# Find a "break" in time interval

## Recommended Posts

hello people, all right?

I have a CSV file with data at sea level over time!
My problem is:
In some parts of these data, there are failures, or data that is missing, and I need to count and locate these parts!

Initially created a code that checks the difference between the times "neighbors" to find these flaws, however, it takes a long time, about 224 seconds to find the flaws ...
Because the data is sorted, I thought about doing something with recursion, but I could not ...
Did you manage to think of something that can make my code faster?

This is my code:

```#include <Date.au3>

Local \$a = _ReadCSV(@ScriptDir & "\rqh0281a.csv")
;Time difference between 2 points
Local \$iTimeDiff = _DateDiff('n', \$a[0], \$a[1])
Local \$n = 0

ConsoleWrite('started' & @LF & 'Time Difference = ' & \$iTimeDiff & @LF & @LF)

\$t = TimerInit()

For \$i = 1 To UBound(\$a) - 1
;Compare the time difference with inicial difference.
If _DateDiff('n', \$a[\$i - 1], \$a[\$i]) <> \$iTimeDiff Then
\$n += 1
EndIf
Next
\$t = TimerDiff(\$t)

ConsoleWrite('End!' & @LF & 'Verification Time = ' & \$t / 1000 & @LF & "errors = " & \$n & @LF)

Local \$aReturn[\$n]

For \$i = 0 To \$n - 1
If @error Then ContinueLoop
\$aReturn[\$i] = \$sStrip[0] & "/" & \$sStrip[1] & "/" & \$sStrip[2] & " " & \$sStrip[3] & ":00:00"
Next
Return \$aReturn

and the CSV file:
CSV File

Thank you very much!

##### Share on other sites

I'm not sure to understand, but maybe try this code :

```#include <File.au3>
#include <Array.au3>

Local \$aCSV = _CheckCSV(@ScriptDir & "\rqh0281a.csv")

Func _CheckCSV(\$sCSVFile)
Local \$aLines, \$iTimeDiff, \$iLastTime
If @error Then Return SetError(@error, 0, 0)

Local \$aRet[ UBound(\$aLines) ], \$iCountValid = 0
Local \$aErrors[ UBound(\$aLines) ], \$iCountError = 0

For \$i = 0 To UBound(\$aLines) - 1
\$aNumbers = StringRegExp(\$aLines[\$i], "\d+", 3)
If @error Or UBound(\$aNumbers) <> 5 Then
\$aErrors[\$iCountError] = "Syntax error line " & \$i
\$iCountError += 1
Else
If \$i = 1 Then
\$iTimeDiff = (\$aNumbers[3] - \$iLastTime) * 60
ElseIf \$i > 1 Then
If (\$aNumbers[3] - \$iLastTime) * 60 <> \$iTimeDiff Then
\$aErrors[\$iCountError] = "Time diff error line " & \$i
\$iCountError += 1
EndIf
EndIf
\$aRet[\$iCountValid] = \$aNumbers[0] & "/" & \$aNumbers[1] & "/" & \$aNumbers[2] & " " & \$aNumbers[3] & ":00:00"
\$iCountValid += 1
\$iLastTime = \$aNumbers[3]
EndIf
Next
ReDim \$aRet[\$iCountValid]
ReDim \$aErrors[\$iCountError]

_ArrayDisplay(\$aRet)
_ArrayDisplay(\$aErrors)

EndFunc```

##### Share on other sites

jguinch,
Your code fires an error each time the hour changes from  23:00  to  0:00

##### Share on other sites

darkshark,

This runs in about 10 secs on my machine and returns some errors:

```#include <File.au3>

\$nBegin = TimerInit()

Local \$aLines

\$iLastHour = Number(StringRegExpReplace(\$aLines[1], "^.*,(\d+),\d{3,4}\$", "\$1"))

For \$i = 2 To \$aLines[0]
\$iExpectedHour = Mod(\$iLastHour + 1, 24)
\$iCurrHour = Number(StringRegExpReplace(\$aLines[\$i], "^.*,(\d+),\d{3,4}\$", "\$1"))
If \$iCurrHour <> \$iExpectedHour Then
\$aSplit = StringSplit(\$aLines[\$i - 1], ",")
\$sLastDate = \$aSplit[1] & "/" & \$aSplit[2] & "/" & \$aSplit[3] & " " & StringFormat("%02i", \$aSplit[4])
\$aSplit = StringSplit(\$aLines[\$i], ",")
\$sCurrDate = \$aSplit[1] & "/" & \$aSplit[2] & "/" & \$aSplit[3] & " " & StringFormat("%02i", \$aSplit[4])
ConsoleWrite("Missing data at line " & \$i & " between " & \$sLastDate & " ~ " & \$sCurrDate & @CRLF)
EndIf
\$iLastHour = \$iCurrHour
Next

ConsoleWrite(TimerDiff(\$nBegin) & @CRLF)```
```Missing data at line 394906 between 1999/4/9 11 ~ 1999/4/11 01
Missing data at line 395685 between 1999/5/13 11 ~ 1999/5/15 03
Missing data at line 451759 between 2005/10/30 12 ~ 2005/11/11 12
Missing data at line 452096 between 2005/11/25 12 ~ 2006/5/12 12
Execution time in ms: 10764.9902817937```

M23

Edit: I have been testing further on a small extract (1000 lines) of the full file and it fails if there is exactly 25 hours of data missing. I am working on a further check for that.

Edited by Melba23

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

##### Share on other sites

One information is missing in your specifications: what is the maximum time laps between two successive entries which is not considered an error? Also Is there a minimum?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

##### Share on other sites

jc, did you take a look at the csv ?

##### Share on other sites

Yes, so?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

##### Share on other sites

So, in the csv the entries are systematically listed every hour. So the answer seems obvious. but maybe I misunderstood

##### Share on other sites

Hello, everyone, first thank you for the answers!

Today in college I thought of something to streamline the code, tested and ... WOW. The code took just 0.24 seconds to find the mistakes (on my computer).

Thank @jguinch, the only problem is that as the @mikell, the code gives problem between 23:00 and 00:00 (I fixed the "bug" and worked correctly)

Thank you also @Melba23 for your code and @mikell
for joining the topic!

I need the full series, so I need not have any errors, so any "broken" point can be considered a mistake!
There are gauges that measure each hour and more accurate measuring minute by minute!

In general, the only problem is that both codes are considering only the time, and sometimes there are gaps of days (the old tide gauges stopped for maintenance) and this can give problems!

I made a "kludge" in my code and he successfully found errors in a very short time!

I will leave here below my solution! (
considering that the data are sorted, I check longer intervals instead of looking one by one, so need to do fewer checks! )

```#include <Date.au3>

Local \$a = _ReadCSV(@Scriptdir & "\rqh0281a.csv")
Local \$iTimeDiff = _DateDiff("n", \$a[0], \$a[1])
Local \$n = 0

ConsoleWrite("started" & @LF & "Time Difference = " & \$iTimeDiff & @LF & @LF)

;"Random" numbers to jump in array.
Local \$jump[12] = [100000, 50000, 10000, 5000, 2000, 1000, 500, 100, 10, 5, 3, 2]
Local \$jumped = False

\$t = TimerInit()
For \$i = 1 To UBound(\$a) - 1
;Verify if i can jump "n" rows comparing the dates and time.
For \$j = 0 To UBound(\$jump) - 1
If \$i + \$jump[\$j] <= UBound(\$a) And _DateDiff("n", \$a[\$i - 1], \$a[\$i + \$jump[\$j]]) = \$iTimeDiff * (\$jump[\$j] + 1) Then
\$i += \$jump[\$j]
\$jumped = True
ExitLoop
EndIf
Next
If Not \$jumped And _DateDiff("n", \$a[\$i - 1], \$a[\$i]) <> \$iTimeDiff Then
ConsoleWrite(\$a[\$i - 1] & @LF & \$a[\$i] & @LF & @LF)
\$n += 1
Else
\$jumped = false
EndIf
Next
\$t = TimerDiff(\$t)

ConsoleWrite("End!" & @LF & "Verification Time = " & \$t / 1000 & @LF & "errors = " & \$n & @LF)

Local \$aReturn[\$n]

For \$i = 0 To \$n - 1
If @error Then ContinueLoop
\$aReturn[\$i] = \$sStrip[0] & "/" & \$sStrip[1] & "/" & \$sStrip[2] & " " & \$sStrip[3] & ":00:00"
Next
Return \$aReturn

Edited by darkshark
Fix the code!

Clever

##### Share on other sites
4 hours ago, mikell said:

So, in the csv the entries are systematically listed every hour. So the answer seems obvious. but maybe I misunderstood

Not so systematically since the OP feels the need to finds gaps in this series. So no it wasn't obvious, just a gamble or at least an assumption. Also assuming that the nominal measurement frequency is exactly the time difference between the first two entries needs to be validated by a human or by some other mean.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

##### Share on other sites

@darkshark

I noticed your solution of post #9 is not finding the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00.

By inserting 20 into the \$jump array the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00 is found , But the errors at 1999/5/13 11:00:00 to 1999/5/15 3:00:00 and 2005/11/25 12:00:00 to 2006/5/12 12:00:00 are missed.

Using:-

`Local \$jump[20] = [100000, 50000, 30000, 20000, 10000, 5000, 3000, 2000, 1000, 500, 300, 200, 100, 50, 30, 20, 10, 5, 3, 2]`

all 7 error were found, and, a slight reduction in running time.

Malkey

##### Share on other sites
3 hours ago, Malkey said:

@darkshark

I noticed your solution of post #9 is not finding the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00.

By inserting 20 into the \$jump array the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00 is found , But the errors at 1999/5/13 11:00:00 to 1999/5/15 3:00:00 and 2005/11/25 12:00:00 to 2006/5/12 12:00:00 are missed.

Using:-

`Local \$jump[20] = [100000, 50000, 30000, 20000, 10000, 5000, 3000, 2000, 1000, 500, 300, 200, 100, 50, 30, 20, 10, 5, 3, 2]`

all 7 error were found, and, a slight reduction in running time.

Malkey

i've found the bug:

In this part:

`\$jumped = Not \$jumped`

i replaced with this:

`\$jumped = false`

and all works again!!

i will fix my code in my post!

Thanks again!

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account

×

• Wiki

• Back

• Git

## Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.