Jump to content

Find a "break" in time interval


Recommended Posts

hello people, all right?

I have a CSV file with data at sea level over time!
My problem is:
In some parts of these data, there are failures, or data that is missing, and I need to count and locate these parts!

Initially created a code that checks the difference between the times "neighbors" to find these flaws, however, it takes a long time, about 224 seconds to find the flaws ...
Because the data is sorted, I thought about doing something with recursion, but I could not ...
Did you manage to think of something that can make my code faster?

 

This is my code:

#include <Date.au3>


Local $a = _ReadCSV(@ScriptDir & "\rqh0281a.csv")
;Time difference between 2 points
Local $iTimeDiff = _DateDiff('n', $a[0], $a[1])
Local $n = 0


ConsoleWrite('started' & @LF & 'Time Difference = ' & $iTimeDiff & @LF & @LF)

$t = TimerInit()

For $i = 1 To UBound($a) - 1
    ;Compare the time difference with inicial difference.
    If _DateDiff('n', $a[$i - 1], $a[$i]) <> $iTimeDiff Then
        $n += 1
    EndIf
Next
$t = TimerDiff($t)

ConsoleWrite('End!' & @LF & 'Verification Time = ' & $t / 1000 & @LF & "errors = " & $n & @LF)

Func _ReadCSV($sCSVFile)
    Local $sRead = FileRead($sCSVFile)
    Local $aDados = StringRegExp($sRead, "(?m)((?:[-\d]+[, ;]+)+\d+)", 3)
    Local $n = UBound($aDados)
    Local $aReturn[$n]

    For $i = 0 To $n - 1
        $sStrip = StringRegExp($aDados[$i], "(\d+)", 3)
        If @error Then ContinueLoop
        $aReturn[$i] = $sStrip[0] & "/" & $sStrip[1] & "/" & $sStrip[2] & " " & $sStrip[3] & ":00:00"
    Next
    Return $aReturn
EndFunc   ;==>_ReadCSV

and the CSV file:
CSV File



Thank you very much!

Link to comment
Share on other sites

I'm not sure to understand, but maybe try this code :

#include <File.au3>
#include <Array.au3>

Local $aCSV = _CheckCSV(@ScriptDir & "\rqh0281a.csv")

Func _CheckCSV($sCSVFile)
    Local $aLines, $iTimeDiff, $iLastTime
    _FileReadToArray($sCSVFile, $aLines, $FRTA_NOCOUNT)
    If @error Then Return SetError(@error, 0, 0)

    Local $aRet[ UBound($aLines) ], $iCountValid = 0
    Local $aErrors[ UBound($aLines) ], $iCountError = 0

    For $i = 0 To UBound($aLines) - 1
        $aNumbers = StringRegExp($aLines[$i], "\d+", 3)
        If @error Or UBound($aNumbers) <> 5 Then
            $aErrors[$iCountError] = "Syntax error line " & $i
            $iCountError += 1
        Else
            If $i = 1 Then
                $iTimeDiff = ($aNumbers[3] - $iLastTime) * 60
            ElseIf $i > 1 Then
                If ($aNumbers[3] - $iLastTime) * 60 <> $iTimeDiff Then
                    $aErrors[$iCountError] = "Time diff error line " & $i
                    $iCountError += 1
                EndIf
            EndIf
            $aRet[$iCountValid] = $aNumbers[0] & "/" & $aNumbers[1] & "/" & $aNumbers[2] & " " & $aNumbers[3] & ":00:00"
            $iCountValid += 1
            $iLastTime = $aNumbers[3]
        EndIf
    Next
    ReDim $aRet[$iCountValid]
    ReDim $aErrors[$iCountError]

    _ArrayDisplay($aRet)
    _ArrayDisplay($aErrors)

EndFunc

 

Link to comment
Share on other sites

  • Moderators

darkshark,

This runs in about 10 secs on my machine and returns some errors:

#include <File.au3>

$nBegin = TimerInit()

Local $aLines
_FileReadToArray("rqh0281a.csv", $aLines)

$iLastHour = Number(StringRegExpReplace($aLines[1], "^.*,(\d+),\d{3,4}$", "$1"))

For $i = 2 To $aLines[0]
    $iExpectedHour = Mod($iLastHour + 1, 24)
    $iCurrHour = Number(StringRegExpReplace($aLines[$i], "^.*,(\d+),\d{3,4}$", "$1"))
    If $iCurrHour <> $iExpectedHour Then
        $aSplit = StringSplit($aLines[$i - 1], ",")
        $sLastDate = $aSplit[1] & "/" & $aSplit[2] & "/" & $aSplit[3] & " " & StringFormat("%02i", $aSplit[4])
        $aSplit = StringSplit($aLines[$i], ",")
        $sCurrDate = $aSplit[1] & "/" & $aSplit[2] & "/" & $aSplit[3] & " " & StringFormat("%02i", $aSplit[4])
        ConsoleWrite("Missing data at line " & $i & " between " & $sLastDate & " ~ " & $sCurrDate & @CRLF)
    EndIf
    $iLastHour = $iCurrHour
Next

ConsoleWrite(TimerDiff($nBegin) & @CRLF)
Missing data at line 394906 between 1999/4/9 11 ~ 1999/4/11 01
Missing data at line 395685 between 1999/5/13 11 ~ 1999/5/15 03
Missing data at line 451759 between 2005/10/30 12 ~ 2005/11/11 12
Missing data at line 452096 between 2005/11/25 12 ~ 2006/5/12 12
Execution time in ms: 10764.9902817937

M23

Edit: I have been testing further on a small extract (1000 lines) of the full file and it fails if there is exactly 25 hours of data missing. I am working on a further check for that.

Edited by Melba23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

@darkshark,

One information is missing in your specifications: what is the maximum time laps between two successive entries which is not considered an error? Also Is there a minimum?

 

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Yes, so?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

 
 
 
 
 

Hello, everyone, first thank you for the answers!

Today in college I thought of something to streamline the code, tested and ... WOW. The code took just 0.24 seconds to find the mistakes (on my computer).

Thank @jguinch, the only problem is that as the @mikell, the code gives problem between 23:00 and 00:00 (I fixed the "bug" and worked correctly)

Thank you also @Melba23 for your code and @mikell
for joining the topic!

@jchd, I need the full series, so I need not have any errors, so any "broken" point can be considered a mistake!
There are gauges that measure each hour and more accurate measuring minute by minute!

In general, the only problem is that both codes are considering only the time, and sometimes there are gaps of days (the old tide gauges stopped for maintenance) and this can give problems!

I made a "kludge" in my code and he successfully found errors in a very short time!

I will leave here below my solution! (
considering that the data are sorted, I check longer intervals instead of looking one by one, so need to do fewer checks! )

#include <Date.au3>


Local $a = _ReadCSV(@Scriptdir & "\rqh0281a.csv")
Local $iTimeDiff = _DateDiff("n", $a[0], $a[1])
Local $n = 0


ConsoleWrite("started" & @LF & "Time Difference = " & $iTimeDiff & @LF & @LF)

;"Random" numbers to jump in array.
Local $jump[12] = [100000, 50000, 10000, 5000, 2000, 1000, 500, 100, 10, 5, 3, 2]
Local $jumped = False

$t = TimerInit()
For $i = 1 To UBound($a) - 1
    ;Verify if i can jump "n" rows comparing the dates and time.
    For $j = 0 To UBound($jump) - 1
        If $i + $jump[$j] <= UBound($a) And _DateDiff("n", $a[$i - 1], $a[$i + $jump[$j]]) = $iTimeDiff * ($jump[$j] + 1) Then
            $i += $jump[$j]
            $jumped = True
            ExitLoop
        EndIf
    Next
    If Not $jumped And _DateDiff("n", $a[$i - 1], $a[$i]) <> $iTimeDiff Then
        ConsoleWrite($a[$i - 1] & @LF & $a[$i] & @LF & @LF)
        $n += 1
    Else
        $jumped = false
    EndIf
Next
$t = TimerDiff($t)

ConsoleWrite("End!" & @LF & "Verification Time = " & $t / 1000 & @LF & "errors = " & $n & @LF)

Func _ReadCSV($sCSVFile)
    Local $sRead = FileRead($sCSVFile)
    Local $aDados = StringRegExp($sRead, "(?m)((?:[-\d]+[, ;]+)+\d+)", 3)
    Local $n = UBound($aDados)
    Local $aReturn[$n]

    For $i = 0 To $n - 1
        $sStrip = StringRegExp($aDados[$i], "(\d+)", 3)
        If @error Then ContinueLoop
        $aReturn[$i] = $sStrip[0] & "/" & $sStrip[1] & "/" & $sStrip[2] & " " & $sStrip[3] & ":00:00"
    Next
    Return $aReturn
EndFunc   ;==>_ReadCSV

 

Edited by darkshark
Fix the code!
Link to comment
Share on other sites

4 hours ago, mikell said:

So, in the csv the entries are systematically listed every hour. So the answer seems obvious. but maybe I misunderstood

Not so systematically since the OP feels the need to finds gaps in this series. So no it wasn't obvious, just a gamble or at least an assumption. Also assuming that the nominal measurement frequency is exactly the time difference between the first two entries needs to be validated by a human or by some other mean.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@darkshark

I noticed your solution of post #9 is not finding the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00.

By inserting 20 into the $jump array the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00 is found , But the errors at 1999/5/13 11:00:00 to 1999/5/15 3:00:00 and 2005/11/25 12:00:00 to 2006/5/12 12:00:00 are missed.

Using:-  

Local $jump[20] = [100000, 50000, 30000, 20000, 10000, 5000, 3000, 2000, 1000, 500, 300, 200, 100, 50, 30, 20, 10, 5, 3, 2]

all 7 error were found, and, a slight reduction in running time.

Malkey

Link to comment
Share on other sites

3 hours ago, Malkey said:

@darkshark

I noticed your solution of post #9 is not finding the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00.

By inserting 20 into the $jump array the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00 is found , But the errors at 1999/5/13 11:00:00 to 1999/5/15 3:00:00 and 2005/11/25 12:00:00 to 2006/5/12 12:00:00 are missed.

Using:-  

Local $jump[20] = [100000, 50000, 30000, 20000, 10000, 5000, 3000, 2000, 1000, 500, 300, 200, 100, 50, 30, 20, 10, 5, 3, 2]

all 7 error were found, and, a slight reduction in running time.

Malkey

Thanks a lot for your reply!

i've found the bug:

 

In this part:

$jumped = Not $jumped

i replaced with this:

$jumped = false

and all works again!!

 

i will fix my code in my post!

 

Thanks again!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...