darkshark Posted August 19, 2016 Share Posted August 19, 2016 hello people, all right?I have a CSV file with data at sea level over time!My problem is:In some parts of these data, there are failures, or data that is missing, and I need to count and locate these parts!Initially created a code that checks the difference between the times "neighbors" to find these flaws, however, it takes a long time, about 224 seconds to find the flaws ...Because the data is sorted, I thought about doing something with recursion, but I could not ...Did you manage to think of something that can make my code faster? This is my code: #include <Date.au3> Local $a = _ReadCSV(@ScriptDir & "\rqh0281a.csv") ;Time difference between 2 points Local $iTimeDiff = _DateDiff('n', $a[0], $a[1]) Local $n = 0 ConsoleWrite('started' & @LF & 'Time Difference = ' & $iTimeDiff & @LF & @LF) $t = TimerInit() For $i = 1 To UBound($a) - 1 ;Compare the time difference with inicial difference. If _DateDiff('n', $a[$i - 1], $a[$i]) <> $iTimeDiff Then $n += 1 EndIf Next $t = TimerDiff($t) ConsoleWrite('End!' & @LF & 'Verification Time = ' & $t / 1000 & @LF & "errors = " & $n & @LF) Func _ReadCSV($sCSVFile) Local $sRead = FileRead($sCSVFile) Local $aDados = StringRegExp($sRead, "(?m)((?:[-\d]+[, ;]+)+\d+)", 3) Local $n = UBound($aDados) Local $aReturn[$n] For $i = 0 To $n - 1 $sStrip = StringRegExp($aDados[$i], "(\d+)", 3) If @error Then ContinueLoop $aReturn[$i] = $sStrip[0] & "/" & $sStrip[1] & "/" & $sStrip[2] & " " & $sStrip[3] & ":00:00" Next Return $aReturn EndFunc ;==>_ReadCSV and the CSV file:CSV FileThank you very much! Link to comment Share on other sites More sharing options...
jguinch Posted August 19, 2016 Share Posted August 19, 2016 I'm not sure to understand, but maybe try this code : expandcollapse popup#include <File.au3> #include <Array.au3> Local $aCSV = _CheckCSV(@ScriptDir & "\rqh0281a.csv") Func _CheckCSV($sCSVFile) Local $aLines, $iTimeDiff, $iLastTime _FileReadToArray($sCSVFile, $aLines, $FRTA_NOCOUNT) If @error Then Return SetError(@error, 0, 0) Local $aRet[ UBound($aLines) ], $iCountValid = 0 Local $aErrors[ UBound($aLines) ], $iCountError = 0 For $i = 0 To UBound($aLines) - 1 $aNumbers = StringRegExp($aLines[$i], "\d+", 3) If @error Or UBound($aNumbers) <> 5 Then $aErrors[$iCountError] = "Syntax error line " & $i $iCountError += 1 Else If $i = 1 Then $iTimeDiff = ($aNumbers[3] - $iLastTime) * 60 ElseIf $i > 1 Then If ($aNumbers[3] - $iLastTime) * 60 <> $iTimeDiff Then $aErrors[$iCountError] = "Time diff error line " & $i $iCountError += 1 EndIf EndIf $aRet[$iCountValid] = $aNumbers[0] & "/" & $aNumbers[1] & "/" & $aNumbers[2] & " " & $aNumbers[3] & ":00:00" $iCountValid += 1 $iLastTime = $aNumbers[3] EndIf Next ReDim $aRet[$iCountValid] ReDim $aErrors[$iCountError] _ArrayDisplay($aRet) _ArrayDisplay($aErrors) EndFunc Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
mikell Posted August 19, 2016 Share Posted August 19, 2016 jguinch, Your code fires an error each time the hour changes from 23:00 to 0:00 Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted August 19, 2016 Moderators Share Posted August 19, 2016 (edited) darkshark, This runs in about 10 secs on my machine and returns some errors: #include <File.au3> $nBegin = TimerInit() Local $aLines _FileReadToArray("rqh0281a.csv", $aLines) $iLastHour = Number(StringRegExpReplace($aLines[1], "^.*,(\d+),\d{3,4}$", "$1")) For $i = 2 To $aLines[0] $iExpectedHour = Mod($iLastHour + 1, 24) $iCurrHour = Number(StringRegExpReplace($aLines[$i], "^.*,(\d+),\d{3,4}$", "$1")) If $iCurrHour <> $iExpectedHour Then $aSplit = StringSplit($aLines[$i - 1], ",") $sLastDate = $aSplit[1] & "/" & $aSplit[2] & "/" & $aSplit[3] & " " & StringFormat("%02i", $aSplit[4]) $aSplit = StringSplit($aLines[$i], ",") $sCurrDate = $aSplit[1] & "/" & $aSplit[2] & "/" & $aSplit[3] & " " & StringFormat("%02i", $aSplit[4]) ConsoleWrite("Missing data at line " & $i & " between " & $sLastDate & " ~ " & $sCurrDate & @CRLF) EndIf $iLastHour = $iCurrHour Next ConsoleWrite(TimerDiff($nBegin) & @CRLF) Missing data at line 394906 between 1999/4/9 11 ~ 1999/4/11 01 Missing data at line 395685 between 1999/5/13 11 ~ 1999/5/15 03 Missing data at line 451759 between 2005/10/30 12 ~ 2005/11/11 12 Missing data at line 452096 between 2005/11/25 12 ~ 2006/5/12 12 Execution time in ms: 10764.9902817937 M23 Edit: I have been testing further on a small extract (1000 lines) of the full file and it fails if there is exactly 25 hours of data missing. I am working on a further check for that. Edited August 19, 2016 by Melba23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
jchd Posted August 19, 2016 Share Posted August 19, 2016 @darkshark, One information is missing in your specifications: what is the maximum time laps between two successive entries which is not considered an error? Also Is there a minimum? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted August 19, 2016 Share Posted August 19, 2016 jc, did you take a look at the csv ? Link to comment Share on other sites More sharing options...
jchd Posted August 19, 2016 Share Posted August 19, 2016 Yes, so? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted August 19, 2016 Share Posted August 19, 2016 So, in the csv the entries are systematically listed every hour. So the answer seems obvious. but maybe I misunderstood Link to comment Share on other sites More sharing options...
darkshark Posted August 19, 2016 Author Share Posted August 19, 2016 (edited) Hello, everyone, first thank you for the answers!Today in college I thought of something to streamline the code, tested and ... WOW. The code took just 0.24 seconds to find the mistakes (on my computer).Thank @jguinch, the only problem is that as the @mikell, the code gives problem between 23:00 and 00:00 (I fixed the "bug" and worked correctly)Thank you also @Melba23 for your code and @mikell for joining the topic! @jchd, I need the full series, so I need not have any errors, so any "broken" point can be considered a mistake! There are gauges that measure each hour and more accurate measuring minute by minute!In general, the only problem is that both codes are considering only the time, and sometimes there are gaps of days (the old tide gauges stopped for maintenance) and this can give problems!I made a "kludge" in my code and he successfully found errors in a very short time!I will leave here below my solution! ( considering that the data are sorted, I check longer intervals instead of looking one by one, so need to do fewer checks! ) expandcollapse popup#include <Date.au3> Local $a = _ReadCSV(@Scriptdir & "\rqh0281a.csv") Local $iTimeDiff = _DateDiff("n", $a[0], $a[1]) Local $n = 0 ConsoleWrite("started" & @LF & "Time Difference = " & $iTimeDiff & @LF & @LF) ;"Random" numbers to jump in array. Local $jump[12] = [100000, 50000, 10000, 5000, 2000, 1000, 500, 100, 10, 5, 3, 2] Local $jumped = False $t = TimerInit() For $i = 1 To UBound($a) - 1 ;Verify if i can jump "n" rows comparing the dates and time. For $j = 0 To UBound($jump) - 1 If $i + $jump[$j] <= UBound($a) And _DateDiff("n", $a[$i - 1], $a[$i + $jump[$j]]) = $iTimeDiff * ($jump[$j] + 1) Then $i += $jump[$j] $jumped = True ExitLoop EndIf Next If Not $jumped And _DateDiff("n", $a[$i - 1], $a[$i]) <> $iTimeDiff Then ConsoleWrite($a[$i - 1] & @LF & $a[$i] & @LF & @LF) $n += 1 Else $jumped = false EndIf Next $t = TimerDiff($t) ConsoleWrite("End!" & @LF & "Verification Time = " & $t / 1000 & @LF & "errors = " & $n & @LF) Func _ReadCSV($sCSVFile) Local $sRead = FileRead($sCSVFile) Local $aDados = StringRegExp($sRead, "(?m)((?:[-\d]+[, ;]+)+\d+)", 3) Local $n = UBound($aDados) Local $aReturn[$n] For $i = 0 To $n - 1 $sStrip = StringRegExp($aDados[$i], "(\d+)", 3) If @error Then ContinueLoop $aReturn[$i] = $sStrip[0] & "/" & $sStrip[1] & "/" & $sStrip[2] & " " & $sStrip[3] & ":00:00" Next Return $aReturn EndFunc ;==>_ReadCSV Edited August 20, 2016 by darkshark Fix the code! Link to comment Share on other sites More sharing options...
mikell Posted August 19, 2016 Share Posted August 19, 2016 Clever Link to comment Share on other sites More sharing options...
jchd Posted August 19, 2016 Share Posted August 19, 2016 4 hours ago, mikell said: So, in the csv the entries are systematically listed every hour. So the answer seems obvious. but maybe I misunderstood Not so systematically since the OP feels the need to finds gaps in this series. So no it wasn't obvious, just a gamble or at least an assumption. Also assuming that the nominal measurement frequency is exactly the time difference between the first two entries needs to be validated by a human or by some other mean. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Malkey Posted August 20, 2016 Share Posted August 20, 2016 @darkshark I noticed your solution of post #9 is not finding the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00. By inserting 20 into the $jump array the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00 is found , But the errors at 1999/5/13 11:00:00 to 1999/5/15 3:00:00 and 2005/11/25 12:00:00 to 2006/5/12 12:00:00 are missed. Using:- Local $jump[20] = [100000, 50000, 30000, 20000, 10000, 5000, 3000, 2000, 1000, 500, 300, 200, 100, 50, 30, 20, 10, 5, 3, 2] all 7 error were found, and, a slight reduction in running time. Malkey Link to comment Share on other sites More sharing options...
darkshark Posted August 20, 2016 Author Share Posted August 20, 2016 3 hours ago, Malkey said: @darkshark I noticed your solution of post #9 is not finding the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00. By inserting 20 into the $jump array the error at 2002/8/28 11:00:00 to 2002/9/11 12:00:00 is found , But the errors at 1999/5/13 11:00:00 to 1999/5/15 3:00:00 and 2005/11/25 12:00:00 to 2006/5/12 12:00:00 are missed. Using:- Local $jump[20] = [100000, 50000, 30000, 20000, 10000, 5000, 3000, 2000, 1000, 500, 300, 200, 100, 50, 30, 20, 10, 5, 3, 2] all 7 error were found, and, a slight reduction in running time. Malkey Thanks a lot for your reply! i've found the bug: In this part: $jumped = Not $jumped i replaced with this: $jumped = false and all works again!! i will fix my code in my post! Thanks again! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now