JohnMC Posted September 18, 2018 Posted September 18, 2018 (edited) So I noticed the other day that haveibeenpwned.com provides a downloadable list of the password hashes they have so that you can more securely check a password, obviously if you use the their website to check a password you would then have to consider that password less secure compared to If you check it locally on your machine. The password list is about 21GB of SHA1 hashes along with what I believe is the count of breaches the password has been found in and then a line break, looking like this: 0ADE7C2CF97F75D009975F4D720D1FA6C19F4897:9 1B6453892473A467D07372D45EB05ABC2031647A:4 356A192B7913B04C54574D18C28D46E6395428AB:1 77DE68DAECD823BABBB58EDB1C8E14D7106E83BB:3 902BA3CDA1883801594B6E1B452790CC53948FDA:7 AC3478D69A3C81FA62E60F5C3696165A4E5E6AC4:5 B1D5781111D84F7B3FE45A0852E59758CD7A87E5:10 C1DFD96EEA8CC2B62785275BCA38AC261256E278:6 DA4B9237BACCCDF19C0760CAB7AEC4A8359010B0:2 FE5DBBCEA5CE7E2988B8C69BCFDFDE8904AABC1F:8 I was surprised I couldn't find a function to efficiently search a large file like this, general googling for how to best search a large file of ordered data turned up nothing ... and when i searched for methods to search the haveibeenpwned.com text file specifically all I got was a number of linear search methods which again surprised me. I'm probably just missing something and I'd love for anyone to share what they know of for doing this. So mostly for fun I've created a function that I believe you could say is just a basic binary search and it is extremely fast and would be even on a file that was 22TB... I ran in to difficulty related to the precision of moving the file pointer and how much data to read from the file at once but so far my solutions have worked and all my tests have returned perfect accuracy. I'd love if someone can give me feedback on it so I can make improvements and feel more confident in it's accuracy, I'm also trying to adapt it to work with other types of data, especially where the individual data points being searched through are variable length but i haven't figured that out yet. Test data (hashes of 1-10 ) can be copied from above but is also attached to the post (script expects testdata.txt). expandcollapse popup#NoTrayIcon #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Change2CUI=y #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** #include <FileConstants.au3> #include <Crypt.au3> Opt("TrayIconHide", 1) ; This is a test script of a way to do a binary search on a large file, specificly made for the file of sha1 hashes (hash ordered) provided by https://haveibeenpwned.com/Passwords ; The loop below provides an input to enter a password that will then be hashed and passed to the main function _FileBinarySearch where it will check pwned.txt for the hash ; You can also leave the input blank to loop through a set of test data which by default will check testdata.txt $Message = "" While 1 $SearchText = InputBox("Pwned File Search",$Message&@CRLF&@CRLF&"What do you want to search for? (leave blank for test data)","") If @error Then Exitloop endif If $SearchText = "" Then ;loop through $TestSamples[] and do a search for each ConsoleWrite(@CRLF&"Testing") Local $TestSamples = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "0", "11"] $sFilePath = "testdata.txt" ;Local $TestSamples = ["passwd1", "passwd1234", "123456", "1234567", "12345678", "safdsadg3423547fgjh45", "sddsj439j23hkjh3gh4KJH3", "sdfsdkjh38saklj1jh7jdh"] ;$sFilePath = "pwned.txt" For $i = 0 To UBound($TestSamples)-1 $Message = $Message & _FileBinarySearch($sFilePath, StringTrimLeft(_Crypt_HashData($TestSamples[$i], $CALG_SHA1),2)) & "," next Else ;use the provided text and do a search $Message = _FileBinarySearch("pwned.txt", StringTrimLeft(_Crypt_HashData($SearchText, $CALG_SHA1),2)) EndIf Wend Func _FileBinarySearch($sFilePath, $Search, $Delimiter = @CRLF) Local $Size = FileGetSize($sFilePath) Local $hFileOpen = FileOpen($sFilePath, $FO_READ) Local $RangeStart = 0 Local $RangeEnd = $Size Local $EstimatedSearchSize = StringLen($Search) + StringLen($Delimiter) + 4 Local $FileReadSize = $EstimatedSearchSize*2 Local $Iterations = 1 Local $IterationsLimit = Floor(Log($Size) / Log(2)) ;the maximum expected difficulty to search the given data Local $Return = False ;ConsoleWrite(@CRLF&@CRLF&"Search: "&$Search&" ReadSize: "&$FileReadSize&" IterationsLimit: "&$IterationsLimit) while 1 ;Get pointer position for middle of given range offset by our read size $PointerPos = Floor(($RangeEnd - $RangeStart) / 2 + $RangeStart - ($FileReadSize / 2)) If $PointerPos < 0 Then $PointerPos = 0 ;Read the file FileSetPos ($hFileOpen, $PointerPos, 0) $Data = FileRead ($hFileOpen,$FileReadSize) If @extended < $FileReadSize Then $Data = $Data&$Delimiter ;end of file? add delimiter to make it easy If $PointerPos = 0 Then $Data = $Delimiter&$Data ;start of file? add delimiter to make it easy ;ConsoleWrite(@CRLF&@CRLF&"Raw Read: "&StringReplace(StringReplace($Data,@LF,"@"),@CR,"%")) ;Clean up the data we read ; an issue will arise in other sets of data with variable length where the data read might not contain delimiters $DataStart = StringInStr($Data,@CRLF) + StringLen($Delimiter) $Data = StringMid($Data, $DataStart, StringInStr($Data, $Delimiter, 0, 2) - $DataStart) ;Update pointer pos with precise start point of the data we are comparing $PointerPos = $PointerPos + $DataStart ;ConsoleWrite(@CRLF&"Data: "&$Data&" Range: "&$RangeStart&"/"&$RangeEnd&" Pointer: "&$PointerPos&" Iterations: "&$Iterations) If StringInStr($Data, $Search) Then ;found the string string ConsoleWrite(@CRLF&"Match Found At "&$PointerPos) $Return = $PointerPos Exitloop ElseIf $RangeEnd - $RangeStart < $EstimatedSearchSize Then ;nothing left to search ConsoleWrite(@CRLF&"No Match") Exitloop ElseIf $Iterations > $IterationsLimit Then ;exceeded the expected difficulty ConsoleWrite(@CRLF&" Error With Too Many Iterations") SetError(1) Exitloop ElseIf $Data < $Search Then ;the found string is lesser ;ConsoleWrite(" Result: Under") $RangeStart = $PointerPos + StringLen($Data) ElseIf $Data > $Search Then ;the found string is greater ;ConsoleWrite(" Result: Over") $RangeEnd = $PointerPos EndIf $Iterations += 1 WEnd FileClose($hFileOpen) Return $Return EndFunc pwned.au3 testdata.txt Edited September 18, 2018 by JohnMC https://johnscs.com
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now