ScottW Posted February 15, 2019 Share Posted February 15, 2019 So, i'm working on a compare tool that will compare the output of the output of a route in the legacy HL7 engine and the output of the new HL7 engine. The idea is to run a months worth of messages into the test route and capture the output in a file. Then using this file to compare it to the current live output. This will tell me all my coding and filtering has been done correctly. We have ultra Edit, but an HL7 message consists of several lines. The files consist of 500-500,000 HL7 messages each. We don't want lines to be compared, we want messages. An HL7 message looks like this. Imagine a file with 100,000 of them stacked on top of each other, to get an idea of what I'm working with. MSH|^~\&|EPIC|EPICADT|SMS|SMSADT|199912271408|CHARRIS|ADT^A04|1817457|D|2.5|PID||0493575^^^2^ID 1|454721||DOE^JOHN^^^^|DOE^JOHN^^^^|19480203|M||B|254 MYSTREET AVE^^MYTOWN^OH^44123^USA||(216)123-4567|||M|NON|400003403~1129086|NK1||ROE^MARIE^^^^|SPO||(216)123-4567||EC|||||||||||||||||||||||||||PV1||O|168 ~219~C~PMA^^^^^^^^^||||277^ALLEN MYLASTNAME^BONNIE^^^^|||||||||| ||2688684|||||||||||||||||||||||||199912271408||||||002376853 Each segment prefixed with a segment type, each field in the segment split with a pipe, each sub-field with a carrot, and sub-sub-fields with an ampersand. I started this project by writing each file to an array parsing on MSH|^~\&. This gave me 2 arrays to compare. I ran the compare using this snipit, modeled off a post i found here: _ArraySort($AnchorArray, 0, 1, 0, 0, 1) ; comparing the 2 array's For $i = 0 to UBound($TestingArray) - 1 $Index = _ArrayBinarySearch($AnchorArray, $TestingArray[$i], 1) GUICtrlSetData($LabelStatus,"Status: comparing record"&$i) ; add equal rows to a string If $Index <> -1 Then $delString1 &= ";" & $Index $delString2 &= ";" & $i EndIf Next ; removing the equal rows from the array's _ArrayDelete($AnchorArray, $delString1) _ArrayDelete($TestingArray, $delString2) ; writing the rsult to files _FileWriteFromArray("migrationcompare\"&$FnMIA, $TestingArray) _FileWriteFromArray("migrationcompare\"&$FnMIT, $AnchorArray) and removed the matching messages, leaving me with 2 files (Missing in Anchor) and (Missing in Test)... Now these missing files can either actually be missing or they can be mapped wrong. They may exist but some part of the string maybe different. So I took it a step further. I then run each of these through another step and create 2 files with just the MSH.10 field (in Red above). An MSH.10 field SHOULD be a UUID for each message. I then compare these 2 MSH.10 files to see what MSH.10's match in (Missing in Anchor) and (Missing in Test)...That tells me what messages exist in both the anchor (legacy) outbound and Test(new) outbound, but don't match.. these I have to correct with my mapper logic. I also create 2 other files, one for messages that got past my filters and shouldn't have, and one that is for messages that didn't get across that should have. I use these to work on my filtering logic. The only thing is these 3 final files are just MSH.10 values. I have to then search my files for the actual record, pull it out of the legacy and the test files and then do a compare with notepad++ or ultraedit to see what exactly doesn't match. What I'd like to happen is for my final 3 files to not be just the MSH.10 value, but the actual messages so I can skip a step of finding them. So my thoughts are: Look at the 2 'Missing' files that have the full HL7 records in them and searching for the matching MSH.10 values from above, create 2 new missing files that have only the messages that had the matching MSH values.... should work but a string in string function will take to long... so my thought is to prefix each message with the MSH.10 value and then do a compare on only the first X characters... But am i wasting my time and CPU? Can I compare 'quickly' the MSH.10 value of each file and sort them out without first extracting the MSH.10 into their own files? And this seems quicker, but how do i use this with just the MSH.10 value? I'm fumbling with this right now. Local $a=$AnchorArray Local $b=$TestingArray Local $sda = ObjCreate("Scripting.Dictionary") Local $sdb = ObjCreate("Scripting.Dictionary") Local $sdc = ObjCreate("Scripting.Dictionary") For $i In $a $sda.Item($i) Next For $i In $b $sdb.Item($i) Next For $i In $a If $sdb.Exists($i) Then $sdc.Item($i) Next $asd3 = $sdc.Keys() For $i In $asd3 If $sda.Exists($i) Then $sda.Remove($i) If $sdb.Exists($i) Then $sdb.Remove($i) Next $asd1 = $sda.Keys() $asd2 = $sdb.Keys() _ArrayDisplay($asd1, "$asd1") _ArrayDisplay($asd2, "$asd2") _ArrayDisplay($asd3, "$asd3") ; writing the result to files _FileWriteFromArray("migrationcompare\new_"&$FnMIA, $asd2) _FileWriteFromArray("migrationcompare\new_"&$FnMIT, $asd1) (also found here somewhere) Ideas? Suggestions? I can post the entire source code if needed. Thanks Scott Link to comment Share on other sites More sharing options...
Earthshine Posted February 15, 2019 Share Posted February 15, 2019 (edited) googles DiffMatchPatch library is extremely fast at comparing plain text files. Search the form for running c# code in autoit. it's pretty cool stuff. I am not so sure this will work if your stuff is not plain text. Edited February 15, 2019 by Earthshine My resources are limited. You must ask the right questions Link to comment Share on other sites More sharing options...
ScottW Posted February 18, 2019 Author Share Posted February 18, 2019 Thanks for the suggestion, i'll file that under the to learn tab in my to do list. I don't think it'll work very well for what I am working on. Here's what I finally came up with. I like it. It works! Takes 2 large collections of messages, splits them into arrays on the beginning of the message. Compares the 2 files and deletes the exact matches, leaves me 2 files without any exact matches. Then it looks at these 2 files and matches any messages that have the same MSH.10 value... writes out these matches to 2 files (sorted) so they can be compared line by line latter to see where the mapper errors are... finally writes out two more final files with extra data from each of the 2 sides of the compare.... now to just rework my front end and plug this in. Here's my code... i'm sure there's some room for improvement, so fire away. expandcollapse popup#Include <Array.au3> #Include <File.au3> $finalMSH10 = "" $finalMSH10_test = "" $Test_matches = "" $Test_nonmatch = "" $Anchor_matches = "" $Anchor_nonmatches = "" ;--------------------------creates anchor array and anchor MSH.10 array--------------------------------------- ;Local $Array_anchor = StringSplit(FileRead("C:\AutoIT\RewriteHL7\VarianADT2019-02-08-10-30-55-355 (2).txt"), "MSH|^~\&", 1) Local $Array_anchor = StringSplit(FileRead("C:\AutoIT\RewriteHL7\Range_Anchor_20190129.txt"), "MSH|^~\&", 1) _arraydelete($Array_anchor, '0') for $ACount = 0 to UBound($Array_anchor) - 1 Local $start = stringinstr($Array_anchor[$ACount], "|", 0, 8, 1) Local $end = stringinstr($Array_anchor[$ACount], "|", 0, 9, 1) local $MSH10_Lenght = $end - $start local $MSH10 = StringMid($Array_anchor[$ACount], $start + 1, $MSH10_Lenght - 1) ;GUICtrlSetData($LabelStatus, "Status: Pulling MSH from Anchor-" & $a) $finalMSH10 = $finalMSH10 & $MSH10 & @CRLF $Array_anchor[$ACount] = "MSH|^~\&" & $Array_anchor[$ACount] ;put MSH and control characters back on next ;build the MSH.10 only array for compare latter $Array_MSH10_Anchor = StringSplit($finalMSH10, @CRLF, 1) _arraydelete($Array_MSH10_Anchor, '0-1') ;_ArrayDisplay($Array_MSH10_Anchor) ;------------------------Creates Test Array and test MSH.10 array---------------------------------------- ;Local $Array_test = StringSplit(FileRead("C:\AutoIT\RewriteHL7\Range_Anchor_20190129.txt"), "MSH|^~\&", 1) Local $Array_test = StringSplit(FileRead("C:\AutoIT\RewriteHL7\VarianADT2019-02-08-10-30-55-355 (2).txt"), "MSH|^~\&", 1) for $TCount = 0 to UBound($Array_test) - 1 Local $start = stringinstr($Array_test[$TCount], "|", 0, 8, 1) Local $end = stringinstr($Array_test[$TCount], "|", 0, 9, 1) local $MSH10_Length = $end - $start local $MSH10_test = StringMid($Array_test[$TCount], $start + 1, $MSH10_Length - 1) ;GUICtrlSetData($LabelStatus, "Status: Pulling MSH from Anchor-" & $a) $finalMSH10_test = $finalMSH10_test & $MSH10_test & @CRLF $Array_test[$TCount] = "MSH|^~\&" & $Array_test[$TCount] ;put MSH and control characters back on Next $Array_MSH10_Test = StringSplit($finalMSH10_test, @CRLF, 1) _arraydelete($Array_test, '0-1') ;-----------create dictionary objects---------------------- Local $Object_String_Anchor = ObjCreate("Scripting.Dictionary") ;Holds Anchor Array Local $Object_String_A_MSH10 = ObjCreate("Scripting.Dictionary") ;Holds Anchor MSH.10 values Local $Object_String_T_MSH10 = ObjCreate("Scripting.Dictionary") ;Holds Test MSH.10 values Local $Object_String_Test = ObjCreate("Scripting.Dictionary") ;Holds Test Array Local $Object_String_Matches = ObjCreate("Scripting.Dictionary") ; holds array of matched messages, used to delete good matches ;----------- populate $Object_String_A_MSH10 object with anchor array msh.10 values------------- For $i In $Array_MSH10_Anchor $Object_String_A_MSH10.Item($i) Next ;----------- populate $Object_String_T_MSH10 object with test array msh.10 values------------- _ArrayDelete($Array_MSH10_Test, '0-2') For $i In $Array_MSH10_Test $Object_String_T_MSH10.Item($i) Next ;----------- populate $Object_String_Anchor object with anchor array------------- For $i In $Array_anchor $Object_String_Anchor.Item($i) Next ;-------------- populate $Object_String_Test object with test array-------------- For $i In $Array_test $Object_String_Test.Item($i) Next ;-------------- check anchor array for matching items in test array and add them to the $Object_String_Matches object ---------------- For $i In $Array_anchor If $Object_String_Test.Exists($i) Then $Object_String_Matches.Item($i) Next ;----- populate matching array Array_returned_Matches with values from $Object_String_Matches object ----------- $Array_returned_Matches = $Object_String_Matches.Keys() ;------------------ take the matches of Array_returned_Matches and remove them from the $Object_String_Anchor and $Object_String_Test objects ---------------------- For $i In $Array_returned_Matches If $Object_String_Anchor.Exists($i) Then $Object_String_Anchor.Remove($i) If $Object_String_Test.Exists($i) Then $Object_String_Test.Remove($i) Next ;------ set arrays $Array_returned_Anchor and Array_returned_Test to the $Object_String_Anchor and $Object_String_Test objects ------------- $Array_returned_Anchor = $Object_String_Anchor.Keys() $Array_returned_Test = $Object_String_Test.Keys() ;----------Part 2--------------------- ;comments-start ;----------- Pull matching records out for Test------------- for $TC = 0 to UBound($Array_returned_Test) - 1 Local $start = stringinstr($Array_returned_Test[$TC], "|", 0, 9, 1) Local $end = stringinstr($Array_returned_Test[$TC], "|", 0, 10, 1) local $MSH10_Lenght = $end - $start ;local $MSH10=StringMid($Array_returned_Test[$TC],$start+1, $MSH10_Lenght-1) if $Object_String_A_MSH10.Exists(StringMid($Array_returned_Test[$TC], $start + 1, $MSH10_Lenght - 1)) Then $Test_matches = $Test_matches & "AWAWRERET" & $Array_returned_Test[$TC] & @CRLF Else $Test_nonmatch = $Test_nonmatch & $Array_returned_Test[$TC] & @CRLF EndIf next $Sorted_Test_Matches = StringSplit($Test_matches, "AWAWRERET", 1) _arraysort($Sorted_Test_Matches) ;_ArrayDisplay($Sorted_Test_Matches) _FileWriteFromArray("C:\AutoIT\RewriteHL7\_Test_matches_sorted.txt", $Sorted_Test_Matches) ;$hFilehandle = FileOpen("C:\AutoIT\RewriteHL7\_Test_matches.txt", $FO_OVERWRITE) ;FileWrite($hFilehandle,$Test_matches) ;FileClose($hFilehandle) $hFilehandle = FileOpen("C:\AutoIT\RewriteHL7\_Test_Non_matches.txt", $FO_OVERWRITE) FileWrite($hFilehandle, $Test_nonmatch) FileClose($hFilehandle) ;#comments-end ;----------- Pull matching records out for Anchor------------- for $AC = 0 to UBound($Array_returned_Anchor) - 1 Local $start = stringinstr($Array_returned_Anchor[$AC], "|", 0, 9, 1) Local $end = stringinstr($Array_returned_Anchor[$AC], "|", 0, 10, 1) local $MSH10_Lenght = $end - $start if $Object_String_T_MSH10.Exists(StringMid($Array_returned_Anchor[$AC], $start + 1, $MSH10_Lenght - 1)) Then ;_arrayadd($Anchor_matches,$Array_returned_Anchor[$AC]) $Anchor_matches = $Anchor_matches & "AWAWRERET" & $Array_returned_Anchor[$AC] & @CRLF Else ;_arrayadd($Anchor_nonmatches,$Array_returned_Anchor[$AC]) $Anchor_nonmatches = $Anchor_nonmatches & $Array_returned_Anchor[$AC] & @CRLF EndIf next $Sorted_Anchor_Matches = StringSplit($Anchor_matches, "AWAWRERET", 1) _arraysort($Sorted_Anchor_Matches) _ArrayDisplay($Sorted_Anchor_Matches) _FileWriteFromArray("C:\AutoIT\RewriteHL7\_Anchor_matches_sorted.txt", $Sorted_Anchor_Matches) ;$hFilehandle = FileOpen("C:\AutoIT\RewriteHL7\_Anchor_matches.txt", $FO_OVERWRITE) ;FileWrite($hFilehandle,$Anchor_matches) ;FileClose($hFilehandle) $hFilehandle = FileOpen("C:\AutoIT\RewriteHL7\_Anchor_Non_matches.txt", $FO_OVERWRITE) FileWrite($hFilehandle, $Anchor_nonmatches) FileClose($hFilehandle) ;_ArrayDisplay($Array_returned_Anchor, "$Array_returned_Anchor") ;_ArrayDisplay($Array_returned_Test, "$Array_returned_Test") ;_ArrayDisplay($Array_returned_Matches, "$Array_returned_Matches") ; ------------write out arrays to files ------------------------------------ _Filewritefromarray("C:\AutoIT\RewriteHL7\HL71(MissInTest).txt", $Array_returned_Anchor) _Filewritefromarray("C:\AutoIT\RewriteHL7\HL72(MissinAnch).txt", $Array_returned_Test) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now