Jump to content

Compare/delete text


Recommended Posts

Hi guys,

I have two text-files, which I can each load into a 2D array.

Need to check if the data in the first column of the first array is also in the second array. If the entry is not there, I want to delete it from the first array.

Is there any fast way to do this? Currently I would just loop with an inner/outer loop which would be quite time consuming on large data.

I don't want you to write code for me, just some input or a good idea.

 

Thanks!

 

// EDIT:

Text from first file

VAR_11
VAR_12
VAR_13
VAR_14

Text from second file

VAR_11
VAR_12
VAR_14

So basically I want to remove VAR_13 from the first file.

Edited by HurleyShanabarger
Link to comment
Share on other sites

Yeah i dont see any other way but to loop to compare every item with each one in the first column.

While it's all in an array it should be pretty fast as it's all in memory.

Spoiler

Renamer - Rename files and folders, remove portions of text from the filename etc.

GPO Tool - Export/Import Group policy settings.

MirrorDir - Synchronize/Backup/Mirror Folders

BeatsPlayer - Music player.

Params Tool - Right click an exe to see it's parameters or execute them.

String Trigger - Triggers pasting text or applications or internet links on specific strings.

Inconspicuous - Hide files in plain sight, not fully encrypted.

Regedit Control - Registry browsing history, quickly jump into any saved key.

Time4Shutdown - Write the time for shutdown in minutes.

Power Profiles Tool - Set a profile as active, delete, duplicate, export and import.

Finished Task Shutdown - Shuts down pc when specified window/Wndl/process closes.

NetworkSpeedShutdown - Shuts down pc if download speed goes under "X" Kb/s.

IUIAutomation - Topic with framework and examples

Au3Record.exe

Link to comment
Share on other sites

Check

StringCompare 

 

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

23 minutes ago, caramen said:

Check

StringCompare

 

That is not useful at all for this scenario.

 

I came up with a solution using regexp. Any ideas for improvements? On two textfiles with around 2000 lines it needs less then 500ms.

#include <Array.au3>

$sText = StringReplace("VAR_11|VAR_12|VAR_14|VAR_15", "|", @CRLF)
$sFind = StringReplace("VAR_11|VAR_12|VAR_13", "|", @CRLF)
$sResult = _Text_FindInOtherText($sText, $sFind)
MsgBox(0, "1", "Match exact string" & @CRLF & $sResult)

$sText = StringReplace("VAR_11 test|VAR_12 next test|VAR_14 oh yeah|VAR_15", "|", @CRLF)
$sFind = StringReplace("VAR_11|VAR_12|VAR_13", "|", @CRLF)
$sResult = _Text_FindInOtherText($sText, $sFind, False)
MsgBox(0, "2", "Match exact string" & @CRLF & $sResult)

$sText = StringReplace("VAR_11 test|VAR_12 next test|VAR_14 oh yeah|VAR_15", "|", @CRLF)
$sFind = StringReplace("VAR_11|VAR_12|VAR_13", "|", @CRLF)
$sResult = _Text_FindInOtherText($sText, $sFind, True)
MsgBox(0, "3", "Match complete string" & @CRLF & $sResult)

Func _Text_FindInOtherText(Const ByRef $sText, $sFind, $boMatchLine = False)
    Local $sResult = "", $sMtch = ""
    Local $iFind_Tmp
    If StringRight($sFind, 2) <> @CRLF Then $sFind &= @CRLF
    If $boMatchLine Then $sFind = StringReplace($sFind, @CRLF, ".*" & @CRLF)

    ; Use while loop, as pattern is limit so ~30000 characters
    While 1
        $sFind_Tmp = StringLeft($sFind, 20000)
        $iFind_Tmp = StringInStr($sFind_Tmp, @CRLF, 0, -1)
        $sFind_Tmp = StringLeft($sFind, $iFind_Tmp - 1)
        $sFind = StringTrimLeft($sFind, $iFind_Tmp + 1)
        $sMtch = StringReplace($sFind_Tmp, @CRLF, "|")
        If StringRight($sMtch, 1) = "|" Then $sMtch = StringTrimRight($sMtch, 1)
        $sMtch = "(" & $sMtch & ")"
        $aData = StringRegExp($sText, $sMtch, 3)
        If Not @error Then $sResult &= _ArrayToString($aData, @CRLF) & @CRLF
        If StringLen($sFind) = 0 Then ExitLoop
    WEnd
    Return StringTrimRight($sResult, 2)
EndFunc   ;==>_Text_FindInOtherText

 

Link to comment
Share on other sites

you may want to begin with sorting both arrays. that way you need to loop only once - keep a counter for each array, advance them both if an item exists (match), advance only one otherwise. store in new array when item does not exist.

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

$Rows = UBound ( $MyArray )
      
      For $Ia = 0 To $Rows
         If StringCompare ( $MyArray[$Ia], $MyOtherArray[$Ia] , 0 ) = 0 Then
            $Match = 1
            ExitLoop
         Else
            $Match = 0
         EndIf
      Next
         If $Match = 1 Then
         Else
            IEraseMyLine
         EndIf

Maybe...If not sorry then...

 

Global $Rows = UBound ( $MyArray )
Global $While , $Ia , $Match

While ( $While = 1 )    
         If $Match = 1 Then
         Else
            IEraseMyLine
         EndIf
         Check ()
WEnd
         
Func Check ()        
      For $Ia = 0 To $Rows
         If StringCompare ( $MyArray[$Ia], $MyOtherArray[$Ia] , 0 ) = 0 Then
            $Match = 1           
         Else
            $Match = 0
            ExitLoop
         EndIf
         If $Ia = $Rows Then $While = 0
      Next
EndFunc

 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

Hello,

you can skip most of the comparings, when you sort your arrays:

 

#include <Array.au3>



Dim $aOne[1]=[0]
for $i = 1 to 1000
    _ArrayAdd($aOne,"Var_" & $i)
Next
$aOne[0]=UBound($aOne) - 1
_ArrayDisplay($aOne)


Dim $aTwo[1]=[0]
for $i = 1 to 1000
    If Random(0,2) > 1 Then
        _ArrayAdd($aTwo,"Var_" & $i)
    Else
        ContinueLoop
    EndIf
Next
$aTwo[0]=UBound($aTwo) -1
_ArrayDisplay($aTwo)

_ArraySort($aOne,0,1)
_ArraySort($aTwo,0,1)

$pointer = $aTwo[0]
$CleanupCounter=0

for $i = $aOne[0] to 1 step -1
    $Left=$aOne[$i]
    ConsoleWrite(" --------- " & $Left & " ------------" & @CRLF)
    for $x = $pointer to 1 Step - 1
        ConsoleWrite(@TAB & "$i = " & $i & ", $x = " & $x & @CRLF)
        $Right= $aTwo[$x]
        if $Left < $Right Then
            ConsoleWrite(@TAB & @TAB & $Left & " < " & $Right & @CRLF)
        Elseif $Left = $Right Then
        ConsoleWrite("MATCH! & " & $Left & " = " & $Right & @CRLF)
        $pointer = $x
        ConsoleWrite("--- TAKE SHORTCUT MATCH ---" & @CRLF)
        ContinueLoop 2
        Else
        ConsoleWrite(@TAB & $Left & " > " & $Right & " --> no further machtes can show up, as sorted arrays!" &  @CRLF)
        ConsoleWrite("/// TAKE SHORTCUT PASSED BY ///" & @CRLF)
        ContinueLoop 2
        EndIf
    Next
Next

$aOne[0]=UBound($aOne) - 1

_ArrayDisplay($aOne,$CleanupCounter)

Regards, Rudi.

Earth is flat, pigs can fly, and Nuclear Power is SAFE!

Link to comment
Share on other sites

If the arrays are sorted, use _ArrayBinarySearch, it's a lot faster than anything else.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Curious if this would do you one better -- If your data can support this strategy that is --

You can skip any further processing if you process during the sorting.
If you could potentially make a separate array, which contains the elements from the arrays you're comparing, and sort that combined set of data, you could save some bit of time.

Then, while you're sorting, if a new value is added without the previous value being a duplicate, you overwrite the previous 

For example:
 

Text from first file

VAR_11
VAR_12
VAR_13
VAR_14

Text from second file

VAR_11
VAR_12
VAR_14

Would give you VAR_11, VAR_11, VAR_12, VAR_12, VAR_13, VAR _14, VAR_14

The issue would most likely come from the instance where VAR_13 appears in the second file but not the first, but you haven't specified if that's possible or not. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...