Jump to content

Remove Duplicates - Excel **Answered past post**


Recommended Posts

@Subz actually answered this in a past post of mine, but it didnt come up in my first few searches dues to wording!

 

Ok..so ive read all the topics pertaining to this problem...I need something that can sort down  2 columns/175000 rows and delete duplicates (just the first)...

For $DupCount = 1 to 200000
            $RangeA = "A" & $DupCount
            $RangeB = "A" & ($DupCount+1)
            $TempReadA = _Excel_RangeRead($oWorkbook, "Main", $RangeA, 1)
            $TempReadB = _Excel_RangeRead($oWorkbook, "Main", $RangeB, 1)
            Switch $TempReadA
                Case $TempReadA = $TempReadB
                    _Excel_RangeWrite($oWorkbook, "MAIN", "", $RangeA)
                    _Excel_RangeWrite($oWorkbook, "MAIN", "", $RangeB)
                    $DupCount = $DupCount + 1
                    GUICtrlSetData($i_Counter, $DupCount, "")
                Case "EndFile"
                    GUICtrlSetData($i_Counter, $DupCount, "")
            EndSwitch
    Next

As you can see, I use 200000 as a temp number since the # of rows varies. I read cell A of the first row, then cell A from the secound row...If they match I remove cell A of the first row..if there is no match it restarts....This is extremely slow, but it does give the results I need. I dont want both rows to be deleted, just the first instance row. So, is there a way to speed this puppy up or do I just use the scrappy code I have?

@water This is mainly aimed at you since it seems you are the Excel guru!!

Edited by Fractured
Answered
Link to comment
Share on other sites

Can you please post the solution you found so it can be useful for others in the future?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

i would try to minimize interaction with sheet and range objects inside loops, something similar to this:

#include <Excel.au3>
#include <Array.au3>
#include <WinAPIFiles.au3>

Opt("MustDeclareVars", 1)
Opt("TrayIconDebug", 1)

Global $sFilePath = @ScriptDir & "\" & "test.xlsx"

Global $oMyError = ObjEvent("AutoIt.Error", "ErrFunc") ;Install a custom error handler
Global $iEventError ; to be checked to know if com error occurs. Must be reset after handling.

Global $oAppl = _Excel_Open(False, Default, Default, Default, True)
$oAppl.EnableEvents = False

Global $oWorkbook = _Excel_BookOpen($oAppl, $sFilePath)
If @error Then Exit MsgBox(0, "Error", "Error _Excel_BookOpen: " & $sFilePath & @CRLF & "@error = " & @error & ", @extended = " & @extended)

Global $oSheet = $oWorkbook.ActiveSheet
;Global $oSheet = $oWorkbook.Sheets("Main")
;MsgBox(0, "", $oSheet.Name)

;http://www.siddharthrout.com/index.php/2018/02/10/find-last-row-and-last-column/
;Find Last Row and Last Column. Siddharth Rout, Feb 2018
Global $LastRow = $oSheet.Cells(1048576, 1).End(-4162).Row

Global $aResult = _Excel_RangeRead($oWorkbook, Default, $oSheet.Range("A1:B" & $LastRow), 1)
If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeRead", "Error reading from workbook." & @CRLF & "@error = " & @error & ", @extended = " & @extended)

;iamtheky, January 2011
;https://www.autoitscript.com/forum/topic/124194-compare-how-many-items-are-equal-in-twox-strings/?do=findComment&comment=862433
;martin
;https://www.autoitscript.com/forum/topic/108392-array-for-loop-help/?do=findComment&comment=763988
;Melba23
;https://www.autoitscript.com/forum/topic/132027-array-delete-blank-element/?do=findComment&comment=919584
For $i = UBound($aResult) - 1 to 0 Step - 1
   If $aResult[$i][0] = $aResult[$i][1] Then
   _ArrayDelete($aResult, $i)
   EndIf
Next

$oSheet.Range("A1:B" & $LastRow).ClearContents
_Excel_RangeWrite($oWorkbook, Default, $aResult, "A1")
;~ _Excel_RangeWrite($oWorkbook, $oSheet, $aResult, $oSheet.Cells(1, 1))

$oAppl.EnableEvents = True
;$oAppl.DisplayAlerts = True

$sFilePath = StringReplace($sFilePath, ".xls", "_result.xls")

_Excel_BookSaveAs($oWorkbook, $sFilePath, $xlOpenXMLWorkbook, True) ;$xlOpenXMLWorkbook  51  ;$xlExcel8  56
_Excel_BookClose($oWorkbook, False)
;If @error Then Exit MsgBox(0, "Error", "Error _Excel_BookClose: " & $sFilePath & @CRLF & "@error = " & @error & ", @extended = " & @extended)

While _WinAPI_FileInUse($sFilePath)
   Sleep(1000)
Wend

;https://www.autoitscript.com/forum/topic/136414-excel-close-problem/?do=findComment&comment=953433
;$oAppl.Application.Quit
;$oAppl = ""
;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/
;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/?do=findComment&comment=1262478
;Run(@ComSpec & " /c " & 'taskkill /im excel.exe /f /t', "", @SW_HIDE)
;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/?do=findComment&comment=1262830
;water / Ok. Let's see if the problem is caused by open/close or by working with a workbook. Could you please try:
;#include <Excel.au3>
;$oExcel = _Excel_Open(False, False, False, False, True)
;$oExcelClose = _Excel_Close($oExcel, False, True)
_Excel_Close($oAppl, False, True)
;If @error Then Exit MsgBox(0, "Error", "Error _Excel_Close" & @CRLF & "@error = " & @error & ", @extended = " & @extended)

Sleep(1000)

Global $aProcesses = ProcessList("Excel.exe")
;_ArrayDisplay($aProcesses)
If $aProcesses[0][0] > 0 Then
   ;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/?do=findComment&comment=1263191
   ;@water, thanks for your help so far, at least we pinned down that it's not a UDF bug. :)
   ;For now I will use a crude workaround by closing the most recent Excel.exe instance:
   ProcessClose($aProcesses[$aProcesses[0][0]][1])
   Sleep(500) ;just to allow some time for the process to definitely close (if it does close)
EndIf

;This is a custom error handler
Func ErrFunc()
   Local $HexNumber = Hex($oMyError.number, 8)
;~    MsgBox(0, "", "We intercepted a COM Error !" & @CRLF & _
;~       "Number is: " & $HexNumber & @CRLF & _
;~       "WinDescription is: " & $oMyError.windescription)
   ConsoleWrite("->    We intercepted a COM Error !" & @CRLF & _
   "->    err.number is: " & @TAB & $HexNumber & @CRLF & _
   "->    err.source: " & @TAB & $oMyError.source & @CRLF & _
   "->    err.windescription: " & @TAB & $oMyError.windescription & _
   "->    err.scriptline is: " & @TAB & $oMyError.scriptline & @CRLF)

   $iEventError = 1 ; Use to check when a COM Error occurs
EndFunc   ;==>ErrFunc

 

Edited by robertocm
Link to comment
Share on other sites

This article explains how to remove duplicates.
Start the Excel macro recorder, remove duplicates as described in the article and translate the VBA skript to AutoIt. Done.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

On 8/14/2020 at 9:38 PM, water said:

translate the VBA skript to AutoIt. Done.

does not seem to work

test.xlsx
A   B
abc bca
abc bca
def fed
def fed

 

#include <Excel.au3>

Global $sFilePath = @ScriptDir & "\" & "test.xlsx"

Global $oAppl = _Excel_Open()
Global $oWorkbook = _Excel_BookOpen($oAppl, $sFilePath)

Global $oSheet = $oWorkbook.ActiveSheet

Global $oRange = $oSheet.Range("A1:B5")
If IsObj($oRange) Then
   ConsoleWrite("-> Is Object" & @CRLF)
   ;$oRange.RemoveDuplicates
   $oRange.RemoveDuplicates(Default, $xlYes)
EndIf

 

using a Dictionary:

#include <Excel.au3>
#include <Array.au3>
#include <WinAPIFiles.au3>

Opt("MustDeclareVars", 1)
Opt("TrayIconDebug", 1)

Global $sFilePath = @ScriptDir & "\" & "test.xlsx"

Global $oMyError = ObjEvent("AutoIt.Error", "ErrFunc") ;Install a custom error handler
Global $iEventError ; to be checked to know if com error occurs. Must be reset after handling.

Global $oAppl = _Excel_Open(False, Default, Default, Default, True)
$oAppl.EnableEvents = False

Global $oWorkbook = _Excel_BookOpen($oAppl, $sFilePath)
If @error Then Exit MsgBox(0, "Error", "Error _Excel_BookOpen: " & $sFilePath & @CRLF & "@error = " & @error & ", @extended = " & @extended)

Global $oSheet = $oWorkbook.ActiveSheet
;Global $oSheet = $oWorkbook.Sheets("Main")
;MsgBox(0, "", $oSheet.Name)

;http://www.siddharthrout.com/index.php/2018/02/10/find-last-row-and-last-column/
;Find Last Row and Last Column. Siddharth Rout, Feb 2018
Global $LastRow = $oSheet.Cells(1048576, 1).End(-4162).Row

;~ ;Does not seem to work
;~ Global $oRange = $oSheet.Range("A1:B" & $LastRow)
;~ If IsObj($oRange) Then
;~    ConsoleWrite("-> Is Object" & @CRLF)
;~    $oRange.RemoveDuplicates
;~ EndIf

Global $aResult = _Excel_RangeRead($oWorkbook, Default, $oSheet.Range("A1:B" & $LastRow), 1)
If @error Then Exit MsgBox($MB_SYSTEMMODAL, "Excel UDF: _Excel_RangeRead", "Error reading from workbook." & @CRLF & "@error = " & @error & ", @extended = " & @extended)

Global $Key
Global $oDictionary = ObjCreate("Scripting.Dictionary")
$oDictionary.CompareMode = 1; case insensitive

;iamtheky, January 2011
;https://www.autoitscript.com/forum/topic/124194-compare-how-many-items-are-equal-in-twox-strings/?do=findComment&comment=862433
;martin
;https://www.autoitscript.com/forum/topic/108392-array-for-loop-help/?do=findComment&comment=763988
;Melba23
;https://www.autoitscript.com/forum/topic/132027-array-delete-blank-element/?do=findComment&comment=919584
For $i = UBound($aResult) - 1 to 0 Step - 1
   $Key = $aResult[$i][0] & "~" & $aResult[$i][1]
   If $oDictionary.Exists($Key) Then
      _ArrayDelete($aResult, $i)
   Else
      $oDictionary.Item($Key)
   EndIf
Next
Global $Keys = $oDictionary.Keys()
_ArrayDisplay($Keys, "Keys")

$oSheet.Range("A1:B" & $LastRow).ClearContents
_Excel_RangeWrite($oWorkbook, Default, $aResult, "A1")
;_Excel_RangeWrite($oWorkbook, $oSheet, $aResult, $oSheet.Cells(1, 1))

$oAppl.EnableEvents = True
;$oAppl.DisplayAlerts = True

$sFilePath = StringReplace($sFilePath, ".xls", "_result.xls")

_Excel_BookSaveAs($oWorkbook, $sFilePath, $xlOpenXMLWorkbook, True) ;$xlOpenXMLWorkbook  51  ;$xlExcel8  56
_Excel_BookClose($oWorkbook, False)
;If @error Then Exit MsgBox(0, "Error", "Error _Excel_BookClose: " & $sFilePath & @CRLF & "@error = " & @error & ", @extended = " & @extended)

While _WinAPI_FileInUse($sFilePath)
   Sleep(1000)
Wend

;https://www.autoitscript.com/forum/topic/136414-excel-close-problem/?do=findComment&comment=953433
;$oAppl.Application.Quit
;$oAppl = ""
;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/
;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/?do=findComment&comment=1262478
;Run(@ComSpec & " /c " & 'taskkill /im excel.exe /f /t', "", @SW_HIDE)
;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/?do=findComment&comment=1262830
;water / Ok. Let's see if the problem is caused by open/close or by working with a workbook. Could you please try:
;#include <Excel.au3>
;$oExcel = _Excel_Open(False, False, False, False, True)
;$oExcelClose = _Excel_Close($oExcel, False, True)
_Excel_Close($oAppl, False, True)
;If @error Then Exit MsgBox(0, "Error", "Error _Excel_Close" & @CRLF & "@error = " & @error & ", @extended = " & @extended)

Sleep(1000)

Global $aProcesses = ProcessList("Excel.exe")
;_ArrayDisplay($aProcesses)
If $aProcesses[0][0] > 0 Then
   ;https://www.autoitscript.com/forum/topic/166043-close-the-entire-application-of-excel/?do=findComment&comment=1263191
   ;@water, thanks for your help so far, at least we pinned down that it's not a UDF bug. :)
   ;For now I will use a crude workaround by closing the most recent Excel.exe instance:
   ProcessClose($aProcesses[$aProcesses[0][0]][1])
   Sleep(500) ;just to allow some time for the process to definitely close (if it does close)
EndIf

;This is a custom error handler
Func ErrFunc()
   Local $HexNumber = Hex($oMyError.number, 8)
;~    MsgBox(0, "", "We intercepted a COM Error !" & @CRLF & _
;~       "Number is: " & $HexNumber & @CRLF & _
;~       "WinDescription is: " & $oMyError.windescription)
   ConsoleWrite("->    We intercepted a COM Error !" & @CRLF & _
   "->    err.number is: " & @TAB & $HexNumber & @CRLF & _
   "->    err.source: " & @TAB & $oMyError.source & @CRLF & _
   "->    err.windescription: " & @TAB & $oMyError.windescription & _
   "->    err.scriptline is: " & @TAB & $oMyError.scriptline & @CRLF)

   $iEventError = 1 ; Use to check when a COM Error occurs
EndFunc   ;==>ErrFunc

 

Edited by robertocm
Link to comment
Share on other sites

 

2 hours ago, robertocm said:

$oRange.RemoveDuplicates(Default, $xlYes)

@robertocm It works if you use the correct parameters:

$oRange.RemoveDuplicates(2, $xlYes)

Or

Global $aDupCols = [2]
$oRange.RemoveDuplicates($aDupCols, $xlYes)

 

Range.RemoveDuplicates

Edited by TheXman
Link to comment
Share on other sites

 A quick and dirty answer to my remove duplicates problem!!

$iRows = $oWorkbook.ActiveSheet.UsedRange.Rows.Count    ;$iRows = # of Rows in Sheet
GUICtrlSetData($i_Counter, $iRows, "")                  ;My GUI counter - not needed except for ref
Local $aColumns[] = [1, 2]                              ;$aColumns = # of columns used in sheet - Dosent Vary
$oWorkbook.ActiveSheet.Range("A1:B" & $iRows).RemoveDuplicates($aColumns) ;Removes Duplicates
$oWorkbook.Sheets("MAIN").Range("A:B").Columns.AutoFit    ; Autofit Cell Widths 
$oWorkbook.ActiveSheet.Columns("A:A").SpecialCells($xlCellTypeBlanks).EntireRow.Delete  ;  Remove blank rows Created From Remove Duplicates
$oWorkbook.ActiveSheet.Columns("B:B").SpecialCells($xlCellTypeBlanks).EntireRow.Delete  ;  Remove blank rows Created From Remove Duplicates

Not pretty but pretty fast!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...