Jump to content

The way to remove the links that are duplicates?


Recommended Posts

yo, i need some help.

Im training with _StringsBetween, on big site! Alot fun, My script turned 8 times created a text file with a weight of 1GB. I saw that in the file was a lot of repetition. Is it possible to somehow set the string to save the file, only one URL without repetition?

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <INet.au3>
#include <StringConstants.au3>
#include <File.au3>

;I'm coming for blood, no code of conduct, no law.
;I'm coming for blood, no code of conduct, no law.
#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Form1", 1211, 812, 43, 110)
$Button1 = GUICtrlCreateButton("Button1", 0, 8, 249, 81)
$Button2 = GUICtrlCreateButton("Button2", 350, 8, 249, 81)
$Edit1 = GUICtrlCreateEdit("0", 8, 128, 609, 257, BitOR($ES_CENTER,$ES_AUTOHSCROLL,$ES_READONLY,$ES_WANTRETURN))
$Edit2 = GUICtrlCreateEdit("2", 632, 0, 577, 809)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda
local $iFileSize = FileGetSize('')


Func VisitFrontPage()
    local $Liczba = _FileCountLines(@ScriptDir&'\data\links.txt')
    local $liczba2 = GUICtrlSetData($Edit1,Random(1,$Liczba,1))
    local $Liczba3 = GUICtrlRead($Edit1)
    local $Liczba4 = FileReadLine(@ScriptDir&'\data\links.txt',$Liczba3)
$data = _INetGetSource ( FileReadLine(@ScriptDir&'\data\links.txt',Random(1,$Liczba,1)))
$linki = StringRegExp($data, '<a href="http://www.wykop.pl/link/(.*?)/" title=""',3)
For $q = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/link/'&$linki[$q]&@CRLF)
Next
$linki = StringRegExp($data, 'href="http://www.wykop.pl/ludzie/(.*?)/">',3) ; pobieranie ludzi co dodali znaleziska
For $w = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$w]&@CRLF)
Next
$linki = StringRegExp($data, '<a href="http://www.wykop.pl/ludzie/(.*?)/" title="',3) ; pobieranie ludzi co sa na stronie z mikro
For $e = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$e]&@CRLF)
Next
$linki = StringRegExp($data, '<a class="tag create" href="http://www.wykop.pl/tag/(.*?)/"><em>',3) ; pobieranie ludzi co sa na stronie z mikro
For $r = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/'&$linki[$r]&@CRLF)
Next
$linki = StringRegExp($data, '<class="showTagSummary" href="http://www.wykop.pl/tag/(.*?)">',3) ; pobieranie ludzi co sa na stronie z mikro
For $t = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/'&$linki[$t]&@CRLF)
Next
 $linki = StringRegExp($data, 'href="http://www.wykop.pl/ludzie/(.*?)/" title=""',3) ; pobieranie ludzi z znaleziska, komentarze
For $a = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$a]&@CRLF)
Next
 $linki = StringRegExp($data, 'href="http://www.wykop.pl/tag/index/(.*?)/"',3) ; pobierane tagi ze znaleziska,
For $s = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/index/'&$linki[$s]&@CRLF)
Next
 $linki = StringRegExp($data, '<a class="clearfix" href="http://www.wykop.pl/link/(.*?)/?utm_source',3) ; pobierane znaleziska z prawego menu
For $d = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/link/'&$linki[$d]&@CRLF)
Next
Sleep(5000)
GuiCtrlSetData($Edit2, GuiCtrlRead($Edit2)+1)
_FileWriteToLine(@ScriptDir&'\data\links.txt',GuiCtrlRead($edit1),'',1)
VisitFrontPage()
EndFunc

While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $Button1
        Case $Button2
            VisitFrontPage()
        Case $GUI_EVENT_CLOSE
            Exit

    EndSwitch
WEnd

 

Link to post
Share on other sites

Why do you make the function VisitFrontPage recursive and what's the purpose of

local $iFileSize = FileGetSize('')

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By mLipok
      Usually when I collect data from DataBase I need to give EndUser a possibility to select rows which should be taken in the processing loop.
      I was searching on the forum and I'm not able to find any UDF or even example of how to select data from array.
      I have my own solutions but I think they are not worth posting on the forum as it is very old code and I am looking for a better solution.

      Could anybody point me to some examples/solutions ?

      Thank you in advance.
      @mLipok
    • By Hermes
      Hi, I am struggling in setting the value of a textarea based on the value of clipboard (that contains a long web page source codes). If I use _WD_SetElementValue, it freezes after some time, or appears to be pressing tab and goes out of focus. I can also use send keys but i need the script to run in the background.
      Here is the full script:
      #Include "Chrome.au3" #Include "wd_core.au3" #Include "wd_helper.au3" #Include "WinHttp.au3" #include <MsgBoxConstants.au3> #include <WinAPIFiles.au3> #include <Array.au3> #include <AutoItConstants.au3> #include <WinAPIFiles.au3> #include <GDIPlus.au3> #include <Excel.au3> Local $sDesiredCapabilities, $sSession SetupChrome() _WD_Startup() $sSession = _WD_CreateSession($sDesiredCapabilities) _WD_LoadWait($sSession) _WD_Navigate($sSession, "http://demo.borland.com/testsite/stadyn_largepagewithimages.html") _WD_LoadWait($sSession) Global $sSource = _WD_GetSource($sSession) Local $Paste = ClipPut($sSource) Local $sData = ClipGet() Local $aArray = 0, _ $iOffset = 1 While 1 $aArray = StringRegExp($sData, '(?s)<p>.*</p>', $STR_REGEXPARRAYMATCH, $iOffset) If @error Then ExitLoop $iOffset = @extended For $i = 0 To UBound($aArray) - 1 Local $Paste = ClipPut($aArray[$i]) Local $sRegExData = ClipGet() ;MsgBox(0, "", "$sRegExData = " & $sRegExData) Next WEnd _WD_Navigate($sSession, "https://www.w3schools.com/tags/tryit.asp?filename=tryhtml5_textarea_placeholder") _WD_WaitElement($sSession, $_WD_LOCATOR_ByCSSSelector, "iframe#iframeResult") Local $sElement1 = _WD_FindElement($sSession, $_WD_LOCATOR_ByCSSSelector, "iframe#iframeResult") _WD_FrameEnter($sSession, $sElement1) _WD_WaitElement($sSession, $_WD_LOCATOR_ByXPath, "//html/body/textarea") $textarea = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//html/body/textarea") _WD_ElementAction($sSession, $textarea, 'click') ;WD SetElementValue(SsSession, Stextarea, $sRegExData) <-- I can do this but the focus goes out, or the browser freezes _WD_FrameLeave($sSession) sleep(2000) Send("^v") _WD_LoadWait($sSession) _WD_Shutdown() Func SetupChrome() _WD_Option('Driver', 'chromedriver.exe') _WD_Option('Port', 9515) _WD_Option('DriverParams', '--log-path="' & @ScriptDir & '\chrome.log"') $sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "args":["start-maximized","disable-infobars"]}}}}' EndFunc ;==>SetupChrome Can someone help me please, or re-direct me to the right path? TIA!
    • By EmilyLove
      I have a string containing the full path of an executable and an array of executables without their paths. I am trying to compare the string to the list in the array and if a match is found, remove it from the array. The entry get removed from the array successfully, and after checking its return result, uses it to update the ubound if it succeeded, but it doesn't want to update to the new value. Any ideas what I am doing wrong? It acts like it is read-only.
      #include <Array.au3> #include <File.au3> Local $sApp_Exe = "F:\App\Nextcloud\nextcloud.exe" Local $aWaitForEXEX = [3, "Nextcloud.exe", "nextcloudcmd.exe", "QtWebEngineProcess.exe"] For $h = 1 To $aWaitForEXEX[0] If StringInStr($sApp_Exe, $aWaitForEXEX[$h]) <> 0 Then $iRet = _ArrayDelete($aWaitForEXEX, $h) If $iRet <> -1 Then $aWaitForEXEX[0] = $iRet ;this line doesn't work. $aWaitForEXEX[0] doesn't update and shortly gives Error: Array variable has incorrect number of subscripts or subscript dimension range exceeded.: _ArrayDisplay($aWaitForEXEX) EndIf Next  
    • By DJ143
      I have a autoit exe file which is used in upload/browse file functionality.  This has been integrated with selenium framework and I am invoking the autoit exe using Java process and runtime. 
      Now the issue is when I run the scripts and invoke the autoit exe in local it works perfectly.  But when I use selenium grid or jenkins to run the scripts in another windows server it is not working.
      Can anyone please suggest any solution for this?
    • By Hermes
      Hello, the script below will read column A from an excel file - and if a value matches in the browser, it will click the corresponding link and click on a specific button to paste the data, then writes "Completed" in Column B. It will continue to read from the excel file and do the same thing for all the remaining rows.
      #Include "Chrome.au3" #Include "wd_core.au3" #Include "wd_helper.au3" #Include "WinHttp.au3" #include <MsgBoxConstants.au3> #include <File.au3> #include <IE.au3> #include <Array.au3> #include <INet.au3> #include <AutoItConstants.au3> #include <WinAPIFiles.au3> #include <GDIPlus.au3> #include <Excel.au3> #Include "WinHttp.au3" #Include "_HtmlTable2Array.au3" Local $sDesiredCapabilities, $sSession SetupChrome() _WD_Startup() $sSession = _WD_CreateSession($sDesiredCapabilities) _WD_LoadWait($sSession) _WD_Navigate($sSession, "table1.html") _WD_LoadWait($sSession) _WD_WaitElement($sSession, $_WD_LOCATOR_ByXPath, "//table[@class='main']") Local $sElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//table[@class='main']") ;ConsoleWrite ("mat-table " & $sElement & @CRLF) Local $aArray1 = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, ".//td[contains(@class,'data')]", $sElement, True) sleep(1000) For $i = 0 to UBound($aArray1) - 1 $aArray1[$i] = _WD_ElementAction($sSession, $aArray1[$i], 'text') Next ;_ArrayDisplay($aArray1) ;Email variables $SmtpServer = "" ; address for the smtp-server to use - REQUIRED $FromName = "Hermes" ; name from who the email was sent $FromAddress = "sender@gmail.com" ; address from where the mail should come $ToAddress = "recipient@gmail.com" ; destination address of the email - REQUIRED, use commas (,) to add more email addresses $Subject = "File not found" ; subject from the email - can be anything you want it to be $Body = "File not found!" ; the messagebody from the mail - can be left blank but then you get a blank mail $AttachFiles = "" ; the file(s) you want to attach seperated with a ; (Semicolon) - leave blank if not needed $CcAddress = "" ; address for cc - leave blank if not needed $BccAddress = "" ; address for bcc - leave blank if not needed $Importance = "High" ; Send message priority: "High", "Normal", "Low" $Username = "" ; username for the account used from where the mail gets sent - REQUIRED $Password = "" ; password for the account used from where the mail gets sent - REQUIRED $IPPort = 25 ; port used for sending the mail $ssl = 0 ; enables/disables secure socket layer sending - put to 1 if using httpS $tls = 0 ; enables/disables TLS when required Local $oAppl = _Excel_Open() Local $sWorkbook = "c:\test.xlsx" Local $oWorkbook = _Excel_BookOpen($oAppl, $sWorkbook) ;open excel and pass both parameters If FileExists($sWorkbook) Then ;Check if the file exist. Local $oAppl = _Excel_Open() Local $sWorkbook = "c:\test.xlsx" Local $oWorkbook = _Excel_BookOpen($oAppl, $sWorkbook) ;open excel and pass both parameters Local $aArray2 = _Excel_RangeRead($oWorkbook,Default,$oWorkbook.ActiveSheet.Usedrange.Columns("A:A")) Local $iIdx Local $Skipline = 0 ;0==> first line Do Local $temprf For $i = 0 To UBound($aArray2) - 1 $temprf &= $aArray2[$i] _WD_WaitElement($sSession, $_WD_LOCATOR_ByXPath, ".//a[contains(@class,'edit') and contains(text(),'Edit')]") Local $aElement = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, ".//a[contains(@class,'edit') and contains(text(),'Edit')]", $sElement, True) $iIdx = _ArraySearch($aArray1, $aArray2[$i]) If @error Then ContinueLoop _WD_ElementAction($sSession, $aElement[$iIdx], 'click') If $i < $Skipline Then ContinueLoop $oRange = $oWorkbook.ActiveSheet.Range("B" & $i + 1 & ":XFD" & $i + 1) _Excel_RangeCopyPaste($oWorkbook.Activesheet, $oRange) ;Paste Local $oTest4 = _WD_FindElement($sSession, $_WD_LOCATOR_ByCSSSelector, "pastebutton") _WD_ElementAction($sSession, $oTest4, 'click') Sleep(1000) ;Save Button Local $save3 = _WD_FindElement($sSession, $_WD_LOCATOR_ByCSSSelector, "button.button") _WD_ElementAction($sSession, $save3, 'click') _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, "Completed", "B" & $i+1) sleep(1000) Next Until (Not @error) _Excel_Close($oWorkbook) Else _INetSmtpMailCom($SmtpServer, $FromName, $FromAddress, $ToAddress, $Subject, $Body, $CcAddress, $BccAddress, $Importance, $Username, $Password, $IPPort, $ssl, $tls) Exit EndIf _WD_LoadWait($sSession) ;Attaching files to emails Func _INetSmtpMailCom($s_SmtpServer, $s_FromName, $s_FromAddress, $s_ToAddress, $s_Subject = "", $as_Body = "", $s_CcAddress = "", $s_BccAddress = "", $s_Importance="Normal", $s_Username = "", $s_Password = "", $IPPort = 25, $ssl = 0, $tls = 0) Local $objEmail = ObjCreate("CDO.Message") $objEmail.From = '"' & $s_FromName & '" <' & $s_FromAddress & '>' $objEmail.To = $s_ToAddress Local $i_Error = 0 Local $i_Error_desciption = "" If $s_CcAddress <> "" Then $objEmail.Cc = $s_CcAddress If $s_BccAddress <> "" Then $objEmail.Bcc = $s_BccAddress $objEmail.Subject = $s_Subject If StringInStr($as_Body, "<") And StringInStr($as_Body, ">") Then $objEmail.HTMLBody = $as_Body Else $objEmail.Textbody = $as_Body & @CRLF EndIf $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2 $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = $s_SmtpServer If Number($IPPort) = 0 then $IPPort = 25 $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = $IPPort ;Authenticated SMTP If $s_Username <> "" Then $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 1 $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/sendusername") = $s_Username $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/sendpassword") = $s_Password EndIf ; Set security params If $ssl Then $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = True If $tls Then $objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/sendtls") = True ;Update settings $objEmail.Configuration.Fields.Update ; Set Email Importance Switch $s_Importance Case "High" $objEmail.Fields.Item ("urn:schemas:mailheader:Importance") = "High" Case "Normal" $objEmail.Fields.Item ("urn:schemas:mailheader:Importance") = "Normal" Case "Low" $objEmail.Fields.Item ("urn:schemas:mailheader:Importance") = "Low" EndSwitch $objEmail.Fields.Update ; Sent the Message $objEmail.Send $objEmail="" EndFunc ;==>_INetSmtpMailCom Local $aDir = _FileListToArrayRec(@TempDir, "scoped_dir*;chrome_*", $FLTAR_FOLDERS, $FLTAR_NORECUR, $FLTAR_NOSORT, $FLTAR_FULLPATH) Sleep(2000) For $i = 1 To $aDir[0] DirRemove($aDir[$i], $DIR_REMOVE) Next _WD_LoadWait($sSession) _WD_Shutdown() Func SetupChrome() _WD_Option('Driver', 'chromedriver.exe') _WD_Option('Port', 9515) _WD_Option('DriverParams', '--log-path="' & @ScriptDir & '\chrome.log"') $sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "args":["start-maximized","disable-infobars"]}}}}' EndFunc ;==>SetupChrome If the excel file doesn't exists in the folder, it will send an email to a specific recipient.
      What i am trying figure out now is if the excel crashes while the script/loop is running, I want to relaunch the excel file continue to the last row before the excel crashed. So if the value of column B is not marked as "completed", it should continue from that row
      Appreciate any help that I can get to achieve this.
      table1.html test.xlsx
×
×
  • Create New...