Jump to content

The way to remove the links that are duplicates?


Recommended Posts

yo, i need some help.

Im training with _StringsBetween, on big site! Alot fun, My script turned 8 times created a text file with a weight of 1GB. I saw that in the file was a lot of repetition. Is it possible to somehow set the string to save the file, only one URL without repetition?

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <INet.au3>
#include <StringConstants.au3>
#include <File.au3>

;I'm coming for blood, no code of conduct, no law.
;I'm coming for blood, no code of conduct, no law.
#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Form1", 1211, 812, 43, 110)
$Button1 = GUICtrlCreateButton("Button1", 0, 8, 249, 81)
$Button2 = GUICtrlCreateButton("Button2", 350, 8, 249, 81)
$Edit1 = GUICtrlCreateEdit("0", 8, 128, 609, 257, BitOR($ES_CENTER,$ES_AUTOHSCROLL,$ES_READONLY,$ES_WANTRETURN))
$Edit2 = GUICtrlCreateEdit("2", 632, 0, 577, 809)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda
local $iFileSize = FileGetSize('')


Func VisitFrontPage()
    local $Liczba = _FileCountLines(@ScriptDir&'\data\links.txt')
    local $liczba2 = GUICtrlSetData($Edit1,Random(1,$Liczba,1))
    local $Liczba3 = GUICtrlRead($Edit1)
    local $Liczba4 = FileReadLine(@ScriptDir&'\data\links.txt',$Liczba3)
$data = _INetGetSource ( FileReadLine(@ScriptDir&'\data\links.txt',Random(1,$Liczba,1)))
$linki = StringRegExp($data, '<a href="http://www.wykop.pl/link/(.*?)/" title=""',3)
For $q = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/link/'&$linki[$q]&@CRLF)
Next
$linki = StringRegExp($data, 'href="http://www.wykop.pl/ludzie/(.*?)/">',3) ; pobieranie ludzi co dodali znaleziska
For $w = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$w]&@CRLF)
Next
$linki = StringRegExp($data, '<a href="http://www.wykop.pl/ludzie/(.*?)/" title="',3) ; pobieranie ludzi co sa na stronie z mikro
For $e = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$e]&@CRLF)
Next
$linki = StringRegExp($data, '<a class="tag create" href="http://www.wykop.pl/tag/(.*?)/"><em>',3) ; pobieranie ludzi co sa na stronie z mikro
For $r = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/'&$linki[$r]&@CRLF)
Next
$linki = StringRegExp($data, '<class="showTagSummary" href="http://www.wykop.pl/tag/(.*?)">',3) ; pobieranie ludzi co sa na stronie z mikro
For $t = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/'&$linki[$t]&@CRLF)
Next
 $linki = StringRegExp($data, 'href="http://www.wykop.pl/ludzie/(.*?)/" title=""',3) ; pobieranie ludzi z znaleziska, komentarze
For $a = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$a]&@CRLF)
Next
 $linki = StringRegExp($data, 'href="http://www.wykop.pl/tag/index/(.*?)/"',3) ; pobierane tagi ze znaleziska,
For $s = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/index/'&$linki[$s]&@CRLF)
Next
 $linki = StringRegExp($data, '<a class="clearfix" href="http://www.wykop.pl/link/(.*?)/?utm_source',3) ; pobierane znaleziska z prawego menu
For $d = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/link/'&$linki[$d]&@CRLF)
Next
Sleep(5000)
GuiCtrlSetData($Edit2, GuiCtrlRead($Edit2)+1)
_FileWriteToLine(@ScriptDir&'\data\links.txt',GuiCtrlRead($edit1),'',1)
VisitFrontPage()
EndFunc

While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $Button1
        Case $Button2
            VisitFrontPage()
        Case $GUI_EVENT_CLOSE
            Exit

    EndSwitch
WEnd

 

Link to post
Share on other sites

Why do you make the function VisitFrontPage recursive and what's the purpose of

local $iFileSize = FileGetSize('')

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By PeterVerbeek
      This topic give you access to an AutoIt functions library I maintain which is called PAL, Peter's AutoIt Library. The latest version 1.26 contains 214 functions divided into these topics:
      window, desktop and monitor GUI, mouse and color GUI controls including graphical buttons (jpg, png) logics and mathematics include constants string, xml string and file string dialogues and progress bars data lists: lists, stacks, shift registers and key maps (a.ka. dictionaries) miscellaneous: logging/debugging, process and system info Change log and files section  on the PAL website (SourceForge).
      A lot of these functions were created in the development of Peace, Peter's Equalizer APO Configuration Extension, which is a user interface for the system-wide audio driver called Equalizer APO.
    • By mLipok
      Usually when I collect data from DataBase I need to give EndUser a possibility to select rows which should be taken in the processing loop.
      I was searching on the forum and I'm not able to find any UDF or even example of how to select data from array.
      I have my own solutions but I think they are not worth posting on the forum as it is very old code and I am looking for a better solution.

      Could anybody point me to some examples/solutions ?

      Thank you in advance.
      @mLipok
    • By Hermes
      Hi, I am struggling in setting the value of a textarea based on the value of clipboard (that contains a long web page source codes). If I use _WD_SetElementValue, it freezes after some time, or appears to be pressing tab and goes out of focus. I can also use send keys but i need the script to run in the background.
      Here is the full script:
      #Include "Chrome.au3" #Include "wd_core.au3" #Include "wd_helper.au3" #Include "WinHttp.au3" #include <MsgBoxConstants.au3> #include <WinAPIFiles.au3> #include <Array.au3> #include <AutoItConstants.au3> #include <WinAPIFiles.au3> #include <GDIPlus.au3> #include <Excel.au3> Local $sDesiredCapabilities, $sSession SetupChrome() _WD_Startup() $sSession = _WD_CreateSession($sDesiredCapabilities) _WD_LoadWait($sSession) _WD_Navigate($sSession, "http://demo.borland.com/testsite/stadyn_largepagewithimages.html") _WD_LoadWait($sSession) Global $sSource = _WD_GetSource($sSession) Local $Paste = ClipPut($sSource) Local $sData = ClipGet() Local $aArray = 0, _ $iOffset = 1 While 1 $aArray = StringRegExp($sData, '(?s)<p>.*</p>', $STR_REGEXPARRAYMATCH, $iOffset) If @error Then ExitLoop $iOffset = @extended For $i = 0 To UBound($aArray) - 1 Local $Paste = ClipPut($aArray[$i]) Local $sRegExData = ClipGet() ;MsgBox(0, "", "$sRegExData = " & $sRegExData) Next WEnd _WD_Navigate($sSession, "https://www.w3schools.com/tags/tryit.asp?filename=tryhtml5_textarea_placeholder") _WD_WaitElement($sSession, $_WD_LOCATOR_ByCSSSelector, "iframe#iframeResult") Local $sElement1 = _WD_FindElement($sSession, $_WD_LOCATOR_ByCSSSelector, "iframe#iframeResult") _WD_FrameEnter($sSession, $sElement1) _WD_WaitElement($sSession, $_WD_LOCATOR_ByXPath, "//html/body/textarea") $textarea = _WD_FindElement($sSession, $_WD_LOCATOR_ByXPath, "//html/body/textarea") _WD_ElementAction($sSession, $textarea, 'click') ;WD SetElementValue(SsSession, Stextarea, $sRegExData) <-- I can do this but the focus goes out, or the browser freezes _WD_FrameLeave($sSession) sleep(2000) Send("^v") _WD_LoadWait($sSession) _WD_Shutdown() Func SetupChrome() _WD_Option('Driver', 'chromedriver.exe') _WD_Option('Port', 9515) _WD_Option('DriverParams', '--log-path="' & @ScriptDir & '\chrome.log"') $sDesiredCapabilities = '{"capabilities": {"alwaysMatch": {"goog:chromeOptions": {"w3c": true, "args":["start-maximized","disable-infobars"]}}}}' EndFunc ;==>SetupChrome Can someone help me please, or re-direct me to the right path? TIA!
    • By EmilyLove
      I have a string containing the full path of an executable and an array of executables without their paths. I am trying to compare the string to the list in the array and if a match is found, remove it from the array. The entry get removed from the array successfully, and after checking its return result, uses it to update the ubound if it succeeded, but it doesn't want to update to the new value. Any ideas what I am doing wrong? It acts like it is read-only.
      #include <Array.au3> #include <File.au3> Local $sApp_Exe = "F:\App\Nextcloud\nextcloud.exe" Local $aWaitForEXEX = [3, "Nextcloud.exe", "nextcloudcmd.exe", "QtWebEngineProcess.exe"] For $h = 1 To $aWaitForEXEX[0] If StringInStr($sApp_Exe, $aWaitForEXEX[$h]) <> 0 Then $iRet = _ArrayDelete($aWaitForEXEX, $h) If $iRet <> -1 Then $aWaitForEXEX[0] = $iRet ;this line doesn't work. $aWaitForEXEX[0] doesn't update and shortly gives Error: Array variable has incorrect number of subscripts or subscript dimension range exceeded.: _ArrayDisplay($aWaitForEXEX) EndIf Next  
    • By DJ143
      I have a autoit exe file which is used in upload/browse file functionality.  This has been integrated with selenium framework and I am invoking the autoit exe using Java process and runtime. 
      Now the issue is when I run the scripts and invoke the autoit exe in local it works perfectly.  But when I use selenium grid or jenkins to run the scripts in another windows server it is not working.
      Can anyone please suggest any solution for this?
×
×
  • Create New...