Jump to content
galan2015

The way to remove the links that are duplicates?

Recommended Posts

galan2015

yo, i need some help.

Im training with _StringsBetween, on big site! Alot fun, My script turned 8 times created a text file with a weight of 1GB. I saw that in the file was a lot of repetition. Is it possible to somehow set the string to save the file, only one URL without repetition?

#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <INet.au3>
#include <StringConstants.au3>
#include <File.au3>

;I'm coming for blood, no code of conduct, no law.
;I'm coming for blood, no code of conduct, no law.
#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Form1", 1211, 812, 43, 110)
$Button1 = GUICtrlCreateButton("Button1", 0, 8, 249, 81)
$Button2 = GUICtrlCreateButton("Button2", 350, 8, 249, 81)
$Edit1 = GUICtrlCreateEdit("0", 8, 128, 609, 257, BitOR($ES_CENTER,$ES_AUTOHSCROLL,$ES_READONLY,$ES_WANTRETURN))
$Edit2 = GUICtrlCreateEdit("2", 632, 0, 577, 809)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda
local $iFileSize = FileGetSize('')


Func VisitFrontPage()
    local $Liczba = _FileCountLines(@ScriptDir&'\data\links.txt')
    local $liczba2 = GUICtrlSetData($Edit1,Random(1,$Liczba,1))
    local $Liczba3 = GUICtrlRead($Edit1)
    local $Liczba4 = FileReadLine(@ScriptDir&'\data\links.txt',$Liczba3)
$data = _INetGetSource ( FileReadLine(@ScriptDir&'\data\links.txt',Random(1,$Liczba,1)))
$linki = StringRegExp($data, '<a href="http://www.wykop.pl/link/(.*?)/" title=""',3)
For $q = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/link/'&$linki[$q]&@CRLF)
Next
$linki = StringRegExp($data, 'href="http://www.wykop.pl/ludzie/(.*?)/">',3) ; pobieranie ludzi co dodali znaleziska
For $w = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$w]&@CRLF)
Next
$linki = StringRegExp($data, '<a href="http://www.wykop.pl/ludzie/(.*?)/" title="',3) ; pobieranie ludzi co sa na stronie z mikro
For $e = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$e]&@CRLF)
Next
$linki = StringRegExp($data, '<a class="tag create" href="http://www.wykop.pl/tag/(.*?)/"><em>',3) ; pobieranie ludzi co sa na stronie z mikro
For $r = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/'&$linki[$r]&@CRLF)
Next
$linki = StringRegExp($data, '<class="showTagSummary" href="http://www.wykop.pl/tag/(.*?)">',3) ; pobieranie ludzi co sa na stronie z mikro
For $t = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/'&$linki[$t]&@CRLF)
Next
 $linki = StringRegExp($data, 'href="http://www.wykop.pl/ludzie/(.*?)/" title=""',3) ; pobieranie ludzi z znaleziska, komentarze
For $a = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/ludzie/'&$linki[$a]&@CRLF)
Next
 $linki = StringRegExp($data, 'href="http://www.wykop.pl/tag/index/(.*?)/"',3) ; pobierane tagi ze znaleziska,
For $s = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/tag/index/'&$linki[$s]&@CRLF)
Next
 $linki = StringRegExp($data, '<a class="clearfix" href="http://www.wykop.pl/link/(.*?)/?utm_source',3) ; pobierane znaleziska z prawego menu
For $d = 0 To UBound($linki) -1
FileWrite(@ScriptDir&'\data\links.txt','http://www.wykop.pl/link/'&$linki[$d]&@CRLF)
Next
Sleep(5000)
GuiCtrlSetData($Edit2, GuiCtrlRead($Edit2)+1)
_FileWriteToLine(@ScriptDir&'\data\links.txt',GuiCtrlRead($edit1),'',1)
VisitFrontPage()
EndFunc

While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $Button1
        Case $Button2
            VisitFrontPage()
        Case $GUI_EVENT_CLOSE
            Exit

    EndSwitch
WEnd

 

Share this post


Link to post
Share on other sites
jchd

Why do you make the function VisitFrontPage recursive and what's the purpose of

local $iFileSize = FileGetSize('')

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • nooneclose
      By nooneclose
      Hey. I'm working on a new project and was wondering if there is a better way to "update" my Column E array. 
      Here is my code: 
      Local $nI  = 0                                                            ;Creates a name index of 0: nI = Name index Local $nII = 1                                                            ;Creates a name index of 1 for second loop: nII = Name Index 2 For    $iN = 0 To $IndexRows Step 1                                       ;Checks the roster for any names that appear twice      For $iN2 = 0 To $IndexRows Step 1          if $d_Names[$nI] == $d_Names[$nII] And $d_Names[$nII] <> "" Then              Local $timeSheetName = _ArraySearch($e_Names, $d_Names[$nI], 0, 0, 0, 0, 1)              ;MsgBox($MB_SYSTEMMODAL, "Found it", $d_Names[$nI] & " In column E on Row " & $timeSheetName)              Local $eI  = $timeSheetName + 1              ;ConsoleWrite($timeSheetName & @CRLF)              ;ConsoleWrite($eI & @CRLF)              ;ConsoleWrite(@CRLF)              _Excel_RangeInsert($OpenWorkbook.ActiveSheet, "E" & $eI & ":F" & $eI, $xlShiftDown)                                                                          ;Inserts a empty cell in columns E and F.              _Excel_RangeWrite($OpenWorkbook, $OpenWorkbook.ActiveSheet, $d_Names[$nII], "E" & $eI)                                                                         ;Fills the empty cell in columns E with the doubled name              $aArray_Index = 2                                           ;Array element counter              For $Index = 2 To $IndexRows Step 1                        ;Loops through every row in the Excel file unto no rows are found or a null row is found                  $Array_Value_E = _Excel_RangeRead($OpenWorkbook, Default, "E"&$Index)                  $e_names[$aArray_Index] = $Array_Value_E                ;While the code loops every value in column E is stored in the E array (updating the array)                  $aArray_Index += 1              Next              ExitLoop          EndIf      Next      $nI  += 1      $nII += 1 Next Basically, It checks a roster for people whose name appears twice then inserts a new "row" for that person because they work in two different departments.
      I have to find that name however in Column E if two appear in column D. My code works but I think it is not as efficient as it could be. 
      Any ideas on how to improve the "update" for my array?
      (once it finds the double names in Column D it then searches for that name by going name by name in the Column E array and once it finds it inserts a new row. However, the E array doesn't have that new row stored in it so I have to "update" the array to properly find the next name)
      Any and all tips would be greatly appreciated. 
       
      NOTE: Just assume I'm opening the excel file properly please do not add that code in, it only complicates your answer. 
    • Miliardsto
      By Miliardsto
      Hello Im wondering if using this https://ohtejera.github.io/ImperiusAutoIt/#started
      UDF can i make that I can control my windows application with phone?
      like for example click button Start on android phone and then something would be done in my windows app?
    • nooneclose
      By nooneclose
      How do I properly convert this to Autoit? This is a VBA macro that I recorded in Excel.
       ActiveSheet.Outline.ShowLevels RowLevels:=2 I need this to close my subtotal once it is finished. 
      any help will be greatly appreciated. 
    • iMacg3
      By iMacg3
      Hi,
      I was looking into a way to delete a registry key (not a specific value, an entire key) if it is present. I was considering using RegRead and RegDelete. However, RegRead appears to only read values, not just keys. Is there a way to delete an entire registry key if it is present? I have heard that the below function may help.
       
      #include-once ; #UDF# ======================================================================================================================= ; Title .........: Reads\Search the name of a Key\Subkey\Value ; AutoIt Version : 3.3.8.1 ; Language ......: English ; Description ...: Lists all Keys\Subkeys\Values in a specified registry key ; Author(s) .....: DXRW4E ; Notes .........: ; =============================================================================================================================== ; #CURRENT# ===================================================================================================================== ;~ _RegEnumKeyEx ;~ _RegEnumValEx ; =============================================================================================================================== #Region ;**** Global constants and vars **** Global Const $sValueTypes[12] = ["REG_NONE","REG_SZ","REG_EXPAND_SZ","REG_BINARY","REG_DWORD","REG_DWORD_BIG_ENDIAN","REG_LINK","REG_MULTI_SZ","REG_RESOURCE_LIST","REG_FULL_RESOURCE_DESCRIPTOR","REG_RESOURCE_REQUIREMENTS_LIST","REG_QWORD"] #EndRegion ;**** Global constants and vars **** ; #FUNCTION# ======================================================================================================================== ; Name...........: _RegEnumKeyEx ; Description ...: Lists all subkeys in a specified registry key ; Syntax.........: _RegEnumKeyEx($KeyName[, $iFlag = 0[, $sFilter = "*"]]) ; Parameters ....: $KeyName - The registry key to read. ; $iFlag - Optional specifies Recursion (add the flags together for multiple operations): ; |$iFlag = 0 (Default) All Key-SubKeys Recursive Mod ; |$iFlag = 1 All SubKeys Not Recursive Mod ; |$iFlag = 2 Include in ArrayList in the first element $KeyName ; |$iFlag = 16 $sFilter do Case-Sensitive matching (By Default $sFilter do Case-Insensitive matching) ; |$iFlag = 32 Disable the return the count in the first element - effectively makes the array 0-based (must use UBound() to get the size in this case). ; By Default the first element ($array[0]) contains the number of strings returned, the remaining elements ($array[1], $array[2], etc.) ; |$iFlag = 64 $sFilter is REGEXP Mod, See Pattern Parameters in StringRegExp ; |$iFlag = 128 Enum value's name (_RegEnumKeyEx Return a 2D array, maximum Array Size limit is 3999744 Key\Value) ; |$iFlag = 256 Reads a value data, this flag will be ignored if the $iFlag = 128 is not set ; $sFilter - Optional the filter to use, default is *. (Multiple filter groups such as "All "*.XXx|*.YYY|*.ZZZ") ; Search the Autoit3 helpfile for the word "WildCards" For details. ; $vFilter - Optional the filter to use for ValueName, $vFilter will be ignored if the $iFlag = 128 is not set ; default is *. (Multiple filter groups such as "All "*.XXx|*.YYY|*.ZZZ") Search the Autoit3 helpfile for the word "WildCards" For details. ; $iValueTypes - Optional, set Value Types to search (Default $iValueTypes = 0 Read All), $iValueTypes will be ignored if the $iFlag = 128 is not set ; (add the flags together for multiple operations): ; 1 = REG_SZ ; 2 = REG_EXPAND_SZ ; 3 = REG_BINARY ; 4 = REG_DWORD ; 5 = REG_DWORD_BIG_ENDIAN ; 6 = REG_LINK ; 7 = REG_MULTI_SZ ; 8 = REG_RESOURCE_LIST ; 9 = REG_FULL_RESOURCE_DESCRIPTOR ; 10 = REG_RESOURCE_REQUIREMENTS_LIST ; 11 = REG_QWORD ; Return values .: Success - Return Array List (See Remarks) ; Failure - @Error ; |1 = Invalid $sFilter ; |2 = No Key-SubKey(s) Found ; |3 = Invalid $vFilter ; |4 = No Value-Name(s) Found ; Author ........: DXRW4E ; Modified.......: ; Remarks .......: The array returned is one-dimensional and is made up as follows: ; $array[0] = Number of Key-SubKeys returned ; $array[1] = 1st Key\SubKeys ; $array[2] = 2nd Key\SubKeys ; $array[3] = 3rd Key\SubKeys ; $array[n] = nth Key\SubKeys ; ; If is set the $iFlag = 128 The array returned is 2D array and is made up as follows: ; $array[0][0] = Number of Key-SubKeys returned ; $array[1][0] = 1st Key\SubKeys ; $array[1][1] = 1st Value name ; $array[1][2] = 1st Value Type (REG_NONE or REG_SZ or REG_EXPAND_SZ ect ect) ; $array[1][3] = 1st Value Data (If is set $iFlag = 256 Else Value Data = "") ; $array[2][0] = 2nd Key\SubKeys ; $array[2][1] = 2nd Value name ; $array[2][2] = 2nd Value Type (REG_NONE or REG_SZ or REG_EXPAND_SZ ect ect) ; $array[2][3] = 2nd Value Data (If is set $iFlag = 256 Else Value Data = "") ; $array[n][0] = nth Key\SubKeys ; Related .......: _RegEnumValEx() ; Link ..........: ; Example .......: _RegEnumKeyEx("HKEY_CURRENT_USER\Software\AutoIt v3") ; Note ..........: ; =================================================================================================================================== Func _RegEnumKeyEx($KeyName, $iFlag = 0, $sFilter = "*", $vFilter = "*", $iValueTypes = 0) If StringRegExp($sFilter, StringReplace("^\s*$|\v|\\|^\||\|\||\|$", Chr(BitAND($iFlag, 64) + 28) & "\|^\||\|\||\|$", "\\\\")) Then Return SetError(1, 0, "") Local $IndexSubKey[101] = [100], $SubKeyName, $BS = "\", $sKeyList, $I = 1, $sKeyFlag = BitAND($iFlag, 1), $sKeyFilter = StringReplace($sFilter, "*", "") If BitAND($iFlag, 2) Then $sKeyList = @LF & $KeyName If Not BitAND($iFlag, 64) Then $sFilter = StringRegExpReplace(BitAND($iFlag, 16) & "(?i)(", "16\(\?\i\)|\d+", "") & StringRegExpReplace(StringRegExpReplace(StringRegExpReplace(StringRegExpReplace($sFilter, "[^*?|]+", "\\Q$0\\E"), "\\E(?=\||$)", "$0\$"), "(?<=^|\|)\\Q", "^$0"), "\*+", ".*") & ")" While $I $IndexSubKey[$I] += 1 $SubKeyName = RegEnumKey($KeyName, $IndexSubKey[$I]) If @error Then $IndexSubKey[$I] = 0 $I -= 1 $KeyName = StringLeft($KeyName, StringInStr($KeyName, "\", 1, -1) - 1) ContinueLoop EndIf If $sKeyFilter Then If StringRegExp($SubKeyName, $sFilter) Then $sKeyList &= @LF & $KeyName & $BS & $SubKeyName Else $sKeyList &= @LF & $KeyName & $BS & $SubKeyName EndIf If $sKeyFlag Then ContinueLoop $I += 1 If $I > $IndexSubKey[0] Then $IndexSubKey[0] += 100 ReDim $IndexSubKey[$IndexSubKey[0] + 1] EndIf $KeyName &= $BS & $SubKeyName WEnd If Not $sKeyList Then Return SetError(2, 0, "") If BitAND($iFlag, 128) <> 128 Then Return StringSplit(StringTrimLeft($sKeyList, 1), @LF, StringReplace(BitAND($iFlag, 32), "32", 2)) $sKeyList = _RegEnumValEx(StringSplit(StringTrimLeft($sKeyList, 1), @LF), $iFlag, $vFilter, $iValueTypes) Return SetError(@Error, 0, $sKeyList) EndFunc ; #FUNCTION# ======================================================================================================================== ; Name...........: _RegEnumValEx ; Description ...: Lists all values in a specified registry key ; Syntax.........: _RegEnumValEx($KeyName[, $iFlag = 0[, $sFilter = "*"]]) ; Parameters ....: $KeyName - The registry key to read Or one-dimensional array RegKeyList ; use _RegEnumKeyEx() to get $RegKeyList (example $RegKeyList = [3, 1st Key\SubKeys, 2st Key\SubKeys, nth Key\SubKeys]) ; |$iFlag = 16 $sFilter do Case-Sensitive matching (By Default $sFilter do Case-Insensitive matching) ; |$iFlag = 32 Disable the return the count in the first element - effectively makes the array 0-based (must use UBound() to get the size in this case). ; By Default the first element ($array[0]) contains the number of strings returned, the remaining elements ($array[1], $array[2], etc.) ; |$iFlag = 64 $sFilter is REGEXP Mod, See Pattern Parameters in StringRegExp ; |$iFlag = 256 Reads a value data ; $sFilter - Optional the filter to use, default is *. (Multiple filter groups such as "All "*.XXx|*.YYY|*.ZZZ") ; Search the Autoit3 helpfile for the word "WildCards" For details. ; $iValueTypes - Optional, set Value Types to search (Default $iValueTypes = 0 Read All) ; (add the flags together for multiple operations): ; 1 = REG_SZ ; 2 = REG_EXPAND_SZ ; 3 = REG_BINARY ; 4 = REG_DWORD ; 5 = REG_DWORD_BIG_ENDIAN ; 6 = REG_LINK ; 7 = REG_MULTI_SZ ; 8 = REG_RESOURCE_LIST ; 9 = REG_FULL_RESOURCE_DESCRIPTOR ; 10 = REG_RESOURCE_REQUIREMENTS_LIST ; 11 = REG_QWORD ; Return values .: Success - Return Array List (See Remarks) ; Failure - @Error ; |3 = Invalid $sFilter ; |4 = No Value-Name(s) Found ; Author ........: DXRW4E ; Modified.......: ; Remarks .......: The array returned is 2D array and is made up as follows: ; $array[0][0] = Number of Key-SubKeys returned ; $array[1][0] = 1st Key\SubKeys ; $array[1][1] = 1st Value name ; $array[1][2] = 1st Value Type (REG_NONE or REG_SZ or REG_EXPAND_SZ ect ect) ; $array[1][3] = 1st Value Data (If is set $iFlag = 256 Else Value Data = "") ; $array[2][0] = 2nd Key\SubKeys ; $array[2][1] = 2nd Value name ; $array[2][2] = 2nd Value Type (REG_NONE or REG_SZ or REG_EXPAND_SZ ect ect) ; $array[2][3] = 2nd Value Data (If is set $iFlag = 256 Else Value Data = "") ; $array[n][0] = nth Key\SubKeys ; Related .......: _RegEnumKeyEx() ; Link ..........: ; Example .......: _RegEnumValEx("HKEY_CURRENT_USER\Software\AutoIt v3") ; Note ..........: ; =================================================================================================================================== Func _RegEnumValEx($aKeyList, $iFlag = 0, $sFilter = "*", $iValueTypes = 0) If StringRegExp($sFilter, "\v") Then Return SetError(3, 0, "") If Not IsArray($aKeyList) Then $aKeyList = StringSplit($aKeyList, @LF) Local $aKeyValList[1954][4], $iKeyVal = Int(BitAND($iFlag, 32) = 0), $sKeyVal = 1953, $sRegEnumVal, $iRegEnumVal, $RegRead = BitAND($iFlag, 256), $vFilter = StringReplace($sFilter, "*", "") If Not BitAND($iFlag, 64) Then $sFilter = StringRegExpReplace(BitAND($iFlag, 16) & "(?i)(", "16\(\?\i\)|\d+", "") & StringRegExpReplace(StringRegExpReplace(StringRegExpReplace(StringRegExpReplace($sFilter, "[^*?|]+", "\\Q$0\\E"), "\\E(?=\||$)", "$0\$"), "(?<=^|\|)\\Q", "^$0"), "\*+", ".*") & ")" For $i = 1 To $aKeyList[0] $iRegEnumVal = 0 While 1 If $iKeyVal = $sKeyVal Then If $sKeyVal = 3999744 Then ExitLoop $sKeyVal *= 2 ReDim $aKeyValList[$sKeyVal + 1][4] EndIf $aKeyValList[$iKeyVal][0] = $aKeyList[$i] $iRegEnumVal += 1 $sRegEnumVal = RegEnumVal($aKeyList[$i], $iRegEnumVal) If @Error <> 0 Then If $iRegEnumVal = 1 And $vFilter = "" Then $iKeyVal += 1 ExitLoop EndIf $aKeyValList[$iKeyVal][2] = $sValueTypes[@Extended] If BitAND(@Extended, $iValueTypes) <> $iValueTypes Then ContinueLoop If $vFilter And Not StringRegExp($sRegEnumVal, $sFilter) Then ContinueLoop $aKeyValList[$iKeyVal][1] = $sRegEnumVal If $RegRead Then $aKeyValList[$iKeyVal][3] = RegRead($aKeyList[$i], $sRegEnumVal) $iKeyVal += 1 WEnd Next $sRegEnumVal = $iKeyVal - Int(BitAND($iFlag, 32) = 0) If Not $sRegEnumVal Or ($sRegEnumVal = 1 And $vFilter = "" And $aKeyValList[$iKeyVal - $sRegEnumVal][2] = "") Then Return SetError(4, 0, "") ReDim $aKeyValList[$iKeyVal][4] If Not BitAND($iFlag, 32) Then $aKeyValList[0][0] = $iKeyVal - 1 Return $aKeyValList EndFunc Thanks.
       
    • great77
      By great77
      ; put the root in a variable $sRoot = "C:\Project\PHexample\" Global $sCurrentTime = _NowCalc() ; We can use that variable here Global $aList = _FileListToArray($sRoot, Default, 2) If @error Then Exit ;;;;;;MsgBox(0, "Error", "_FileListToArray returned @error = " & @error) ;;;;;;;;;;;This is a loop that runs from 1 to the number of items listed in the first element of the returned  array For $i = 1 To UBound($aList) - 1     MsgBox(0, "Folder date", $sRoot & "" & $aList[$i] & @CRLF & @CRLF & FileGetTime($sRoot & "" & $aList[$i], 1, 1))     MsgBox(0, "Folder date", FileGetTime($sRoot & "" & $aList[$i], 1, 1))      $a_filenew = StringReplace(StringReplace(StringReplace(_NowCalc(), "/", ""), ":", ""), " ", "")       MsgBox(0, "Folder date", $a_filenew) $adex =  _DateDiff('D', FileGetTime($sRoot & "" & $aList[$i], 1, 1), $a_filenew)  MsgBox(0,"ade", $adex) Next I have a code as seen above, but the difference in time is returning zero. I understand that the date yyyymmddhhmmss but how can I find the difference.
      The purpose is to try find the difference in days. Any suggestion?  
×