Dieuz Posted December 15, 2009 Share Posted December 15, 2009 (edited) Hey guys, I want to delete urls from a file that doesnt meet my criteria but I dont know how to use the StringRegExp properly to achieve it. Wrong format: http: //www.site1.com Good format: http: //www.site1.com/anything $file = FileOpen("URL.txt", 2) ; How can I set it to Read/Write mode at the same time? $count = _FileCountLines("URL.txt") For $x = 1 to $count $url = FileReadLine($file, $x) ; Remove Url from file if url doesnt meet criteria (using StringRegExp?) ; Wrong format: http://www.site1.com ; Good format: htt://www.site1.com/anything Next FileClose($file) URL.txt http://www.site1.com http://www.site2.com/anything http://www.site2.com/test http://www.site3.com http://www.site3.com/test After running the code above, I would like to have this in URL.txt : http://www.site2.com/anything http://www.site2.com/test http://www.site3.com/test Thanks! Edited December 15, 2009 by Dieuz Link to comment Share on other sites More sharing options...
smartee Posted December 15, 2009 Share Posted December 15, 2009 i posted something similar here see if it helps Link to comment Share on other sites More sharing options...
Authenticity Posted December 15, 2009 Share Posted December 15, 2009 #include <Array.au3> Local $aMatch, $sText = _ "http://www.site1.com" & @CRLF & _ "http://www.site2.com/anything" & @CRLF & _ "http://www.site2.com/test" & @CRLF & _ "http://www.site3.com" & @CRLF & _ "http://www.site3.com/test" $aMatch = StringRegExp($sText, "(?i)http://www\.[^.\r\n]+\.[^/\r\n]+/.+", 3) If IsArray($aMatch) Then _ArrayDisplay($aMatch) The pattern is simple (read not restrictive that much). Tweak as necessary. Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 (edited) Thanks, I can see the pattern! How could I tweak so it wont accept: http: //www.site1.com/ (normal website with a backslash at the end but with nothing after it) Thanks! EDIT: Found the FileRead function Edited December 15, 2009 by Dieuz Link to comment Share on other sites More sharing options...
jvanegmond Posted December 15, 2009 Share Posted December 15, 2009 Thanks, the Regular Expression seems to be acurate.How can I extract all lines from a file and transfer it to a string like you did?Funny enough, his example uses line ends as breaks. So you only have to read the data from a file and you are done. xDFunction is FileRead. github.com/jvanegmond Link to comment Share on other sites More sharing options...
GEOSoft Posted December 15, 2009 Share Posted December 15, 2009 (edited) "\w+://.+/.{2,}" EDIT: Better "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)" Edited December 15, 2009 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 (edited) Thanks alot guys! Here's what I got so far: $file = FileOpen("BACKLINK.txt", 0) $readbacklink = FileRead($file) $bl_array = StringRegExp($readbacklink, "\w+://.+/.{2,}",3) FileClose($file) _FileCreate("BACKLINK.txt") $file2 = FileOpen("BACKLINK.txt", 1) For $w = 0 to UBound($bl_array) - 1 FileWriteLine($file2, $bl_array[$w]) Next FileClose($file2) There is still one thing that isnt wotking properly. When the RegExp extract the links and add them to the array, it add "[]h" at the end of each link... Edited December 15, 2009 by Dieuz Link to comment Share on other sites More sharing options...
GEOSoft Posted December 15, 2009 Share Posted December 15, 2009 First off, change that SRE to the one I used in the edit. Secondly, you don't need FileOpen() or FileClose() for the reading part. Next: Are you saying that with a plain text file as given above you are getting the extra characters added? Try This $sStr = FileRead("backlink.txt") $bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)"3) If NOT @Error Then Local $sOut = "" For $i = 0 To Ubound($bl_array) -1 $sOut &= $bl_array[$i] Next $hFile = FileOpen("backlink.txt", 2) FileWrite($hFile, StringStripWS($sOut, 2)) FileClose($hFile) EndIf George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 (edited) First,Next: Are you saying that with a plain text file as given above you are getting the extra characters added?Yes, even with a plain text file.Here is what I see if I do an _ArrayDisplay():Second, with the above code there is no "Line break" between the URLS in the file. It's why I tought it was usefull to use FileWriteLine()By the way, thanks for taking the time to help me! Appreciated it! Edited December 15, 2009 by Dieuz Link to comment Share on other sites More sharing options...
GEOSoft Posted December 15, 2009 Share Posted December 15, 2009 Change "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)" to "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+" and see what you get. Please report back. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 (edited) I am still getting the character added to every link. Here is a working example so you can try it without having any file. #include <Array.au3> #Include <File.au3> Local $bl_array, $sText = _ "http://www.site1.com/" & @CRLF & _ "http://site2.com/anything" & @CRLF & _ "http://www.site2.com/test" & @CRLF & _ "http://www.site3.com/" & @CRLF & _ "http://www.site3.com/test" $bl_array = StringRegExp($sText, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3) _ArrayDisplay($bl_array) As you can see every url is on a different line in the file. Edited December 15, 2009 by Dieuz Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted December 15, 2009 Moderators Share Posted December 15, 2009 George (& Dieuz), If it is of any assistance, I am not getting any additional characters when I run that script (on 3.3.1.7). I get what I expected. M23  Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area  Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 Strange... I do have Version 3.3.1.7 and Im getting the additional characters like in the picture I posted. Link to comment Share on other sites More sharing options...
GEOSoft Posted December 15, 2009 Share Posted December 15, 2009 George (& Dieuz),If it is of any assistance, I am not getting any additional characters when I run that script (on 3.3.1.7). I get what I expected.M23Either am I and I suspect his problem is in the text file. I've often seen this happen with a database, spreadsheet or some html code. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 Arg...even without using a file at all (running the simple script above), I am getting the additional characters... why so Link to comment Share on other sites More sharing options...
GEOSoft Posted December 15, 2009 Share Posted December 15, 2009 (edited) Strange... I do have Version 3.3.1.7 and Im getting the additional characters like in the picture I posted. Okay, I'm assuming that you still get them with the code you posted (I'm not) Try it with my code written the way it should have been (there is an error in it). $sStr = FileRead("backlink.txt") $bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3) If NOT @Error Then Local $sOut = "" For $i = 0 To Ubound($bl_array) -1 $sOut &= $bl_array[$i] & @CRLF Next $hFile = FileOpen("backlink.txt", 2) FileWrite($hFile, StringStripWS($sOut, 2)) FileClose($hFile) EndIf If that still fails try something that sounds really stupid at first glance, reboot your system and try it again. Also what text editor are you reading the file with? Edited December 15, 2009 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Dieuz Posted December 15, 2009 Author Share Posted December 15, 2009 $sStr = FileRead("backlink.txt") $bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3) If NOT @Error Then Local $sOut = "" For $i = 0 To Ubound($bl_array) -1 $sOut &= $bl_array[$i] & @CRLF Next $hFile = FileOpen("backlink.txt", 2) FileWrite($hFile, StringStripWS($sOut, 2)) FileClose($hFile) EndIf This code DOES work now. I really dont know why it wasnt working at first. I am not getting any additional characters! Thanks alot! Quick & last question, what would be the best way to make sure there is no duplicate element (url) in the $bl_array? Seriously, thanks everyone for your help! I can now continue working on my app! Link to comment Share on other sites More sharing options...
GEOSoft Posted December 15, 2009 Share Posted December 15, 2009 $sStr = FileRead("backlink.txt") $bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3) If NOT @Error Then Local $sOut = "" For $i = 0 To Ubound($bl_array) -1 If NOT StringInStr($sOut, $bl_Array{$i] & @CRLF) Then $sOut &= $bl_array[$i] & @CRLF Next $hFile = FileOpen("backlink.txt", 2) FileWrite($hFile, StringStripWS($sOut, 2)) FileClose($hFile) EndIf George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now