Jump to content

Regular Expression Help!


Dieuz
 Share

Recommended Posts

Hey guys,

I want to delete urls from a file that doesnt meet my criteria but I dont know how to use the StringRegExp properly to achieve it.

Wrong format: http: //www.site1.com

Good format: http: //www.site1.com/anything

$file = FileOpen("URL.txt", 2) ; How can I set it to Read/Write mode at the same time?
$count = _FileCountLines("URL.txt")

For $x = 1 to $count

$url = FileReadLine($file, $x)

; Remove Url from file if url doesnt meet criteria (using StringRegExp?)
; Wrong format: http://www.site1.com
; Good format: htt://www.site1.com/anything
    
Next

FileClose($file)

URL.txt

http://www.site1.com
http://www.site2.com/anything
http://www.site2.com/test
http://www.site3.com
http://www.site3.com/test

After running the code above, I would like to have this in URL.txt :

http://www.site2.com/anything
http://www.site2.com/test
http://www.site3.com/test

Thanks!

;)

Edited by Dieuz
Link to comment
Share on other sites

#include <Array.au3>

Local $aMatch, $sText = _
    "http://www.site1.com" & @CRLF & _
    "http://www.site2.com/anything" & @CRLF & _
    "http://www.site2.com/test" & @CRLF & _
    "http://www.site3.com" & @CRLF & _
    "http://www.site3.com/test"
    
$aMatch = StringRegExp($sText, "(?i)http://www\.[^.\r\n]+\.[^/\r\n]+/.+", 3)

If IsArray($aMatch) Then _ArrayDisplay($aMatch)

The pattern is simple (read not restrictive that much). Tweak as necessary.

Link to comment
Share on other sites

Thanks, I can see the pattern!

How could I tweak so it wont accept:

http: //www.site1.com/

(normal website with a backslash at the end but with nothing after it)

Thanks!

EDIT: Found the FileRead function ;)

Edited by Dieuz
Link to comment
Share on other sites

"\w+://.+/.{2,}"

EDIT: Better

"(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)"

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Thanks alot guys!

Here's what I got so far:

$file = FileOpen("BACKLINK.txt", 0)  
$readbacklink = FileRead($file)
$bl_array = StringRegExp($readbacklink, "\w+://.+/.{2,}",3)
FileClose($file)

_FileCreate("BACKLINK.txt")

$file2 = FileOpen("BACKLINK.txt", 1)  

    For $w = 0 to UBound($bl_array) - 1
        
    FileWriteLine($file2, $bl_array[$w])
        
    Next

FileClose($file2)

There is still one thing that isnt wotking properly. When the RegExp extract the links and add them to the array, it add "[]h" at the end of each link...

Edited by Dieuz
Link to comment
Share on other sites

First off, change that SRE to the one I used in the edit.

Secondly, you don't need FileOpen() or FileClose() for the reading part.

Next: Are you saying that with a plain text file as given above you are getting the extra characters added?

Try This

$sStr = FileRead("backlink.txt")
$bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)"3)
If NOT @Error Then
   Local $sOut = ""
   For $i = 0 To Ubound($bl_array) -1
        $sOut &= $bl_array[$i]
   Next
   $hFile = FileOpen("backlink.txt", 2)
   FileWrite($hFile, StringStripWS($sOut, 2))
   FileClose($hFile)
EndIf

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

First,

Next: Are you saying that with a plain text file as given above you are getting the extra characters added?

Yes, even with a plain text file.

Here is what I see if I do an _ArrayDisplay():

Posted Image

Second, with the above code there is no "Line break" between the URLS in the file. It's why I tought it was usefull to use FileWriteLine()

By the way, thanks for taking the time to help me! Appreciated it!

;)

Edited by Dieuz
Link to comment
Share on other sites

Change

"(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)"

to

"(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+"

and see what you get. Please report back.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

I am still getting the character added to every link.

Here is a working example so you can try it without having any file.

#include <Array.au3>
#Include <File.au3>

Local $bl_array, $sText = _
    "http://www.site1.com/" & @CRLF & _
    "http://site2.com/anything" & @CRLF & _
  "http://www.site2.com/test" & @CRLF & _
    "http://www.site3.com/" & @CRLF & _
    "http://www.site3.com/test"

$bl_array = StringRegExp($sText, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3)

_ArrayDisplay($bl_array)

As you can see every url is on a different line in the file.

Edited by Dieuz
Link to comment
Share on other sites

  • Moderators

George (& Dieuz),

If it is of any assistance, I am not getting any additional characters when I run that script (on 3.3.1.7). I get what I expected.

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

George (& Dieuz),

If it is of any assistance, I am not getting any additional characters when I run that script (on 3.3.1.7). I get what I expected.

M23

Either am I and I suspect his problem is in the text file. I've often seen this happen with a database, spreadsheet or some html code.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Strange... I do have Version 3.3.1.7 and Im getting the additional characters like in the picture I posted.

Okay, I'm assuming that you still get them with the code you posted (I'm not)

Try it with my code written the way it should have been (there is an error in it).

$sStr = FileRead("backlink.txt")
$bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3)
If NOT @Error Then
   Local $sOut = ""
   For $i = 0 To Ubound($bl_array) -1
        $sOut &= $bl_array[$i] & @CRLF
   Next
   $hFile = FileOpen("backlink.txt", 2)
   FileWrite($hFile, StringStripWS($sOut, 2))
   FileClose($hFile)
EndIf

If that still fails try something that sounds really stupid at first glance, reboot your system and try it again.

Also what text editor are you reading the file with?

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

$sStr = FileRead("backlink.txt")
$bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3)
If NOT @Error Then
   Local $sOut = ""
   For $i = 0 To Ubound($bl_array) -1
        $sOut &= $bl_array[$i] & @CRLF
   Next
   $hFile = FileOpen("backlink.txt", 2)
   FileWrite($hFile, StringStripWS($sOut, 2))
   FileClose($hFile)
EndIf

This code DOES work now. I really dont know why it wasnt working at first. I am not getting any additional characters! Thanks alot!

Quick & last question, what would be the best way to make sure there is no duplicate element (url) in the $bl_array?

Seriously, thanks everyone for your help! I can now continue working on my app!

Link to comment
Share on other sites

$sStr = FileRead("backlink.txt")
$bl_array = StringRegExp($sStr, "(?i)(?m:^)(\w+://.+/\w.*)(?:\v|\z)+",3)
If NOT @Error Then
   Local $sOut = ""
   For $i = 0 To Ubound($bl_array) -1
        If NOT StringInStr($sOut, $bl_Array{$i] & @CRLF) Then $sOut &= $bl_array[$i] & @CRLF
   Next
   $hFile = FileOpen("backlink.txt", 2)
   FileWrite($hFile, StringStripWS($sOut, 2))
   FileClose($hFile)
EndIf

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...