Zepx Posted May 23, 2008 Share Posted May 23, 2008 Hi, I've a problem here. Let's say if I have a very long sentence. My objective is to search for all words that starts with http:// this word might end with something I won't know. Is there any idea how to pick up this word from a huge sentence? I've been thinking over for a few days, but can't seem to find a good way to search for all words that starts with http:// and end with something I wouldn't know. Please guide me in searching this words. Asumming: 1. The word might have space (although links doesn't have, but I would like to get some info on this) 2. The word might end with any extension (e.g .php or .html) 3. The word might end with any domain (.dom .net or just a slash) Some guidelines would be appreciated. Thanks in advanced. Link to comment Share on other sites More sharing options...
aslani Posted May 23, 2008 Share Posted May 23, 2008 You have to use StringRegExp, but don't ask me to tell you how because that thing always kicks my butt. When it comes to variable string search, this is your friend. [font="Georgia"]Chances are, I'm wrong.[/font]HotKey trouble?Stringregexp GuideAutoIT Current Version Link to comment Share on other sites More sharing options...
Zepx Posted May 23, 2008 Author Share Posted May 23, 2008 Thanks, will read on it. Link to comment Share on other sites More sharing options...
Valuater Posted May 23, 2008 Share Posted May 23, 2008 This will work... ( regexp is the right way ) $Info = "This is some text http://www.tester.com and more text http://www.finisher/now/.php and some more words" $split = StringSplit($Info, ":") $http = "" For $x = 2 To $split[0] $part = StringSplit($split[$x], " ") $http &= "http:" & $part[1] & @CRLF Next MsgBox(0, 0, $http) 8) Link to comment Share on other sites More sharing options...
Prophet Posted May 23, 2008 Share Posted May 23, 2008 (edited) With RegExp local $SomeString = "This is a weird string -09 ;''l; sd sdfp[sodf[p ohttp://www.somesite.com/gimme.phpsdlf;ks;dflks';ldfk" $Site =StringRegExp($SomeString , '(?i)(.*)(http.*php|html|htm|asp)(.*)' , 1 , 1) msgbox (0,"test",$Site[1]) Edited May 24, 2008 by Prophet +==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+ Link to comment Share on other sites More sharing options...
Valuater Posted May 24, 2008 Share Posted May 24, 2008 With RegExp local $SomeString = "This is a wierd string -09 ;''l; sd sdfp[sodf[p ohttp://www.somesite.com/gimme.phpsdlf;ks;dflks';ldfk" $Site =StringRegExp($SomeString , '(?i)(.*)(http.*php|html|htm|asp)(.*)' , 1 , 1) msgbox (0,"test",$Site[1]) *php|html|htm|asp That appeasr almost limited, per the OP or just a slash (unknown) 8) Link to comment Share on other sites More sharing options...
Prophet Posted May 24, 2008 Share Posted May 24, 2008 That appeasr almost limited, per the OPWhat? I dont quite understand that senctence +==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+ Link to comment Share on other sites More sharing options...
Zepx Posted May 24, 2008 Author Share Posted May 24, 2008 Hmm seems like stringsplit could be a wider range here. StringRegExp is really complicated. Link to comment Share on other sites More sharing options...
Valuater Posted May 24, 2008 Share Posted May 24, 2008 (edited) What? I dont quite understand that senctence OP is the original poster of this thread and the regexp() is limited by the type of "endings". as stated they.. "end with something I wouldn't know."Hmm seems like stringsplit could be a wider range here. StringRegExp is really complicated.I learned StringSplit() and like it.. I have not tried to get into RegExp()8) Edited May 24, 2008 by Valuater Link to comment Share on other sites More sharing options...
Siao Posted May 24, 2008 Share Posted May 24, 2008 (edited) $a = StringRegExp($string, "(?U)http://.+\h", 3) Edited May 24, 2008 by Siao "be smart, drink your wine" Link to comment Share on other sites More sharing options...
Prophet Posted May 24, 2008 Share Posted May 24, 2008 OP is the original poster of this thread and the regexp() is limited by the type of "endings". as stated they.. "end with something I wouldn't know."I learned StringSplit() and like it.. I have not tried to get into RegExp()It depends on the string and if there is any patern in it. For a lot of things stringsplit is easier but it has it limitations.If you know the format of the string, for example the adress will be followed by a whitespace or <a> you could remove the php|html|etc ending. +==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+ Link to comment Share on other sites More sharing options...
Valuater Posted May 24, 2008 Share Posted May 24, 2008 It depends on the string and if there is any patern in it. For a lot of things stringsplit is easier but it has it limitations.If you know the format of the string, for example the adress will be followed by a whitespace or <a> you could remove the php|html|etc ending.obviously there is a "white space" thats how I got it from StringSplit()... anyways siao got it from RegExp()8) Link to comment Share on other sites More sharing options...
Prophet Posted May 24, 2008 Share Posted May 24, 2008 Well it wassn't that obvious 2 me +==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+ Link to comment Share on other sites More sharing options...
Valuater Posted May 24, 2008 Share Posted May 24, 2008 Asumming:1. The word might have space (although links doesn't have, but I would like to get some info on this)2. The word might end with any extension (e.g .php or .html)3. The word might end with any domain (.dom .net or just a slash)Some guidelines would be appreciated.Thanks in advanced.it was right therelol8) Link to comment Share on other sites More sharing options...
Prophet Posted May 24, 2008 Share Posted May 24, 2008 lol, well appart from the fact that its 3:18 here , its says might have spaces, it could come from html source for instance. Anyways , I just made simple RegExp , that wil get a address outa any random data. +==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+ Link to comment Share on other sites More sharing options...
Valuater Posted May 24, 2008 Share Posted May 24, 2008 Me too 8) Link to comment Share on other sites More sharing options...
Zepx Posted May 24, 2008 Author Share Posted May 24, 2008 @siao,What does the (?U) do?From Help File:Invert greediness of quantifiers.Mind to give me some example? I don't get the meaning. Link to comment Share on other sites More sharing options...
Prophet Posted May 24, 2008 Share Posted May 24, 2008 (edited) Normaly a RexExp looks for the longest match in a string. If you invert greediness of quantifiers, it will look for the shortest match in the string.See the example script below.Now the Regexp ends on whitespace (\h), If you invert the greediness it will select everything from http:// upto the first whitespace. Without the invert of greediness it will select everything untill it finds " http://". local $SomeString = "site1: http://www.somesite.com/gimme.php site2: http://www.thesite.com/" ;With invert of greediness $Site = StringRegExp($SomeString, "(?U)http://.+\h", 3) msgbox(0, "test", $Site[0]) ;Without invert of greediness $Site = StringRegExp($SomeString, "http://.+\h", 3) msgbox(0, "test", $Site[0]) Edited May 24, 2008 by Prophet +==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+ Link to comment Share on other sites More sharing options...
Zepx Posted May 24, 2008 Author Share Posted May 24, 2008 I see thanks for the clear description prophet. Link to comment Share on other sites More sharing options...
Zepx Posted May 24, 2008 Author Share Posted May 24, 2008 (edited) Hm... ok guyz, these things works just like _StringBetween, but the thing is: 1. If the link does not end with space, end line or w/e, the link is positioned in such a way that it is in the last line, how can I pick it up? e.g 1st Code: First line...... Second line.... something here http://thisislink.com I just noticed that both split and StringRegExp uses a concept in such a way it's in between two points, if these points don't exists, it would just take the whole sentence. In that case this would happen when picking up the sentence of the last line: 2nd Code: First line..... Second line... something here http://thisislink.com aaa If I ignored the 2nd point of Vaulter's split which is 'SPACE' and Prophet's 2nd point, which is (http://.*rar|html|htm)(.*) I would get my result from my 2nd code: http://thisislink.com aaa I've tried thinking deeper, but still no outcome, all I can thought off was to add in another line to check if the link ends without a 2nd point or it does. If it doesn't end with a 2nd point -> I'll just StringRegExp($link, "http://.*asp|html|php|", 3) to the end. If it does exists a 2nd point -> I can use 2nd method or, StringRegExp($link, "(http://.*asp|html|php)(.*)", 3) Any other ideas or methods to merge these 2 lines in to one? Probably a one that can check if the link has end. --------------------------- 2. Another thing is, I just tested prophet's pattern, but it seems that there was a mistake in the pattern. Of course, since this is a link, I have no problem manually adding all the extension I know, but that comes to my 2nd problem as below (http://.*php|html|htm)(.*) basically it would work for the 1st file extension, but following it, it would only pick the html only. I could solved it by adding extra brackets ((http://.*)(php|html|htm))(.*) However, that leads to another problem if I remove the 2nd point as a solution to my 1st problem, I created a 2nd problem where StringRegExp would pick 3 copies of it. E.g $string = "http://thisisalink.php" $search = StringRegExp($string, "(?U)(http://.*)(php|html|htm)", 3) For $a = 0 To Ubound($search) - 1 ConsoleWrite($search[$a]) Next ConsoleWrite will write out: http://thisisalink.php http://thisisalink. php -------- Pretty long... but I did spent hours trying to break my own logic, sorry I lack of logics. Thanks again in advanced. Edited May 24, 2008 by Zepx Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now