Jump to content

Search this string


Recommended Posts

Hi,

I've a problem here. Let's say if I have a very long sentence. My objective is to search for all words that starts with http:// this word might end with something I won't know. Is there any idea how to pick up this word from a huge sentence? I've been thinking over for a few days, but can't seem to find a good way to search for all words that starts with http:// and end with something I wouldn't know.

Please guide me in searching this words.

Asumming:

1. The word might have space (although links doesn't have, but I would like to get some info on this)

2. The word might end with any extension (e.g .php or .html)

3. The word might end with any domain (.dom .net or just a slash)

Some guidelines would be appreciated.

Thanks in advanced.

Link to comment
Share on other sites

With RegExp

local $SomeString = "This is a weird string -09 ;''l;  sd sdfp[sodf[p ohttp://www.somesite.com/gimme.phpsdlf;ks;dflks';ldfk"

$Site =StringRegExp($SomeString , '(?i)(.*)(http.*php|html|htm|asp)(.*)' , 1  , 1)
msgbox (0,"test",$Site[1])
Edited by Prophet
+==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+
Link to comment
Share on other sites

That appeasr almost limited, per the OP

What? I dont quite understand that senctence :)
+==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+
Link to comment
Share on other sites

What? I dont quite understand that senctence :)

OP is the original poster of this thread and the regexp() is limited by the type of "endings". as stated they.. "end with something I wouldn't know."

Hmm seems like stringsplit could be a wider range here. StringRegExp is really complicated.

I learned StringSplit() and like it.. I have not tried to get into RegExp()

8)

Edited by Valuater

NEWHeader1.png

Link to comment
Share on other sites

OP is the original poster of this thread and the regexp() is limited by the type of "endings". as stated they.. "end with something I wouldn't know."

I learned StringSplit() and like it.. I have not tried to get into RegExp()

It depends on the string and if there is any patern in it. For a lot of things stringsplit is easier but it has it limitations.

If you know the format of the string, for example the adress will be followed by a whitespace or <a> you could remove the php|html|etc ending.

+==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+
Link to comment
Share on other sites

It depends on the string and if there is any patern in it. For a lot of things stringsplit is easier but it has it limitations.

If you know the format of the string, for example the adress will be followed by a whitespace or <a> you could remove the php|html|etc ending.

obviously there is a "white space" thats how I got it from StringSplit()

... anyways siao got it from RegExp()

8)

NEWHeader1.png

Link to comment
Share on other sites

Well it wassn't that obvious 2 me :)

+==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+
Link to comment
Share on other sites

Asumming:

1. The word might have space (although links doesn't have, but I would like to get some info on this)

2. The word might end with any extension (e.g .php or .html)

3. The word might end with any domain (.dom .net or just a slash)

Some guidelines would be appreciated.

Thanks in advanced.

it was right there

lol

8)

NEWHeader1.png

Link to comment
Share on other sites

lol, well appart from the fact that its 3:18 here :) , its says might have spaces, it could come from html source for instance.

Anyways :),

I just made simple RegExp , that wil get a address outa any random data.

+==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+
Link to comment
Share on other sites

Normaly a RexExp looks for the longest match in a string. If you invert greediness of quantifiers, it will look for the shortest match in the string.

See the example script below.

Now the Regexp ends on whitespace (\h), If you invert the greediness it will select everything from http:// upto the first whitespace. Without the invert of greediness it will select everything untill it finds " http://".

local $SomeString = "site1: http://www.somesite.com/gimme.php site2: http://www.thesite.com/"

;With invert of greediness
$Site = StringRegExp($SomeString, "(?U)http://.+\h", 3)
msgbox(0, "test", $Site[0])


;Without invert of greediness
$Site = StringRegExp($SomeString, "http://.+\h", 3)
msgbox(0, "test", $Site[0])
Edited by Prophet
+==================================================================+| The Definition of Madness: Creating a GUI, with GUI automation scripts |+==================================================================+
Link to comment
Share on other sites

Hm... ok guyz, these things works just like _StringBetween, but the thing is:

1. If the link does not end with space, end line or w/e, the link is positioned in such a way that it is in the last line, how can I pick it up? e.g

1st Code:

First line......
Second line....
something here http://thisislink.com

I just noticed that both split and StringRegExp uses a concept in such a way it's in between two points, if these points don't exists, it would just take the whole sentence.

In that case this would happen when picking up the sentence of the last line:

2nd Code:

First line.....
Second line...
something here http://thisislink.com aaa

If I ignored the 2nd point of Vaulter's split which is 'SPACE' and Prophet's 2nd point, which is (http://.*rar|html|htm)(.*)

I would get my result from my 2nd code:

I've tried thinking deeper, but still no outcome, all I can thought off was to add in another line to check if the link ends without a 2nd point or it does.

If it doesn't end with a 2nd point -> I'll just StringRegExp($link, "http://.*asp|html|php|", 3) to the end.

If it does exists a 2nd point -> I can use 2nd method or, StringRegExp($link, "(http://.*asp|html|php)(.*)", 3)

Any other ideas or methods to merge these 2 lines in to one? Probably a one that can check if the link has end.

---------------------------

2. Another thing is, I just tested prophet's pattern, but it seems that there was a mistake in the pattern. Of course, since this is a link, I have no problem manually adding all the extension I know, but that comes to my 2nd problem as below

(http://.*php|html|htm)(.*)

basically it would work for the 1st file extension, but following it, it would only pick the html only. I could solved it by adding extra brackets

((http://.*)(php|html|htm))(.*)

However, that leads to another problem if I remove the 2nd point as a solution to my 1st problem, I created a 2nd problem where StringRegExp would pick 3 copies of it.

E.g

$string = "http://thisisalink.php"

$search = StringRegExp($string, "(?U)(http://.*)(php|html|htm)", 3)
For $a = 0 To Ubound($search) - 1
ConsoleWrite($search[$a])
Next

ConsoleWrite will write out:

http://thisisalink.php
http://thisisalink.
php

--------

Pretty long... but I did spent hours trying to break my own logic, sorry I lack of logics.

Thanks again in advanced.

Edited by Zepx
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...