Chimp

how to get a number located after a name from within a string?

19 posts in this topic

#1 ·  Posted (edited)

how can I extract the number that follows a word and the = sign in a random position within a string?

also between the word the = sign and the number there can be, none or random spaces, and also, the number can be only the number or enclosed within " " or ' '

for example I need the number after ROWSPAN

'<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>'
or
 <td rowspan=   " 2 ">
or
<td style="width:400px;" rowspan   =    '5   ' ...</td>

thanks for suggestions

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Chimp,

This seems to do the trick: :)

#include <Array.au3>

$sText = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF & _
'<td rowspan=   " 2 ">' & @CRLF & _
'<td style="width:400px;" colspan   =    "5   " ...</td>'

$aExtract = StringRegExp($sText, '(?i)span\s*=\s*"?\s?(\d+)', 3)

_ArrayDisplay($aExtract, "", Default, 8)
M23

Edit: SRE decode (sorry I did not provide one last night):

(?i)    - Case insensitive (because we have ~SPAN and ~span)

span    - Look for the word "span",
\s*     - possibly followed by any number of spaces,
=       - but certainly by "=".
\s*     - Then there might be some more spaces,
"?      - and even a '"',
\s?     - with another possible space.
(\d+)   - Finally, capture the digits that come along!

3       - Produce an array of every match found
For me a Regex is indispensable here as it allows you to get around the fact that the number of spaces (and even their very existence) is completely variable. ;) Edited by Melba23
Added decode
2 people like this

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Chimp,

This seems to do the trick: :)

#include <Array.au3>

$sText = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF & _
'<td rowspan=   " 2 ">' & @CRLF & _
'<td style="width:400px;" colspan   =    "5   " ...</td>'

$aExtract = StringRegExp($sText, '(?i)span\s*=\s*"?\s?(\d+)', 3)

_ArrayDisplay($aExtract, "", Default, 8)
M23

 

 

Thanks Melba23

It's nearly what I need!

I wrote colspan on one line for mistake, but the word is exactly rowspan, also I need to parse one line at time, not multiline, so how can be the regexp adapted to simply parse only one line at time and an exact word (not *span)

thanks

edit:

also, the quote of the number could be without quote or single quotes or double quotes.

thanks

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

That code should work on single or multiline.

#include <Array.au3>

$sText = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF & _
'<td rowspan=   " 2 ">' & @CRLF & _
'<td style="width:400px;" colspan   =    "5   " ...</td>'

$aExtract = StringRegExp($sText, '(?i)rowspan\s*=\s*"?\s?(\d+)', 3)

_ArrayDisplay($aExtract, "", Default, 8)
Edited by JohnOne

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites

Thanks JohnOne

.... just writing exactly "rowspan" in the pattern is enough?
what (?i) stands for?


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

 ...... I'm terrible at regex. .....

 

me too

I think that I should decide to study RegExp one of this days.... (the regexp patterns scare me)


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Yes, "(?i)" indicated case-insensitive mode. From the StringRegExp manual:

 

Caseless: matching becomes case-insensitive from that point on. By default, matching is case-sensitive. When UCP is enabled casing applies to the entire Unicode plane 0, else applies by default to ASCII letters A-Z and a-z only.

 

Regex is soooo sweet  :huggles:  I used to have this instinctive fear of it as well, but once I started using it daily it really took off. It's unbelievably powerful, especially when used with scripting languages like awk/sed/perl on linux machines. I now need and use it pretty much daily for work and have used it countless times in private life as well to perform all kinds of data mining tasks. Getting over that small difficulty spike at the beginning is well worth it :)

By the way, just install the nifty little freeware (thought it was freeware, apparently not, my company just has a license) tool called "expresso", and you can toy around with your data set and the regex JohnOne/Melba provided, and see a breakdown of what every item in the pattern actually does and have immediate results. Great for tweaking those longer regex patterns. (There are many other tools like that but IMHO expresso is just unbeatable.)

Also, I suggest getting a pillow cover printed with this regex cheatsheet. I once ordered a mousemat with it and it has served me well.

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Of course, you could have also just done a StringSplit on rowspan= followed by a StringSplit on spaces, to get the number.

Obviously for those who struggle with RegExp ... or don't want to expend the brainpower necessary ... like me.


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Share this post


Link to post
Share on other sites

Here, a quick test in expresso. I also refined your pattern somewhat, and included a line in your testset with multiple rowspans on the same line. Don't know what you want to do if you encounter those, don't know if you ever would encounter those, but still, it came to mind.

One thing you should realize is why this actually gets the numbers: it's because the d+, i.e. the 1 or more digits you are looking for, is (inside brackets). That's a "capture group". The StringRegExp with mode 3 ($STR_REGEXPARRAYGLOBALMATCH) returns an array of substrings matching that capture group.

One problem with regex is that there's quite a variation in the default behaviour of parsers, so if it's important, you always want to test as many scenarios as possible.

d69eZND.png

In AU3:

#include <Array.au3>

$s = '<TD ROWSPAN=3 BGCOLOR="#99CCFF">Sales</TD>' & @CRLF
$s = $s & '<td rowspan=   " 2 ">' & @CRLF
$s = $s & '<td style="width:400px;" rowspan   =    '' 5 '' ...</td>' & @CRLF
$s = $s & '<td rowspan = '' 6''</td><td rowspan = '' 7''</td>'

_ArrayDisplay(StringRegExp($s, "(?i)rowspan\s*=\s*[""']?\s*(\d+)", 3))

g9Hcdgx.png

Note, just to be complete in case you didn't already know: when including " in a "string", or a ' in a 'string', like in this pattern and in this example string, you need to double the quote to "escape" it and not break the string, otherwise you'll get syntax errors. So:

$s = "This string ""contains"" double doublequotes."
$s = 'This string ''contains'' double singlequotes.'

Hope this helps a bit in your understanding.


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Of course, you could have also just done a StringSplit on rowspan= followed by a StringSplit on spaces, to get the number.

Obviously for those who struggle with RegExp ... or don't want to expend the brainpower necessary ... like me.

 

When trying to get specific substrings from unpredictable input it's often well worth the trouble. Before you know it you're spending much more brainpower and coding time on things like "how to stringsplit if there's maybe a single or double quote between the rowspan= and the number".

Furthermore, I have literally built regex patterns in my dreams. Choosing Regex is not a question of calculating whether it's worth to expend brainpower or not, it's a way of life  :thumbsup:


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Well, I've survived quite well up until now, rarely using it.

A simple replace for quotes does the trick, and you can also include a StringIsDigit if you want.

Barely any thought in doing any of that.

But hey, if others want to use RegExp, go right ahead ... I'll even sit back and admire how clever you are, while still rarely bothering myself. In fact, I see it as almost a challenge, to not use RegExp these days, as so many seem so proficient at it. o:)

For those who struggle though, especially newbies to the finer art of programming, it always pays to give them a simple alternative too. Let them pick the one they are most comfortable with, especially if there is a need to adapt.


AutoIt.4.Life Clubrooms - Life is like a Donut (secret key)

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Share this post


Link to post
Share on other sites

Well, I've survived quite well up until now, rarely using it.

A simple replace for quotes does the trick, and you can also include a StringIsDigit if you want.

Barely any thought in doing any of that.

But hey, if others want to use RegExp, go right ahead ... I'll even sit back and admire how clever you are, while still rarely bothering myself. In fact, I see it as almost a challenge, to not use RegExp these days, as so many seem so proficient at it. o:)

For those who struggle though, especially newbies to the finer art of programming, it always pays to give them a simple alternative too. Let them pick the one they are most comfortable with, especially if there is a need to adapt.

 

Obviously. Of course, to each his own  :huggles:

But I just can't imagine life without regex any more. A large part of my job, the part where I parse and process customer input (inherently unpredictable, weird character sets, possible injection attempts, accitental copypastes of the entire King James bible, etc) would be literally impossible without it  :sweating: 


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

I found this the other day:

http://regex.inginf.units.it/#

It automatically creates regular expression pattern. It's not going to work every time but it might help

1 person likes this

Share this post


Link to post
Share on other sites

Lots of ways to create a working expression:

"(?is)(?:row|col)span\h*=\h*(?:'|"")?\h*(\d+)"

(?is) = case insensitive, work through any type of new line sequence (html is famous for not working as expected without using this)

(?:row|col) = non-capturing group to select rowspan or colspan

h* = work through any horizontal space if it exists

(?:'|")? = non-capturing group to look for a single or double quote, the "?" after is basically saying... May or may not exist

(d+) = capture group (give me my digits please)

....

Btw, didn't I provide a pattern in the the udf funcs I did for you with _htmlraw_* that did something like this?

1 person likes this

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

What about negated types ?

$word = "rowspan"
$res = StringRegExp($s, '(?i)\Q' & $word & '\E\s*=\D*(\d+)', 3)

:)

1 person likes this

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

@Melba23
Thanks again for your help,
also, very instructive the "SRE decode".

Thanks

@SadBunny
I think your regexp is exactly what I was looking for
"(?i)rowspans*=s*[""']?s*(d+)"
it also catch numbers when are double quoted or single quoted as well.
also appreciated the bonus extra illustrated explanation... :)
Thanks

@SmOke_N
the listing in >that post you provided has been the first place where I searched,
but the RegExp contained therein "(?is)<s*(?:td|th)h+rowspan=(?:x22|x27)(d+)(?:x22|x27)s*>" is a bit complicated for my knowledges and maybe not general purpose?

(p.s. I'm still working on that table extraction function and I'm nearly to my wanted result. I will post there in short (spare time allowing))
Thanks

What about negated types ?

$word = "rowspan"
$res = StringRegExp($s, '(?i)\Q' & $word & '\E\s*=\D*(\d+)', 3)

:)

 

.... emmm :ermm: .... maybe yes :unsure:  thanks :)

@JohnOne @TheSaint @oapjr
Thanks for the appreciated contributions

Edited by Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

.... emmm :ermm: .... maybe yes :unsure:  thanks :)

 

Oh sorry for the lack of comments, my f* laziness...  :)

You want to get digits, so you can use D* (0 or more non-digit chars) after the "=" to match spaces and quotes

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now