Kyan

Regex / SRE conditional

17 posts in this topic

#1 ·  Posted (edited)

Hi, I never needed a "if" condition in SRE, how can I discard empty users from a list and just capture the ones with value?

Example:

---

Alice has $200
Bob has $10
John has null

----

I tried with

$tex='Alice has $200'&@CRLF&'Bob has $10'&@CRLF&'John has null'
$x=StringRegExp($tex,'(?i)^(.+?)\hhas\h(?:(?=:null)|(.+?))$',3)
_ArrayDisplay($x)
Exit

But doesn't work

Is this even possible to do with regex?

Seen that conditional SRE existed on those websites:

http://www.regular-expressions.info/conditional.html

http://www.rexegg.com/regex-conditionals.html

Edited by Kyan

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites



It seems you don't need a conditional pattern for this.

Just use (?m) and capture values from line containing a price $d+ at the end of line

#Include <Array.au3>

$sString = "Alice has $200" & @CRLF & _ 
           "Bob has $10" & @CRLF & _ 
           "John has null"

$aValues = StringRegExp($sString, "(?im)^([a-z-]+).*\$(\d+)", 3)
Local $aResult[UBound($aValues) / 2][2]
For $i = 0 To UBound($aValues) - 1 Step 2
    $aResult[$i / 2][0] = $aValues[$i]
    $aResult[$i / 2][1] = $aValues[$i + 1]
Next

_ArrayDisplay($aResult)
1 person likes this

Share this post


Link to post
Share on other sites

@jguinch, yeah, thats a good idea :)

@Malkey, can you tell me why when I add groups to ".+" it splits things up? if this only works that way it means I need to add non capturing groups to everything else that is matching something?


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

If your example data is representative, and if you want the full lines returned, it seems like you can just grab anything that doesn't end in an 'l' (lowercase L):

$x = StringRegExp($tex, "(?m)^.*[^l]$", 3)
_ArrayDisplay($x)
Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

@SadBunny, "(?m).+hhash.+(?!null)$", the case is, you gonna get all those values but not when they are null, there's no other workaround in the real case.

I can post the real case, guess it can be posted here

text: <a href="#" class="ValuesLst"...

I want to exclude all the links with "#" through regex (excluding post Do Loop), so how can I capture all links except the one with "#"


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
I can post the real case, guess it can be posted here

 

You should do it... when dealing with regex the description of initial text, requirements and expected results must be as precise as possible

Share this post


Link to post
Share on other sites

What does this mean

(excluding post Do Loop),

 

?


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

And what about the "John has null" relevance with the <a href="#" links ?  :)

Share this post


Link to post
Share on other sites

@kylomas, is like saying "post work" I wrote it without looking if was the same meaning in english, but Google Translate says so. By it I was meaning "without the need of a Do Loop after executing the regex. Like:

$x = StringRegExp(....)
local $I=0
Do
    if $x[$i] = 'null' then _ArrayDelete($x,$i)
    $i+=1
until $i>(ubound($x)-1)

@mikel, I explained it on comment #6, I'm doing this '<a href="(.+?)" class="ValuesLst"' but I don't want to capture "#" links :) the way of thinking is the same, you want to capture all but excluding the one case.


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

So which part of the link do you want captured exactly? Please give a couple of examples of actual input and the actual output you desire from those lines. Do you want the entire line? Do you just want the href target? Also, what is it exactly you want to skip? Any line containing anchors exactly equal to href="#"? Or any <a href...>...</a> construct containing that? (If there's more than one per line...)

Finally: in the beginning of this thread we didn't know you were parsing HTML. Regex is not a good tool to parse HTML. Read this stackoverflow answer for a very eloquent and linguistically dexterous explanation of that fact. Unless you know exactly what your HTML is going to look like and that it's going to be valid, you will run into problems by doing this.

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

All from .ValuesLst except the ones with "#"
 

#include <Array.au3>
$b64HTML='CTx0ZD48YSBocmVmPSJodHRwOi8vaW1ndXIuY29tL2tpWWFvdzEiIHRhcmdldD0iX2JsYW5rIiBjbGFzcz0iVmFsdWVzTHN0Ij48aW1' & _
    'nIHNyYz0iL2ltYWdlcy9MaXN0SWNvbi5wbmciIGJvcmRlcj0nMCcgLz48L2E+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlP' & _
    'SJTdGF0dXMiIGhyZWY9IiMiPjwvYT48L3RkPgkNCgkgPC90cj48dHI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2x' & _
    'hc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+PHRkPjxhIGNsY' & _
    'XNzPSJ0b29sdGlwIiB0aXRsZT0iU3RhdHVzIiBocmVmPSIjIj48L2E+PC90ZD4JDQoJIDwvdHI+PHRyICBjbGFzcz0ib2RkIj4NCgkJPHR' & _
    'kPjxpbWcgc3JjPSIvaW1hZ2VzL25hbi5naWYiIC8+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlPSJTdGF0dXMiIGhyZWY9I' & _
    'iMiPjxpbWcgc3JjPSIvaW1hZ2VzL3VuYXZhaWxhYmxlLnBuZyIgLz48L2E+PC90ZD4JDQoJIDwvdHI+PHRyPg0KCQk8dGQ+PGEgaHJlZj0' & _
    'iL2dyYXBocy8zOTI3MzczLnBuZyIgdGFyZ2V0PSJfYmxhbmsiPjxpbWcgc3JjPSIvaW1hZ2VzL0xpc3RJY29uLnBuZyIgYm9yZGVyPScwJ' & _
    'yAvPjwvYT48L3RkPg0KCSA8L3RyPjx0ciAgY2xhc3M9Im9kZCI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M' & _
    '9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+DQoJIDwvdHI+PHRyP' & _
    'gkNCgkJPHRkPjxhIGhyZWY9Ii80MkNCODlBQjA0QUI4MzRCIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM' & _
    '9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+'
$sHTML = BinaryToString(_Base64Decode($b64HTML))
$rex=StringRegExp($sHTML,'(?im)href="(.+?)"[^>]+?target="[^"]+?"[^>]+?class="ValuesLst"',3)
_ArrayDisplay($rex)
Exit

Func _Base64Decode($input_string) ; by trancexx
    Local $struct = DllStructCreate('int')
    Local $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', 0, 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0)
    If @error Or Not $a_Call[0] Then Return SetError(1, 0, '')
    Local $a = DllStructCreate('byte[' & DllStructGetData($struct, 1) & ']')
    $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', DllStructGetPtr($a), 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0)
    If @error Or Not $a_Call[0] Then Return SetError(2, 0, '')
    Return DllStructGetData($a, 1)
EndFunc   ;==>_Base64Decode

output:

Row|Col 0
[0]|http://imgur.com/kiYaow1
[1]|#
[2]|#
[3]|/42CB89AB04AB834B

I don't want to capture the [1] and [2] :/

EDIT: This should've worked '(?im)href="(?<!:#)(.*?)"[^>]+?target="[^"]+?"[^>]+class="ValuesLst"', don't know how to do it :|

Edited by Kyan

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites

Try this RE pattern

'(?i)href="([^#"]+).+?class="ValuesLst"'
1 person likes this

Share this post


Link to post
Share on other sites

It works! thank you :D
 
Wouldn't be possible to do this with a "if" in regex like '(?i)href="(?(?<!#).+?)" class="ValuesLst"'?


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites

$rex=StringRegExp($sHTML,'(?i)href="(?(?!#)([^"]+)|mikell_was_here).+?ValuesLst', 3)

A little overcomplicated  :)

1 person likes this

Share this post


Link to post
Share on other sites

nice nice, if is not a "#" capture everything except quotes, else mikell_was_here xD


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now