Jump to content
Kyan

Regex / SRE conditional

Recommended Posts

Kyan

Hi, I never needed a "if" condition in SRE, how can I discard empty users from a list and just capture the ones with value?

Example:

---

Alice has $200
Bob has $10
John has null

----

I tried with

$tex='Alice has $200'&@CRLF&'Bob has $10'&@CRLF&'John has null'
$x=StringRegExp($tex,'(?i)^(.+?)\hhas\h(?:(?=:null)|(.+?))$',3)
_ArrayDisplay($x)
Exit

But doesn't work

Is this even possible to do with regex?

Seen that conditional SRE existed on those websites:

http://www.regular-expressions.info/conditional.html

http://www.rexegg.com/regex-conditionals.html

Edited by Kyan

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
jguinch

It seems you don't need a conditional pattern for this.

Just use (?m) and capture values from line containing a price $d+ at the end of line

#Include <Array.au3>

$sString = "Alice has $200" & @CRLF & _ 
           "Bob has $10" & @CRLF & _ 
           "John has null"

$aValues = StringRegExp($sString, "(?im)^([a-z-]+).*\$(\d+)", 3)
Local $aResult[UBound($aValues) / 2][2]
For $i = 0 To UBound($aValues) - 1 Step 2
    $aResult[$i / 2][0] = $aValues[$i]
    $aResult[$i / 2][1] = $aValues[$i + 1]
Next

_ArrayDisplay($aResult)
  • Like 1

Share this post


Link to post
Share on other sites
Kyan

@jguinch, yeah, thats a good idea :)

@Malkey, can you tell me why when I add groups to ".+" it splits things up? if this only works that way it means I need to add non capturing groups to everything else that is matching something?


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
SadBunny

If your example data is representative, and if you want the full lines returned, it seems like you can just grab anything that doesn't end in an 'l' (lowercase L):

$x = StringRegExp($tex, "(?m)^.*[^l]$", 3)
_ArrayDisplay($x)
Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites
Kyan

@SadBunny, "(?m).+hhash.+(?!null)$", the case is, you gonna get all those values but not when they are null, there's no other workaround in the real case.

I can post the real case, guess it can be posted here

text: <a href="#" class="ValuesLst"...

I want to exclude all the links with "#" through regex (excluding post Do Loop), so how can I capture all links except the one with "#"


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
mikell
I can post the real case, guess it can be posted here

 

You should do it... when dealing with regex the description of initial text, requirements and expected results must be as precise as possible

Share this post


Link to post
Share on other sites
kylomas

What does this mean

(excluding post Do Loop),

 

?


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
mikell

And what about the "John has null" relevance with the <a href="#" links ?  :)

Share this post


Link to post
Share on other sites
Kyan

@kylomas, is like saying "post work" I wrote it without looking if was the same meaning in english, but Google Translate says so. By it I was meaning "without the need of a Do Loop after executing the regex. Like:

$x = StringRegExp(....)
local $I=0
Do
    if $x[$i] = 'null' then _ArrayDelete($x,$i)
    $i+=1
until $i>(ubound($x)-1)

@mikel, I explained it on comment #6, I'm doing this '<a href="(.+?)" class="ValuesLst"' but I don't want to capture "#" links :) the way of thinking is the same, you want to capture all but excluding the one case.


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
SadBunny

So which part of the link do you want captured exactly? Please give a couple of examples of actual input and the actual output you desire from those lines. Do you want the entire line? Do you just want the href target? Also, what is it exactly you want to skip? Any line containing anchors exactly equal to href="#"? Or any <a href...>...</a> construct containing that? (If there's more than one per line...)

Finally: in the beginning of this thread we didn't know you were parsing HTML. Regex is not a good tool to parse HTML. Read this stackoverflow answer for a very eloquent and linguistically dexterous explanation of that fact. Unless you know exactly what your HTML is going to look like and that it's going to be valid, you will run into problems by doing this.

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites
Kyan

All from .ValuesLst except the ones with "#"
 

#include <Array.au3>
$b64HTML='CTx0ZD48YSBocmVmPSJodHRwOi8vaW1ndXIuY29tL2tpWWFvdzEiIHRhcmdldD0iX2JsYW5rIiBjbGFzcz0iVmFsdWVzTHN0Ij48aW1' & _
    'nIHNyYz0iL2ltYWdlcy9MaXN0SWNvbi5wbmciIGJvcmRlcj0nMCcgLz48L2E+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlP' & _
    'SJTdGF0dXMiIGhyZWY9IiMiPjwvYT48L3RkPgkNCgkgPC90cj48dHI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2x' & _
    'hc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+PHRkPjxhIGNsY' & _
    'XNzPSJ0b29sdGlwIiB0aXRsZT0iU3RhdHVzIiBocmVmPSIjIj48L2E+PC90ZD4JDQoJIDwvdHI+PHRyICBjbGFzcz0ib2RkIj4NCgkJPHR' & _
    'kPjxpbWcgc3JjPSIvaW1hZ2VzL25hbi5naWYiIC8+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlPSJTdGF0dXMiIGhyZWY9I' & _
    'iMiPjxpbWcgc3JjPSIvaW1hZ2VzL3VuYXZhaWxhYmxlLnBuZyIgLz48L2E+PC90ZD4JDQoJIDwvdHI+PHRyPg0KCQk8dGQ+PGEgaHJlZj0' & _
    'iL2dyYXBocy8zOTI3MzczLnBuZyIgdGFyZ2V0PSJfYmxhbmsiPjxpbWcgc3JjPSIvaW1hZ2VzL0xpc3RJY29uLnBuZyIgYm9yZGVyPScwJ' & _
    'yAvPjwvYT48L3RkPg0KCSA8L3RyPjx0ciAgY2xhc3M9Im9kZCI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M' & _
    '9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+DQoJIDwvdHI+PHRyP' & _
    'gkNCgkJPHRkPjxhIGhyZWY9Ii80MkNCODlBQjA0QUI4MzRCIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM' & _
    '9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+'
$sHTML = BinaryToString(_Base64Decode($b64HTML))
$rex=StringRegExp($sHTML,'(?im)href="(.+?)"[^>]+?target="[^"]+?"[^>]+?class="ValuesLst"',3)
_ArrayDisplay($rex)
Exit

Func _Base64Decode($input_string) ; by trancexx
    Local $struct = DllStructCreate('int')
    Local $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', 0, 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0)
    If @error Or Not $a_Call[0] Then Return SetError(1, 0, '')
    Local $a = DllStructCreate('byte[' & DllStructGetData($struct, 1) & ']')
    $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', DllStructGetPtr($a), 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0)
    If @error Or Not $a_Call[0] Then Return SetError(2, 0, '')
    Return DllStructGetData($a, 1)
EndFunc   ;==>_Base64Decode

output:

Row|Col 0
[0]|http://imgur.com/kiYaow1
[1]|#
[2]|#
[3]|/42CB89AB04AB834B

I don't want to capture the [1] and [2] :/

EDIT: This should've worked '(?im)href="(?<!:#)(.*?)"[^>]+?target="[^"]+?"[^>]+class="ValuesLst"', don't know how to do it :|

Edited by Kyan

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
Malkey

Try this RE pattern

'(?i)href="([^#"]+).+?class="ValuesLst"'
  • Like 1

Share this post


Link to post
Share on other sites
Kyan

It works! thank you :D
 
Wouldn't be possible to do this with a "if" in regex like '(?i)href="(?(?<!#).+?)" class="ValuesLst"'?


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites
mikell

$rex=StringRegExp($sHTML,'(?i)href="(?(?!#)([^"]+)|mikell_was_here).+?ValuesLst', 3)

A little overcomplicated  :)

  • Like 1

Share this post


Link to post
Share on other sites
Kyan

nice nice, if is not a "#" capture everything except quotes, else mikell_was_here xD


Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.