Jump to content

Regex / SRE conditional


Go to solution Solved by Malkey,

Recommended Posts

Hi, I never needed a "if" condition in SRE, how can I discard empty users from a list and just capture the ones with value?

Example:

---

Alice has $200
Bob has $10
John has null

----

I tried with

$tex='Alice has $200'&@CRLF&'Bob has $10'&@CRLF&'John has null'
$x=StringRegExp($tex,'(?i)^(.+?)\hhas\h(?:(?=:null)|(.+?))$',3)
_ArrayDisplay($x)
Exit

But doesn't work

Is this even possible to do with regex?

Seen that conditional SRE existed on those websites:

http://www.regular-expressions.info/conditional.html

http://www.rexegg.com/regex-conditionals.html

Edited by Kyan

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

It seems you don't need a conditional pattern for this.

Just use (?m) and capture values from line containing a price $d+ at the end of line

#Include <Array.au3>

$sString = "Alice has $200" & @CRLF & _ 
           "Bob has $10" & @CRLF & _ 
           "John has null"

$aValues = StringRegExp($sString, "(?im)^([a-z-]+).*\$(\d+)", 3)
Local $aResult[UBound($aValues) / 2][2]
For $i = 0 To UBound($aValues) - 1 Step 2
    $aResult[$i / 2][0] = $aValues[$i]
    $aResult[$i / 2][1] = $aValues[$i + 1]
Next

_ArrayDisplay($aResult)
Link to comment
Share on other sites

@jguinch, yeah, thats a good idea :)

@Malkey, can you tell me why when I add groups to ".+" it splits things up? if this only works that way it means I need to add non capturing groups to everything else that is matching something?

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

If your example data is representative, and if you want the full lines returned, it seems like you can just grab anything that doesn't end in an 'l' (lowercase L):

$x = StringRegExp($tex, "(?m)^.*[^l]$", 3)
_ArrayDisplay($x)
Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

@SadBunny, "(?m).+hhash.+(?!null)$", the case is, you gonna get all those values but not when they are null, there's no other workaround in the real case.

I can post the real case, guess it can be posted here

text: <a href="#" class="ValuesLst"...

I want to exclude all the links with "#" through regex (excluding post Do Loop), so how can I capture all links except the one with "#"

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

@kylomas, is like saying "post work" I wrote it without looking if was the same meaning in english, but Google Translate says so. By it I was meaning "without the need of a Do Loop after executing the regex. Like:

$x = StringRegExp(....)
local $I=0
Do
    if $x[$i] = 'null' then _ArrayDelete($x,$i)
    $i+=1
until $i>(ubound($x)-1)

@mikel, I explained it on comment #6, I'm doing this '<a href="(.+?)" class="ValuesLst"' but I don't want to capture "#" links :) the way of thinking is the same, you want to capture all but excluding the one case.

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

So which part of the link do you want captured exactly? Please give a couple of examples of actual input and the actual output you desire from those lines. Do you want the entire line? Do you just want the href target? Also, what is it exactly you want to skip? Any line containing anchors exactly equal to href="#"? Or any <a href...>...</a> construct containing that? (If there's more than one per line...)

Finally: in the beginning of this thread we didn't know you were parsing HTML. Regex is not a good tool to parse HTML. Read this stackoverflow answer for a very eloquent and linguistically dexterous explanation of that fact. Unless you know exactly what your HTML is going to look like and that it's going to be valid, you will run into problems by doing this.

Edited by SadBunny

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

All from .ValuesLst except the ones with "#"
 

#include <Array.au3>
$b64HTML='CTx0ZD48YSBocmVmPSJodHRwOi8vaW1ndXIuY29tL2tpWWFvdzEiIHRhcmdldD0iX2JsYW5rIiBjbGFzcz0iVmFsdWVzTHN0Ij48aW1' & _
    'nIHNyYz0iL2ltYWdlcy9MaXN0SWNvbi5wbmciIGJvcmRlcj0nMCcgLz48L2E+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlP' & _
    'SJTdGF0dXMiIGhyZWY9IiMiPjwvYT48L3RkPgkNCgkgPC90cj48dHI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2x' & _
    'hc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+PHRkPjxhIGNsY' & _
    'XNzPSJ0b29sdGlwIiB0aXRsZT0iU3RhdHVzIiBocmVmPSIjIj48L2E+PC90ZD4JDQoJIDwvdHI+PHRyICBjbGFzcz0ib2RkIj4NCgkJPHR' & _
    'kPjxpbWcgc3JjPSIvaW1hZ2VzL25hbi5naWYiIC8+PC90ZD48dGQ+PGEgY2xhc3M9InRvb2x0aXAiIHRpdGxlPSJTdGF0dXMiIGhyZWY9I' & _
    'iMiPjxpbWcgc3JjPSIvaW1hZ2VzL3VuYXZhaWxhYmxlLnBuZyIgLz48L2E+PC90ZD4JDQoJIDwvdHI+PHRyPg0KCQk8dGQ+PGEgaHJlZj0' & _
    'iL2dyYXBocy8zOTI3MzczLnBuZyIgdGFyZ2V0PSJfYmxhbmsiPjxpbWcgc3JjPSIvaW1hZ2VzL0xpc3RJY29uLnBuZyIgYm9yZGVyPScwJ' & _
    'yAvPjwvYT48L3RkPg0KCSA8L3RyPjx0ciAgY2xhc3M9Im9kZCI+DQoJCTx0ZD48YSBocmVmPSIjIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M' & _
    '9IlZhbHVlc0xzdCI+PGltZyBzcmM9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+DQoJIDwvdHI+PHRyP' & _
    'gkNCgkJPHRkPjxhIGhyZWY9Ii80MkNCODlBQjA0QUI4MzRCIiB0YXJnZXQ9Il9ibGFuayIgY2xhc3M9IlZhbHVlc0xzdCI+PGltZyBzcmM' & _
    '9Ii9pbWFnZXMvTGlzdEljb24ucG5nIiBib3JkZXI9JzAnIC8+PC9hPjwvdGQ+'
$sHTML = BinaryToString(_Base64Decode($b64HTML))
$rex=StringRegExp($sHTML,'(?im)href="(.+?)"[^>]+?target="[^"]+?"[^>]+?class="ValuesLst"',3)
_ArrayDisplay($rex)
Exit

Func _Base64Decode($input_string) ; by trancexx
    Local $struct = DllStructCreate('int')
    Local $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', 0, 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0)
    If @error Or Not $a_Call[0] Then Return SetError(1, 0, '')
    Local $a = DllStructCreate('byte[' & DllStructGetData($struct, 1) & ']')
    $a_Call = DllCall('Crypt32.dll', 'int', 'CryptStringToBinary', 'str', $input_string, 'int', 0, 'int', 1, 'ptr', DllStructGetPtr($a), 'ptr', DllStructGetPtr($struct, 1), 'ptr', 0, 'ptr', 0)
    If @error Or Not $a_Call[0] Then Return SetError(2, 0, '')
    Return DllStructGetData($a, 1)
EndFunc   ;==>_Base64Decode

output:

Row|Col 0
[0]|http://imgur.com/kiYaow1
[1]|#
[2]|#
[3]|/42CB89AB04AB834B

I don't want to capture the [1] and [2] :/

EDIT: This should've worked '(?im)href="(?<!:#)(.*?)"[^>]+?target="[^"]+?"[^>]+class="ValuesLst"', don't know how to do it :|

Edited by Kyan

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

It works! thank you :D
 
Wouldn't be possible to do this with a "if" in regex like '(?i)href="(?(?<!#).+?)" class="ValuesLst"'?

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

nice nice, if is not a "#" capture everything except quotes, else mikell_was_here xD

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...