Jump to content

Regexp question


Recommended Posts

Pleas help me , I am converting HTML to csv using the command stringreg exp. In the example belot, the field Help is not detected.

How to change my regexp ?

#include <Array.au3>

$sString = "<td NOWRAP>cel1</td><td NOWRAP>cel2</td><td NOWRAP>cel3</td><td>Help</td><td NOWRAP>cel4</td>"

$aReturn = StringRegExp($sString, '(?s)(?i)<td NOWRAP>(.*?)</td>', 3)

_ArrayDisplay($aReturn)

thnx.

Link to comment
Share on other sites

I assume you use Internet Explorer as browser. Then you could use the builtin IE UDF, fucntion _IETableWriteToArray, to read the content of a table into an array (for further processing).

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

#include <Array.au3>

$sString = "<td NOWRAP>cel1</td><td NOWRAP>cel2</td><td NOWRAP>cel3</td><td>Help</td><td NOWRAP>cel4</td>"

$aReturn = StringRegExp($sString, '(?s)(?i)<td(?: NOWRAP)?>(.*?)</td>', 3)
_ArrayDisplay($aReturn)

;orelse to get only the Help

$aReturn = StringRegExp( $sString , '<td>(.*?)</td>', 3 )
_ArrayDisplay($aReturn)
Ask if you don't get the code

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Link to comment
Share on other sites

winhek,

Here's a couple alternatives

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><br><td>Help</td><td NOWRAP>cel4</td>"
; to get everything that is not HTML
$aReturn = StringRegExp($sString, '(?si)>([^<].*?)<', 3)
 _ArrayDisplay($aReturn, 'All NON-HTML')
 ; get any non-HTML that begins with the string "help"
 $aReturn = StringRegExp($sString, '(?s)(?i)(help.*?)<', 3)
 _ArrayDisplay($aReturn,'Help Only')
 ;==================================================================================
 ;
 ; REGEXP Experts - How would I get get any non-HTML that contains the string "help"
 ;
 ; I've tried multiple variations of the following without success
 ;
 ;===================================================================================
 $aReturn = StringRegExp($sString, '(?si)>([^>].*?help.*?)<', 3)
 _ArrayDisplay($aReturn,'Help Only')

@SRE Experts - I can't figure out how to get the third example to work. I am trying to get any non-HTML containing a string.

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

An Example

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><td>Help</td><td NOWRAP>cel4</td>"
Local $a, $aReturn = StringRegExp($sString, '>([^<>]+)<', 4), $aRet[1]
For $i = 0 To UBound($aReturn) - 1
$a = $aReturn[$i]
If StringInStr($a[1], "help") Then _ArrayAdd($aRet, $a[1])
Next
_ArrayDelete($aRet, 0 )
_ArrayDisplay($aRet)

Direct Approach

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><td>Help</td><td NOWRAP>cel4</td>"
$aReturn = StringRegExp($sString, '(?i)>([^<>]*?help[^<>]*?)<', 3)
_ArrayDisplay($aReturn)

Regards :)

Edited by PhoenixXL

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Link to comment
Share on other sites

@PhoenixXL,

I see it now. I was negating the "<" and ">", but then matching on any char "." (which is probably contradictory).

Thanks,

kylomas

edit: additional question

This pattern also works

'(?si)>([^<>]*?help.*?)<'

Because the "<" is the first char encountered following "help"???? Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...