Jump to content
Sign in to follow this  
wimhek

Regexp question

Recommended Posts

wimhek

Pleas help me , I am converting HTML to csv using the command stringreg exp. In the example belot, the field Help is not detected.

How to change my regexp ?

#include <Array.au3>

$sString = "<td NOWRAP>cel1</td><td NOWRAP>cel2</td><td NOWRAP>cel3</td><td>Help</td><td NOWRAP>cel4</td>"

$aReturn = StringRegExp($sString, '(?s)(?i)<td NOWRAP>(.*?)</td>', 3)

_ArrayDisplay($aReturn)

thnx.

Share this post


Link to post
Share on other sites
water

I assume you use Internet Explorer as browser. Then you could use the builtin IE UDF, fucntion _IETableWriteToArray, to read the content of a table into an array (for further processing).


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites
PhoenixXL

#include <Array.au3>

$sString = "<td NOWRAP>cel1</td><td NOWRAP>cel2</td><td NOWRAP>cel3</td><td>Help</td><td NOWRAP>cel4</td>"

$aReturn = StringRegExp($sString, '(?s)(?i)<td(?: NOWRAP)?>(.*?)</td>', 3)
_ArrayDisplay($aReturn)

;orelse to get only the Help

$aReturn = StringRegExp( $sString , '<td>(.*?)</td>', 3 )
_ArrayDisplay($aReturn)
Ask if you don't get the code


My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites
wimhek

Super, it works. I do not get the code, but that is my restriction :-)

Share this post


Link to post
Share on other sites
kylomas

winhek,

Here's a couple alternatives

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><br><td>Help</td><td NOWRAP>cel4</td>"
; to get everything that is not HTML
$aReturn = StringRegExp($sString, '(?si)>([^<].*?)<', 3)
 _ArrayDisplay($aReturn, 'All NON-HTML')
 ; get any non-HTML that begins with the string "help"
 $aReturn = StringRegExp($sString, '(?s)(?i)(help.*?)<', 3)
 _ArrayDisplay($aReturn,'Help Only')
 ;==================================================================================
 ;
 ; REGEXP Experts - How would I get get any non-HTML that contains the string "help"
 ;
 ; I've tried multiple variations of the following without success
 ;
 ;===================================================================================
 $aReturn = StringRegExp($sString, '(?si)>([^>].*?help.*?)<', 3)
 _ArrayDisplay($aReturn,'Help Only')

@SRE Experts - I can't figure out how to get the third example to work. I am trying to get any non-HTML containing a string.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
PhoenixXL

An Example

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><td>Help</td><td NOWRAP>cel4</td>"
Local $a, $aReturn = StringRegExp($sString, '>([^<>]+)<', 4), $aRet[1]
For $i = 0 To UBound($aReturn) - 1
$a = $aReturn[$i]
If StringInStr($a[1], "help") Then _ArrayAdd($aRet, $a[1])
Next
_ArrayDelete($aRet, 0 )
_ArrayDisplay($aRet)

Direct Approach

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><td>Help</td><td NOWRAP>cel4</td>"
$aReturn = StringRegExp($sString, '(?i)>([^<>]*?help[^<>]*?)<', 3)
_ArrayDisplay($aReturn)

Regards :)

Edited by PhoenixXL

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites
kylomas

@PhoenixXL,

I see it now. I was negating the "<" and ">", but then matching on any char "." (which is probably contradictory).

Thanks,

kylomas

edit: additional question

This pattern also works

'(?si)>([^<>]*?help.*?)<'

Because the "<" is the first char encountered following "help"???? Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • SOF-TECH
      By SOF-TECH
      Dear all,
      Can someone show  me how to en hance the below function to write in CSV  into column  and rows the input values ? 
      I am getting this result: 

      I would like the result to be as this 

      From A1:C1 is for headers
      From A2:C2 is for input Data
      Global Const $GUI_EVENT_CLOSE = -3 $sDataFilePath = @ScriptDir & "\Records.csv" #region ### START Koda GUI section ### Form= $Form1 = GUICreate("Demo1: New Record", 580, 115) $Input1 = GUICtrlCreateInput("", 10, 30, 270, 21) $Input2 = GUICtrlCreateInput("", 300, 30, 270, 21) $Input3 = GUICtrlCreateInput("", 10, 80, 270, 21) $Label1 = GUICtrlCreateLabel("Name:", 10, 10, 35, 17) $Label2 = GUICtrlCreateLabel("ID:", 300, 10, 18, 17) $Label3 = GUICtrlCreateLabel("Phone No:", 10, 60, 55, 17) $Button1 = GUICtrlCreateButton("Save to CSV", 450, 70, 120, 30) GUISetState(@SW_SHOW) #endregion ### END Koda GUI section ### While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit Case $Button1 _ExportData() MsgBox(64, @ScriptName, "Record Saved.") EndSwitch WEnd Func _ExportData() If Not FileExists($sDataFilePath) Then FileWriteLine($sDataFilePath, "Name;ID;Phone No.;") EndIf For $i = $Input1 To $Input3 FileWrite($sDataFilePath, GUICtrlRead($i) & ";") Next FileWriteLine($sDataFilePath, "") EndFunc ;==>_ExportData May be Excel UDF has be to be added but I can manage that my self  
      Thank you in advance
    • PClough
      By PClough
      Hi everyone!
      After updating autoit, I tried to run an old program using complex regexp's.  It did not work.  Eventually I broke the problem down to this example:
       
      #include <Array.au3> $buf = "First title" & @CRLF & "Tom" & Chr(0x92) & "s sleepwalking" & @CRLF & "Last | line" & @CRLF $items = StringRegExp($buf, '([\x20-\xff]+)\x0d\x0a', 3) _ArrayDisplay($items,'') And this is the result I get when running it:
      Row 0
       
    • ShawnW
      By ShawnW
      I have a script that takes a large excel file, pulls out and reorganizes certain information I need, and spits out a trimmed down csv file which I uses to upload the information on my website. Some of this information contains characters with accents or em dashes. By default it would create a csv file in ANSI which I then uploaded but had to tell my website import system it was windows-1252 in order for it to look correct.
      This was all working fine except now I need to add in a non-breaking space and non-breaking hyphen into parts of my output. At first I tried using ChrW(0xA0) and ChrW(0x2011) as replacements. A quick test in the console looked correct, however opening the csv output in notepad++ showed the space correctly but a ? for the hyphen and the file was still encoded as ANSI. I tried to view it as UTF-8 instead but this just made the space appear as xAO and also other characters appeared that way like my em dashes appeared as x97 and another symbol as xA7 etc.
      If I instead do a convert to UTF-8 from notepad++ then those problems go away except the hyphen still displays as ?. I then noticed on the page I linked for the non-breaking hyphen it lists the UTF-8 hex as 0xE2 0x80 0x91 (e28091). I was unsure how to enter this in autoit but several things i tried all failed to get the hyphen inserted.
      I need a way to get both the space and hyphen added correctly as either ANSI or UTF-8, but if it is UTF-8 then I need a way to convert all of the other data I extracted from the excel file.
      I've included a test excel file with a single line and test script to create a csv demonstrating the problem.
      test.xlsx
      test.au3
    • Miliardsto
      By Miliardsto
      Hello . How to do that
      $regexp = starts from "abcdef" and after this could be anything in name
      WinActivate($regexp)
×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.