Sign in to follow this  
Followers 0
wimhek

Regexp question

8 posts in this topic

Pleas help me , I am converting HTML to csv using the command stringreg exp. In the example belot, the field Help is not detected.

How to change my regexp ?

#include <Array.au3>

$sString = "<td NOWRAP>cel1</td><td NOWRAP>cel2</td><td NOWRAP>cel3</td><td>Help</td><td NOWRAP>cel4</td>"

$aReturn = StringRegExp($sString, '(?s)(?i)<td NOWRAP>(.*?)</td>', 3)

_ArrayDisplay($aReturn)

thnx.

Share this post


Link to post
Share on other sites

I assume you use Internet Explorer as browser. Then you could use the builtin IE UDF, fucntion _IETableWriteToArray, to read the content of a table into an array (for further processing).


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Thnx I will try that

Share this post


Link to post
Share on other sites

#include <Array.au3>

$sString = "<td NOWRAP>cel1</td><td NOWRAP>cel2</td><td NOWRAP>cel3</td><td>Help</td><td NOWRAP>cel4</td>"

$aReturn = StringRegExp($sString, '(?s)(?i)<td(?: NOWRAP)?>(.*?)</td>', 3)
_ArrayDisplay($aReturn)

;orelse to get only the Help

$aReturn = StringRegExp( $sString , '<td>(.*?)</td>', 3 )
_ArrayDisplay($aReturn)
Ask if you don't get the code


My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites

Super, it works. I do not get the code, but that is my restriction :-)

Share this post


Link to post
Share on other sites

winhek,

Here's a couple alternatives

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><br><td>Help</td><td NOWRAP>cel4</td>"
; to get everything that is not HTML
$aReturn = StringRegExp($sString, '(?si)>([^<].*?)<', 3)
 _ArrayDisplay($aReturn, 'All NON-HTML')
 ; get any non-HTML that begins with the string "help"
 $aReturn = StringRegExp($sString, '(?s)(?i)(help.*?)<', 3)
 _ArrayDisplay($aReturn,'Help Only')
 ;==================================================================================
 ;
 ; REGEXP Experts - How would I get get any non-HTML that contains the string "help"
 ;
 ; I've tried multiple variations of the following without success
 ;
 ;===================================================================================
 $aReturn = StringRegExp($sString, '(?si)>([^>].*?help.*?)<', 3)
 _ArrayDisplay($aReturn,'Help Only')

@SRE Experts - I can't figure out how to get the third example to work. I am trying to get any non-HTML containing a string.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

An Example

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><td>Help</td><td NOWRAP>cel4</td>"
Local $a, $aReturn = StringRegExp($sString, '>([^<>]+)<', 4), $aRet[1]
For $i = 0 To UBound($aReturn) - 1
$a = $aReturn[$i]
If StringInStr($a[1], "help") Then _ArrayAdd($aRet, $a[1])
Next
_ArrayDelete($aRet, 0 )
_ArrayDisplay($aRet)

Direct Approach

#include <Array.au3>
$sString = "<td NOWRAP>cel1</td><td NOWRAP>something before Help1</td><td NOWRAP>help,once again</td><td>Help</td><td NOWRAP>cel4</td>"
$aReturn = StringRegExp($sString, '(?i)>([^<>]*?help[^<>]*?)<', 3)
_ArrayDisplay($aReturn)

Regards :)

Edited by PhoenixXL

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

@PhoenixXL,

I see it now. I was negating the "<" and ">", but then matching on any char "." (which is probably contradictory).

Thanks,

kylomas

edit: additional question

This pattern also works

'(?si)>([^<>]*?help.*?)<'

Because the "<" is the first char encountered following "help"???? Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • ShawnW
      By ShawnW
      I have a script that takes a large excel file, pulls out and reorganizes certain information I need, and spits out a trimmed down csv file which I uses to upload the information on my website. Some of this information contains characters with accents or em dashes. By default it would create a csv file in ANSI which I then uploaded but had to tell my website import system it was windows-1252 in order for it to look correct.
      This was all working fine except now I need to add in a non-breaking space and non-breaking hyphen into parts of my output. At first I tried using ChrW(0xA0) and ChrW(0x2011) as replacements. A quick test in the console looked correct, however opening the csv output in notepad++ showed the space correctly but a ? for the hyphen and the file was still encoded as ANSI. I tried to view it as UTF-8 instead but this just made the space appear as xAO and also other characters appeared that way like my em dashes appeared as x97 and another symbol as xA7 etc.
      If I instead do a convert to UTF-8 from notepad++ then those problems go away except the hyphen still displays as ?. I then noticed on the page I linked for the non-breaking hyphen it lists the UTF-8 hex as 0xE2 0x80 0x91 (e28091). I was unsure how to enter this in autoit but several things i tried all failed to get the hyphen inserted.
      I need a way to get both the space and hyphen added correctly as either ANSI or UTF-8, but if it is UTF-8 then I need a way to convert all of the other data I extracted from the excel file.
      I've included a test excel file with a single line and test script to create a csv demonstrating the problem.
      test.xlsx
      test.au3
    • Miliardsto
      By Miliardsto
      Hello . How to do that
      $regexp = starts from "abcdef" and after this could be anything in name
      WinActivate($regexp)
    • Robinson1
      By Robinson1
      Well the plan is to use the power of regular expressions engine of AutoIT for patching binary data.
      Something like this: StringRegExp( $BinaryData,  "(?s)\x55\x8B.."
       
      <cut> ... Okay straight to question/problem
      ... certain bytes that are in the range from 0x80 to 0xA0 won't match.
      Hmm seem to be a char encoding problem. In detail these are 27 chars: 0x80, 0x82~8C, 0x8E, 0x91~9C, 0x9E,0x9F
      Here's a small code snippet to explore / explain this problem:
      #include "StringConstants.au3" $TestData = BinaryToString("0x7E7F808182") ;Okay $match = StringRegExp( $TestData ,'\x7E' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x7F' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x80' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x81' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x82' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;~ output: ;~ @extended = 2 $match = ;~ @extended = 3 $match = ;~ @extended = 0 $match = 1 ;~ @extended = 5 $match = ;~ @extended = 0 $match = 1 Hmm what to do? Go back and use the 'numberstring monster' implementation or just omit that range of 'unsafe bytes'. What is the root of this problem?
      Any idea how to fix this?
       
      Update: Okay I know a byte is not a character.
      But StringRegExp operates on String and so character level.
      Okay as long as you stay at Ansi encoding and only use /x00 - /X7F in the search pattern using  StringRegExp works well to search for binary data.
      What bytes can be matched that are in the range from /X7F - /xFF is also depending on the code page.
      So this avoid to search for bytes in the range from 0x80-0xa0 only applies to Germany.
      I just change this country setting:

      to Thai and now near all bytes from /X7F - /xFF fails to match.
    • RichardL
      By RichardL
      Text in a file, read into var with fileread:
      <> <> <> <> < J please look > <> <> <> Hi, 
      I want  a RegExp to select around 'please', back to the previous < and forward to the next >.  I can select the line of text.  Then I add in (?s) and it selects the whole text.  I think I want to make it not greedy, (?U) , that seems to make it ungreedy after, but it still selects all the previous lines.
      $sPattern = "(?s)<.*please.*>" ; 1 $sPattern = "(?s)<(?U).*please.*>" ; 2 $sPattern = "(?s)<(?U).*please(?U).*>" ; 3 $sAry = StringRegExp($sHTML, $sPattern, 3)  
    • fopetesl
      By fopetesl
      I have a csv file with delimiters "," and @CRLF
      After trying several different examples (simple ones!) I still haven't resolved an answer
      #include <Array.au3> #include <File.au3> ;#include "ArrayMultiColSort.au3" Global $BatchDir = "C:\ncat\" FileChangeDir($BatchDir) $sFile = "mod1.csv" ; Read file into a 2D array Global $aArray _FileReadToArray($sFile, $aArray, $FRTA_NOCOUNT, ",") ; And here it is _ArrayDisplay($aArray, "Original", Default, 8) Firstly it throws a MsgBox error "No array variable passed to function" _ArrayDisplay(), so no displayed data
      Second and probably more important how do I get _FileReadToArray() to split the imported array[][] into rows and columns?
      I tried "," & @CRLF without success.