Jump to content

Replacing an escaped HTML character using StringRegExpReplace


Recommended Posts

I am trying to format some lines of imported HTML code which contain escaped characters, e.g.

Denial & # 38; Deprivation

(spaces introduced to allow the problem to be viewed)

should be

Denial & Deprivation

(A rather apt title, since I have been tearing my hair out on this!)

I have the following snippet of code:

$x = "Denial & Deprivation"
   $y = _FormatTitle($x)
   msgbox(0,"Why oh why doesn't this work!?!?!?!?",$y)
   
   Func _FormatTitle($s_input)
       Local $s_output = StringRegExpReplace($s_input, "&#(\d{2,3});", "\1")
       Return $s_output
   EndFunc

which extracts the decimal ascii of an ampersand quite nicely, but how on Earth do I convert/replace the decimal ascii with its equivalent (&) character????

I have tried StringRegExpReplace ( "test", "pattern", Chr("\1")) and StringRegExpReplace ( "test", "pattern", Chr(Eval("\1"))) and similar variants, but I cannot work this out for the life of me.

Please help!

Regards

DG

Link to comment
Share on other sites

im not very good at this but try....

$string = "Denial & # 38; Deprivation"

$res=StringRegExpReplace($string,"[!£$%^#;:'&*()]"," ")

$ss=StringSplit($res," ")
$nn=''
for $x=1 to $ss[0]
    if $ss[$x] <> "" Then
    if StringIsDigit($ss[$x])=0 then $nn&= $ss[$x]&" "
EndIf
Next
$str=StringTrimRight($nn,1)
ConsoleWrite(">>>"&$str&"<<<"&@lf)
MsgBox(0,"",$str,0)

EDIT: just impooved it

Edited by Aceguy
Link to comment
Share on other sites

I am trying to format some lines of imported HTML code which contain escaped characters, e.g.

(spaces introduced to allow the problem to be viewed)

should be

(A rather apt title, since I have been tearing my hair out on this!)

I have the following snippet of code:

$x = "Denial & Deprivation"
   $y = _FormatTitle($x)
   msgbox(0,"Why oh why doesn't this work!?!?!?!?",$y)
   
   Func _FormatTitle($s_input)
       Local $s_output = StringRegExpReplace($s_input, "&#(\d{2,3});", "\1")
       Return $s_output
   EndFunc

which extracts the decimal ascii of an ampersand quite nicely, but how on Earth do I convert/replace the decimal ascii with its equivalent (&) character????

I have tried StringRegExpReplace ( "test", "pattern", Chr("\1")) and StringRegExpReplace ( "test", "pattern", Chr(Eval("\1"))) and similar variants, but I cannot work this out for the life of me.

Please help!

Regards

DG

Not quite sure what you want... but this does what your example says:

$x = "DE & # 38; Dep"
   $y = _FormatTitle($x)
   msgbox(0,"Why oh why doesn't this work!?!?!?!?",$y)
   
   Func _FormatTitle($s_input)
       Local $s_output
       $s_output = StringRegExpReplace($s_input, "\s#\s([0-9])*;", "")
       Return $s_output
   EndFuncoÝ÷ Ù6¤xqá®Ël{¦¦WÙ¨v'òx"µ«­¢+ØÀÌØíàôÅÕ½ÐíµÀìÌàìÀÅÕ½Ðì(ÀÌØíäô}½ÉµÑQ¥Ñ± ÀÌØíà¤(µÍ½à À°ÅÕ½Ðí]¡ä½ Ý¡ä½Í¸ÌäíÐÑ¡¥ÌݽɬÌÌìüÌÌìüÌÌìüÌÌìüÅÕ½Ðì°ÀÌØíä¤((Õ¹}½ÉµÑQ¥Ñ± ÀÌØíÍ}¥¹ÁÕФ(1½°ÀÌØíÍ}½ÕÑÁÕÐ(ÀÌØíÍ}½ÕÑÁÕÐôMÑÉ¥¹IáÁIÁ± ÀÌØíÍ}¥¹ÁÕаÅÕ½ÐìÀäÈíÍlÌÌîÀÌØìxìèÌäìµÀì¨ ¥tÀäÈíÌ¡lÀ´åt¤¨ìÅÕ½Ðì°ÅÕ½ÐìÅÕ½Ðì¤(IÑÕɸÀÌØíÍ}½ÕÑÁÕÐ(¹Õ¹

Does one of these work? If not post here with more situations and we can help you better=)

Edited by Szhlopp
Link to comment
Share on other sites

StringRegExpReplace baffles me too......

getting there ,,,,,, what would be the answer to getting rid of the spaces and just leaving one.?

$string = "Denial & # 38; Deprivation"

$res=StringRegExpReplace($string,"\W", " ",0)
$res=StringRegExpReplace($res,"\d"," ",0)

ConsoleWrite($res&@lf)
;lots of spaces up till this point, ************

$ss=StringSplit($res," ")
$nn=''
for $x=1 to $ss[0]
    if $ss[$x] <> "" Then
   $nn&= $ss[$x]&" "
EndIf
Next
$str=StringTrimRight($nn,1)
ConsoleWrite(">>>"&$str&"<<<"&@lf)
MsgBox(0,"",$str,0)
Link to comment
Share on other sites

This takes all the fun out of it, but check this out: #301771

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Hi

I don't think I explained myself very well (as usual ;)), and it's difficult to describe because I'm using escaped characters, which don't appear in the posts!

Anyway, the example string is "Denial & #38; Deprivation" (I have to introduce the space between the '&' and the '#', otherwise you only see '&').

When I use

StringRegExpReplace($s_input, "&#(\d{2,3});", "\1")
this nicely returns "Denial 38 Deprivation", but the problem I'm having is the syntax on how to convert the 38 to Chr(38) to return '&'.

I have tried different formats i.e. ..., Chr("\1") or ..., Eval(Chr("\1")) but nothing seems to work.

What am I doing wrong???

Regards

DG

Link to comment
Share on other sites

$string = "Denial & # 38; Deprivation"

$res=StringRegExpReplace($string,"\W", " ",0)
$res=StringReplace($res,"38","&")

ConsoleWrite($res&@lf)
;lots of spaces up till this point, ************

$ss=StringSplit($res," ")
$nn=''
for $x=1 to $ss[0]
    if $ss[$x] <> "" Then
   $nn&= $ss[$x]&" "
EndIf
Next
$str=StringTrimRight($nn,1)
ConsoleWrite(">>>"&$str&"<<<"&@lf)
MsgBox(0,"",$str,0)

Edited by Aceguy
Link to comment
Share on other sites

I've found a very similar thread describing exactly the same problem, and you can't pass functions to the StringRegExpReplace function itself!

Similar Thread

So, I need to go back to the drawing board.

Many thanks for your help on this.

DG

Obviously this would need some modifying, but here you go ;)

$String = "Escaped & #38; Chars"
$regexp = StringRegExp($String, "&(.*);", 1)
$regexp = Chr(StringTrimLeft($regexp[0], 1))
$regexpRep = StringRegExpReplace($String, "&#(\d{2,3});", $regexp)
MsgBox(0, "", $regexpRep)

I've forgotten how all the escaped characters are formatted, so you would have to fix the regexp/regexrep commands to make it work with all the escaped character sets. But this is the basic idea of how to find one and replace it easily=)

Edited by Szhlopp
Link to comment
Share on other sites

Well, I've now moved away from the StringRegExpReplace route, and come up with this, which does the trick:

$x = "Witness: Truth & Lies"
$y = _FormatTitle($x)
MsgBox(0, "Oh Joy! It Does Work!", $y)

Func _FormatTitle($s_input)
    Local $s_new_input = $s_input, $s_output

    For $i = 32 To 255
        If StringInStr($s_new_input, "&#" & $i & ";") Then
            $s_output = StringReplace($s_new_input, "&#" & $i & ";", Chr($i))
            $s_new_input = $s_output
        EndIf
    Next
    Return $s_output
EndFunc

It's just a pity that StringRegExpReplace doesn't accept functions - this would have been by far the more elegant route to take.

Regards

DG

Edited by DobraGolonka
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...