david1337

StringReplace special characters in htm file

7 posts in this topic

Hey guys

Can anyone help me explain this? :)

$szFile = "test.htm"

$szText = FileRead($szFile)


$szText = StringReplace($szText, "hello", "ö")

FileDelete($szFile)
FileWrite($szFile,$szText)

If the file "test.htm" has it's text changed into something containing non US characters, in this example "ö", the output is " ö " when shown in a browser.
If i manually change the text in the "test.htm" file to "ö" - the output in the browser is "ö" !
In both cases, if the htm file is opened in notepad, the content is just "ö" - but the one changed from the script, still opens as " ö " in a browser. How weird is this?

I am aware that I can replace the text to " ö" , which is the HTML code for "ö" - then the output is correct in the browser, but this is just dumb when there are a lot of characters to be changed :)


Does anyone know why this happens, and how to solve it in a more simple way?

 

Share this post


Link to post
Share on other sites



You're seeing UTF8 encoding of characters. Change the html header to indicate UTF8.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
27 minutes ago, jchd said:

You're seeing UTF8 encoding of characters. Change the html header to indicate UTF8.

Are you sure it's the header. I think it's the filewrite using the wrong mode.

@david1337: so test first:

$szFile = "test.htm"

$szText = FileRead($szFile)


$szText = StringReplace($szText, "hello", "ö")
$hFile=FileOpen($hFile,$FO_OVERWRITE + $FO_UTF8)
FileWrite($hFile,$szText)
FileClose($hFile)

 

Share this post


Link to post
Share on other sites

Either you leave the header as ISO and use FileOpen with the ANSI mode to write the file,
or switch to full Unicode and switch the header to UTF8.

The later solution is universal, not the former.

1 person likes this

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Hey guys, thanks for your answer.

2 hours ago, AutoBert said:

Are you sure it's the header. I think it's the filewrite using the wrong mode.

@david1337: so test first:

$szFile = "test.htm"

$szText = FileRead($szFile)


$szText = StringReplace($szText, "hello", "ö")
$hFile=FileOpen($hFile,$FO_OVERWRITE + $FO_UTF8)
FileWrite($hFile,$szText)
FileClose($hFile)

 

You code gives an error: (Am I missing an include for or something for the code to understand what "$FO_UTF8" is?)
==> Variable used without being declared.:
$hFile=FileOpen($hFile,$FO_OVERWRITE + $FO_UTF8)
$hFile=FileOpen(^ ERROR

 

2 hours ago, jchd said:

Either you leave the header as ISO and use FileOpen with the ANSI mode to write the file,
or switch to full Unicode and switch the header to UTF8.

The later solution is universal, not the former.

You are correct. Changing the HTML header to UTF-8 fixed the problem :)
But what if I do the same thing with a txt file, and open that in a web browser? Then I have the same problem, and I can't add a header to a txt file.
 

Edited by david1337

Share this post


Link to post
Share on other sites
4 minutes ago, david1337 said:

You code gives an error: (Am I missing an include for or something for the code to understand what "$FO_UTF8" is?)

Yes a include is missing, but there's a 2. error (typo):

#include <FileConstants.au3>

$szFile = "test.htm"

$szText = FileRead($szFile)


$szText = StringReplace($szText, "hello", "ö")
$hFile=FileOpen($szFile,$FO_OVERWRITE + $FO_UTF8)
FileWrite($hFile,$szText)
FileClose($hFile)

and maybe you need other mode:

$hFile=FileOpen($szFile,$FO_OVERWRITE + $FO_ANSI)

 

1 person likes this

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

56 minutes ago, AutoBert said:

 

#include <FileConstants.au3>

$szFile = "test.htm"

$szText = FileRead($szFile)


$szText = StringReplace($szText, "hello", "ö")
$hFile=FileOpen($szFile,$FO_OVERWRITE + $FO_UTF8)
FileWrite($hFile,$szText)
FileClose($hFile)

 

AutoBert, this was exactly what I was looking for, and it worked perfectly! Thanks a lot :)

Edited by david1337

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • coffeeturtle
      StringReplace differentiating numeric values
      By coffeeturtle
      Hello. I need to perform a specific string replace, but not sure how to go about it.
      The scenario is this: I have a large block of text. Within the text colons appear ":",  Sometimes the colons are used in a sentence appearing after a word. Other times they appear in between numbers like a ratio or a sport score (e.g. "6:8").
      I want to replace the colons appearing between numeric values like 6:8 with the word "to", but not the ones appearing at the end of a sentence.
      Is there a way that I can have StringReplace (or any other method) differentiate when to replace the colon based on it appearing between numbers?
      I did try searching for a similar scenario.
      Thank you for any help. 
    • cookiemonster
      StringReplace from string with line break
      By cookiemonster
      Im trying to edit a file, I want to find a string which has a line break in it, and replace it with a string that has multiple line breaks in it.
      editfile.txt looks like this:
      dog cat mouse chicken my au3 script looks like this but is not working, i suspect because of how I am trying to do the line breaks?
      Func EditFile($CurrentFile) $szFile = "$CurrentFile" $szText = FileRead($szFile,FileGetSize($szFile)) $szText = StringReplace($szText, "Cat" & @CRLF & "Dog", "Hippo" & @CRLF "Lion" & @CRLF & "Tiger") FileDelete($szFile) FileWrite($szFile,$szText) EndFunc ;--EditFile-- But once ran the file should look like this:
      Hippo Lion Tiger cat mouse chicken I cannot replace by line number as the animals are not in the same line in each copy of the file i want to run against.
       
      Can anyone help?
    • jsmcpn
      Having trouble with $CmdLine when parsing an arg with spaces and special chars
      By jsmcpn
      Hello all!  I'm trying to make a little script that sends a message over TCP.  An example payload would be:
      MY_PAYLOAD#<ATTRS><ATTR><NAME>MAILSERVER</NAME><OPERATION>set_value(ServiceStatus)</OPERATION><VALUE><![CDATA["Not Running"]]></VALUE><TIMESTAMP>1349297031</TIMESTAMP></ATTR></ATTRS> Note how the payload contains special characters, and how the CDATA contains a SPACE (i.e.  "Not Running")
      My compiled AU3 script expects exactly two command line arguments:
      1) destination server/port expressed as 127.0.0.1:80
      2) the payload I want to send to my server application (which may have multiple ATTRs, this example only has one ATTR)
      MyApp.exe 127.0.0.1:80 "MY_PAYLOAD#<ATTRS><ATTR><NAME>MAILSERVER</NAME><OPERATION>set_value(ServiceStatus)</OPERATION><VALUE><![CDATA["Not Running"]]></VALUE><TIMESTAMP>1349297031</TIMESTAMP></ATTR></ATTRS>"  
      My problems (plural) are with parsing of the second argument:
      1)  If I don't wrap second argument in double-quotes, then Windows bails out before even launching EXE with "< was unexpected at this time"
      2)  If I DO wrap second argument in double-quotes, then two problems can arise when my EXE parses the arguments:
                   aa) The double-quotes are stripped from the CDATA value. ["Running"] is stored in the array as [Running]
                   bb) If CDATA value DOES have a space, such as ["Not Running"], then I get an extra unwanted $CmdLine[3] because the SPACE between Not and Running is treated as a separator.  My second argument is thus split into two separate arguments, [2] and [3] instead of being treated as one arg stored in [2]
                              cc) Additionally, my script exits because it has an "IF $CmdLine[0] <>2 THEN Exit (1)" to validate the command line args.
      I've tried launching the EXE with the second argument wrapped and escaped in a multitude of ways.  Double double-quotes, triple double-quotes, single-quotes on the outside with double-quotes inside (i.e.  '"ETC"'), starting the second arg with a caret (i.e. ^"ETC") and one of three things happens:
      1) "< was unexpected at this time"
      2) my CDATA value is stripped of its double-quotes
      3) my payload argument gets split at the space in the CDATA value and the script bails out due to the unexpected 3rd argument.
       
      Any ideas?  How can I preserve the second argument literally, including all special characters, quotes and spaces?
      I tried adding some string manipulation code to grab the 2nd argument from $CmdLineRaw and it works (preserves the "Not Running" with quotes), but my number of arguments is still 3 instead of 2 if there is a space in the CDATA value.
       
      Any assistance would be greatly appreciated!
    • ken82m
      StringReplace Multiple Search Strings
      By ken82m
      Nothing amazing but I use it all the time, I'm surprised something similar hasn't been added to the standard StringReplace.   I've never been any good at regular expressions, I'm sure if I was the whole example below could be done in one line
      But for the simple minded like me here you go    Enjoy $BIOS = _StringMultiReplace(CleanWMIC("bios", "biosversion"), "(|)|{|}", "") Func _StringMultiReplace($zString, $zSearchString, $zReplaceString, $zDelimeter = "|") If $zString = "" OR $zSearchString = "" OR $zDelimeter = "" Then SetError(1) Return $zString EndIf $zArray = StringSplit($zSearchString, $zDelimeter) For $i = 1 to $zArray[0] $zString = StringReplace($zString, $zArray[$i], $zReplaceString) Next Return $zString EndFunc  
    • DCCD
      replace multiple strings in 100mb file
      By DCCD
      Hi, i wrote a script that can replace multiple strings in a xml file works fine but so slow!
      I've used StringReplace ,_ReplaceStringInFile, StringRegExpReplace, all the same very slow,.
      The number of replacements in the file about 8000
      Any help would be greatly appreciated
      #include <File.au3> $path = @ScriptDir & '\xmlfo.xml' $OXML = FileOpen($path, 256) $XML = FileRead($OXML) $term = 'post' $nofr = 1 Local $aArray = StringRegExp($XML, '(?s)<entry[^>]*>.*?</entry>', 3) FileClose($OXML) $XL = $XML If Not @error Then For $i = 0 To UBound($aArray) - 1 ;get data start ;ConsoleWrite ( $aArray[0] &' '&$i& @CRLF) $date = StringRegExp($aArray[$i], '(?i)<published>(.*?)</published>', 3) If @error Then $date = StringRegExp("date err", "(.{33,}?(?:\s)|.+)", 3) ElseIf Not @error Then ;ConsoleWrite($date[0] & ' ' & $i & @CRLF) EndIf $kind = StringRegExp($aArray[$i], '(?i)<category>(.*?)</category>', 3) If @error Then $kind = StringRegExp("kind err", "(.{33,}?(?:\s)|.+)", 3) ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) ElseIf Not @error Then ;ConsoleWrite ( $kind[0] &' '&$i& @CRLF) EndIf If $kind[0] = $term And Data(getdate($date[0], 'year'), getdate($date[0], 'month')) = True Then _ReplaceStringInFile($path, $aArray[$i], '') If Not @error Then ;MsgBox(16,'',$XL) ConsoleWrite($nofr & ' ' & $i & @CRLF) $nofr = $nofr + 1 EndIf ;FileDelete(@ScriptDir & '\XML_output.xml') ;FileWrite (@ScriptDir & '\XML_output.xml', StringToBinary ( StringReplace($temp, $aArray[$i], "") , 4) ) Else ConsoleWrite ('err0x0'& @CRLF) EndIf Next EndIf