Sign in to follow this  
Followers 0
bartekd

php characters

6 posts in this topic

I am doing something with an API in another software, and I need to get the source of a html file and put it in a PHP file. I have everything working as intended, but I keep coming accross characters that PHP doesn't like, and it stops there. Is there some script I can use to catch all the characters that PHP doesn't like? This is part of how I am doing it now, and I keep adding this same line every time I come accross a character that makes the script stop. FYI $sHTML is the source code of the page.

$sHTML = StringReplace($sHTML, '"', "'")
$sHTML = StringReplace($sHTML, '“', "'")
$sHTML = StringReplace($sHTML, '’', "'")
$sHTML = StringReplace($sHTML, '”', "'")
$sHTML = StringReplace($sHTML, 'à', "a")
$sHTML = StringReplace($sHTML, 'â', "a")
$sHTML = StringReplace($sHTML, 'ä', "a")
$sHTML = StringReplace($sHTML, 'è', "e")
$sHTML = StringReplace($sHTML, 'ê', "e")
$sHTML = StringReplace($sHTML, 'é', "e")
$sHTML = StringReplace($sHTML, 'ë', "e")
$sHTML = StringReplace($sHTML, 'î', "i")

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Or...

StringRegExpReplace($sData, '[^²&"''\(-\)=°\+~\\#\{\[\|`\^@\]\}$\*\?,:!\.%/\w]', "")

Which replaces special letters/non latin languages.

Br, FireFox.

Edited by FireFox

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Doesn't seem to work for me. I still see letters like é etc... Do I need to format it differently before using that?

$IEFile = "c:test"

Local $oIE = _IECreate($IEFile & ".htm")

Local $sHTML = _IEDocReadHTML($oIE)

_IEQuit($oIE)

StringRegExpReplace($sHTML, '[^²&"''(-)=°+~#{[|`^@]}$*?,:!.%w]', "")

Edited by bartekd

Share this post


Link to post
Share on other sites

Nevermind, I forgot to put "$sHTML = "before your code.

That doesn't seem to work because it takes out the important characters of the source (like the slashes etc). Any other ideas?

Share this post


Link to post
Share on other sites

That doesn't seem to work because it takes out the important characters of the source (like the slashes etc). Any other ideas?

It's so hard to add a slash to the pattern.... edited my post.

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

OK Thanks Firefox

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0