Jump to content
Sign in to follow this  
goldenix

[Solved] Can autoit process &#12354 ; ?

Recommended Posts

goldenix

I have this DL manager that downloaded large amount of files.

Files look like this: See below

I had to add spaces or browser will convert my text into Japanese.

I cant manually rename cuz it will take forever.

How to make Browser convert those strings into Japanese strings so I can rename my files? what options do I have?

あ ;る ;ま ;じ ;ろ ;う ;別 ;ス ;キ ;ャ ;ン ;.jpg
Edited by goldenix

My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]

Share this post


Link to post
Share on other sites
goldenix

well if it converts it to japa nese if you dont add spaces then the easiest way to convert it to japanese would be not to add spaces.

Browser does this, but the source remains the same. the queston is, how to get what I see? Not what is in the page source.


My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]

Share this post


Link to post
Share on other sites
doudou

Browser does this, but the source remains the same. the queston is, how to get what I see? Not what is in the page source.

Try:

$img = $htmlDoc.images("someID")
$fileName = $img.getAttribute("src")
MsgBox(0, "Hi Nippon", "File name is: " & $fileName)

UDFS & Apps:


DDEML.au3 - DDE Client + Server[*]
Localization.au3- localize your scripts[*]
TLI.au3 - type information on COM objects (TLBINF emulation)[*]
TLBAutoEnum.au3 - auto-import of COM constants (enums)[*]
AU3Automation - export AU3 scripts via COM interfaces
TypeLibInspector

- OleView was yesterday

Coder's last words before final release: WE APOLOGIZE FOR INCONVENIENCE 

Share this post


Link to post
Share on other sites
goldenix

Try:

$img = $htmlDoc.images("someID")
$fileName = $img.getAttribute("src")
MsgBox(0, "Hi Nippon", "File name is: " & $fileName)

This is only half of the code, I dont understand how to use it. what is someID ? what is $htmlDoc ? must it be filename?


My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]

Share this post


Link to post
Share on other sites
doudou

This is only half of the code, I dont understand how to use it. what is someID ? what is $htmlDoc ? must it be filename?

You haven't posted any code at all yet, in order to build a common base of understanding it would be helpful to see some of it. I just assumed you use MSHTML.HTMLDocument this is what $htmlDoc is. If I am wrong - show what you really do.

UDFS & Apps:


DDEML.au3 - DDE Client + Server[*]
Localization.au3- localize your scripts[*]
TLI.au3 - type information on COM objects (TLBINF emulation)[*]
TLBAutoEnum.au3 - auto-import of COM constants (enums)[*]
AU3Automation - export AU3 scripts via COM interfaces
TypeLibInspector

- OleView was yesterday

Coder's last words before final release: WE APOLOGIZE FOR INCONVENIENCE 

Share this post


Link to post
Share on other sites
goldenix

I took your idea & made this atm, but I dont like it. I was thinking maybe I can do it quietly with your code. without using IE.

MSHTML.HTMLDocument _ I dont have a lithest clue how to use it. Never seen this before. And what is some ID ?

#include <IE.au3>
$oIE = _IECreate ("file:///C:/Documents%20and%20Settings/biteme/Desktop/New%20AutoIt%20v3%20Script.html")
$oImgs = _IEImgGetCollection ($oIE)
$iNumImg = @extended
For $oImg In $oImgs

    ConsoleWrite($oImg.src & @CRLF)

        $split      = StringSplit($oImg.src,'/',1)  ; get file name
        $filename   = $split[$split[0]]

        ConsoleWrite($filename & @CRLF)

    FileMove('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.rar', _
    $filename)

Next
_IEQuit($oIE)
Exit

Edit: ok its just I have never seen this before. I think I figured it out. But I still need to create IE window so its the same as I made. guess ill just have to list all files in the 1 html file & loop it. If im wrong, feel free & correct me.

Thanx.

$ObjIE=ObjCreate("InternetExplorer.Application")
With $ObjIE
.Visible = True
.Navigate("C:\Documents and Settings\biteme\Desktop\New AutoIt v3 Script.html")

while .ReadyState <> 4
Sleep(50)
wend
EndWith

$document = $objIE.document
$img = $document.getElementsByTagName("img").item(0)
$fileName = $img.getAttribute("src")

    $split      = StringSplit($fileName,'/',1)  ; get file name
    $filename   = $split[$split[0]]

ConsoleWrite($filename & @CRLF)

    FileMove([&#12354.rar', _
    $filename)
Edited by goldenix

My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]

Share this post


Link to post
Share on other sites
jchd

The issue is simple: the sequence

is the html way to denote Unicode codepoint 12345 (decimal). It's about the same as "&amp;" coding for the ampersand character by itself "&".

So when you get such sequence, regexp it into ChrW(12345) then execute and it should work.

Try this:

Local $s = 'Unicode html string with some fractional ((2n+1)/8) html codepoints ⅛=1/8 ⅜=3/8 ⅝=5/8 ⅞=7/8'
$s = Execute('"' & StringRegexpReplace($s, "&#(\d+);", '"&chrw($1)&"') & '"')
MsgBox(0, "Html Unicode test", $s & @LF)
Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
KaFu

So when you get such sequence, regexp it into ChrW(12345) then execute and it should work.

Really nice solution :idea:. I remembered that you've written something similar some weeks ago, but I couldn't get it to work in this case :)...

Share this post


Link to post
Share on other sites
jchd

Which version/target string did you try? I use this about 100000 time/day and Execute never went on strike. (OK, I admit I bribe it with plenty of $nnn with nnn > 0)


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
goldenix

So when you get such sequence, regexp it into ChrW(12345) then execute and it should work.

Oo I see, it works. If you dont mind, can you please explain:

Why we need execute?

And this is what I do not understand what are these for?

I can not grasp the logic hire:

(\d+)

"&chrw($1)&"

StringRegexpReplace($s, "&#(\d+);", '"&chrw($1)&"')
Edited by goldenix

My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]

Share this post


Link to post
Share on other sites
doudou

People, I don't get it, why bother with RegExp when MSHTML DOM's getAttribute() gives you all entities decoded to system codepage for free? RegExp not only consumes much more resources but is also proven to be unreliable in decoding entities, especially because you have to catch all possible forms and encodings: "&named;", "&#12345 ;" (16 bit), "&#12 ;&#12 ;" (8 bit) etc.

Edited by doudou

UDFS & Apps:


DDEML.au3 - DDE Client + Server[*]
Localization.au3- localize your scripts[*]
TLI.au3 - type information on COM objects (TLBINF emulation)[*]
TLBAutoEnum.au3 - auto-import of COM constants (enums)[*]
AU3Automation - export AU3 scripts via COM interfaces
TypeLibInspector

- OleView was yesterday

Coder's last words before final release: WE APOLOGIZE FOR INCONVENIENCE 

Share this post


Link to post
Share on other sites
jchd

@Doudou,

The OP didn't reference any _IE functions at all. I also don't believe it consumes much resources and I hardly see how "&#(\d+);" could possibly match anything else than an html Unicode codepoint (that , or html grammar is flawed beyond repair, which should have been noticed before BTW).

Since you mention it, decoding html codepoints in the system codepage sounds weird to me. Do you mean it would decode characters in the 0x80..0xFF as per codepage, but any codepoint above as per Unicode? That would be wrong, very wrong.

I agree that regexp are not the only truth on earth, but when specific issues like these arise they reveal handy, nothing less.

@goldenix,

No black magic involved. Let's take it in small pieces. The sequences you mentionned are composed of: an ampersand followed by sharp followed by a decimal value followed by a semicolumn. The goal is to pick up the decimal value and feed it to ChrW(), like you would do in a simple AutoIt statement, like Local $char = ChrW(169) to produce a Copyright sign.

If we stick to this task, one way to do it simply is to use a regular expression: StringRegExpReplace($s, "&#(\d+);", '"&chrw($1)&"')

The pattern is &#(\d+); where

& matches an ampersand

# matches sharp

() the parenthesis is a capturing group, the first one, so it can be referred to later by $1

\d+ what do we capture? \d stands for decimal digit and + means one or more

; matches a semicolumn

In short, the pattern recognize the sequence we are after, and captures the decimal string within it under the "name" $1. Now let's look at the replace part.

StringRegExpReplace will ... replace matched parts of the string by the replace pattern, which is "&chrw($1)&"

We've seen that $1 contains the decimal value that we want and we want to feed it to ChrW(). That's what the replace pattern is doing! But it's just concatenating the required strings for doing so.

Now, we need to enclose the whole string returned by StringRegExpReplace in double quotes. Why? Because since we couldn't execute the ChrW(<value>) function and the parts we have demand opening and closing quotes to be valid AutoIt grammar:

Local $s = 'abc{def' ;; ChrW(123) = '{'

$s = StringRegExpReplace($s, "&#(\d+);", '" & ChrW($1) & "') ;; I put whitespace to make things clearer

will produce exactly abc" & ChrW(123) & "def so you see why we need enclosing quotes, to obtain "abc" & ChrW(123) & "def" which is now valid syntax. The final bit is to execute this very statement, as if you had typed it in a line of source, and assign the result to a variable (which can still be $s). $s = Execute("abc" & ChrW(123) & "def") produces $s = "abc{def" which is what we wanted.

To preserve compactness and once such construct don't surprise you anymore, you can chain them as I did in a single statement.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.