Sign in to follow this  
Followers 0
drtrann

Russian string causing issues

13 posts in this topic

#1 ·  Posted (edited)

hey guys,

so I have a rather large and complicated parsing script that has been working flawlessly for the last few months. recently I had a few clients look for Russian support to the parse, well upon adding it, it has caused multiple issues to happen across my script, not just in the function its contained in.

for $i = 1 to $aLog[0]
if $iniRUclient = 1 Then

$aRu_Split = StringSplit ($aLog[$i], "] ")

If $aRu_Split[5] = "в" Then
; post parse procedure
ExitLoop
EndIf
else
$aFirst_Split = StringSplit ($aLog[$i], ":")
; post parse procedure
endif

now my question is less to do with how to fix it and more to do with why the hell this is happening? its been increasingly erratic as of late, seems like every few times I restart the script it works properly, then comes up with a new way of not functioning properly.

edit: note that the B in the $aRu_SPlit is not an english B but a similar Russian character.

Edited by drtrann

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

What is causing the problem ?

What are you expecting ?

I cant figure out with your code !!

Please post a working script, if it is long enough then a sample reproducer should do it

Regards :)

Edited by PhoenixXL

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

What is causing the problem ?

What are you expecting ?

I cant figure out with your code !!

Please post a working script, if it is long enough then a sample reproducer should do it

Regards :)

hey, sorry that I'm being a little restrictive, I'll try and explain.

there are several things that don't seem to be functioning the way they are suppose to in the script.

firstly i use an INI file to store saved information. this is loaded in an entirely different function that reads like this

func ReadSettings()
$emailaddress = IniRead(@ScriptDir & "\launcher.ini","Version","username","email")
;several different other settings
EndFunc

i also have several things that happen in the ;post parse procedure in the above script.

now everything in the script worked perfectly without the

if $iniRUclient = 1 Then

$aRu_Split = StringSplit ($aLog[$i], "] ")

If $aRu_Split[5] = "в" Then
; post parse procedure
ExitLoop
EndIf

but when added causes tons of issues such as only half of the ReadSettings() function to complete (even though they are entirely unrelated). when the If $aRu_Split[5] returns true it will execute part of the post parse procedure, but not all of it, as if its skipping every few lines of code for no decernable reason

example of post parse procedure

GUICtrlSetData($Edit1, "[" & @HOUR & ":" & @MIN & ":" & @SEC & "] " & "string found, targeting new log" &@CRLF, 1) ;skips
FileClose($logdir) ;it does this
sleep(100) ;it does this
$timerstart = TimerInit()
$found = 1 ; does
$timerfail = TimerDiff($timerstart) ; skips
$wincount = $wincount +1 ;it does this
GUICtrlSetData($num, $wincount) ; skips
failcheck() ; skips
ExitLoop ;skips

again sorry I'm being stingy with the code, but I'm more interested in why the simple addition of a stringsplit could be causing everything to go insane.

Edited by drtrann

Share this post


Link to post
Share on other sites

Maybe something like this would help

If StringInStr( $aRu_Split[5], ChrW(1074)) Then
; post parse procedure
ExitLoop
EndIf


My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites

Which encoding do the users use?

Can you post a complete and working failing example?


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Hello!

I vaguely remember one of the MVP's or Dev's mentioning the way AutoIt see's strings. See this for an example:

I believe it translates them all into ascii code and then moves the data around. You may need to convert the letters into ASCII and then load them into variables. See this site for Cyrillic ASCII:

http://www.ascii-codes.com/cp855.html

Edited by Colyn1337

Share this post


Link to post
Share on other sites

Not at all. AutoIt uses a subset of UTF-16 known as UCS2 as string encoding. I.e. it allows users to use almost all characters in most real-world human language. Cyrillic and its extensions are obviously included.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

hey guys,

so apparently at some point the program accidentally activated scite instead of another document and began writing a few lines of gibberish in there that caused the issues with the code. unfortunately it still does not recognize the russian characters in the IF $aRU_Split[5] statement, i can do display array and clearly see the string is there and the statement should return positive but it does not, instead it causes the program to break logical procedure.

the structure is as follow:

While 1
FileOpen($logdir)
FileWriteLine($logdir, "filler that stops stringsplit from crashing if the log doesnt contain what its looking for")
FileClose($logdir)
;unrelated code
LogCheck()
if $Error = 3 then ;string wasnt there so restart with a new log
;lots of unrelated code
endif
wend

LogCheck()

$Error = 4
$found = 0
FileOpen($logdir)
_fileReadToArray($logdir, $aLog)
for $i = 1 to $aLog[0]
$aLog_WS = StringStripWS($aLog[$i],8)
$aRu_Split = StringSplit($aLog_WS, "]")
If StringInStr($aRu_Split[2], "в") Then ; $aRu_split = didnt work
FileClose($logdir)
Sleep(100)
$found = 1
$wincount = wincount +1
ExitLoop
EndIf
Next
If $found =0 Then
$Error = 3
EndIf
EndFunc

now what should happen is that it goes through the while loop and pulls up a new log, adds a filler line to it because it crashes if a delimiter isnt in the log, then does another task that populates the log. from there it goes to check if the log contains a specific statement, in this case it begins with the russian character "в". now if it finds the character in the log, it should accept that it found it and then move on to a new log. if it doesnt find the character, it closes the log and moves onto a new log.

now for some weird reason if it does find the character, it just sits there and does nothing. it just delivers the message that its checking the log and just stops.

out of this 2 questions arise,

1. why would the program just stop at the when the If statement returns true without crashing/failing

2. is there a way to make it so stringsplit doesn't crash the program if the delimiter is not in the file?

thanks

Edited by drtrann

Share this post


Link to post
Share on other sites

I believe в is not visible in the ScIte and is hence not passed to the program

use ChrW(1074) instead

1 person likes this

My code:

PredictText: Predict Text of an Edit Control Like Scite. Remote Gmail: Execute your Scripts through Gmail. StringRegExp:Share and learn RegExp.

Run As System: A command line wrapper around PSEXEC.exe to execute your apps scripts as System (LSA). Database: An easier approach for _SQ_LITE beginners.

MathsEx: A UDF for Fractions and LCM, GCF/HCF. FloatingText: An UDF for make your text floating. Clipboard Extendor: A clipboard monitoring tool. 

Custom ScrollBar: Scroll Bar made with GDI+, user can use bitmaps instead. RestrictEdit_SRE: Restrict text in an Edit Control through a Regular Expression.

Share this post


Link to post
Share on other sites

I believe в is not visible in the ScIte and is hence not passed to the program

use ChrW(1074) instead

you sir are a god and now my head hurts from slamming on the desk for 12 hours of attempts wasted...

Share this post


Link to post
Share on other sites

Instead of using the painful ChrW() function everytime you need some character not mapped identically in your Windows 125* charset (ill-called ANSI), international users should really switch their source to UTF-8 encoding and use plain strings like:

$sMyLocation = "УФПС НЕНЕЦКОГО АВТОНОМНОГО ОКРУГА"

$sMyCurrency = "The currency of Russia is the 'Russian Ruble', money is called 'Рубль' and its monetary symbol is 'руб'".

$sHerLocation = "上海市(シャンハイし、中国語:上海市、英語:Shanghai)は、中華人民共和国直轄市である。"

To do so using Scite, hit menu File >> Encoding >> UTF-8 with BOM

Then make any change to the text (i.e. add space then remove it) and save it again.

Your file now uses UTF-8 encoding and you can insert any plane0 Unicode codepoint in your strings.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Instead of using the painful ChrW() function everytime you need some character not mapped identically in your Windows 125* charset (ill-called ANSI), international users should really switch their source to UTF-8 encoding and use plain strings like:

$sMyLocation = "УФПС НЕНЕЦКОГО АВТОНОМНОГО ОКРУГА"

$sMyCurrency = "The currency of Russia is the 'Russian Ruble', money is called 'Рубль' and its monetary symbol is 'руб'".

$sHerLocation = "上海市(シャンハイし、中国語:上海市、英語:Shanghai)は、中華人民共和国直轄市である。"

To do so using Scite, hit menu File >> Encoding >> UTF-8 with BOM

Then make any change to the text (i.e. add space then remove it) and save it again.

Your file now uses UTF-8 encoding and you can insert any plane0 Unicode codepoint in your strings.

unfotunately this doesnt seem to work with stringinstring or $string = "". it would just freeze the script once it found it to be true. sounds like a bug with the IF statement. to get it to work i had to use the ChrW()

Share this post


Link to post
Share on other sites

Please post the code that you used that shows the problem.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0