Sign in to follow this  
Followers 0
Dubz

Manipulate Text & Optimize Script

16 posts in this topic

Hey again :think:

I'm very inexperienced at scripting. My goal is to have a script that logs into a particular website, grabs information from several pages then manipulate the text in such a way that I can send it into excel. After running the script I already put together, I will have the following in notepad:

<area shape="rect" coords="174,174,245,245" href="/en/s_en.php?s=zbsns2&p=map&sub=isle&pos1=15&pos2=14&pos3=13" title="Isle: ~30~ Covenant Empire
Position: 15:14:13
Ruler: shadownet
Alliance: [Ares]
Score: 1384
Colony: Yes"><area shape="rect" coords="10,92,81,163" href="/en/s_en.php?s=zbsns2&p=map&sub=isle&pos1=15&pos2=14&pos3=6" title="Isle: 018
Position: 15:14:6
Ruler: aapo_
Alliance: [HOD]
Score: 1045
Colony: Yes">
This is but a clippet. There will be about 100 or so lines in one txt file to be converted either at the end, or as I proceed if possible. I would like the following outcome from this clippet:

~30~ Covenant Empire,15,14,13,shadownet,[Ares],1384,yes
018,15,14,6,aapo_,[HOD],1045,yes

It is the names of Isle, Position, Ruler, Alliance, Score and Colony setup in such a way that I can export to excel. Another issue may be that an Isle has no ruler... In that case it would have to put 'unruled' or some other predetermined text.

Is there any information I should add?

Thnk you in advance for any help :(

Share this post


Link to post
Share on other sites



I'm sorry I was unclear. I already wrote a script that will login to the said site, it will go page to page copying the source info into notepad, then save the file as a txt.

The txt has several hundred lines similar to the 1st quote of code I included above. I would like to extract certain parts of that code into a new txt file in a format that excel can read.

The new script would see the word "Isle:" and grab the text between "Isle:" and "Position:". For example:

<area shape="rect" coords="174,174,245,245" href="/en/s_en.php?s=zbsns2&p=map&sub=isle&pos1=15&pos2=14&pos3=13" title="Isle: ~30~ Covenant Empire

Position: 15:14:13

Ruler: shadownet

Alliance: [Ares]

Score: 1384

Colony: Yes">

The colored text is the text that I want to keep with a comma between each part, everything else should be cropped.

Is this possible to do with Autoit, or would another coding language be better suited?

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

(assuming $var is equal to the page source)

$var = StringReplace($var, '<area shape="rect" coords="174,174,245,245" href="/en/s_en.php?s=zbsns2&p=map&sub=isle&pos1=15&pos2=14&pos3=13" title="Isle: ', "")

That's all one line, it should narrow it down a bit... (NOT THE BEST WAY)

add this after

$var = StringReplace($var, "Position:", "")
$var = StringReplace($var, "Ruler:", "")
$var = StringReplace($var, "Alliance:", "")

And so on & so forth...

then you'll end up with a string something like

"~30~ Covenant Empire 15:14:12 shadownet [Ares] 1384 Yes>"

Then you can split it with a StringSplit($var, " ")

Look in the helpfile for more info on String* functions.

--Hope this helps

~cdkid

Edited by cdkid

AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide!

Share this post


Link to post
Share on other sites

I'm reading theough them at the moment, trying to put something together that works. This is a better example of the source code I'm dealing with. I took much of it off as not to fill the page with useless code:

<html><head><link rel="stylesheet" type="text/css" href="http://80.237.203.111/g/style.css"><script type="text/javascript" src="lib.js"></script></head><body onload="start()"><h1>Map</h1><hr width="420" align="left">22.04.2006 02:41:28 | <a href="/en/s_en.php?s=o96fjh&p=settings">Settings</a> | <a href="/en/s_en.php?s=o96fjh&p=mail⊂=invite">Invite a Friend</a> | <a href="/en/s_en.php?s=o96fjh&a=logout">Logout</a> | <a href="/en/s_en.php?s=o96fjh&p=mail"><img src="http://80.237.203.111/g/m_off.gif" border="0"></a><br>You <b>Dubz</b>, are ruler of the isle D (5:89:7)<table width="420" border="1" cellspacing="0" cellpadding="3"><tr><td width="33%"><img src="http://80.237.203.111/g/gold.gif"> 315</td><td width="33%"><img src="http://80.237.203.111/g/stones.gif"> 8940</td><td><img src="http://80.237.203.111/g/wood.gif"> 2456</td></tr></table><br><table width="420" border="0" cellspacing="1" cellpadding="3"><form action="/en/s_en.php?s=o96fjh&p=map" method="post"><tr><td bgcolor="#F0F0F0"><b>Ocean: </b><input type="text" name="pos1" value="1" size="3"><b> Group of Isles: </b><input type="text" name="pos2" value="1" size="3"><b> Member: </b><input type="text" size="8" name="highlight" value=""> <input type="submit" value="Show"></td></tr></form></table><br><img src="/en/s_en.php?s=o96fjh&a=draw_map&pos1=1&pos2=1&highlight=" border="0" usemap="#map"><map name="map"><area shape="rect" coords="411,9,420,411" href="/en/s_en.php?s=o96fjh&p=map&pos1=1&pos2=2&highlight=" title="Ocean: 1

Group of Isles: 2"><area shape="rect" coords="9,411,411,420" href="/en/s_en.php?s=o96fjh&p=map&pos1=1&pos2=11&highlight=" title="Ocean: 1

Group of Isles: 11"><area shape="rect" coords="256,256,327,327" href="/en/s_en.php?s=o96fjh&p=map⊂=isle&pos1=1&pos2=1&pos3=19" title="Isle: x druid Wario World

Position: 1:1:19

Ruler: chief_druid

Alliance: [Ares]

Score: 1285

Colony: Yes"><area shape="rect" coords="338,338,409,409" href="/en/s_en.php?s=o96fjh&p=map⊂=isle&pos1=1&pos2=1&pos3=25" title="Isle: ||Empire||

Position: 1:1:25

Ruler: commando

Alliance: [Debello]

Score: 1384

Colony: Yes"><area shape="rect" coords="174,174,245,245" href="/en/s_en.php?s=o96fjh&p=map⊂=isle&pos1=1&pos2=1&pos3=13" title="Isle: Pandinus

Position: 1:1:13

Ruler: scorpius

Alliance: [R-]

Score: 2797

Colony: No"></map><hr width="420" align="left"><a href="/en/s_en.php?s=o96fjh&p=main">Overview</a> | <a href="/en/s_en.php?s=o96fjh&p=alliance">Alliance</a> | <a href="/en/s_en.php?s=o96fjh&p=map">Map</a> | <a href="/en/s_en.php?s=o96fjh&p=isles">Isles</a> | <a href="/en/s_en.php?s=o96fjh&p=market">Market</a> | <a href="/en/s_en.php?s=o96fjh&p=ranking">Rank List</a> | <a href="/en/s_en.php?s=o96fjh&p=calc">Calculator</a></body></html><br><br>0.012

This is still somewhat cut down, but the middle part of the script contains several lines of the information I need, ie; Isle, Ruler, etc. I put it in bold. If you notice the part in red, that changes upon each login. It appears various times throughout the text. The coords and time would also change regularly. I would need to have the exact text in order to replace/crop it using this method?

Share this post


Link to post
Share on other sites

Look at StringInStr then use StringMid something like

$start = StringinStr($var, "Position: ")
$end = StringInStr($var, "Ruler: ")
$position = StringMid($var, $start, $end)

etc.

Hope this helps

~cdkid


AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide!

Share this post


Link to post
Share on other sites

Hmm, I didn't try this yet (noob here), but I would try matching title=", read a few chars ahead, check if = Isle, then get the text between title=" and "><area <or> "></map.

Then do a string2array based on whitespaces, and discard array elements that equal = Isle: <or> Position: <or> Ruler: etc, while joining the previous elements.

example of pseudo code:

$excluded_words="Isle:","Position:"

$a[0]="Isle:"
$a[1]="my"
$a[2]="island"
$a[3]="Position:"
$a[4]="1:1:19"
 
[$i=$j=0]

[for each element $i in array $a do:]

if $a[$i] in $excluded_words then
     $j = $j + 1
else
     $b[$j]=$a[$i] & " "
endif

[end loop]

; $b[1] = "my island "
; $b[2] = "1:1:19 "

After that, use a loop to write the array elements to a text file, always including a comma between them.

That's just an idea, tho. I don't knwo how to really code it or even if it is the best way to do this with AutoIt.

(DISCLAIMER : the pseudo code presented may be completely incorrect and noobiish. No warranties or refunds are offered. :think: )

Share this post


Link to post
Share on other sites

Read in the helpfile about

StringSplit

StringMid

StringInStr

For...Next

FileWriteLine

those should be all the commands you need.

If you run into problems i'll see if I can help :think:

~cdkid


AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide!

Share this post


Link to post
Share on other sites

Assuming you just use FileRead() to read the HTML into the variable $str, this line of code will put each entry into an array, each entry taking up 8 elements of the array.

$ret = StringRegExp($str, 'Isle: (.*?)\r\nPosition: (.*?):(.*?):(.*?)\r\nRuler: (.*?)\r\nAlliance: (.*?)\r\nScore: (.*?)\r\nColony: (.*?)"', 3)

Here's an example that will show you the array:

#include <Array.au3>
$ret = StringRegExp($str, 'Isle: (.*?)\r\nPosition: (.*?):(.*?):(.*?)\r\nRuler: (.*?)\r\nAlliance: (.*?)\r\nScore: (.*?)\r\nColony: (.*?)"', 3)
_ArrayDisplay($ret, "")

You can take a look at StringRegExp guide in my signature for an explanation of this seemed garbledygook.


[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

That looks like just what I need :( Thanks Guys!

neogia, that guide is great. The code you posted gives me a parsing error though. I'm going to use it as a template for what it should look like approximately.

I'll post my results :think:

Edited by Dubz

Share this post


Link to post
Share on other sites

That looks like just what I need :( Thanks Guys!

neogia, that guide is great. The code you posted gives me a parsing error though. I'm going to use it as a template for what it should look like approximately.

I'll post my results :think:

Could you be more specific on what error it gives? And could you post an actual log file you want to parse?

[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

I think it's because I'm running it in Beta. StringRegExt doesn't seem to appear in the Beta help area :\ That or I'm doing something wrong.

I've attached the file I will be parsing along with the script I'm using to create that file. I haven't made the loop yet, but it shouldn't be difficult. The username/pass is of a new account for testing and as not to give my own pass. I added your script to the very end to test.

ScoreKeeperScriptv0.3.au3

Edited by Dubz

Share this post


Link to post
Share on other sites

Pump, plz. I can't seem to find StringRegExp() in any help file I've yet looked at. Here for example, the online documantation:

http://www.autoitscript.com/autoit3/docs/

I think StringRegExp() would be the best tool to use and I've since been unable to use it.

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

In beta

StringRegExp 
--------------------------------------------------------------------------------

Check if a string fits a given regular exp[b][/b]ression pattern.


StringRegExp ( "test", "pattern" [, flag ] )


 

Parameters

test The string to check 
pattern The regular exp[b][/b]ression to compare. 
flag [optional] A number to indicate how the function behaves. See below for details. The default is 0. 


Flag  Values 
0 Return true/false (1/0) as to whether the test matched the pattern. 
1 Return an array with the text that matched all the group patterns. Check @Extended to determine whether the pattern matched or not. 
2 Same as 0.  
3 Perform a global search, checking the entire string, returning an array of all results. Check @Extended to determine whether the pattern matched or not. 

 

Return Value

Check @Error to make sure the function executed properly.
@Error Meaning 
0 Executed properly. Check @Extended and/or the return value to determine if the pattern was found or not. 
1 Flag invalid. Flag must be one of the numbers above. Return value is "". 
2 Pattern invalid. Return value is the first location in the string that was invalid, as an integer. 


@Extended is true (1) or false (0) depending if the test string matched the pattern string. The return value is dependent on the Flag value. If the flag is set for true/false (0), then the return value will match @Extended. If the flag is set for array return (1 / 3) then the following table applies.


@Extended Return value 
0 (false) Match not found. Return value is "" (empty string). 
1 (true) Match found. Return value is an array of all group values. If there are no groups in the pattern, the function returns "" (empty string). 

 

Remarks

Regular exp[b][/b]ression notation is a compact way of specifying a pattern for strings that can be searched. Regular exp[b][/b]ressions are character strings in which plain text characters indicate what text should exist in the target string, and a some characters are given special meanings to indicate what variability is allowed in the target string. AutoIt regular exp[b][/b]ressions are normally case-sensitive. 

Regular exp[b][/b]ressions are constructed of one or more of the following simple regular exp[b][/b]ression specifiers. If the character is not in the following table, then it will match only itself.

Repeating characters (*, +, ?, {...} ) will try to match the largest set possible, which allows the following characters to match as well, unless followed immediately by a question mark; then it will find the smallest pattern that allows the following characters to match as well.

Nested groups are allowed, but keep in mind that all the groups, except non-capturing groups, assign to the returned array, with the outer groups assigning after the inner groups.

Matching Characters

[ ... ] Match any character in the set. e.g. [aeiou] matches any lower-case vowel. A contiguous set can be defined using a dash between the starting and ending characters. e.g. [a-z] matches any lower case character. To include a dash (-) in a set, use it as the first or last character of the set. To include a closing bracket in a set, use it as the first character of the set. e.g. [][] will match either [ or ]. Note that special characters do not retain their special meanings inside a set, with the exception of \b, \n, \r, \t, and \\. \^, \- and \] match the escaped character inside a set. 
[^ ... ] Match any character not in the set. e.g. [^0-9] matches any non-digit. To include a caret (^) in a set, put it after the beginning of the set or escape it (\^). 
[:class:] Match a character in the given class of characters. Valid classes are: alpha (any alphabetic character), alnum (any alphanumeric character), lower (any lower-case letter), upper (any upper-case letter), digit (any decimal digit 0-9), xdigit (any hexidecimal digit, 0-9, A-F, a-f), space (any whitespace character), blank (only a space or tab), print (any printable character), graph (any printable character except spaces), cntrl (any control character [ascii 127 or <32]) or punct (any punctuation character) 
[^:class:] Match any character not in the class.  
( ... ) Group. The elements in the group are treated in order and can be repeated together. e.g. (ab)+ will match "ab" or "abab", but not "aba". A group will also store the text matched for use in back-references and in the array returned by the function, depending on flag value. 
(?i) Case-insensitivity flag. This does not operate as a group. It tells the regular exp[b][/b]ression engine to do case-insensitive matching from that point on. 
(?-i) (default) Case-sensitivity flag. This does not operate as a group. It tells the regular exp[b][/b]ression engine to do case-sensitive matching from that point on. 
(?i ... ) Case-insensitive group. Behaves just like a normal group, but performs case-insensitive matches within the group. 
(?-i ... ) Case-sensitive group. Behaves just like a normal group, but performs case-sensitive matches within the group. Primarily for use after (-i) flag or inside a case-insensitive group. 
(?: ... ) Non-capturing group. Behaves just like a normal group, but does not record the matching characters in the array nor can the matched text be used for back-referencing. 
(?i: ... ) Case-insensitive non-capturing group. Behaves just like a non-capturing group, but performs case-insensitive matches within the group. 
(?-i: ... ) Case-sensitive non-capturing group. Behaves just like a non-capturing group, but performs case-sensitive matches within the group. 
. Match any single character 
| Or. The exp[b][/b]ression on one side or the other can be matched. 
\ Escape a special character (have it match the actual character) or introduce a special character type (see below) 
\\ Match an actual backslash (\) 
\1 - \9 Back-reference. Match the prior group number given exactly. For example, (\a)\1 would match a double letter. 
\a Match any alphabetic character (a-z, A-Z) 
\A Match any alphanumeric character (a-z, A-Z, 0-9) 
\b Match a backspace (chr(8)) 
\c? Match a control character, based on the next character. For example, \cM matches ctrl-M. 
\d Match any digit (0-9) 
\D Match any non-digit 
\e Match an escape character (chr(27)) 
\l (lower-L) Match any lower-case letter (a-z) 
\n Match a newline (@LF, chr(10)). 
\N Match either a newline or a carriage return, but not both in sequence. Try \N? for that. 
\p Match any punctuation character. 
\r Match a carriage return (@CR, chr(13)). 
\s Match any whitespace character: Chr(9) through Chr(13) which are Horizontal Tab, Line Feed, Vertical Tab, Form Feed, and Carriage Return, and the standard space ( Chr(32) ). 
\S Match any non-whitespace character 
\t Match a tab character. 
\u Match any capital letter (A-Z) 
\w Match any "word" character: a-z, A-Z or underscore (_) 
\W Match any non-word character 
\x Match any hexadecimal character. 
\0### Match the ascii character whose code is given. Can be up to 3 digits. 
\0x## Match the ascii character whose code is given in hexadecimal. Can be up to 2 digits. 
\# Position. Record the current character location in the test string into the returned content array. 
\< Match beginning of word. 
\> Match end of word. 

Repeating Characters

{x} Repeat the previous character, set or group exactly x times. 
{x,} Repeat the previous character, set or group at least x times. 
{,x} Repeat the previous character, set or group at most x times. 
{x, y} Repeat the previous character, set or group between x and y times, inclusive. 
* Repeat the previous character, set or group 0 or more times. Equivalent to {0,} 
+ Repeat the previous character, set or group 1 or more times. Equivalent to {1,} 
? The previous character, set or group may or may not appear. Equivalent to {0, 1} 
? (after a repeating character) Find the smallest match instead of the largest. 

"test" or "pattern" parameters cannot be a binaryString.


 

Related

StringInStr, StringRegExpReplace 
 

Example

Local $sPattern, $sTest, $vResult, $nFlag

$sPattern = InputBox("StringRegExp Sample", "What is the pattern to test?")
$sTest = InputBox("StringRegExp Sample", "What is the line to test?")
$vResult = StringRegExp($sTest, $sPattern)
Select
Case @Error = 2 
 ; Error.  The pattern was invalid.  $vResult = position in $sPattern where error occurred.
Case @Error = 0
   if @Extended  Then
; Success.  Pattern matched.  $vResult matches @Extended
   Else
; Failure.  Pattern not matched.  $vResult = ""
   EndIf
EndSelect

$sPattern = InputBox("StringRegExp Sample", "What is the pattern to test?")
$sTest = InputBox("StringRegExp Sample", "What is the line to test?")
$nFlag = InputBox("StringRegExp Sample", "What flag to use?  0 - true/false, 1 - single pattern array return, 3 - global pattern array return")
$vResult = StringRegExp($sTest, $sPattern, $nFlag)
Select
Case @Error = 1 
; Error.  Flag is bad.  $vResult = ""
Case @Error = 2 
 ; Error.  The pattern was invalid.  $vResult = position in $sPattern where error occurred.
Case @Error = 0
   if @Extended  Then
; Success.  Pattern matched.  $vResult has the text from the groups or true (1), depending on flag. 
   Else
; Failure.  Pattern not matched.  $vResult = "" or false (0), depending on flag.
   EndIf
EndSelect
Edited by thatsgreat2345

Share this post


Link to post
Share on other sites

I'm such a dope :\ Thanks man.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0