Jump to content
Sign in to follow this  

Parsing .... again

Recommended Posts


OK, I have looked and looked on this forum about parsing text. My poor old feeble brain just not get the StringRegExp stuff.

Here goes on my thought.

I want to make a source code cross reference tool (for the Autoit mainly, but want to be able to use other languages). In that I need to read the line of code, parse out everything but commas, arithmetic signs, parens, etc. that is not a word, keyword, variable. Having said that, I am having trouble with the syntax of the StringRegExp. Any further help would be greatly appreciated.

P.S. I would LIKE to make a module to call into to do this per line (can store the results in an array if necessary).

Just can't get it in my head how to use the call.


Share this post

Link to post
Share on other sites

I have attached a snippet of what I am attempting to do. You will notice that (1) it skips any comment line (whether at the beginning or later in line of code) and (2) it does not show the numbers added.

Hope this will help you understand what I would LIKE to accomplish.


Share this post

Link to post
Share on other sites

Well, I dont see a snippet over here, but - I'd just read a string, break the string into an array at CRLF (char 13 and 10) then make a second array of equal length.

Then, itterate through Array1 index by index, and then take that string, run the length, and rebuild it character by character

For $x = 1 to Len($strIN)

if Mid($strIN,$x,1) = [A-A,a-a,0-9] then

$strOut = $strOut + Mid($strIN, $x, 1)

end if


Array2[y] = $strOut

that's kinda pseudo code for ya. Or, you can use Replace functions to remove non-necessary characters. Regex for stripping strings of characters isn't such a good idea in my experience. Too complex.

Share this post

Link to post
Share on other sites

OK, since I screwed up with my last post, no attachment, I will put it here

***** Snippet to process ******


1 #include <Array.au3>

2 $i = 1

3 While 1


5 $line = FileReadLine($hFileHandle1)

6 If @error = -1 Then ExitLoop

7 $mLineRead[$i] =$line

8 ;_ArrayDisplay($mLineRead)

9 ;MsgBox(0, "Line read:", $i & ": " & $line)

10 $mLineRead[0] = $i

11 $i= $i +1

12 redim $mLineRead[$i + 1]

13 Wend


14 ; now that file has been read in, we need to parse out the stuff!


16 _ArrayDisplay($mLineRead)


***** Results ******************


$i - lines 2,7,11,12

$line - lines 5,7

$hFileHandle1 - lines 5

$mLineRead - lines 7,8,12,16

While - lines 3

If - lines 6

FileReadLine - lines 5

redim - lines 12

_ArrayDisplay - lines 16

ExitLoop - lines 6

@error - lines 6

#include - lines 1

<Array.au3> - lines 1


***** END OF SNIPPET ****************

This also discounts the formatting and any blank lines/spaces/tabs.

Thanks again


Share this post

Link to post
Share on other sites

You may be interested to know that Tidy.exe when run with the /gd option produces some nice documentation including a xref report at the botton.

This is the output for the code snippet you posted.

===  Tidy report for :C:\AutoIt3Data\Scripts\test.au3

00001    #Region ;**** Directives created by AutoIt3Wrapper_GUI ****
00002    #Tidy_Parameters=/gd
00003    #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI ****
00004    #include <Array.au3>
00005    $i = 1
00006  +-While 1
00007  |    
00008  |    $line = FileReadLine($hFileHandle1)
00009  v----If @error = -1 Then ExitLoop
00010  |    $mLineRead[$i] = $line
00011  |    ;_ArrayDisplay($mLineRead)
00012  |    ;MsgBox(0, "Line read:", $i & ": " & $line)
00013  |    $mLineRead[0] = $i
00014  |    $i = $i + 1
00015  |    ReDim $mLineRead[$i + 1]
00016  +-WEnd
00018    ; now that file has been read in, we need to parse out the stuff!
00020    _ArrayDisplay($mLineRead)

=== xref reports =====

== User functions =================================================================================================
Function name             Row     Referenced at Row(s)
========================= ====== ==================================================================================

#### indicates that this specific variable only occurs one time in the script.
---- indicates that this specific variable isn't declared with Dim/Local/Global/Const.

== Variables ======================================================================================================
Variable name             Dim   Used in Row(s)
========================= ===== ===================================================================================
$hFileHandle1             ----- 00008
$i                        ----- 00005 00010 00013 00014 00015
$line                     ----- 00008 00010
$mLineRead                ----- 00010 00013 00015 00020
@error                    ----- 00009

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post

Link to post
Share on other sites

No, I did not know about Tidy.

So, that brings some questions

- how do you learn about Tidy?

- can it also display the functions?


However, I still am not understanding StringRegExp. Maybe I could get someone else to explain it better. I did find a "tutorial" on the forum, but it still left me puzzled.

Share this post

Link to post
Share on other sites

Language tokenizer is not a trivial task. You don't need extraordinaire complex regular expressions as you need the element and semantics of the language to fit correctly. For example, rvalues sentences are never on the left of the assignment operator. Another example is that parentheses are expressed right to left and inside outside. There might be many similarities to regular expression's arsenal of spices but for my limited knowledge about tokenizers, RegExp is playing a little bit..., and I might be wrong.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  


Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.