Sign in to follow this  
Followers 0
dickep

Parsing .... again

8 posts in this topic

OK, I have looked and looked on this forum about parsing text. My poor old feeble brain just not get the StringRegExp stuff.

Here goes on my thought.

I want to make a source code cross reference tool (for the Autoit mainly, but want to be able to use other languages). In that I need to read the line of code, parse out everything but commas, arithmetic signs, parens, etc. that is not a word, keyword, variable. Having said that, I am having trouble with the syntax of the StringRegExp. Any further help would be greatly appreciated.

P.S. I would LIKE to make a module to call into to do this per line (can store the results in an array if necessary).

Just can't get it in my head how to use the call.

E

Share this post


Link to post
Share on other sites



Share this post


Link to post
Share on other sites

I have attached a snippet of what I am attempting to do. You will notice that (1) it skips any comment line (whether at the beginning or later in line of code) and (2) it does not show the numbers added.

Hope this will help you understand what I would LIKE to accomplish.

Thanks

Share this post


Link to post
Share on other sites

Well, I dont see a snippet over here, but - I'd just read a string, break the string into an array at CRLF (char 13 and 10) then make a second array of equal length.

Then, itterate through Array1 index by index, and then take that string, run the length, and rebuild it character by character

For $x = 1 to Len($strIN)

if Mid($strIN,$x,1) = [A-A,a-a,0-9] then

$strOut = $strOut + Mid($strIN, $x, 1)

end if

next

Array2[y] = $strOut

that's kinda pseudo code for ya. Or, you can use Replace functions to remove non-necessary characters. Regex for stripping strings of characters isn't such a good idea in my experience. Too complex.

Share this post


Link to post
Share on other sites

OK, since I screwed up with my last post, no attachment, I will put it here

***** Snippet to process ******

*******************************

1 #include <Array.au3>

2 $i = 1

3 While 1

4

5 $line = FileReadLine($hFileHandle1)

6 If @error = -1 Then ExitLoop

7 $mLineRead[$i] =$line

8 ;_ArrayDisplay($mLineRead)

9 ;MsgBox(0, "Line read:", $i & ": " & $line)

10 $mLineRead[0] = $i

11 $i= $i +1

12 redim $mLineRead[$i + 1]

13 Wend

14

14 ; now that file has been read in, we need to parse out the stuff!

15

16 _ArrayDisplay($mLineRead)

*********************************

***** Results ******************

*********************************

$i - lines 2,7,11,12

$line - lines 5,7

$hFileHandle1 - lines 5

$mLineRead - lines 7,8,12,16

While - lines 3

If - lines 6

FileReadLine - lines 5

redim - lines 12

_ArrayDisplay - lines 16

ExitLoop - lines 6

@error - lines 6

#include - lines 1

<Array.au3> - lines 1

**************************************

***** END OF SNIPPET ****************

This also discounts the formatting and any blank lines/spaces/tabs.

Thanks again

E

Share this post


Link to post
Share on other sites

You may be interested to know that Tidy.exe when run with the /gd option produces some nice documentation including a xref report at the botton.

This is the output for the code snippet you posted.

========================================================================================================
===  Tidy report for :C:\AutoIt3Data\Scripts\test.au3
========================================================================================================

00001    #Region ;**** Directives created by AutoIt3Wrapper_GUI ****
00002    #Tidy_Parameters=/gd
00003    #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI ****
00004    #include <Array.au3>
00005    $i = 1
00006  +-While 1
00007  |    
00008  |    $line = FileReadLine($hFileHandle1)
00009  v----If @error = -1 Then ExitLoop
00010  |    $mLineRead[$i] = $line
00011  |    ;_ArrayDisplay($mLineRead)
00012  |    ;MsgBox(0, "Line read:", $i & ": " & $line)
00013  |    $mLineRead[0] = $i
00014  |    $i = $i + 1
00015  |    ReDim $mLineRead[$i + 1]
00016  +-WEnd
00017    
00018    ; now that file has been read in, we need to parse out the stuff!
00019    
00020    _ArrayDisplay($mLineRead)

======================
=== xref reports =====
======================

== User functions =================================================================================================
                          Func
Function name             Row     Referenced at Row(s)
========================= ====== ==================================================================================

#### indicates that this specific variable only occurs one time in the script.
---- indicates that this specific variable isn't declared with Dim/Local/Global/Const.

== Variables ======================================================================================================
Variable name             Dim   Used in Row(s)
========================= ===== ===================================================================================
$hFileHandle1             ----- 00008
$i                        ----- 00005 00010 00013 00014 00015
$line                     ----- 00008 00010
$mLineRead                ----- 00010 00013 00015 00020
@error                    ----- 00009

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

No, I did not know about Tidy.

So, that brings some questions

- how do you learn about Tidy?

- can it also display the functions?

Thanks

However, I still am not understanding StringRegExp. Maybe I could get someone else to explain it better. I did find a "tutorial" on the forum, but it still left me puzzled.

Share this post


Link to post
Share on other sites

Language tokenizer is not a trivial task. You don't need extraordinaire complex regular expressions as you need the element and semantics of the language to fit correctly. For example, rvalues sentences are never on the left of the assignment operator. Another example is that parentheses are expressed right to left and inside outside. There might be many similarities to regular expression's arsenal of spices but for my limited knowledge about tokenizers, RegExp is playing a little bit..., and I might be wrong.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0