WaitingForZion Posted February 20, 2010 Posted February 20, 2010 (edited) I was working on a UDF for inclusion in the UDF standard library that breaks text up into tokens based on an array of token definitions. I'm aware that according to good engineering principles code should be decomposed as several functions to make it more human readable. However, since this was a UDF, I did not think it would be approved if I broke up my UDF into smaller functions, so I decided to write this function without any subroutines except for one. You may think I'm inapt to code for saying this, but I'm starting to have trouble understanding my own code. I have nearly succeeded in creating a tokenizer that recognizes specified symbols, sets of characters according to a given regular expression, and in-quote strings. My goal was to create a function that could break up structured information into tokens for easy processing. Unfortunately, because the code is becoming so unmanageable for me, and so disorganized to the point that I can no longer refine it through minor modifications, I've chosen not to complete it. Nevertheless, the function does work successfully with the proper parameters. _Tokenize() takses three arguments. $sText - the text to tokenized. $aTokenTypes - the array of token definitions. $aTokens - The destination array to which shall be added the new tokens. [Type, Text] Each token definition is an array of five elements. 1. The type name of the token. 2. Whether the token will be matched directly with a given single character, or regular expression describing the kind of character. 3. The character/regular expression. 4. Whether the token consists of a single character, or multiple characters each within the class on the one specified character or regular expression. 5. Whether to accept the following characters literally a string under the type of token specified, until it encounters a character of the same token definition. I've never thoroughly studied or understood already developed algorithms for tokenizing or parsing, so this process came from my own limited and faulty idea of how one would work. Any feedback, ideas, etc will be appreciated. expandcollapse popup#include <Array.au3> $NO_TOKEN = -1 Func _Tokenize($sText, $aTokenTypes, byref $aTokens) $iCharCount = StringLen($sText) $vLastType = 0 $sLastChar = 0 $sCurrentToken = "" $bLastIsSingle = False $bIsSingle = False $bHoldLastChar = False $bInLiteral = False $bStartLiteral = False $bLastStartLiteral = False $sLiteralText = "" Dim $aNewToken[2] For $iCharIndex = 1 to $iCharCount $sChar = StringMid($sText, $iCharIndex, 1) $vType = _CharIdentifyTokenType($sChar, $aTokenTypes, $bIsSingle, $bStartLiteral) if $iCharIndex > 1 Then if NOT $bHoldLastChar Then $sLastChar = StringMid($sText, $iCharIndex-1, 1) $vLastType = _CharIdentifyTokenType($sLastChar, $aTokenTypes, $bLastIsSingle, $bLastStartLiteral) EndIf If $bInLiteral AND $bStartLiteral <> $bLastStartLiteral then $sLiteralText &= $sChar Else $bHoldLastChar = False EndIf if ($vType <> $vLastType OR ($vType == $vLastType AND $bLastIsSingle)) AND $vType <> $NO_TOKEN Then If Not $bInLiteral then If Not $bLastStartLiteral then $aNewToken[0] = $vLastType $aNewToken[1] = $sCurrentToken _ArrayAdd($aTokens, $aNewToken) EndIf If Not $bStartLiteral Then Else $sLiteralText = "" $sLastChar = $sChar $vLastType = $vType $bLastStartLiteral = True $bInLiteral = True $bHoldLastChar = True EndIf Else If ($bStartLiteral AND $bLastStartLiteral) AND ($vType = $vLastType) Then $aNewToken[0] = $vType $aNewToken[1] = $sLiteralText _ArrayAdd($aTokens, $aNewToken) $bInLiteral = False $bHoldLastChar = False EndIf EndIf $sCurrentToken = "" ElseIf $vType = $NO_TOKEN Then $bHoldLastChar = true EndIf If $iCharIndex = $iCharCount AND $vType <> $NO_TOKEN Then if $bInLiteral Then SetError(1) Return -1 EndIf If ($bStartLiteral AND $bLastStartLiteral) AND ($vType = $vLastType) Then $aNewToken[0] = $vType $aNewToken[1] = $sLiteralText _ArrayAdd($aTokens, $aNewToken) Return EndIf $sCurrentToken &= $sChar $aNewToken[0] = $vType $aNewToken[1] = $sCurrentToken _ArrayAdd($aTokens, $aNewToken) Return EndIf If $vType <> $NO_TOKEN Then $sCurrentToken &= $sChar EndIf Else If $vType <> $NO_TOKEN Then If $iCharCount = 1 Then if $bStartLiteral Then SetError(2) Return -1 EndIf $aNewToken[0] = $vType $aNewToken[1] = $sChar _ArrayAdd($aTokens, $aNewToken) Else $sCurrentToken &= $sChar EndIf EndIf EndIf Next EndFunc Func _CharIdentifyTokenType($sChar, $aTokenTypes, byref $bIsSingle, byref $bStartLiteral) For $aType in $aTokenTypes If $aType[1] = true AND StringRegExp($sChar, $aType[2]) Then $bIsSingle = $aType[3] $bStartLiteral = $aType[4] Return $aType[0] ElseIf $aType[2] == $sChar then $bIsSingle = $aType[3] $bStartLiteral = $aType[4] Return $aType[0] EndIf Next $bIsSingle = false $bStartLiteral = False Return $NO_TOKEN EndFunc Dim $tokenDefs[5] Dim $token1[5] Dim $token2[5] Dim $token3[5] Dim $token4[5] Dim $token5[5] $token1[0] = "open_param" $token1[1] = False $token1[2] = "(" $token1[3] = True $token1[4] = False $token2[0] = "close_paren" $token2[1] = False $token2[2] = ")" $token2[3] = True $token2[4] = False $token3[0] = "comma" $token3[1] = False $token3[2] = "," $token3[3] = True $token3[4] = False $token4[0] = "single_alnum_word" $token4[1] = True $token4[2] = "[[:alnum:]]" $token4[3] = False $token4[4] = False $token5[0] = "string" $token5[1] = False $token5[2] = '"' $token5[3] = False $token5[4] = True $tokenDefs[0] = $token1 $tokenDefs[1] = $token2 $tokenDefs[2] = $token3 $tokenDefs[3] = $token4 $tokenDefs[4] = $token5 Dim $tokens[1] if _Tokenize('sandwich(cheese, "confusing solomy", "Another string?", mayonaze, mustard)', $tokenDefs, $tokens) < 0 Then ConsoleWrite(@Error & @CRLF) EndIf for $i = 1 to UBound($tokens)-1 $token = $tokens[$i] ConsoleWrite("Type: " & $token[0] & @CRLF & "Text: " & $token[1] & @CRLF & @CRLF) Next Edited February 20, 2010 by WaitingForZion Spoiler "This then is the message which we have heard of him, and declare unto you, that God is light, and in him is no darkness at all. If we say that we have fellowship with him, and walk in darkness, we lie, and do not the truth: But if we walk in the light, as he is in the light, we have fellowship one with another, and the blood of Jesus Christ his Son cleanseth us from all sin. If we say that we have no sin, we deceive ourselves, and the truth is not in us. If we confess our sins, he is faithful and just to forgive us our sins, and to cleanse us from all unrighteousness. If we say that we have not sinned, we make him a liar, and his word is not in us." (I John 1:5-10)
WaitingForZion Posted February 21, 2010 Author Posted February 21, 2010 I guess everyone thinks it's garbage. Well, that's ok. But can I have some kind of feedback? Spoiler "This then is the message which we have heard of him, and declare unto you, that God is light, and in him is no darkness at all. If we say that we have fellowship with him, and walk in darkness, we lie, and do not the truth: But if we walk in the light, as he is in the light, we have fellowship one with another, and the blood of Jesus Christ his Son cleanseth us from all sin. If we say that we have no sin, we deceive ourselves, and the truth is not in us. If we confess our sins, he is faithful and just to forgive us our sins, and to cleanse us from all unrighteousness. If we say that we have not sinned, we make him a liar, and his word is not in us." (I John 1:5-10)
Moderators Melba23 Posted February 21, 2010 Moderators Posted February 21, 2010 WaitingForZion,OK, I will bite! I can see what it does (and it seems to do it quite adequately) but why does it do it? What can it be used for? What lacuna in my coding life is it looking to fill?Apologies if that sounds negative, but answering those questions might elicit some response. At the moment I can imagine most forum members looking at your post and thinking "This is a solution in search of a problem". So give us the problem! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
Developers Jos Posted February 21, 2010 Developers Posted February 21, 2010 What can it be used for? What lacuna in my coding life is it looking to fill?To translate it for our average member: What game does this Bot handle? Seriously: I have the same question: give us a real life example where this script could be useful. Jos SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past.
WaitingForZion Posted February 21, 2010 Author Posted February 21, 2010 (edited) I explain that in my improved version: Alexi 1.0 Edited February 21, 2010 by WaitingForZion Spoiler "This then is the message which we have heard of him, and declare unto you, that God is light, and in him is no darkness at all. If we say that we have fellowship with him, and walk in darkness, we lie, and do not the truth: But if we walk in the light, as he is in the light, we have fellowship one with another, and the blood of Jesus Christ his Son cleanseth us from all sin. If we say that we have no sin, we deceive ourselves, and the truth is not in us. If we confess our sins, he is faithful and just to forgive us our sins, and to cleanse us from all unrighteousness. If we say that we have not sinned, we make him a liar, and his word is not in us." (I John 1:5-10)
dani Posted February 21, 2010 Posted February 21, 2010 (edited) No you didn't actually I still don't see any example where the library is put to use. I do see an example.au3 in your Alexi.zip in that thread btw, though IMO it's not a very extensive and useful example Maybe other people are interested in using it though, and kudos for also including a help file in the .zip (and for it using OO ) Edited February 21, 2010 by d4ni
Skizmata Posted February 21, 2010 Posted February 21, 2010 I have no idea why you need to tokenize data but apparently people do. I assume for good reasons I don't yet understand. Apparently this is an important part of lexical analysis. I have seen tokenizing perl modules on cpan and tokenization code for java and c++. Maybe after I read the Wikipedia page for lexical analysis I will get it. AutoIt changed my life.
Fulano Posted February 22, 2010 Posted February 22, 2010 I have no idea why you need to tokenize data but apparently people do. I assume for good reasons I don't yet understand. Apparently this is an important part of lexical analysis. I have seen tokenizing perl modules on cpan and tokenization code for java and c++. Maybe after I read the Wikipedia page for lexical analysis I will get it.Tokenizing is mainly good for breaking up text, some examples of this I can come up with off the top of my head are:Configuration filesPreprocessing source codeCommand line calculatorsScript Engines <- heaven forbid, implementing a script engine inside AutoIt would be horribly inefficentEtc.Morgen #fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!
Raik Posted November 19, 2010 Posted November 19, 2010 (edited) "This is a solution in search of a problem". So give us the problem! @Melba23 & @Jon any syntaxhilighting needs tokenizing, any sourcecode-editors need tokenizing the code, to sort out, what is code, what is a comment, what is commented out code, etc. my idea is to shorten the code like i have writen without proper tokenizing its impossible to do a perfect job. have played around: expandcollapse popup#include <Array.au3> Global Const $VFName = StringSplit("abcdefghijklmnopqrstuvwxyz_0123456789",'',2) ; 0 - 36 Elemente Global $VFcount = 0 $File = FileOpenDialog("",@ScriptDir,"Scripts (*.au3)",5) If @error Then msgbox(0,"Error",@error) Exit EndIf $File = StringReplace($File, "|", @CRLF) $Source = FileRead($File) ; insert used functions from includes ; remove comments ; identify and protect strings from changes ; count and replace variables ; count and replace function names ; hex to dec, if shorter ; replace constants with content, if shorter then constants name ; for constants and params eval BitOr,BitAnd, ... ; reduce whitespace as far as possible Func ReplaceFuncs() Dim $Vars[1][3] Local $l=0 $VFcount = 0 $functions = StringRegExp($source,"(?i)func\s+(\w*)\(.*\)",3) For $element In $functions ReDim $Vars[$l+1][3] $count = StringRegExp($source,"[^$]"&$element&"\s*\(",3) $Vars[$l][0] = $element $Vars[$l][1] = UBound($count) $l+=1 Next _ArraySort($Vars, 1, 0, 0, 0) For $i=0 To $l-1 $Vars[$i][2] = getName(true) Next Return $Vars EndFunc Func ReplaceVariables() Dim $Vars[1][3] Local $l=0 $VFcount = 0 $variables = StringRegExp($source,"\$(\w*)",3) FOR $element IN $variables For $i=0 To $l-1 If StringCompare($Vars[$i][0],$element)=0 Then $Vars[$i][1] += 1 ContinueLoop 2 EndIf NEXT ReDim $Vars[$l+1][3] $Vars[$l][0] = $element $Vars[$l][1] = 1 $l+=1 NEXT _ArraySort($Vars,1,0,0,1) For $i=0 To $l-1 $Vars[$i][2] = getName() Next Return $Vars EndFunc Func getName($type_func=false) If $type_func And Mod($VFcount,37)=27 Then $VFcount+=10 $counter=$VFcount $mod=Mod($counter,37) $var=$VFName[$mod] While $counter>36 $counter=($counter-$mod)/37-1 $var &= $VFName[Mod($counter,37)] Wend $VFcount+=1 return $var EndFunc _ArrayDisplay(ReplaceFuncs(), "Funcs") _ArrayDisplay(ReplaceVariables(), "Variables") ; $Source=StringStripCR($Source) TODO: * prevent numerics in the first place of new name for functions Edited November 20, 2010 by Raik AutoIt-Syntaxsheme for Proton & Phase5 * Firefox Addons by me (resizable Textarea 0.1d) (docked JS-Console 0.1.1)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now