Jump to content
Sign in to follow this  
Uten

Extract single and double quoted strings

Recommended Posts

Uten

Needed a UDF to extract "strings" in au3 files, so I could process and replace them.

I did find the thread done by @SmOKe_N and @cameronsdad

But the solution presented there was slower than mine (due to the autoit version at the time?) and did not take care of several quoted strings on a line. As far as I know. I know @SmOKe_N has a solution (a complex brute force one?) as he used it in his EnCodeIt application.

Further down this topic @\dev\null has worked out a translation of the lexer files from the autoit source. Absolutely worth taking a look at.

My solution using a regular expression does not cower all bases, but you can see how the regexp evolves.

TODO:

  • Check if the quoted string is in a comment.
  • Single quoted strings with double quoted parts should return the entire single quoted string.
Func testStringExtractQuotes()
    Local $t = TimerInit()
    Local $ar[1]
    
    Local $i, $test, $expect
    $test = '"' & "Double quoted string" & '"'
    $expect = '"Double quoted string"'
    $r = StringExtractQuotes($test)
    Assert($r[0] == $expect, $r[0] & " <==> " & $expect )
    
    $test = "'" & "Single quoted string" & "'"
    $expect = "'" & "Single quoted string" & "'"
    $r = StringExtractQuotes($test)
    Assert($r[0] == $expect, $r[0] & " <==> " & $expect )

    $test = '"' & "'" & '"' ;Coresponds to "'"
    $expect = '"' & "'" & '"'
    $r = StringExtractQuotes($test)
    Assert($r[0] == $expect, $r[0] & " <==> " & $expect )

    $test = "'" & '"' & "'" ;Coresponds to '"'
    $expect = "'" & '"' & "'"
    $r = StringExtractQuotes($test)
    Assert($r[0] == $expect, $r[0] & " <==> " & $expect )

    $test = '"String with" several "quoted parts in it"'
    $expect = '"String with"'
    $r = StringExtractQuotes($test)
    Assert($r[0] == $expect, $r[0] & " <==> " & $expect )
    $expect = '"quoted parts in it"'
    Assert($r[1]== $expect)
    
    $test = ";'" & "UNEXPECTED" & "'" ;Comment
    $expect = ""
    $r = StringExtractQuotes($test)
    Assert(@error > 0)
    
    $test = "UNEXPECTED Just a test line"
    $expect = ""
    $r = StringExtractQuotes($test)
    Assert(@error > 0 ) 
    
    $test = '"' & "Double quoted string. Embeded' single quote" & '"'
    $expect = '"Double quoted string.Embeded' & "'" & ' single quote"'
    $r = StringExtractQuotes($test)
    Assert($r[0] == $expect, $r[0] & " <==> " & $expect )
;~  $test = 
;~  $expect = 
;~  $r = StringExtractQuotes($test)
;~  Assert($r[0]== $expect) 
EndFunc
Func dbg($msg, $line=@ScriptLineNumber, $err=@error, $ext=@extended)
    ConsoleWrite("(" & $line & ") := (" & $err & ")(" & $ext & ") " & $msg & @CRLF)
EndFunc
Func Assert($bool, $msg="", $line=@ScriptLineNumber, $err=@error, $ext=@extended)
    If Not $bool Then dbg("ASSERT FAILED: " & $msg, $line, $err, $ext)
EndFunc
Func StringExtractQuotes($str)
    ;Local $rx = '([' & "'" & '"' & '][\w\W]+?[' & "'" & '"' & "])" ;Failes singel quoted string with double quoted string inside
        ;Local $rx = '([' & "'" &  "][^'\n]+?[" & "'" & "])" & "|" & '([' & '"' & '][^"\n]+?[' & '"' & "])"
    Local $rx = '([' & "'" &  "][^'\n]+?[" & "'" & "]" & "|" & '[' & '"' & '][^"\n]+?[' & '"' & "])"  
    Local $r = StringRegExp($str, $rx, 3)
    Return $r
EndFunc

EDIT: $rx tweaked.

EDIT: $rx tweaked pay attention to the () and comapre against previous line.

EDIT: Added link to @\dev\null's lexer implementation.

Edited by Uten

Share this post


Link to post
Share on other sites
/dev/null

Hi,

I applied your function to your AU3 code. Does not look very well. Maybe it needs some further tweaking ...

#include <array.au3>

$file = "47531.txt"  ;  <-- sample code posted by uten

$text = FileRead($file)

$strings = StringExtractQuotes($text)

_ArrayDisplay($strings,"Strings")

Func StringExtractQuotes($str)
    Local $rx = '([' & "'" & '"' & '][\w\W]+?[' & "'" & '"' & "])" ;Failes singel quoted string with double quoted string inside   
    Local $r = StringRegExp($str, $rx, 3)
    Return $r
EndFunc

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

Share this post


Link to post
Share on other sites
Uten

As you can see from my samples they are all single lines! Processing the entire file in one go is quite hard. I know that as I did work a day with @SmOKe_n to try to figure out a regexp pattern to extract all possible valid autoit string patterns. Unfortunately I don't have the tests I did from the time available. We came close but not close enough.

Now that is not to say the sample provided does not need tweaking..:rolleyes: The problem is, not to tweak yourself into a corner and forget the best solution found on the way. And did I say writing the test code covering all aspects and possible combinations are a challenge?

So any regular expression being able to take all in one go and exclude commented stuff is more than welcome.

Share this post


Link to post
Share on other sites
SmOke_N

I know I finally figured it out... and I'll have to find those test myself (I'm sure I saved them), if and when I do, I'll send them to you PM and you can post them here if you like.


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
Uten

Hi, thanks @SmOke_N, thought you would. I think my next edit will be quite good to. But I would love to compare against your latest.

Now, if someone could tell me why the @error flag ain't set if there are no matches in a global search?

$test = ";'" & "UNEXPECTED" & "'" ;Comment
$expect = ""
$r = StringExtractQuotes($test)
Assert(NOT IsArray($test))
Assert(@error > 0) ; Should be @errro=1oÝ÷ Ø@ÈL¬Æ¥&Þq«¬zíë-º¢¸Ê'¶º%)ඤ{&èj·ZºÚ"µÍÌÍÝÝH  ][ÝÎÉÌÎNÉ][ÝÈ   [È ][ÝÕSVPÕQ    ][ÝÈ  [È ][ÝÉÌÎNÉ][ÝÈÐÛÛ[Y[ÌÍÙ^XÝH ][ÝÉ][ÝÂÌÍÜHÝ[Ñ^XÝ][ÝÊ  ÌÍÝÝ
BÜÙ
ÕÐ^J  ÌÍÜJBÜÙ
Ü  ÝÈ
HÈHX]ÚÈ]YXØ]ÙHHYÙ^ÛÚXÚÈÜÈÛÛ[Y[Y[YY
Edited by Uten

Share this post


Link to post
Share on other sites
SmOke_N

Hi, thanks @SmOke_N, thought you would. I think my next edit will be quite good to. But I would love to compare against your latest.

Now, if someone could tell me why the @error flag ain't set if there are no matches in a global search?

$test = ";'" & "UNEXPECTED" & "'" ;Comment
$expect = ""
$r = StringExtractQuotes($test)
Assert(NOT IsArray($test))
Assert(@error > 0) ; Should be @errro=1
Not a fan of that coding :rolleyes:

Not IsArray($test) ... you are passing "True" or "False" to Assert(), can't really see where that would ever equal 1.


Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
/dev/null

So any regular expression being able to take all in one go and exclude commented stuff is more than welcome.

where is the benefit of using a very complex regular expression here (if even possible to find one that can match every combination)? If you look at the lexer code of AutoIT, it is fairly simple (for strings). You can port the c++ code to AutoIT within a few minutes AND you are sure to get every string, as you will use the same method as AutoIT itself!

See available source code:

AutoIt_Script::Lexer

AutoIt_Script::Lexer_String

Cheers

Kurt


__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

Share this post


Link to post
Share on other sites
Uten

@SmOke_N, Yea, I did have a typing error there. Tested a string for an array. Not good.

Assert(Not IsArray($var)) To make sure $var does not contain an array should be a valid test I think? At keast when the regexp covers the test case.

EDIT: The lexer idea might be worth looking into. Newer thought of that. I just can't understand why it's so hard to make a regexp to do it?

Edited by Uten

Share this post


Link to post
Share on other sites
/dev/null

I just can't understand why it's so hard to make a regexp to do it?

O.K. try to find a pattern that matches both of these valid strings.

$a = "'""'''""'''"
$b = '"''"""''"""'

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

Share this post


Link to post
Share on other sites
Uten

:rolleyes:

Could you point out where this is put to good used..:x

But, yes a weary nice sample showing that 80% of the job is done in 20% of the time. Samples like this will be in the 20% part of the job and probably eat a lot more than 80% of the time.

But a solution will probably contains something like: ['"]{2}*?. We, you probably know this, also have to understand that $a does not contain all the double quotes if $a is tested in a running script. But the string is there if read from a file. Clever as I am I have not tested that, as I expect there will be other issues biting my but..:rambo:

Thanks for the sample @\dev\null

Share this post


Link to post
Share on other sites
/dev/null

:rolleyes:

Could you point out where this is put to good used..:rambo:

well, for nothing. It just shows the complexity of the problem. :-)


__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

Share this post


Link to post
Share on other sites
/dev/null

well, for nothing. It just shows the complexity of the problem. :-)

Here is something to get you started. It's a direct translation of the AU3 Lexer Code. Save the code as lexer.au3 and run it. See what happens. Still much to do ...

#include <file.au3>

$file = "lexer.au3"

; $a = "Huhuhu" 

dim $lines
_ReadFile($file,$lines)

for $i = 1 to $lines[0] 
    $pos = 0
    
    $line = StringSplit($lines[$i],"")
    ConsoleWrite("Line " & $i & ": " & $lines[$i] & @CRLF)
    while ($pos < $line[0]) 
        $pos += 1
        if ($line[$pos] = ';') then ExitLoop
        if ($line[$pos] = '"' or $line[$pos] = "'") then
            $result = _Lexer_String($line,$pos)
            ConsoleWrite("Strings: ->" & $result & "<-" & @CRLF)
        endif
    wend 
    
next 

#cs
$test = "Hello World _";
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
$test = "Hello World_";
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
$test = "Hello World _ ";
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
$test = "Hello World _;";
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
$test = "Hello World _; Hello ";
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
$test = "Hello World _  ; Hello ";
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
$test = """Hello World _;"";"
ConsoleWrite($test & " -> " & StringRegExp($test,".* _(\s)*(;.*)*$") & @CRLF)
#ce


#cs
// Copyright (C)1999-2005:
//      - Jonathan Bennett <jon at hiddensoft dot com>
//      - See "AUTHORS.txt" for contributors.

// direct translation of the c++ code of AutoIt_Script::Lexer_String 
#ce
func _Lexer_String (ByRef $szLine,ByRef $iPos)
    
    Local $iPosTemp = 0
    Local $bComplete = 0

    Local $chComment = $szLine[$iPos]
    Local $iPosStart = $iPos

    Local $szResult = ""
    
    $iPos +=1

    While ($iPos < $szLine[0])
    
        if ($szLine[$iPos] = $chComment) then
            if ( $szLine[$iPos+1] <> $chComment) then
                $iPos += 1
                $bComplete = 1
                ExitLoop
            else
                $szResult = $szResult & $szLine[$iPos]
                $iPos = $iPos + 2
            endif
        else
            $szResult = $szResult & $szLine[$iPos]
            $iPos += 1
        endif
    WEnd

    if ($bComplete = 0) then
        SetError(15)
        return $szResult
    endif

    return $szResult

EndFunc


func _ReadFile($file,ByRef $lines)
    _FileReadToArray($file,$lines)
EndFunc
Edited by /dev/null

__________________________________________________________(l)user: Hey admin slave, how can I recover my deleted files?admin: No problem, there is a nice tool. It's called rm, like recovery method. Make sure to call it with the "recover fast" option like this: rm -rf *

Share this post


Link to post
Share on other sites
Uten

Hi @\dev\null,

Nice sample, I have just brifely tested it. Will take a closer look later on. Will update with a link at the top post.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×