Jump to content

Shorten repeated characters in a string


b0x4it
 Share

Recommended Posts

You have to write your own function iterating over this string, counting repeated letters and rewriting them in proper way. There's no function in autoit that could do it.

 

Do you mean to analyse the whole string character by character and counting the identical characters?

Link to comment
Share on other sites

This appears to work.

Local $string = "aaaaabbbbbbccc[ab][ab][ab][ab][cde][cde]"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:-a{5}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "(\[.+?\])\1") Then
            $sRep = StringRegExpReplace($s, "^.*?((\[.+?\])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "^.*?(\[.+?\])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert
Link to comment
Share on other sites

 

This appears to work.

Local $string = "aaaaabbbbbbccc[ab][ab][ab][ab][cde][cde]"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:-a{5}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "(\[.+?\])\1") Then
            $sRep = StringRegExpReplace($s, "^.*?((\[.+?\])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "^.*?(\[.+?\])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert

 

WWOOWW, this is fantastic, exactly what I am looking for. It just doesn't work with the group containing "{" like {ar}{ar}{ar}{ar}. Can it be adjusted to work with all of the characters?

I really appreciate it.

Link to comment
Share on other sites

.... It just doesn't work with the group containing "{" like {ar}{ar}{ar}{ar}. Can it be adjusted to work with all of the characters?

I really appreciate it.

Try this.

Local $string = "aaaaa{ar}{ar}{ar}{ar}bbbbbbccc[ab][ab][ab][ab][cde][cde]"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:- a{5}{ar}{4}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "([\[{].+?[}\]])\1") Then
            $sRep = StringRegExpReplace($s, "^.*?(([\[{].+?[}\]])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "^.*?([\[{].+?[}\]])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert
Link to comment
Share on other sites

 

Try this.

Local $string = "aaaaa{ar}{ar}{ar}{ar}bbbbbbccc[ab][ab][ab][ab][cde][cde]"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:- a{5}{ar}{4}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "([\[{].+?[}\]])\1") Then
            $sRep = StringRegExpReplace($s, "^.*?(([\[{].+?[}\]])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "^.*?([\[{].+?[}\]])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert

 

Excellent, Thank you sooo much!

To reverse this process (to expand the string to what it was before _StringConvert), should I look for {*} in the string and repeat what is behind it * times? or regular expression can be used for the reverse process as well?

Link to comment
Share on other sites

....

To reverse this process (to expand the string to what it was before _StringConvert), should I look for {*} in the string and repeat what is behind it * times? or regular expression can be used for the reverse process as well?

 

I am sure there is more than one way to reverse the process. I would do it this way.

Local $string = "{ar}{4}[ab]{4}a{5}b{6}[cde]{2}c{3}"

ConsoleWrite(_StringExpand($string) & @LF) ; Return:-  {ar}{ar}{ar}{ar}[ab][ab][ab][ab]aaaaabbbbbb[cde][cde]ccc


Func _StringExpand($s)
    Local $sBR = StringRegExpReplace($s, "^.*?([\[{]?[^}\]]+[\]}]?)\{\d+\}.*$", "\1"), $sRep = ""
    While StringRegExp($s, "[\[{]?[^}\]]+[\]}]?\{\d+\}")
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "[^}\]]\{\d+\}") Then  ; "a letter {n}", where n = a number
            $sBR = StringRegExpReplace($s, "^.*?([^}\]])\{\d+\}.*$", "\1"); The letter part
            For $i = 1 To StringRegExpReplace($s, "^.*?[^}\]]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "^.*?([^}\]]\{\d+\}).*$", "\1"), $sRep)
        ElseIf StringRegExp($s, "[\[{][^}\]]+[\]}]\{\d+\}") Then  ; "[" or "{" letters "}" or "]" {n}, where n = a number
            $sBR = StringRegExpReplace($s, "^.*?([\[{][^}\]]+[\]}])\{\d+\}.*$", "\1") ; "[" or "{" letters "}" or "]"
            For $i = 1 To StringRegExpReplace($s, "^.*?[\[{][^}\]]+[\]}]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "^.*?([\[{][^}\]]+[\]}]\{\d+\}).*$", "\1"), $sRep)
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringExpand

 

Link to comment
Share on other sites

Is there anyway to shorten repeated characters

 

 

I would recommend a stringlength > 4 requirement, if the goal is reducing characters.  The example is easy:

aa =  2 

a{2}= 4

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

 

I am sure there is more than one way to reverse the process. I would do it this way.

Local $string = "{ar}{4}[ab]{4}a{5}b{6}[cde]{2}c{3}"

ConsoleWrite(_StringExpand($string) & @LF) ; Return:-  {ar}{ar}{ar}{ar}[ab][ab][ab][ab]aaaaabbbbbb[cde][cde]ccc


Func _StringExpand($s)
    Local $sBR = StringRegExpReplace($s, "^.*?([\[{]?[^}\]]+[\]}]?)\{\d+\}.*$", "\1"), $sRep = ""
    While StringRegExp($s, "[\[{]?[^}\]]+[\]}]?\{\d+\}")
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "[^}\]]\{\d+\}") Then  ; "a letter {n}", where n = a number
            $sBR = StringRegExpReplace($s, "^.*?([^}\]])\{\d+\}.*$", "\1"); The letter part
            For $i = 1 To StringRegExpReplace($s, "^.*?[^}\]]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "^.*?([^}\]]\{\d+\}).*$", "\1"), $sRep)
        ElseIf StringRegExp($s, "[\[{][^}\]]+[\]}]\{\d+\}") Then  ; "[" or "{" letters "}" or "]" {n}, where n = a number
            $sBR = StringRegExpReplace($s, "^.*?([\[{][^}\]]+[\]}])\{\d+\}.*$", "\1") ; "[" or "{" letters "}" or "]"
            For $i = 1 To StringRegExpReplace($s, "^.*?[\[{][^}\]]+[\]}]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "^.*?([\[{][^}\]]+[\]}]\{\d+\}).*$", "\1"), $sRep)
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringExpand

 

Thank you very much.

Link to comment
Share on other sites

#include <Array.au3>
$sString = '[ab][ab][ab][ab]'
$sString = 'asdfghjkqwertyuiopzxcvbnmasdfhjasdf'
$aString = StringSplit($sString, '')
; удаляет дубликаты из массива, аналогично _ArrayUnique.
$aString = _Calc($aString)
MsgBox(0, '', $aString)

Func _Calc(Const ByRef $aArray)
    If Not IsArray($aArray) Then Return SetError(1, 0, 0)
    Local $oDict = ObjCreate("Scripting.Dictionary")
    ; $oDict.CompareMode = 0 ; by default
    For $i = 0 To UBound($aArray) - 4
        $item = $aArray[$i] & $aArray[$i + 1] & $aArray[$i + 2] & $aArray[$i + 3]
        $oDict.Item($item) = $oDict.Item($item) + 1
    Next
    $aName = $oDict.Keys()
    $iCount = $oDict.Items()
    Local $sRes
    For $i = 0 To UBound($aName) - 1
        If $iCount[$i] = 1 Then ContinueLoop
        $sRes &= $aName[$i] & '[' & $iCount[$i] & ']'
    Next
    Return $sRes
EndFunc   ;==>_Calc

Link to comment
Share on other sites

 

Try this.

Local $string = "aaaaa{ar}{ar}{ar}{ar}bbbbbbccc[ab][ab][ab][ab][cde][cde]"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:- a{5}{ar}{4}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "([\[{].+?[}\]])\1") Then
            $sRep = StringRegExpReplace($s, "^.*?(([\[{].+?[}\]])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "^.*?([\[{].+?[}\]])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert

Wayfarer do you know why this does not work if the string has @CRLF in it?

"aaaaa{ar}{ar}{ar}{ar}bbbb" & @CRLF & "bbccc{|}{|}[ab][ab][<][<]...AAA111"

is shorten to

a{61}

Link to comment
Share on other sites

#include <Array.au3>
$sString = '[ab][ab][ab][ab]'
$sString = 'asdfghjkqwertyuiopzxcvbnmasdfhjasdf'
$aString = StringSplit($sString, '')
; удаляет дубликаты из массива, аналогично _ArrayUnique.
$aString = _Calc($aString)
MsgBox(0, '', $aString)

Func _Calc(Const ByRef $aArray)
    If Not IsArray($aArray) Then Return SetError(1, 0, 0)
    Local $oDict = ObjCreate("Scripting.Dictionary")
    ; $oDict.CompareMode = 0 ; by default
    For $i = 0 To UBound($aArray) - 4
        $item = $aArray[$i] & $aArray[$i + 1] & $aArray[$i + 2] & $aArray[$i + 3]
        $oDict.Item($item) = $oDict.Item($item) + 1
    Next
    $aName = $oDict.Keys()
    $iCount = $oDict.Items()
    Local $sRes
    For $i = 0 To UBound($aName) - 1
        If $iCount[$i] = 1 Then ContinueLoop
        $sRes &= $aName[$i] & '[' & $iCount[$i] & ']'
    Next
    Return $sRes
EndFunc   ;==>_Calc

 

Thanks for your code, but I don't see how it works! It actually does not work for me.

I want [ab][ab][ab][ab] to become [ab]{4}, but it becomes [ab][4]ab][[3]b][a[3]][ab[3] by this code!

Link to comment
Share on other sites

#include <Array.au3>
$sString = '[ab][ab][ab][ab][zx][zx]'
$aString = _Calc($sString)
MsgBox(0, '', $aString)

Func _Calc(Const ByRef $sString)
    Local $oDict = ObjCreate("Scripting.Dictionary")
    ; $oDict.CompareMode = 0 ; by default
    $aArray = StringRegExp($sString, '\[\w{2}\]', 3)
    If @error Then Return SetError(1, 0, 0)
    For $i = 0 To UBound($aArray) - 1
        $oDict.Item($aArray[$i]) = $oDict.Item($aArray[$i]) + 1
    Next
    $aName = $oDict.Keys()
    $iCount = $oDict.Items()
    Local $sRes
    For $i = 0 To UBound($aName) - 1
        If $iCount[$i] = 1 Then ContinueLoop
        $sRes &= $aName[$i] & '[' & $iCount[$i] & ']'
    Next
    Return $sRes
EndFunc   ;==>_Calc

Link to comment
Share on other sites

Wayfarer do you know why this does not work if the string has @CRLF in it?

"aaaaa{ar}{ar}{ar}{ar}bbbb" & @CRLF & "bbccc{|}{|}[ab][ab][<][<]...AAA111"

is shorten to

a{61}

 

By default the dot, "." does not include @CRLF when matching characters. Adding "(?s)" to the beginning of each regular expression pattern that contains "^" and "$" will match all characters within the test string including newlines.

Local $string = "aaaaa{ar}{ar}{ar}{ar}bbbb" & @CRLF & "bbccc{|}{|}[ab][ab][<][<]...AAA111"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:- a{5}{ar}{4}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "(?s)^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "(?s)^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "([\[{].+?[}\]])\1") Then
            $sRep = StringRegExpReplace($s, "(?s)^.*?(([\[{].+?[}\]])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "(?s)^.*?([\[{].+?[}\]])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert
Link to comment
Share on other sites

 

By default the dot, "." does not include @CRLF when matching characters. Adding "(?s)" to the beginning of each regular expression pattern that contains "^" and "$" will match all characters within the test string including newlines.

Local $string = "aaaaa{ar}{ar}{ar}{ar}bbbb" & @CRLF & "bbccc{|}{|}[ab][ab][<][<]...AAA111"

ConsoleWrite(_StringConvert($string) & @LF) ; Return:- a{5}{ar}{4}b{6}c{3}[ab]{4}[cde]{2}


Func _StringConvert($s)
    Local $sBR, $sRep = StringRegExpReplace($s, "(?s)^.*?((.)\2+).*$", "\1")
    While $sRep <> ""
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "((.)\2)") Then
            $sRep = StringRegExpReplace($s, "(?s)^.*?((.)\2+).*$", "\1")
            $s = StringReplace($s, $sRep, StringLeft($sRep, 1) & "{" & StringLen($sRep) & "}")
        ElseIf StringRegExp($s, "([\[{].+?[}\]])\1") Then
            $sRep = StringRegExpReplace($s, "(?s)^.*?(([\[{].+?[}\]])\2+).*$", "\1")
            $sBR = StringRegExpReplace($s, "(?s)^.*?([\[{].+?[}\]])\1.*$", "\1")
            StringReplace($s, $sBR, $sBR)
            $iNum = @extended
            $s = StringReplace($s, $sRep, $sBR & "{" & $iNum & "}")
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringConvert

Thank you very much.

I tried to apply this to the reverse code, but it doesn't work (It returns just aaaaa). What did I do wrong please?

Local $string = "a{5}{ar}{4}b{4}" & @CRLF & "b{2}c{3}{|}{2}[ab]{2}[<]{2}.{3}A{3}1{3}"

ConsoleWrite(_StringExpand($string) & @LF) ; Return:-  {ar}{ar}{ar}{ar}[ab][ab][ab][ab]aaaaabbbbbb[cde][cde]ccc


Func _StringExpand($s)
    Local $sBR = StringRegExpReplace($s, "(?s)^.*?([\[{]?[^}\]]+[\]}]?)\{\d+\}.*$", "\1"), $sRep = ""
    While StringRegExp($s, "[\[{]?[^}\]]+[\]}]?\{\d+\}")
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "[^}\]]\{\d+\}") Then  ; "a letter {n}", where n = a number
            $sBR = StringRegExpReplace($s, "(?s)^.*?([^}\]])\{\d+\}.*$", "\1"); The letter part
            For $i = 1 To StringRegExpReplace($s, "(?s)^.*?[^}\]]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "^.*?([^}\]]\{\d+\}).*$", "\1"), $sRep)
        ElseIf StringRegExp($s, "[\[{][^}\]]+[\]}]\{\d+\}") Then  ; "[" or "{" letters "}" or "]" {n}, where n = a number
            $sBR = StringRegExpReplace($s, "(?s)^.*?([\[{][^}\]]+[\]}])\{\d+\}.*$", "\1") ; "[" or "{" letters "}" or "]"
            For $i = 1 To StringRegExpReplace($s, "(?s)^.*?[\[{][^}\]]+[\]}]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "(?s)^.*?([\[{][^}\]]+[\]}]\{\d+\}).*$", "\1"), $sRep)
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringExpand
Link to comment
Share on other sites

Thank you very much.

I tried to apply this to the reverse code, but it doesn't work (It returns just aaaaa). What did I do wrong please?

Local $string = "a{5}{ar}{4}b{4}" & @CRLF & "b{2}c{3}{|}{2}[ab]{2}[<]{2}.{3}A{3}1{3}"

ConsoleWrite(_StringExpand($string) & @LF) ; Return:-  {ar}{ar}{ar}{ar}[ab][ab][ab][ab]aaaaabbbbbb[cde][cde]ccc


Func _StringExpand($s)
    Local $sBR = StringRegExpReplace($s, "(?s)^.*?([\[{]?[^}\]]+[\]}]?)\{\d+\}.*$", "\1"), $sRep = ""
    While StringRegExp($s, "[\[{]?[^}\]]+[\]}]?\{\d+\}")
        $sRep = ""
        $sBR = ""
        If StringRegExp($s, "[^}\]]\{\d+\}") Then  ; "a letter {n}", where n = a number
            $sBR = StringRegExpReplace($s, "(?s)^.*?([^}\]])\{\d+\}.*$", "\1"); The letter part
            For $i = 1 To StringRegExpReplace($s, "(?s)^.*?[^}\]]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "^.*?([^}\]]\{\d+\}).*$", "\1"), $sRep)
        ElseIf StringRegExp($s, "[\[{][^}\]]+[\]}]\{\d+\}") Then  ; "[" or "{" letters "}" or "]" {n}, where n = a number
            $sBR = StringRegExpReplace($s, "(?s)^.*?([\[{][^}\]]+[\]}])\{\d+\}.*$", "\1") ; "[" or "{" letters "}" or "]"
            For $i = 1 To StringRegExpReplace($s, "(?s)^.*?[\[{][^}\]]+[\]}]\{(\d+)\}.*$", "\1") ; The number that is enclosed in "{}".
                $sRep &= $sBR
            Next
            $s = StringReplace($s, StringRegExpReplace($s, "(?s)^.*?([\[{][^}\]]+[\]}]\{\d+\}).*$", "\1"), $sRep)
        EndIf
    WEnd
    Return $s
EndFunc   ;==>_StringExpand

You missed one, line #16.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...