Jump to content

Shorten repeated characters in a string


b0x4it
 Share

Recommended Posts

What about this here?

 

$sText1 = "[ab][ab][ab][ab]X[cd][cd]"
$sText2 = "abababababab"
$sText3 = "aaaaabbbbbbccc"

ConsoleWrite($sText1 & @LF)
ShortenRepeatedCharacters($sText1)
ConsoleWrite(@LF)
ConsoleWrite($sText2 & @LF)
ShortenRepeatedCharacters($sText2)
ConsoleWrite(@LF)
ConsoleWrite($sText3 & @LF)
ShortenRepeatedCharacters($sText3)
ConsoleWrite(@LF)

Func ShortenRepeatedCharacters($sText)
    Local $aResult = StringSplit($sText, "", 2)
    Local $iStartChar = $aResult[0], $sString = $iStartChar, $i
    If UBound($aResult) = 1 Then
        ConsoleWrite($iStartChar & "{1}")
        Return
    EndIf
    For $i = 1 To UBound($aResult) - 1
        If $aResult[$i] = $iStartChar Then ExitLoop
        $sString &= $aResult[$i]
    Next
    If $i < UBound($aResult) Then
        FindRepeatations($sString, $sText & " ")
    Else
        ConsoleWrite(StringLeft($sString, 1)  & "{1}")
        ShortenRepeatedCharacters(StringMid($sString, 2))
    EndIf
EndFunc

Func FindRepeatations($sSearch, $sText)
    Local $j, $c = 1, $bExit = False, $iLenSearch = StringLen($sSearch)
    For $j = $iLenSearch + 1 To StringLen($sText) - $iLenSearch Step $iLenSearch
        If StringMid($sText, $j, $iLenSearch) = $sSearch Then
            $c += 1
        Else
            $bExit = True
            ExitLoop
        EndIf
    Next
    If $sSearch <> " " Then ConsoleWrite($sSearch & "{" & $c & "}")
    Local $sNewString = StringMid(StringTrimRight($sText, 1), $j)
    If $bExit Or StringLen($sNewString) = 1 Then ShortenRepeatedCharacters($sNewString)
EndFunc
Not fully tested!

Edit: added some checks.

Edit2: added some more checks.

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

What about this here?

 

#include <Array.au3>

$sText1 = "[ab][ab][ab][ab][cd][cd]"
$sText2 = "abababababab"
$sText3 = "aaaaabbbbbbccc"

ConsoleWrite($sText1 & @LF)
ShortenRepeatedCharacters($sText1)
ConsoleWrite(@LF)
ConsoleWrite($sText2 & @LF)
ShortenRepeatedCharacters($sText2)
ConsoleWrite(@LF)
ConsoleWrite($sText3 & @LF)
ShortenRepeatedCharacters($sText3)
ConsoleWrite(@LF)

Func ShortenRepeatedCharacters($sText)
    Local $aResult = StringSplit($sText, "", 2)
    Local $iStartChar = $aResult[0], $sString = $iStartChar, $i
    If UBound($aResult) = 1 Then
        ConsoleWrite($iStartChar & "{1}" & @LF)
        Return
    EndIf
    For $i = 1 To UBound($aResult) - 1
        If $aResult[$i] = $iStartChar Then ExitLoop
        $sString &= $aResult[$i]
    Next
    If $i <= UBound($aResult) Then FindRepeatations($sString, $sText & " ")
EndFunc

Func FindRepeatations($sSearch, $sText)
    Local $j, $c = 1, $bExit = False, $iLenSearch = StringLen($sSearch)
    For $j = $iLenSearch + 1 To StringLen($sText) - $iLenSearch Step $iLenSearch
        If StringMid($sText, $j, $iLenSearch) = $sSearch Then
            $c += 1
        Else
            $bExit = True
            ExitLoop
        EndIf
    Next
    If $sSearch <> " " Then ConsoleWrite($sSearch & "{" & $c & "}" & @LF)
    Local $sNewString = StringMid(StringTrimRight($sText, 1), $j)
    If $bExit Or StringLen($sNewString) = 1 Then ShortenRepeatedCharacters($sNewString)
EndFunc
Not fully tested!

Edit: added some checks.

Br,

UEZ

 

Many thanks for this. This is working, but if I add @CRLF in between then it get confused:

$sText1 = "[ab][ab][ab]"& @CRLF & "[ab]<cd><cd>"

produces:

ab][ab][ab]

[ab]<cd><cd>

[ab]{3}

[ab]<cd><cd>{1}

 

It also adds newlines in between the shortened item.

Do you have any method like this in mind for the reverse procedure?

Link to comment
Share on other sites

Updated the code from post#21. Btw, @crlf between the string makes no sense imho.

What do you mean with reverse procedure? Build up the string again from shorten string?

#include <String.au3>

$sShorten = "a{5}b{6}c{3}"
ConsoleWrite(ExpandString($sShorten) & @LF & @LF)

$sShorten = "[ab]{4}X{1}[cd]{2}"
ConsoleWrite(ExpandString($sShorten) & @LF & @LF)


Func ExpandString($sString)
    Local $aRepetitions = StringRegExp($sString, "\{(\d+)\}", "3")
    If @error Then Return SetError(1, 0, 0)
    Local $aWords = StringRegExp($sString, "(?U)(.+)\{\d*\}", "3")
    If @error Then Return SetError(2, 0, 0)
    If UBound($aRepetitions) <> UBound($aWords) Then Return SetError(3, 0, 0)
    Local $i, $sExpanded
    For $i = 0 To UBound($aWords) - 1
        $sExpanded &= _StringRepeat($aWords[$i], $aRepetitions[$i])
    Next
    Return $sExpanded
EndFunc
 

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

a further way

#include <array.au3>
Local $items[1], $temp, $x, $y
$sText1 = "[ab][ab][ab]" & @CRLF & "[ab]<cd><cd>aaaaaaaabb[hello][hello]bbbbbbbcccccccccbn hello everybody bye bye"
For $i = 1 To StringLen($sText1)
    $x = StringMid($sText1, $i, 1)
    If $x = "[" Then
        $temp = ""
        $y = 1
        ContinueLoop
    EndIf
    If $x = "]" Then
        $y = 0
        _ArrayAdd($items, $temp)
        ContinueLoop
    EndIf
    If $y Then
        $temp &= $x
        ContinueLoop
    EndIf
    _ArrayAdd($items, $x)
Next
_ArrayDelete($items, 0)
$uniqueitems = _ArrayUnique($items)
_ArrayDelete($uniqueitems, 0)
For $i = 0 To UBound($uniqueitems) - 1
    $temp = _ArrayFindAll($items, $uniqueitems[$i])
    ConsoleWrite($uniqueitems[$i] & "{" & UBound($temp) & "}" & @CRLF)
Next

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

 

a further way

#include <array.au3>
Local $items[1], $temp, $x, $y
$sText1 = "[ab][ab][ab]" & @CRLF & "[ab]<cd><cd>aaaaaaaabb[hello][hello]bbbbbbbcccccccccbn hello everybody bye bye"
For $i = 1 To StringLen($sText1)
    $x = StringMid($sText1, $i, 1)
    If $x = "[" Then
        $temp = ""
        $y = 1
        ContinueLoop
    EndIf
    If $x = "]" Then
        $y = 0
        _ArrayAdd($items, $temp)
        ContinueLoop
    EndIf
    If $y Then
        $temp &= $x
        ContinueLoop
    EndIf
    _ArrayAdd($items, $x)
Next
_ArrayDelete($items, 0)
$uniqueitems = _ArrayUnique($items)
_ArrayDelete($uniqueitems, 0)
For $i = 0 To UBound($uniqueitems) - 1
    $temp = _ArrayFindAll($items, $uniqueitems[$i])
    ConsoleWrite($uniqueitems[$i] & "{" & UBound($temp) & "}" & @CRLF)
Next

Thanks for your code, but it does not work. For the $sText that you have defined, it produces:

ab{4}

{1}

{1}

<{2}

c{11}

d{3}

>{2}

a{8}

b{13}

hello{2}

n{1}

 {4}

h{1}

e{5}

l{2}

o{2}

v{1}

r{1}

y{4}

It also removes all of the @CRLF. It should keep @CRLF at the shortened version inorder to be able to expand the shorter version to the original one later.

Link to comment
Share on other sites

Updated the code from post#21. Btw, @crlf between the string makes no sense imho.

What do you mean with reverse procedure? Build up the string again from shorten string?

#include <String.au3>

$sShorten = "a{5}b{6}c{3}"
ConsoleWrite(ExpandString($sShorten) & @LF & @LF)

$sShorten = "[ab]{4}X{1}[cd]{2}"
ConsoleWrite(ExpandString($sShorten) & @LF & @LF)


Func ExpandString($sString)
    Local $aRepetitions = StringRegExp($sString, "\{(\d+)\}", "3")
    If @error Then Return SetError(1, 0, 0)
    Local $aWords = StringRegExp($sString, "(?U)(.*)\{\d}", "3")
    If @error Then Return SetError(2, 0, 0)
    If UBound($aRepetitions) <> UBound($aWords) Then Return SetError(3, 0, 0)
    Local $i, $sExpanded
    For $i = 0 To UBound($aWords) - 1
        $sExpanded &= _StringRepeat($aWords[$i], $aRepetitions[$i])
    Next
    Return $sExpanded
EndFunc
 

Br,

UEZ

 

Thanks for your reply. Consider that you have a multiline text in clipboard and you want to shorter it and later expand it, but you don't want to loos the multiline format of it. I think it must be able to handle newlines @CRLF as well. Am I right?

Also could you please make this code so that it can expand repeats of more than 9? Currently it can not reapet more than 9 times.

Link to comment
Share on other sites

Well, I would in this case split the text at @CRLF and parse each line and save the result also to each line. That means if you build it up again you just insert a @CRLF automatically after each row.

What do you mean with 9? This should work also for more.

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Well, I would in this case split the text at @CRLF and parse each line and save the result also to each line. That means if you build it up again you just insert a @CRLF automatically after each row.

What do you mean with 9? This should work also for more.

Br,

UEZ

True, thanks for your reply.

 

About 9: this

$sShorten = "a{5}b{6}c{3}<sdgs>{3}[df]{10}"
ConsoleWrite(ExpandString($sShorten) & @LF & @LF)

produces

0

Link to comment
Share on other sites

It returns error = 3 -> that means I've to check the regex stuff.

I will post an updated version when issue is fixed.

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Hi b0x4it
the rules are not well understood by me ....
are used, only the symbols "[" and "]" to delimit groups or even others, such as "<" and ">" or many others?
how you can recreate the original string? after the letters clustered together, you lose their original position within the string, especially if they are not side by side Within the string, for example if you have aaabbbaaa,
is obtained a(6) b(3) but do not know how they were located in the original string

Edited by PincoPanco

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Updated post#23 - should work now.

Br,

UEZ

Thanks for your update. I found another issue:

$sShorten = "[ab]{4}X{1}[cd]{2}<3>{10} a{10}"

produces:

[ab][ab][ab][ab]X[cd][cd]<3><3><3><3><3><3><3><3><3><3> a a a a a a a a a a

it should produce this:

[ab][ab][ab][ab]X[cd][cd]<3><3><3><3><3><3><3><3><3><3> aaaaaaaaaa

Link to comment
Share on other sites

Hi b0x4it

the rules are not well understood by me ....

are used, only the symbols "[" and "]" to delimit groups or even others, such as "<" and ">" or many others?

how you can recreate the original string? after the letters clustered together, you lose their original position within the string, especially if they are not side by side Within the string, for example if you have aaabbbaaa,

is obtained a(6) b(3) but do not know how they were located in the original string

 

Thanks for your nice question. What I am trying to acheive here is to somehow shorten a text file that include every character as well as newlines. So what I came up with was to find those repeated characters or group of characters and change them to one instance of the repeated item & something like {number}. So for example if we have :

aaabbbbccc [as][as][as][as] [3456][3456][3456][3456][3456]

we can save space and yet have the possibility to reproduce the original text by changing it to

a{3}b{4}c{3} [as]{4} [3456]{5}

If you think of it this way that you have a text file of size 200kb it may be decreased to 200b after the shortening, but the important fact is that this must be reversable.

The little issue is that if the original text has a text of {number} fomat in it, then the reversing procedure repeats what is before thsi as well, which shouldn't. What I can propose for the solution is to use another format for shortening to lower the chance of ahving exactly similar text in the original text. For example we can use _{number)_ or ~{number}~ or ::{number}:: or //{number}.

Several execelent codes have been shared in this topic, but none of them can cover all characters and all cases.

Please let me know if still there is any part of this that is not clear.

Link to comment
Share on other sites

This will build it back out:

#include <array.au3>
$sShorten = "[ab]{4}X{1}[cd]{2}<3>{10} a{10}"

$a = StringRegExp($sShorten,"[\[{<][^\]}>]+[\]}>]|[^\[{<]",3)

Local $string,$iRepeat
For $i = 0 to UBound($a)-1
    If StringRegExp($a[$i],"{.*",0) Then
        $iRepeat = StringRegExpReplace($a[$i],"({)(\d+)(})","\2") - 1
    Else
        $iRepeat = 1
        $sub = $a[$i]

    EndIf
    For $j = 1 To $iRepeat
        $string &= $sub
    Next
Next
ConsoleWrite($string & @CRLF)
Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

This will do it all (not sure why you don't want to count the spaces, but removed that counting anyways)

#include <array.au3>
$sShorten = "[ab]{4}X{1}[cd]{2}<3>{10} a{10}"
;~ $sShorten = "a{3}b{4}c{3} [as]{4} [3456]{5}"

ConsoleWrite("Start:" & @TAB & $sShorten & @CRLF)

$sBuilt = Build($sShorten)
ConsoleWrite("Build:" & @TAB & $sBuilt & @CRLF)

$sNewShorten = Destruct($sBuilt)
ConsoleWrite("Short:" & @TAB & $sNewShorten & @CRLF)


Func Build($sShorten)
    Local $a = StringRegExp($sShorten,"[\[{<][^\]}>]+[\]}>]|[^\[{<]",3)
    Local $string,$iRepeat
    For $i = 0 to UBound($a)-1
        If StringRegExp($a[$i],"{.*",0) Then
            $iRepeat = StringRegExpReplace($a[$i],"({)(\d+)(})","\2") - 1
        Else
            $iRepeat = 1
            $sub = $a[$i]

        EndIf
        For $j = 1 To $iRepeat
            $string &= $sub
        Next
    Next
    Return $string
EndFunc

Func Destruct($sLengthen)
    Local $a = StringRegExp($sLengthen,"[\[<][^\]>]+[\]}>]|[^\[<]",3)

    Local $last, $iCount=0, $string
    For $i = 0 To UBound($a) - 1
        If $a[$i] = $last Then
            $iCount+=1
        Else
            If $iCount>0  Then
                If Not StringRegExp($last,"\s", 0) Then
                    $string&="{" & $iCount & "}"
                EndIf
            EndIf
            $string&=$a[$i]
            $last=$a[$i]
            $iCount=1
        EndIf
    Next
    If $iCount > 1 And Not StringRegExp($last,"\s", 0) Then $string&="{" & $iCount & "}"
    Return  $string
EndFunc

Returns:

Start: [ab]{4}X{1}[cd]{2}<3>{10} a{10}
Build: [ab][ab][ab][ab]X[cd][cd]<3><3><3><3><3><3><3><3><3><3> aaaaaaaaaa
Short: [ab]{4}X{1}[cd]{2}<3>{10} a{10}

I think this route is a lot more straight forward, since there are only 2 regexp's total (to break out the strings)

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

 

This will build it back out:

#include <array.au3>
$sShorten = "[ab]{4}X{1}[cd]{2}<
3>{10} a{10}"

$a = StringRegExp($sShorten,"[\[{<][^\]}>]+[\]}>]|[^\[{<]",3)

Local $string,$iRepeat
For $i = 0 to UBound($a)-1
    If StringRegExp($a[$i],"{.*",0) Then
        $iRepeat = StringRegExpReplace($a[$i],"({)(\d+)(})","\2") - 1
    Else
        $iRepeat = 1
        $sub = $a[$i]

    EndIf
    For $j = 1 To $iRepeat
        $string &= $sub
    Next
Next
ConsoleWrite($string & @CRLF)

Excelent, will do some testing and get back to you if thare was any issue. Thank you very much!

Link to comment
Share on other sites

Thanks for your update. I found another issue:

 

$sShorten = "[ab]{4}X{1}[cd]{2}<3>{10} a{10}"

 

produces:

 

[ab][ab][ab][ab]X[cd][cd]<3><3><3><3><3><3><3><3><3><3> a a a a a a a a a a

 

it should produce this:

 

[ab][ab][ab][ab]X[cd][cd]<3><3><3><3><3><3><3><3><3><3> aaaaaaaaaa

You have a space in between! -> {10} a{10} should be {10}a{10}

[ab]{4}X{1}[cd]{2}<3>{10}a{10} works.

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...