Damein

StringSplit multiple formats...

9 posts in this topic

Not sure what to label this.. and it may be easier to use StringRegExp but I'm still a noob at that ^^;

 

I need a way to compare two text files and find similarities. Now I know some of these already exist but I wanted one specific for what I need. Sadly, there are various types of formats that these text files can take. Here are the variants.

 

A: plain_text_here number

B: plain_text_here "number"

C: plain_text_here 'number'

D: plain_text_here"number"

E: plain_text_here'number'

F: plain_text_here"number" // text here

G: plain_text_here'number' // text here

H: plain_text_here "number" "number"

I: plain_text_here 'number 'number'

 

 

So yeah, I need a way to split the: plain_text_here and all the possible: number scenarios and group them back up to check against each other. Each of the variants are on a new line so the first thing I did was just _FileReadToArray and than start cycling through the lines.

 

Example File


 

plain_text_here "1"

plain_text_here "2"

plain_text_here '3'

plain_text_here '4' '5'

plain_text_here"6"

 

I started doing it with just StringSplits but soon got lost in all the variants and wanted to see if anyone know of a better/cleaner way. Thanks in advance!


MCR.jpg?t=1286371579

Most recent sig. I made

Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic

Share this post


Link to post
Share on other sites



after reading the lines to array, try this on each line:

step 1: trim any trailing comment (beginning with // )

step 2: use StringSplit with several delimiters simultaneously. the delimiters are whitespace, single quote, and double quote.

step 3: loop from the final substring backward, stop when the substring is not whitespace or number (meaning, you got to the plane_text_here substring). during the loop, ignore anything that is not a number.

if the plain_text_here substring does not contain whitespace or quotes, then you can make it easier - loop forward starting at 2nd substring.

Share this post


Link to post
Share on other sites

Damein,

A Regex would seem the logical way to do this - her is my poor effort which requires 2 passes:

Global $aList[10] = [9]

$aList[1] = "plain_1_text_here 11"
$aList[2] = 'plain_2_text_here "2"'
$aList[3] = "plain_3_text_here '33'"
$aList[4] = 'plain_4_text_here"4"'
$aList[5] = "plain_5_text_here'55'"
$aList[6] = 'plain_6_text_here"6" // text here'
$aList[7] = "plain_7_text_here'77' // text here"
$aList[8] = 'plain_8_text_here "8" "88"'
$aList[9] = "plain_9_text_here '9 '99"

For $i = 1 To $aList[0]
    $sExtract_Text = StringRegExpReplace($aList[$i], "^(?U)(.*)[\s\x22\x27].*$", "$1")
    $sExtract_Numbers = StringRegExpReplace($aList[$i], "^" & $sExtract_Text & "(.*)$", "$1")
    $aNumbers = StringRegExp($sExtract_Numbers, "\d++", 3)
    For $j = 0 To UBound($aNumbers) - 1
        $sExtract_Text &= " " & $aNumbers[$j]
    Next
    ConsoleWrite($sExtract_Text & @CRLF)
Next

No doubt a guru will come along soon and show us how to do it on one.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

#include<array.au3>

Global $aList[10] = [9]

$aList[1] = "plain_1_text_here 11"
$aList[2] = 'plain_2_text_here "2"'
$aList[3] = "plain_3_text_here '33'"
$aList[4] = 'plain_4_text_here"4"'
$aList[5] = "plain_5_text_here'55'"
$aList[6] = 'plain_6_text_here"6" // text here'
$aList[7] = "plain_7_text_here'77' // text here"
$aList[8] = 'plain_8_text_here "8" "88"'
$aList[9] = "plain_9_text_here '9 '99"


For $i = 1 To $aList[0]
$aList[$i] = stringregexpreplace(stringregexpreplace(stringregexpreplace(stringreplace(stringreplace($aList[$i], "'" , "") , '"' , '') , "//(.*)" , "") , "(\d+\s)\d+" , "$1"), "(\D)(\d)" , "$1 $2")
 Next

_ArrayDisplay($aList)

 

edit: I am not the guru who was spoken of earlier, and my solution is janky at best.

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

Not sure what you mean by plain_text, but I gather it's probably pretty much anything before the first delimiter where the delimiter is any space, single quote or double quote.

Second, I assume that in:

Quote
I: plain_text_here 'number 'number'

... there should be a single quote behind that first number?

Third, not sure what you want with those "// text here" parts, but I'm guessing you want to just leave that in, and if so then also that a space behind wouldn't matter?

Maybe something like this?

#include <MsgBoxConstants.au3>
#include <StringConstants.au3>

Global $aList[10] = [9]

$aList[1] = "plain_1_text_here 11"
$aList[2] = 'plain_2_text_here "2"'
$aList[3] = "plain_3_text_here '33'"
$aList[4] = 'plain_4_text_here"4"'
$aList[5] = "plain_5_text_here'55'"
$aList[6] = 'plain_6_text_here"6" // text here'
$aList[7] = "plain_7_text_here'77' // text here"
$aList[8] = 'plain_8_text_here "8" "88"'
$aList[9] = "plain_9_text_here '9' '99'"

For $i = 1 To $aList[0]
    $extract = StringRegExpReplace($aList[$i], "[ '""]+", " $1")
    ConsoleWrite("|" & $extract & "|" & @CRLF)
Next

This would simply change any combination of <delimiter(s)><any number of digits><delimiter(s)>. It leaves the "// text here" in, and it leaves a space after most strings. But it is very simple and maybe that's enough. I put pipe symbols around the results so you can see where the spaces are.


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Assuming that "plain_text_here" contains no space(s),  this *should* work

#Include <Array.au3>

Global $aList[10] = [9]
$aList[1] = "plain_1_text_here 11"
$aList[2] = 'plain_2_text_here "2"'
$aList[3] = "plain_3_text_here '33'"
$aList[4] = 'plain_4_text_here"4"'
$aList[5] = "plain_5_text_here'55'"
$aList[6] = 'plain_6_text_here"6" // text here'
$aList[7] = "plain_7_text_here'77' // text here"
$aList[8] = 'plain_8_text_here "8" "88"'
$aList[9] = "plain_9_text_here '9 '99"

For $i = 1 To $aList[0]
    $aList[$i] = StringRegExpReplace($aList[$i], '^\w+\K|[\s\D]+', " ")
Next
_ArrayDisplay($aList)

 

Edited by mikell

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

One shot split:

#Include <Array.au3>

Global $aList[10] = [9]

$aList[1] = "plain_1_text_here 11"
$aList[2] = 'plain_2_text_here "2"'
$aList[3] = "plain_3_text_here '33'"
$aList[4] = 'plain_4_text_here"4"'
$aList[5] = "plain_5_text_here'55'"
$aList[6] = 'plain_6_text_here"6" // text here'
$aList[7] = "plain_7_text_here'77' // text here"
$aList[8] = 'plain_8_text_here "8" "88"'
$aList[9] = "plain_9_text_here '9 '99"


For $i = 1 To $aList[0]
    $aRes = StringRegExp($aList[$i], "(?|^(\w+)|(\d+))", 3)
    _ArrayDisplay($aRes)
Next

sorry, edited because of horrible way

Edited by jguinch
1 person likes this

Share this post


Link to post
Share on other sites

jguinch,
Is the reset really necessary ?

$aRes = StringRegExp($aList[$i], "^\w+|\d+", 3)

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now