Sign in to follow this  
Followers 0
Bert

Reading Strings

33 posts in this topic

I have a string that can look like the following example:

APPLE POWERMAC G4|APPLE-IMAC|COMPAQ DC5000|COMPAQ DESKPRO 2000|COMPAQ DESKPRO 4000|COMPAQ DESKPRO 5100|COMPAQ DESKPRO 6000|COMPAQ DESKPRO EN|COMPAQ DESKPRO EN SFF|COMPAQ DESKPRO EN-CMT|COMPAQ DESKPRO EP|COMPAQ EVO D500|COMPAQ EVO D510|COMPAQ EVO D530|COMPAQ EVO D530 CMT|COMPAQ EVO W6000|COMPAQ EVO W8000|COMPAQ PROLIANT HD|COMPAQ PROLIANT ML330|COMPAQ PROLIANT ML380|COMPAQ PROSIGNIA|COMPAQ SP 750|COMPAQ XW6000|DELL OPTIPLEX|HP DC7100|IBM PC 300GL|IBM PC 300PL|UNISYS TELLER STATION

Each entry has a "|" separating them. What I'm looking to do is just extract from the list the ones that meet my need. For example, in the above string I want to capture any item that begins with the letter "C". Once captured, I want to put that information in a new string so my output would look like this:

COMPAQ DC5000|COMPAQ DESKPRO 2000|COMPAQ DESKPRO 4000|COMPAQ DESKPRO 5100|COMPAQ DESKPRO 6000|COMPAQ DESKPRO EN|COMPAQ DESKPRO EN SFF|COMPAQ DESKPRO EN-CMT|COMPAQ DESKPRO EP|COMPAQ EVO D500|COMPAQ EVO D510|COMPAQ EVO D530|COMPAQ EVO D530 CMT|COMPAQ EVO W6000|COMPAQ EVO W8000|COMPAQ PROLIANT HD|COMPAQ PROLIANT ML330|COMPAQ PROLIANT ML380|COMPAQ PROSIGNIA|COMPAQ SP 750|COMPAQ XW6000

I'm having to go through and sort a list that is 15,000 items long. The above example is one of many things I have to sort, so I need to make it so it can be easily changed to the criteria I need.

I'm not sure on how to search for this, for as you can see, my question is very word intensive. I'm assuming this is just a couple lines of code. I'm just not sure how to do it. Thoughts?

Share this post


Link to post
Share on other sites



$ary = StringSplit($data, "|")
$stack = $_StackEmpty
For $i = 1 to $ary[0]
   If StringLeft($ary[$i], 1) = "C" Then
      _StackPush($stack, $ary[$i])
   EndIf
Next
;;;;Now, $stack is your array of values.

Requires my Stack UDFs:

#region _Stack UDFs by nfwu
Global Const $_StackEmpty = "Empty"
Func _StackPop(ByRef $avArray)
    Local $sLastVal
    If (Not IsArray($avArray)) Then
        SetError(1)
        Return $_StackEmpty
    EndIf
    $sLastVal = $avArray[UBound($avArray) - 1]
    If UBound($avArray) = 1 Then
        $avArray = $_StackEmpty
    Else
        ReDim $avArray[UBound($avArray) - 1]
    EndIf
    
    Return $sLastVal
EndFunc
Func _StackPush(ByRef $avArray, $sValue)
    IF IsArray( $avArray ) Then
        ReDim $avArray[Ubound($avArray)+1]
    Else
        Dim $avArray[1]
    EndIf
    $avArray[UBound($avArray)-1] = $sValue
    SetError(0)
    Return 1
EndFunc
#endregion

You could modify this to use regular expressions, if you want.

#)

Share this post


Link to post
Share on other sites

I have a string that can look like the following example:

Each entry has a "|" separating them. What I'm looking to do is just extract from the list the ones that meet my need. For example, in the above string I want to capture any item that begins with the letter "C". Once captured, I want to put that information in a new string so my output would look like this:

I'm having to go through and sort a list that is 15,000 items long. The above example is one of many things I have to sort, so I need to make it so it can be easily changed to the criteria I need.

I'm not sure on how to search for this, for as you can see, my question is very word intensive. I'm assuming this is just a couple lines of code. I'm just not sure how to do it. Thoughts?

From the help file:

StringRegExp ( "test", "pattern" [, flag ] )
Flag  Values:
0 Return true/false (1/0) as to whether the test matched the pattern. 
1 Return an array with the text that matched all the group patterns. Check @Extended to determine whether the pattern matched or not. 
2 Same as 0.  
3 Perform a global search, checking the entire string, returning an array of all results. Check @Extended to determine whether the pattern matched or not.

So somthing like this should give you somthing to work with:

NOTE: I have not tested so its a sugestion to work with :think:

$data= "|Compaq test|HP test|Code test|"
$arr = StringRegExp("|C.*|",$data,3)

Share this post


Link to post
Share on other sites

So somthing like this should give you somthing to work with:

NOTE: I have not tested so its a sugestion to work with :think:

$data= "|Compaq test|HP test|Code test|"
$arr = StringRegExp("|C.*|",$data,3)
Very smart and short.

vollyman, you just need to make sure that your data file/string contains the | character as the first and last characters like vollyman's otherwise you could miss the first and last possible data matches.


Be open minded but not gullible.A hammer sees everything as a nail ... so don't be A tool ... be many tools.

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

I even tried this, and this didn't work either. WHAT AM I DOING WRONG??? :think:

$data= "Compaq test|HP test|Code test|"
$arr = StringRegExp("Compaq test|HP test|Code test|",$data,1)
MsgBox(0,"",$arr);test to see if desired string is correct

I still get a blank for a return string.

$data= "Compaq test|HP test|Code test|"
$arr = StringRegExp("|C.*|",$data,0)
MsgBox(0,"",$arr)

this gives a blank for a return string.

Edited by vollyman

Share this post


Link to post
Share on other sites

NOTE: In my sample post I goofed and switched the pattern and the data entries in the StringRegExp call. Sorry about that :think:

But, StringRegExp does not behave as I expected, not even after reading the documentation.

So some research is in its place.

My test code, reveals that there is no array returned.

#include <Array.au3>
$data= ";Compaq test;HP test;Code test;"
$arr = StringRegExp($data,"C.*t",3)
ConsoleWrite("@error:=" & @error & ", @Extended:=" & @extended & @LF)
if not IsArray($arr) then ConsoleWrite("ERROR: Did not return array" & @LF)

_ArrayDisplay($arr, "Result")

I thought a | could have special meaning, and it does (in the pattern) so I have removed it in this test code.

Share this post


Link to post
Share on other sites

Tried this also, and I get a blank for a return

Global Const $_StackEmpty = "Empty"
$data= "Compaq test|HP test|Code test|Compaq test2|Compaq test3"
;$arr = StringRegExp("[Compaq]",$data,3)


$ary = StringSplit($data, "|")
$stack = $_StackEmpty
For $i = 1 to $ary[0]
   If StringLeft($ary[$i], 1) = "C" Then
      _StackPush($stack, $ary[$i])
   EndIf
Next
;;;;Now, $stack is your array of values.
MsgBox(0,"",$stack);test to see if desired string is correct
#region _Stack UDFs by nfwu


Func _StackPop(ByRef $avArray)
    Local $sLastVal
    If (Not IsArray($avArray)) Then
        SetError(1)
        Return $_StackEmpty
    EndIf
    $sLastVal = $avArray[UBound($avArray) - 1]
    If UBound($avArray) = 1 Then
        $avArray = $_StackEmpty
    Else
        ReDim $avArray[UBound($avArray) - 1]
    EndIf
    
    Return $sLastVal
EndFunc
Func _StackPush(ByRef $avArray, $sValue)
    IF IsArray( $avArray ) Then
        ReDim $avArray[Ubound($avArray)+1]
    Else
        Dim $avArray[1]
    EndIf
    $avArray[UBound($avArray)-1] = $sValue
    SetError(0)
    Return 1
EndFunc
#endregion

Share this post


Link to post
Share on other sites

Turns out we have to wrap up any pattern in ().

Like this:

#include <Array.au3>
$data= ";Compaq test;HP test;Code test;"
$arr = StringRegExp($data,"(;C[a-zA-Z0-9 ]*;)",3)
ConsoleWrite("@error:=" & @error & ", @Extended:=" & @extended & @LF)
if not IsArray($arr) then ConsoleWrite("ERROR: Did not return array" & @LF)

_ArrayDisplay($arr, "Result")

So no you have to sort out the |. Think I leav that out as an exerice :think:

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Some background. I have a program that reads the string and puts it into a list. The string must be in this format for the program to read it: item1|item2|item3

What I'm attempting is to design a update script that pulls data from a 15,000+ item excel spreadsheet and transforms it to the string format I need. I can do this part and put the data into a string, but I need to pull out sections depending on what the items are, such as all compaq items. when I pull the items, I need to create a string with just those items, and in the same format. If you look in my first post, you will see what I mean. I really need to solve this problem with working code.

Edited by vollyman

Share this post


Link to post
Share on other sites

I need the "|" for the list, and in some cases, my list can be over 5000 items.

Also, I need the output string in the format like this: item1|item2|item3

Having it listed in an array wont work for what I need it to do.

So you have to find a way to include the | or find another solution, whatever suites you best. Happy hunting :think:

Share this post


Link to post
Share on other sites

Turns out we have to wrap up any pattern in ().

Like this:

#include <Array.au3>
$data= ";Compaq test;HP test;Code test;"
$arr = StringRegExp($data,"(;C[a-zA-Z0-9 ]*;)",3)
ConsoleWrite("@error:=" & @error & ", @Extended:=" & @extended & @LF)
if not IsArray($arr) then ConsoleWrite("ERROR: Did not return array" & @LF)

_ArrayDisplay($arr, "Result")

So no you have to sort out the |. Think I leav that out as an exerice :think:

Some background. I have a program that reads the string and puts it into a list. The string must be in this format for the program to read it: item1|item2|item3

What I'm attempting is to design a update script that pulls data from a 15,000+ item excel spreadsheet and transforms it to the string format I need. I can do this part and put the data into a string, but I need to pull out sections depending on what the items are, such as all compaq items. when I pull the items, I need to create a string with just those items, and in the same format. If you look in my first post, you will see what I mean. I really need to solve this problem with working code.

Obviously your out of imagination at the moment @vollyman. You hava a solution solving what you want, you just have to tweek it a bit. But it is not good enough for you since it does not give you the entier solution.

Even thought I'm abit pissed by that kind of atitude (you probably have a good exuse, and I probably have it my selfe from time to time, and you did modify your post to the better). So I wil give you the solution:

#include <Array.au3>
$data= "|Compaq test|HP test|Code test|Some thinge else|"
$arr = StringRegExpReplace($data,"(\|[^C][a-zA-Z0-9 ]*\|)","|")
ConsoleWrite("@error:=" & @error & ", @Extended:=" & @extended & @LF)
ConsoleWrite("$arr:=" & $arr & @LF)

Share this post


Link to post
Share on other sites

I'm really not trying to be annoying. believe me. I've been beating my head against the wall about this for several days now in fustration. sorry to offend.

I tried the solution you gave. It doesn't work. Again, I'm not trying to be a pain in the butt here.

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

I'm really not trying to be annoying. believe me. I've been beating my head against the wall about this for several days now in fustration. sorry to offend.

I know how it is, realy.

I tried the solution you gave. It doesn't work. Again, I'm not trying to be a pain in the butt here.

What did it return. On my system it returns:

@error:=0, @Extended:=2
$arr:=|Compaq test|Code test|

Witch is what you want, is it not?

It is probably a matter of creating the right regexp, wiche can be a real pain in the but :think:

EDIT: I did not notice this before I posted but @Extended returns a warning even as the result is as expected.

Edited by Uten

Share this post


Link to post
Share on other sites

I'm getting a error saying it didn't retrun the array.

I tried this, and it works, but it is messy. Also, I noticed it missed one of the items:

#include <Array.au3>
$_dataget= ";Compaq test;HP test;Code test;Compaq test1;HP test;Code test1;"
$replace = StringReplace($_dataget,"|",";")
;MsgBox(0,"",$replace)
$arr = StringRegExp($_dataget,"(;C[a-zA-Z0-9 ]*;)",3)
ConsoleWrite("@error:=" & @error & ", @Extended:=" & @extended & @LF)
if not IsArray($arr) then ConsoleWrite("ERROR: Did not return array" & @LF)
;_ArrayDisplay($arr, "Result") 
$result1 =_ArrayToString($arr,"+")
$replace2 = StringReplace($result1,";","|")
$replace3 = StringReplace($replace2,"+","")
$replace2a = StringReplace($replace3,"||","|")
MsgBox(0, "this is the result",$replace2a)

Share this post


Link to post
Share on other sites

Ok, I must have made a mistake, for your code is working, but in testing I changed the data to look like this:

$data= "|Compaq test|HP test|Other data|Even more data|Code test|Some thinge else|"

and the output would be |Compaq test|Other data|Code test|

I tried this data:

"|Compaq test|HP test|Other data|Even more data|Code test|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else"
and got this: |Compaq test|Other data|Code test|Some thinge else|Some thinge else|Some thinge else|Some thinge else

Share this post


Link to post
Share on other sites

Ok, I must have made a mistake, for your code is working, but in testing I changed the data to look like this:

$data= "|Compaq test|HP test|Other data|Even more data|Code test|Some thinge else|"

and the output would be |Compaq test|Other data|Code test|

I tried this data:

"|Compaq test|HP test|Other data|Even more data|Code test|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else"
and got this: |Compaq test|Other data|Code test|Some thinge else|Some thinge else|Some thinge else|Some thinge else
I think the StringRegExpReplace has some odd behaviour. Probably due to the fact that we replace a part of the string with somthing we want to be a part of the next search.

So try this:

#include <Array.au3>
;$data = "|Compaq test|HP test|Other data|Even more data|Code test|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|Some thinge else|"
$data = "|APPLE POWERMAC G4|APPLE-IMAC|COMPAQ DC5000|COMPAQ DESKPRO 2000|COMPAQ DESKPRO 4000|COMPAQ DESKPRO 5100|COMPAQ DESKPRO 6000|COMPAQ DESKPRO EN|COMPAQ DESKPRO EN SFF|COMPAQ DESKPRO EN-CMT|COMPAQ DESKPRO EP|COMPAQ EVO D500|COMPAQ EVO D510|COMPAQ EVO D530|COMPAQ EVO D530 CMT|COMPAQ EVO W6000|COMPAQ EVO W8000|COMPAQ PROLIANT HD|COMPAQ PROLIANT ML330|COMPAQ PROLIANT ML380|COMPAQ PROSIGNIA|COMPAQ SP 750|COMPAQ XW6000|DELL OPTIPLEX|HP DC7100|IBM PC 300GL|IBM PC 300PL|UNISYS TELLER STATION|"

do 
    $data = StringRegExpReplace($data,"([^a-zA-Z0-9 ][^C][a-zA-Z0-9 ]*[^a-zA-Z0-9 ])","|")
    $res = @extended
until $res <= 0

;$arr = StringRegExpReplace($arr,"([\|][^C][a-zA-Z0-9 ]*[\|])","|")
ConsoleWrite("@error:=" & @error & ", @Extended:=" & @extended & @LF)
ConsoleWrite("$arr:=" & $data & @LF)

On my system it ruturns:

@error:=0, @Extended:=0
$arr:=|COMPAQ DC5000|COMPAQ DESKPRO 2000|COMPAQ DESKPRO 4000|COMPAQ DESKPRO 5100|COMPAQ DESKPRO 6000|COMPAQ DESKPRO EN|COMPAQ DESKPRO EN SFF|COMPAQ DESKPRO EN-CMT|COMPAQ DESKPRO EP|COMPAQ EVO D500|COMPAQ EVO D510|COMPAQ EVO D530|COMPAQ EVO D530 CMT|COMPAQ EVO W6000|COMPAQ EVO W8000|COMPAQ PROLIANT HD|COMPAQ PROLIANT ML330|COMPAQ PROLIANT ML380|COMPAQ PROSIGNIA|COMPAQ SP 750|COMPAQ XW6000|

Share this post


Link to post
Share on other sites

I tried the script, and decided to try it with a different string to see what would happen. I had the script connect to the script that would put out the string I need to pull data from. I wanted to capture every application we list in the sheet. In the "C" section, we list about 50 applications. When I run it against the script you gave me, it kicked out only 17. hmmmm.... This search string thing is quite tricky to do.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0