Jump to content

Find string in string by RegEx


Recommended Posts

I have 100 text files which contain some information which is (just) human-readable.

I want to extract two things from them, one is a string which fits a RegEx:

[a-zA-Z]{2}\d{4}[a-zA-Z]{2}\d{3}

(although technically it will say DescriptionAA1111AA111, and I just want to catch the AA1111AA111 part of it, but I can use a StringRight() function to clean that up)

 

And the scond is a string showing the status of a device on our network, something like:

06:52:16 AWST01.24.00OnPlaying streamIdlePresent and mountedInternet: XX A1str.to: 0, buf.emp: 0, str.dsc: 0, vs.eof: 0

which always starts with a time stamp (something like \d{2}:\d{2}:\d{2}?)

Basically, I just want to define two variables:

  • The last 11 characters of DescriptionAA1111AA111 (only appears once in the string)
  • The entire line the first time a time like 06:52:16 is found

 

I tried playing using FileReadLine() but because they aren't always on the same line from file to file. I also tried passing the entire file to StringRegExp(), but I can't seem to find a way to get StringRegExp() to trawl through the file looking for a match, rather than trying to match the entire contents against the expression.

 

Any help appreciated!

 

Link to comment
Share on other sites

The regex pattern modifier for multiline is your friend: (?m)

#include <Array.au3>

$fileContent = "This is the first line."
$fileContent = $fileContent & "Something something, blah, DescriptionAA1111AA111, and another thing." & @CRLF
$fileContent = $fileContent & "06:52:16 this line starts with a valid timestamp"  & @CRLF
$fileContent = $fileContent & "24:01:01 this line doesn't start with a valid timestamp"  & @CRLF
$fileContent = $fileContent & "00:00:00 this line also starts with a valid timestamp"  & @CRLF
$fileContent = $fileContent & "23:59:59 just like this one, which also has a description thingy: DescriptionZZ1234zz123"  & @CRLF
$fileContent = $fileContent & "This is a bonus line."
$fileContent = $fileContent & "DescriptionXy1111aB111, aaaaaand it's gone." & @CRLF

$aTimestampLines = StringRegExp($fileContent, "(?m)^((?:0\d|1\d|2[0-3]):[0-5]\d:[0-5]\d .*)$", $STR_REGEXPARRAYGLOBALMATCH )
_ArrayDisplay($aTimestampLines)

$aDescriptionLines = StringRegExp($fileContent, "(?m)([a-zA-Z]{2}\d{4}[a-zA-Z]{2}\d{3})", $STR_REGEXPARRAYGLOBALMATCH )
_ArrayDisplay($aDescriptionLines)

 

Roses are FF0000, violets are 0000FF... All my base are belong to you.

Link to comment
Share on other sites

Thats too much safeness for my taste :) 

 

#include <Array.au3>

$fileContent = "This is the first line."
$fileContent = $fileContent & "Something something, blah, DescriptionAA1111AA111, and another thing." & @CRLF
$fileContent = $fileContent & "06:52:16 this line starts with a valid timestamp"  & @CRLF
$fileContent = $fileContent & "24:01:01 this line doesn't start with a valid timestamp"  & @CRLF
$fileContent = $fileContent & "00:00:00 this line also starts with a valid timestamp"  & @CRLF
$fileContent = $fileContent & "23:59:59 just like this one, which also has a description thingy: DescriptionZZ1234zz123"  & @CRLF
$fileContent = $fileContent & "This is a bonus line."
$fileContent = $fileContent & "DescriptionXy1111aB111, aaaaaand it's gone." & @CRLF

$aDescriptionLines = StringRegExp($fileContent, "Description(.{11})", $STR_REGEXPARRAYGLOBALMATCH)  ;11 characters after every instance of 'Description'
_ArrayDisplay($aDescriptionLines)


msgbox(0, '' , stringregexp($fileContent, "(.*\d\d\:\d\d\:\d\d\s.*)", $STR_REGEXPARRAYGLOBALMATCH )[0]) ; First match of timestamp

 

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

to show simplicity when things are guaranteed, like X number of characters after static string.  And an alternate syntax if criteria like validity of the timestamp are unnecessary and/or the data does not have a risk of another colon separated string of numbers, like the ass half of a MAC address.

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

  • 3 years later...

@Parsix

Why didn't you create a new post in which you explain your issue and show us what have you tried so far, instead of bumping a four years old one?

Better for you to read carefully this and this before posting again :)

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...