Jump to content

StringRegExp Pattern


 Share

Recommended Posts

@jchd

You are absolutely... Godsent! Thank you so much for helping me with this. Not only have you found a solution, but I have been taught a great deal more about PCRE from your examples. Thank you again! I hope some day, there is a way I can repay you for all the help that you have given me, not only today, but also in the past! Runtime was very minimal, and nothing at all in comparison with the line by line method I written 2 days ago. I tested with those 4 random documents I chosen 2 nights ago, and I am very impressed and proud to announce that after reviewing the output, there was not one single anomaly. Now to convert these files into databases and this should be all said, done, and over with by the end of tomorrow. Thanks again! I too needed sleep, matter of fact I fell asleep while I was attempting to get the first and unedited example you posted last night to work.

@Malkey

Short, Sweet, and Simple. Thanks for adding your 2 cents, it too has taught me about Regular Expression, I had yet not known. However, without the Company Acronym, the output is short of their expectations. However, I like short, sweet, and simple, so while I am crunching these files today, I'll probably work on your example to extract the acronym as well and test the timing of these two wonderful examples, in case they call me and change how they want the data delivered again.

Again, Thanks for everyone that helped me with this obviously complicated pattern.

Realm

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. 

Link to comment
Share on other sites

I must say I'm surprised it worked out of the box, but isn't what we expect from programs: that they behave in public?

Now something that I didn't mention clearly enough: the pattern relies on a fixed overall size for the company code including extra whitespaces if shorter than 3 chars.

Let me put that otherwise: I've considered that the company code can be 1 to 3 chars enclosed in square brakets. If it's 3-chars long, all is fine. BUT if it's shorter (so 1 or 2 chars), the regexp expects that it will find 2 or 1 whitespaces (resp.) after the closing ] to keep track of fields lengths.

If this rule (that I invented myself) doesn't fit the real-world data, then the regexp will have to be trippled for coping with two variable-length fields in a row instead of only one. In short, we would have to add two alternatives for each possible company name length.

You can quickly add a check to insure there is no such problem with your actual files. Using another regexp (this one very simple) to single out problematic entries:

"^.{47,48}["

Try this to see exactly what happens:

Local $text = 'Company                                          Est    Actual    Supervisor[Employees] ' & @CRLF _
      & '=== April 2012 ===================================================================================================================================' & @CRLF _
      & 'Henderson, Smith and Thurman Project [HST]      136.25    145.5     Anderson[2] Hoskins[1] Tillman[3] ' & @CRLF _
      & 'Acme Inc. [ACM]                                    41.65    193       Anderson[99] ' & @CRLF _
      & 'Disney Software [DSN]                                      488    ' & @CRLF _
      & 'Donald Duck [DND]                                                    ' & @CRLF _
      & 'Tillman Remod [TR]                              71.5     70        Anderson[3] ' & @CRLF _
      & 'Failing Entry 1                                 [F]71.5      70        Anderson[3] ' & @CRLF _
      & 'Failing Entry 2                                [F] 71.5      70        Anderson[3] ' & @CRLF _
      & 'Failing Entry 3                                [FE]71.5      70        Anderson[3] ' & @CRLF _
      & 'Working Entry 1                               [WE] 71.5      70        Anderson[3] ' & @CRLF _
      & 'Working Entry 2                               [W]  71.5      70        Anderson[3] ' & @CRLF _
      & 'Working Entry 3                               [WEN]71.5      70        Anderson[3] ' & @CRLF _
      & 'Baker Automotive Supplies Project [BAS]            212              ' & @CRLF _
      & 'Miller Sports Arena Project [MSA]                          451.1833  Baker2[14] ' & @CRLF

Yet I believe that a long company name would most probably have a 3-chars code and that it might be that you don't even have 1-char company code at all.

Anyway I feel the need to warn you about this caveat that I silently dug under your unsuspecting feet.

Last note: I don't recommend using option x nor leaving lengthy regexp comments in cases where the expression is used routinely (pattern compilation takes significant time). This is obviously not a problem in this specific application, where the compilation of the regexp occurs only once. Also in this case, the x option allowing unsignificant whitespaces in pattern and detailed comments are of great help for readability and maintainance!

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I understand, and see what you mean with your string example above. I tested the code as is on a few documents first before implementing in my original script and I kept the script as is and saved it in a folder I keep scripts to learn and remember from. I did remove the comments in the code I implemented in my original script. I'm personally very picky about trying to keep my scripts clean and readable these days. I have also written a second script today, that will later double check the new database and verify the information from the text files, looking for missing records, and/or data.

As for the company code/acronym, Their current standard is a three letter code for all companies, however, when they started accumulating data in 1985, they started with two letter codes for each company and never changed their codes when they changed to the three letter format. There will only be a minimum of two letter codes, and maximum of three. I do appreciate the warning, and it is my fault that I didn't clarify the standards for the company codes in my OP.

Overall, there are 21 different document formats across 11,000 documents. This one you have graciously helped me with was the only one that has irregular data that I could not formulate a pattern for. This document format along with 3 others, I need to extract data from and create records/tables in an sql database that they will later extract the data from to the new machine. So far, I have compiled 14 years of data on this particular data set. I have viewed the database every time this script has finished compiling a years worth of documents, and so far everything seems to be visually correct. When I decide to call it a night, I'll run the second script to verify the data that has been compiled thus far.

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. 

Link to comment
Share on other sites

That sounds good. I see you now have the screwdriver with interchangeable bits in your hands!

I whish you best of luck migrating the pile.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Since I was compiling this data for someone else, I wanted to be extra careful, and write the second script to verify the data as well. So far everything visually compiled correctly. The second/verification script is currently 85% done, and no data errors or anomalies have yet to be found! I just hope they are as impressed and happy with the results as am I.

I ended up taking what I have learn from your example, and started looking over some of my old unfinished projects to see where this may help or apply to roadblocks I hit in the past. Found two so far that were stopped at roadblocks from sre patterns that this new knowledge could apply to, and can't wait to get working on them again!

Thanks again, jchd and Malkey

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. 

Link to comment
Share on other sites

You may want to study the official PCRE documentation which comes along with PCRE source available from http://www.pcre.org/

Install the html version and put a desktop icon on it. VERY useful!

Be warned however that the version included in AutoIt is a few releases behind and doesn't have the PCRE_UCP compile option.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks again! This is even better than the text file version I downloaded a couple days ago.

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...