LondonNDIB Posted January 5, 2015 Share Posted January 5, 2015 (edited) I've read so many regex tutorials... I guess my brain just isn't wired for this beast. Just as I think I'm getting a handle... Can someone help me out? Sample text below <tbody> <tr class="currentOrders_bluepanel"> <td class="currentOrders_bluetext">D135069445</td> <td class="currentOrders_bluetext">Tracked Packet USA</td> <td class="currentOrders_bluetext" nowrap="nowrap"> <script type="text/javascript" src="/esto/app/javax.faces.resource/jsf.js?ln=javax.faces"></script> <a id="esto_currentorders_form:j_idt65:0:j_idt66:0:esto_currentorder" href="#" onclick="mojarra.jsfcljs(document.getElementById('esto_currentorders_form'),{'esto_currentorders_form:j_idt65:0:j_idt66:0:esto_currentorder':'esto_currentorders_form:j_idt65:0:j_idt66:0:esto_currentorder'},'reprintArtifact');return false">Shipping Label</a> </td> <td class="currentOrders_bluetext">LM026058444CA</td> <td class="currentOrders_bluetext" align="right">7244353</td> <td class="currentOrders_bluetext" align="right">95336</td> </tr> <tr class="currentOrders_greypanel"> <td class="currentOrders_bluetext">D135064462</td> <td class="currentOrders_bluetext">Small Packet International Air</td> <td class="currentOrders_bluetext" nowrap="nowrap"><a id="esto_currentorders_form:j_idt65:1:j_idt66:0:esto_currentorder" href="#" onclick="mojarra.jsfcljs(document.getElementById('esto_currentorders_form'),{'esto_currentorders_form:j_idt65:1:j_idt66:0:esto_currentorder':'esto_currentorders_form:j_idt65:1:j_idt66:0:esto_currentorder'},'reprintArtifact');return false">Shipping Label</a> </td> <td class="currentOrders_bluetext"></td> <td class="currentOrders_bluetext" align="right">7244353</td> <td class="currentOrders_bluetext" align="right">35018</td> </tr> </tbody> I have a known order (eg: D135069445) and I want to grab the tracking number associated with it (same eg: LM026058444CA) So basically I want to say "in the text after "D135069445", look for something that looks like LM026058444CA. But here's the kicker... sometimes it doesn't look like LM026058674CA. Sometimes it is all numeric something like this: 7244353444343422. And it isn't safe to assume the number of characters will always be the same (although it should be close. Let's say "between 12 and 18" should be good. So I guess I want "a bunch of Alphanumeic characters between > < some point following D135069445". If it helps, I know the tracking number will always be on the line immediately preceding a line containing "7244353". But there will be multiple instances of that. Ie. The 7244353 is not unique but the D135069445 is unique. Heeeeeeeellllppppppppp! Edited January 5, 2015 by LondonNDIB Link to comment Share on other sites More sharing options...
LondonNDIB Posted January 5, 2015 Author Share Posted January 5, 2015 (edited) I came up with this pattern that does match... but I'm worried is it succinct enough? I'm afraid I just don't understand this stuff enough to know WHY this worked and to be comfortable that it won't false-match and/or it won't match every time: (?s)(?:D135069445)(?:.*)(?:>)([[:alnum:]]{12,18}) Edited January 5, 2015 by LondonNDIB Link to comment Share on other sites More sharing options...
LondonNDIB Posted January 5, 2015 Author Share Posted January 5, 2015 curious. Just to test my pattern, I doubled up on my sample text (copied and pasted so everything was there twice) and I expected it to return an array of two matches... but it doesn't, still just one. Why? Link to comment Share on other sites More sharing options...
mikell Posted January 5, 2015 Share Posted January 5, 2015 Being more specific, here is another way #Include <Array.au3> $txt = FileRead("1.txt") $var = "D135069445" $res = StringRegExp($txt, '(?s)bluetext">' & $var & '.*?</a>.*?bluetext">([^<]*)', 3) _ArrayDisplay($res) Link to comment Share on other sites More sharing options...
jdelaney Posted January 5, 2015 Share Posted January 5, 2015 Suggestion: use XML parsers, or html parsers. IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
LondonNDIB Posted January 5, 2015 Author Share Posted January 5, 2015 Thanks! Mine didn't work. When I tested with more orders and tracking numbers.... it would always grab the LAST tracking number, rather than the one immediately following the order number. I don't get it I tested yours and expected you misunderstood my question because TO ME, it ooks like you're trying to match the order number. I can't see ANYTHING in your example that indicates it will show what I'm looking for... but it does. That's GREAT that you helped give me an answer. I wish I understood. Link to comment Share on other sites More sharing options...
LondonNDIB Posted January 5, 2015 Author Share Posted January 5, 2015 Suggestion: use XML parsers, or html parsers. Wanna point me in the right direction? Last night I came across the same advice on several other forums. So I googled "html parser" and only came across things that were even more complicated! They also seemed like they were geared toward grabbing massing amounts of data (site mining, etc). What am I missing? Link to comment Share on other sites More sharing options...
mikell Posted January 5, 2015 Share Posted January 5, 2015 (edited) The regex works with marks from the html code : bluetext"> and </a> , with some lazy '0 or more chars' ([^<]*) means '0 or more non-< chars' Edit Try this one on your html $res = StringRegExp($txt, '(?s)bluetext">([^<]+).*?</a>.*?bluetext">([^<]*)', 3) This should grab all couples of references Edited January 5, 2015 by mikell Link to comment Share on other sites More sharing options...
LondonNDIB Posted January 5, 2015 Author Share Posted January 5, 2015 I think my brain kinda fills in with elevator music when I get to the terms "lazy" and "greedy". I'm grateful for your help. Some day, I may start to understand. Link to comment Share on other sites More sharing options...
iamtheky Posted January 5, 2015 Share Posted January 5, 2015 (edited) If the line count and naming convention stays the same between the known value and the target string #Include <Array.au3> #Include <File.au3> local $aArray = 0 $txt = _FileReadToArray("1.txt" , $aArray) $index = _ArrayFindAll($aArray , "D135069445" , 0 , 0, 0 , 1) msgbox(0, '' , (stringtrimright(stringtrimleft(stringstripws($aArray[$index[0] + 6] , 8) , 34) , 5))) Here it is under the rules of being on the line above the next 7244353 #Include <Array.au3> #Include <File.au3> local $aArray = 0 $txt = _FileReadToArray("1.txt" , $aArray) $index = _ArrayFindAll($aArray , "D135069445" , 0 , 0, 0 , 1) $aTarget = _ArrayFindAll($aArray , "7244353" , $index , 0 , 0 , 1) msgbox(0, '' , (stringtrimright(stringtrimleft(stringstripws($aArray[$aTarget[0] - 1] , 8) , 34) , 5))) Edited January 5, 2015 by boththose ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
Malkey Posted January 5, 2015 Share Posted January 5, 2015 And another view of the problem. expandcollapse popup#include <IE.au3> #include <Array.au3> ; ------ Get html source -------- Local $sHTML = StringRegExpReplace(FileRead(@ScriptFullPath), "(?s)^.*#cs\s*(.+)\s*#ce.*$", "\1") ; Get HTML text from this script that is between #cs and #ce. ;Local $sHTML = FileRead("1.txt") ; or, Get HTML text fron text file. ;ConsoleWrite($sHTML & @LF) ; ------------------------------- $sText = _IE_InnerhtmlToOutertext($sHTML) ; Get the text that is displayed in browser. ;ConsoleWrite( $sText & @LF) $var = "D135069445" ; "D135064462" ; $res = StringRegExpReplace($sText, '(?is)^.*' & $var & '.*?Shipping Label\s+([A-Z0-9]+)(?=\s*7244353).*$', "\1") $res = @extended ? $res : "" ; If @extended <> 0 Then $res = $res Else $res = "". Used because this returns nothing instead of the whole string returning, when there is no RE match, or zero replacements. ConsoleWrite($res & @LF) ;or $res = StringRegExp($sText, '(?is)' & $var & '.*?Shipping Label\s+([A-Z0-9]+)(?=\s+7244353)', 3) _ArrayDisplay($res) ; From HTML source, "innerhtml", get "outertext", the text that is displayed in browser. Func _IE_InnerhtmlToOutertext($sSource) Local $oIE = _IECreate("about:blank", 0, 0) ; Hidden _IEPropertySet($oIE, "innerhtml", $sSource) Local $sRet = _IEPropertyGet($oIE, "outertext") _IEQuit($oIE) Return $sRet EndFunc ;==>_IE_InnerhtmlToOutertext #cs <tbody> <tr class="currentOrders_bluepanel"> <td class="currentOrders_bluetext">D135069445</td> <td class="currentOrders_bluetext">Tracked Packet USA</td> <td class="currentOrders_bluetext" nowrap="nowrap"> <script type="text/javascript" src="/esto/app/javax.faces.resource/jsf.js?ln=javax.faces"></script> <a id="esto_currentorders_form:j_idt65:0:j_idt66:0:esto_currentorder" href="#" onclick="mojarra.jsfcljs(document.getElementById('esto_currentorders_form'),{'esto_currentorders_form:j_idt65:0:j_idt66:0:esto_currentorder':'esto_currentorders_form:j_idt65:0:j_idt66:0:esto_currentorder'},'reprintArtifact');return false">Shipping Label</a> </td> <td class="currentOrders_bluetext">LM026058444CA</td> <td class="currentOrders_bluetext" align="right">7244353</td> <td class="currentOrders_bluetext" align="right">95336</td> </tr> <tr class="currentOrders_greypanel"> <td class="currentOrders_bluetext">D135064462</td> <td class="currentOrders_bluetext">Small Packet International Air</td> <td class="currentOrders_bluetext" nowrap="nowrap"><a id="esto_currentorders_form:j_idt65:1:j_idt66:0:esto_currentorder" href="#" onclick="mojarra.jsfcljs(document.getElementById('esto_currentorders_form'),{'esto_currentorders_form:j_idt65:1:j_idt66:0:esto_currentorder':'esto_currentorders_form:j_idt65:1:j_idt66:0:esto_currentorder'},'reprintArtifact');return false">Shipping Label</a> </td> <td class="currentOrders_bluetext"></td> <td class="currentOrders_bluetext" align="right">7244353</td> <td class="currentOrders_bluetext" align="right">35018</td> </tr> </tbody> #ce Link to comment Share on other sites More sharing options...
LondonNDIB Posted January 6, 2015 Author Share Posted January 6, 2015 Just wanted to post another "thanks" for the help in here guys! After working WAY too long on this, I just ran through a job of 100+ labels and it went PERFECTLY! Yay Every tracking number was accurately grabbed. OK, that was just a minor part of this, but it was a headache for me! Incidentally, along the way I found a nice command line tool for marking up PDF files programically. I only found it after a lot of searching and wasting time on more complicated and/or less stable methods. I wonder if this kind of tool is something I should start a thread about here? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now