myspacee Posted July 21, 2010 Share Posted July 21, 2010 hello to all,need some help to extract some info from a report :pv301a.txt|õ 140PV301A *PV301AœDUFFY°±ÇNEWYORK. ÁE' stata posata la prima pv301t.txt| ª22PV301T *PV301TŽÃµÔ³ŒØ‘Confienza avr la casa di riposo‚’Posata la prima pietra nei pv302a.txt|¤13PV302A *DISCO œGOOFY°²ÇBOSTON. ÁŠIl caldo e la fatica del viaggio pv302b.txt|À 102PV302B *PV302AœMICHEL°±E' accaduto ieri mattina alle 8 sulla provinciale pv302t.txt| ð30PV302T *PV302TŽ³ÅÔ¹ŒØ‘Veniva da Venezia e aveva viaggiato tutta la notte. Si scatena pv303a.txt|71PV303A *PV303AŽÂÖÒ°ŒØµ´ŒØ€‘SANNAZZARO’Verso il {Settembre}œPaolo Calviµ´Prima pv401a.txt|†20PV401A *PNNSARœZELDA°³ÇNEWDHELI. ÁAl via la festa della Madonna†del pv401t.txt| '12PV401T *PNNSARŽÂ¸Ô°ŒØ‘SARTIRANA’“Festa del Carmelo‚pv402a.txt|„20PV402A *PNNMO3œMARIO°³ÇSIDNEY. ÁLa cima Zumstein stata la meta pv402t.txt| '12PV402T *PNNMO3ŽÂ¸Ô°ŒØ‘MORTARA’“In vetta con il Cai‚pv403a.txt|€20PV403A *PV403AœSONIC°³ÇMILAN.Á Lotta alla zanzara tigre: il sindaco pv403t.txt| .12PV403T *PV910AŽÂ¸Ô°ŒØ‘PIEVE DEL CAIRO’“Lotta alle zanzare‚I identify some delimeters :œ ° -> contains namesÇ Á -> contains citiesanyone can help me to extract these info ?Thank you,m. Link to comment Share on other sites More sharing options...
linus Posted July 21, 2010 Share Posted July 21, 2010 Well, in general you have to use string functions to calculate position of start delimiter and end delimiter (stringinstring), then extract what is between (stringmid). Don't know if this gets you on the right way, because for myself I cannot see your delimiters in your quote ... Link to comment Share on other sites More sharing options...
GEOSoft Posted July 21, 2010 Share Posted July 21, 2010 Where File.txt contains the quoted material from above. $aValues = StringRegExp(FileRead("File.txt"), "œ\s*\b(\w+)\b.+Ç([\w\s]+)", 3);; Generate a 0 based array where even numbered elements contain the name and odd numbered = city If NOT @Error Then For $i = 0 To Ubound($aValues) -2 Step 2 MsgBox(0, "Result", "Name: " & $aValues[$i] & @CRLF & "City: " & $aValues[$i+1]) Next EndIf George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
myspacee Posted July 21, 2010 Author Share Posted July 21, 2010 (edited) linus thank you for reply, but think StringRegExp is the right way to do this... thank you GEOSoft for code, have a request. Your script return name and city when found both information. if one of these info missing script exit. How can avoid this, and return single info if line contains city and/or name ? thank you, m. Edited July 21, 2010 by myspacee Link to comment Share on other sites More sharing options...
GEOSoft Posted July 21, 2010 Share Posted July 21, 2010 I gave you the code (tested) so it's a matter of deciding what you want returned from a function and then building the code into that. Example: if you want an array returned, you could do this Func _MyData($sStr);; $sStr could be anything including a file. If NOT $sStr Then Return SetError(1,1, "You must send an input value") If FileExists($sStr) Then $sStr = FileRead($sStr) $aValues = StringRegExp(FileRead("File.txt"), "œ\s*\b(\w+)\b.+Ç([\w\s]+)", 3);; The 0 based 1 dimension array If NOT @Error Then Return $aValues EndIf Return SetError(1, 2, "No valid data found") EndFunc George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
myspacee Posted July 22, 2010 Author Share Posted July 22, 2010 thank you GEOSoft, but can't solve StringRegExp extraction alone. try to read line x line my file to obtain 2 information I need : #Include <Array.au3> $file = FileOpen("test.txt", 0) ; Check if file opened for reading OK If $file = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf ; Read in lines of text until the EOF is reached While 1 $line = FileReadLine($file) If @error = -1 Then ExitLoop $aValues = StringRegExp($line, "œ\s*\b(\w+)\b.+Ç([\w\s]+)", 3) _ArrayDisplay($aValues, "find these values") Wend FileClose($file) but can't solve if missing any of these information. I can do 2 check for each lines but can't understand how use StringRegExp to check before name, and after city. Anyone can help me a little more ? Thank you, m. Link to comment Share on other sites More sharing options...
GEOSoft Posted July 22, 2010 Share Posted July 22, 2010 I said it worked with your example although I do see something I would change now that I'm looking at it again.Now you say that it doesn't work if any of the information is missing so post an example that doesn't work. Also you shouldn't need FileReadLine() to do this. I based it on FileRead where it captures all the values at one time. Using FileReadLine() I would probably change the expression somewhat but I won't make any determination until you post an example that will display the problem. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
myspacee Posted July 22, 2010 Author Share Posted July 22, 2010 GEOSoft i've some problem to give some working example, because my source are files and not given text. Post 2 of them:firstsecondit's a propretary txt format so i don't want to annoying anyone with this problem.Quick solution for me, is ask to guru how extract needed text specifing StringRegExp delimeters.Sorry if I annoying you, and understand that you want to spur me, but find StringRegExp hard func ever.thank you again for your help but i really need an hand to analizy more than 5000 files.m. Link to comment Share on other sites More sharing options...
GEOSoft Posted July 22, 2010 Share Posted July 22, 2010 Change the RegEx to this "(?i).+œ([a-z]+\s+[a-z]+).+?Ç(.+?)\." George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
GEOSoft Posted July 22, 2010 Share Posted July 22, 2010 I just took the 2 lines from those links and created a text file named test.txt Then ran this code against it and it seems to do what you want with the FileReadLine(). You can add your own error checking to see if the file opened okay. $sFile = @ScriptDir & "\Test.txt" $hFile = FileOpen($sFile, 0) While 1 $sLine = FileReadLine($hFile) If @Error Then ExitLoop $aLine = StringSplit(StringRegExpReplace($sLine, "(?i).+œ([a-z]+\s+[a-z]+).+?Ç(.+?)\..+", "$1~$2"), "~", 2) MsgBox(0, "TEST", "Name: " & $aLine[0] & @CRLF & "City: " & $aLine[1]) WEnd FileClose($hFile) George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
myspacee Posted July 23, 2010 Author Share Posted July 23, 2010 (edited) Thank you for you code. Reread my request and think my problem is my english I'm searching 2 information. this fact generate 4 cases : there are both (name and city) there is only city there is only name none of these are persent compile test for explain better this: œlion°ÇsavanaÁ œ°ÇforestÁ œcat°ÇÁ œ°ÇÁ first line contain name (lion) and city (savana) second line contains only city (savana) third line contains only name (cat) fourth line not contains anything (only delimeters) script must return only information if present, and with error handling, inform that found nothing. So i think in a double check for every line, with 2 separate StringRegExp with different delimeters, one for name (œ°), one for city (ÇÁ) Thank you for your time, i hope you can help me. m. Edited July 23, 2010 by myspacee Link to comment Share on other sites More sharing options...
GEOSoft Posted July 23, 2010 Share Posted July 23, 2010 (edited) Okay, I'll get back at this later today. I'm going to have to first replace your delimiters with something easier to work with but that isn't difficult. EDIT: Do you expect an array or a (delimited?)string back from the Function? EDIT2: There has also been some inconsistantcy in your examples. The first examples and the last contained "°" but the linked lines did not. Will that always be present? Edited July 23, 2010 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
myspacee Posted July 23, 2010 Author Share Posted July 23, 2010 (edited) ok, thank you.If possible i expect string back, but also array it's good, if that can cause problems.for inconsistantcy i watch more than 100 files, and it's true, not always name have both delimeters,always first one 'œ', but for second is not allways present. City instead is always between its delimeters.thank you again for you times,m. Edited July 23, 2010 by myspacee Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now