Jump to content

Exctract some info from report


Recommended Posts

hello to all,

need some help to extract some info from a report :

pv301a.txt|õ 140PV301A *PV301AœDUFFY°±ÇNEWYORK. ÁE' stata posata la prima

pv301t.txt| ª22PV301T *PV301TŽÃµÔ³ŒØ‘Confienza avr la casa di riposo‚’Posata la prima pietra nei

pv302a.txt|¤13PV302A *DISCO œGOOFY°²ÇBOSTON. ÁŠIl caldo e la fatica del viaggio

pv302b.txt|À 102PV302B *PV302AœMICHEL°±E' accaduto ieri mattina alle 8 sulla provinciale

pv302t.txt| ð30PV302T *PV302TŽ³ÅÔ¹ŒØ‘Veniva da Venezia e aveva viaggiato tutta la notte. Si scatena

pv303a.txt|71PV303A *PV303AŽÂÖÒ°ŒØµ´ŒØ€‘SANNAZZARO’Verso il {Settembre}œPaolo Calviµ´Prima

pv401a.txt|†20PV401A *PNNSARœZELDA°³ÇNEWDHELI. ÁAl via la festa della Madonna†del

pv401t.txt| '12PV401T *PNNSARŽÂ¸Ô°ŒØ‘SARTIRANA’“Festa del Carmelo‚

pv402a.txt|„20PV402A *PNNMO3œMARIO°³ÇSIDNEY. ÁLa cima Zumstein stata la meta

pv402t.txt| '12PV402T *PNNMO3ŽÂ¸Ô°ŒØ‘MORTARA’“In vetta con il Cai‚

pv403a.txt|€20PV403A *PV403AœSONIC°³ÇMILAN.Á Lotta alla zanzara tigre: il sindaco

pv403t.txt| .12PV403T *PV910AŽÂ¸Ô°ŒØ‘PIEVE DEL CAIRO’“Lotta alle zanzare‚

I identify some delimeters :

œ ° -> contains names

Ç Á -> contains cities

anyone can help me to extract these info ?

Thank you,

m.

Link to comment
Share on other sites

Well, in general you have to use string functions to calculate position of start delimiter and end delimiter (stringinstring), then extract what is between (stringmid).

Don't know if this gets you on the right way, because for myself I cannot see your delimiters in your quote ...

Link to comment
Share on other sites

Where File.txt contains the quoted material from above.

$aValues = StringRegExp(FileRead("File.txt"), "œ\s*\b(\w+)\b.+Ç([\w\s]+)", 3);; Generate a 0 based array where even numbered elements contain the name and odd numbered = city
If NOT @Error Then
   For $i = 0 To Ubound($aValues) -2 Step 2
       MsgBox(0, "Result", "Name: " & $aValues[$i] & @CRLF & "City: " & $aValues[$i+1])
   Next
EndIf

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

linus thank you for reply, but think StringRegExp is the right way to do this...

thank you GEOSoft for code, have a request.

Your script return name and city when found both information. if one of these info missing

script exit. How can avoid this, and return single info if line contains city and/or name ?

thank you,

m.

Edited by myspacee
Link to comment
Share on other sites

I gave you the code (tested) so it's a matter of deciding what you want returned from a function and then building the code into that. Example: if you want an array returned, you could do this

Func _MyData($sStr);; $sStr could be anything including a file.
    If NOT $sStr Then Return SetError(1,1, "You must send an input value")
    If FileExists($sStr) Then $sStr = FileRead($sStr)
    $aValues = StringRegExp(FileRead("File.txt"), "œ\s*\b(\w+)\b.+Ç([\w\s]+)", 3);; The 0 based 1 dimension array
    If NOT @Error Then
        Return $aValues
    EndIf
    Return SetError(1, 2, "No valid data found")
EndFunc

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

thank you GEOSoft,

but can't solve StringRegExp extraction alone.

try to read line x line my file to obtain 2 information I need :

#Include <Array.au3>

$file = FileOpen("test.txt", 0)

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

; Read in lines of text until the EOF is reached
While 1
    $line = FileReadLine($file)
    If @error = -1 Then ExitLoop
        
    $aValues = StringRegExp($line, "œ\s*\b(\w+)\b.+Ç([\w\s]+)", 3)
    _ArrayDisplay($aValues, "find these values")

Wend

FileClose($file)

but can't solve if missing any of these information.

I can do 2 check for each lines but can't understand how use StringRegExp

to check before name, and after city.

Anyone can help me a little more ?

Thank you,

m.

Link to comment
Share on other sites

I said it worked with your example although I do see something I would change now that I'm looking at it again.

Now you say that it doesn't work if any of the information is missing so post an example that doesn't work. Also you shouldn't need FileReadLine() to do this. I based it on FileRead where it captures all the values at one time. Using FileReadLine() I would probably change the expression somewhat but I won't make any determination until you post an example that will display the problem.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

GEOSoft i've some problem to give some working example,

because my source are files and not given text.

Post 2 of them:

first

second

it's a propretary txt format so i don't want to annoying anyone with this problem.

Quick solution for me, is ask to guru how extract needed text specifing StringRegExp delimeters.

Sorry if I annoying you, and understand that you want to spur me, but find StringRegExp hard func ever.

thank you again for your help but i really need an hand to analizy more than 5000 files.

m.

Link to comment
Share on other sites

Change the RegEx to this

"(?i).+œ([a-z]+\s+[a-z]+).+?Ç(.+?)\."

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

I just took the 2 lines from those links and created a text file named test.txt Then ran this code against it and it seems to do what you want with the FileReadLine(). You can add your own error checking to see if the file opened okay.

$sFile = @ScriptDir & "\Test.txt"
$hFile = FileOpen($sFile, 0)
While 1
    $sLine = FileReadLine($hFile)
    If @Error Then ExitLoop
    $aLine = StringSplit(StringRegExpReplace($sLine, "(?i).+œ([a-z]+\s+[a-z]+).+?Ç(.+?)\..+", "$1~$2"), "~", 2)
    MsgBox(0, "TEST", "Name: " & $aLine[0] & @CRLF & "City: " & $aLine[1])
WEnd
FileClose($hFile)

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Thank you for you code.

Reread my request and think my problem is my english :blink:

I'm searching 2 information. this fact generate 4 cases :

there are both (name and city)

there is only city

there is only name

none of these are persent

compile test for explain better this:

œlion°ÇsavanaÁ
œ°ÇforestÁ
œcat°ÇÁ
œ°ÇÁ

first line contain name (lion) and city (savana)

second line contains only city (savana)

third line contains only name (cat)

fourth line not contains anything (only delimeters)

script must return only information if present, and with error handling, inform that found nothing.

So i think in a double check for every line, with 2 separate StringRegExp with different delimeters,

one for name (œ°), one for city (ÇÁ)

Thank you for your time, i hope you can help me.

m.

Edited by myspacee
Link to comment
Share on other sites

Okay, I'll get back at this later today. I'm going to have to first replace your delimiters with something easier to work with but that isn't difficult.

EDIT: Do you expect an array or a (delimited?)string back from the Function?

EDIT2: There has also been some inconsistantcy in your examples. The first examples and the last contained "°" but the linked lines did not. Will that always be present?

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

ok, thank you.

If possible i expect string back, but also array it's good, if that can cause problems.

for inconsistantcy i watch more than 100 files, and it's true, not always name have both delimeters,

always first one 'œ', but for second is not allways present. City instead is always between its delimeters.

thank you again for you times,

m.

Edited by myspacee
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...