Jump to content
Sign in to follow this  
myspacee

Extract text from txt file

Recommended Posts

myspacee

hello,

in these days i'm working with text transformation.

Start form a txt file and must convert in other thing.

Working on that sh1t i see that it can be fully automated using delimeters.

So, i think :

- is possible to identify text from a file using 'static' text.

- extract in a 'transition' file (i think one value for row)

- reverse into a new file.

Simple idea but hard realization.

txt example:

<v6.50><e0>

@campionato:JUNIORES GIRONE A

@risultati:

Bollengo-Victor Favria 5-1

Fenusma-Aosta 4-2

Ha riposato: Atletico 1912, Real Sarre, Sanson.

@hclassifica: Pt G V N P F S

@classifica:Atletico 1912 41 15 13 2 0 48 7

Monte Cervino 33 16 10 3 3 39 21

San Grato 30 15 9 3 3 36 16

Prossimo turno: Aosta-La Romanese; Atletico 1912-Fenusma; Monte Cervino-Bollengo; Real Canavese-Pont Donnaz; Real Sarre-G. Combin; Victor Favria-Sanson. Riposa: San Grato.

In bold we have some delimeters, and some others are hidden (eg: TAB)

'extractor' can identify a lot of things, and put them in a 'transition' file:

JUNIORES GIRONE A
Bollengo-Victor Favria
5-1
Fenusma-Aosta
4-2
Atletico 1912, Real Sarre, Sanson.
Atletico 1912
41  
15  
13  
2   
0   
48  
7
etc.....

Next step is compile an output file with some 'targets',

idea is that every line can be identified with line numer, eg:

<line1>***<line6>

that output in this way :

JUNIORES GIRONE A***Atletico 1912, Real Sarre, Sanson.

In this way i can extract all info, and then reverse into a new (formatted) file.

LOOONG (and boring) exposition but need some comments,

thank you,

m.

Share this post


Link to post
Share on other sites
kaotkbliss

ReadFileToArray

loop through the array using StringInStr to search for your static text

If Not found, FileWriteLine the unmodified line to a new file

If found StringReplace to change the line then FileWriteLine the modified line to the new file

:mellow:


010101000110100001101001011100110010000001101001011100110010000

001101101011110010010000001110011011010010110011100100001

My Android cat and mouse game
https://play.google.com/store/apps/details?id=com.KaosVisions.WhiskersNSqueek

We're gonna need another Timmy!

Share this post


Link to post
Share on other sites
myspacee

star coding,

is there anyway to script text replacement like this ?

every <RanD0m t3xt> became <*>

<IP0><HR0,0.3,0,0><BXBEIGE,0,16,1.5,-1,-1><HR0,0.3,19,0><EL4><HS4><CF47><CP12>JUNIORES GIRONE A
<IB><EL7><COROSSO><CF44><CP9>RISULTATI
<CO><CP6.5><CS6><CF34>
<HR0,0.3,1,0><EL2>Bollengo-Victor Favria<QM>5-1
<HR0,0.3,1,0><EL2>Fenusma-Aosta<QM>rinv.
<HR0,0.3,1,0><EL2>G. Combin-Real Canavese<QM>rinv.

became :

<*><*><*><*><*><*><*><*>JUNIORES GIRONE A
<*><*><*><*><*>RISULTATI
<*><*><*><*>
<*><*>Bollengo-Victor Favria<*>5-1
<*><*>Fenusma-Aosta<*>rinv.
<*><*>G. Combin-Real Canavese<*>rinv.

thank you,

m.

Share this post


Link to post
Share on other sites
kaotkbliss

oooh, now I think you are wanting to look into StringRegExp to find everything between < and > then do a StringReplace with *


010101000110100001101001011100110010000001101001011100110010000

001101101011110010010000001110011011010010110011100100001

My Android cat and mouse game
https://play.google.com/store/apps/details?id=com.KaosVisions.WhiskersNSqueek

We're gonna need another Timmy!

Share this post


Link to post
Share on other sites
myspacee

I hate StringRegExp func, too complex for me,

can make me a gift and suggest some syntax ? :)

please ? :mellow:

Share this post


Link to post
Share on other sites
UEZ

Try this one:

$string = "<IP0><HR0,0.3,0,0><BXBEIGE,0,16,1.5,-1,-1><HR0,0.3,19,0><EL4><HS4><CF47><CP12>JUNIORES GIRONE A" & @LF & _
                "<IB><EL7><COROSSO><CF44><CP9>RISULTATI" & @LF & _
                "<CO><CP6.5><CS6><CF34>" & @LF & _
                "<HR0,0.3,1,0><EL2>Bollengo-Victor Favria<QM>5-1" & @LF & _
                "<HR0,0.3,1,0><EL2>Fenusma-Aosta<QM>rinv." & @LF & _
                "<HR0,0.3,1,0><EL2>G. Combin-Real Canavese<QM>rinv."

MsgBox(0, "Test", StringRegExpReplace($string, "(?U)<(.*)>", "<*>"))

Br,

UEZ


Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites
myspacee

I have a working thing here, with needed files.

#Include <Array.au3>


$file = FileOpen("IVGFBA.txt", 0)

; Check if file opened for reading OK
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf



;~ ------------------------------------------------------------
;~  open file and stringreplace in accord with given .ini
;~ ------------------------------------------------------------
$content_box = FileRead($file)                          ;put content file into var

$file_filter = FileOpen("model_A.ini", 0)               ;map every NOT interesting value (will be delimeter)
; Check if file opened for reading OK
If $file_filter = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

; change every NOT interesting value into *
While 1
    $line = FileReadLine($file_filter)
    If @error = -1 Then ExitLoop
        
    if $line <> "" then $content_box = StringReplace($content_box, $line, "*", 0)
Wend

FileClose($file_filter)



;~ ------------------------------------------------------------
;~  support for 'MACRO' filter [tab] - [enter] - etc.
;~ ------------------------------------------------------------
$file_filter = FileOpen("model_A.ini", 0)
$file_filter_contenuto = FileRead($file_filter)
FileClose($file_filter)


$result = StringInStr($file_filter_contenuto, '[TAB]')
if $result <> 0 then $content_box = StringReplace($content_box, @TAB, "*")

;~ ------------------------------------------------------------
;~  support for cosmetic text hack
;~ ------------------------------------------------------------
$content_box = StringReplace($content_box, "* ", "*")
$content_box = StringReplace($content_box, " *", "*")






;~ ------------------------------------------------------------
;~  save result and StringSplit row by row
;~ ------------------------------------------------------------
$file = FileOpen("mapped.txt", 10)
FileWrite($file, $content_box)
FileClose($file)

;open saved file to analize and extract values
;naval battle format is used for array spec [so happy :) ]
$file = FileOpen("mapped.txt", 0)

$line_counter = 0
; Read in lines of text until the EOF is reached
While 1
    $line_counter = $line_counter + 1               ;keep this info for have row info
    $line = FileReadLine($file, $line_counter)
    If @error = -1 Then ExitLoop
    $valori = StringSplit($line, "*")
    
    $array_elements = UBound($valori) - 1

    if $array_elements > 0 Then ;is a valid array
        
        For $r = 0 to UBound($valori,1) - 1
            if $r <> 0 then                         ;this is colum info
                if $valori[$r] <> "" then           ;if is not empty, and contain something, msgbox
                    msgbox(0,"info : ", "Coord: " & @CRLF & "ROW:   " & $line_counter & @CRLF & "Colums:" & $r & @CRLF & "Value: " & $valori[$r])
                EndIf
            EndIf
        Next

    EndIf
    
Wend

FileClose($file)

feel serendipity in the air. This solve my problem, and a lot of other things.

Miss last step, output model, but gear it's ready.

Viva Autoit,

m.

Edited by myspacee

Share this post


Link to post
Share on other sites
myspacee

I'm working on this project and need again some help :graduated:

after first step, need to remoce some trash:

<IP0><HR0,0.3,0,0><BXBEIGE,0,12,1.5,-1,-1><HR0,0.3,15,0><EL4><HS4><CF46><CP10>GIOV. FASCIA B GIRONE A<CF31>
<IB><EL5><COROSSO><CF44><CP9>RISULTATI<EL2><CO><CP7><CL8><CF34>
<HR0,0.3,-1,0>Atletico 1912-Evancon<QM>4-1
<HR0,0.3,-1,0>Charvensod-Banchette<QM>4-4
<HR0,0.3,-1,0>Coll. Pedenea-Castellamonte<QM>4-1
<HR0,0.3,-1,0>Rivarolese F-Pont Donnaz<QM>n.d.
<HR0,0.3,-1,0>Rivarolese M-Real Canavese<QM>2-1
<HR0,0.3,-1,0>Ha riposato: Aygreville.<QM>[riga_9_2]
<HR0,0.3,-1,0>[riga_10_1]<QM>
<EL2.5><HR0,0.5,-1,0><EL-8>
<TSL50,c+10,c+10,c+10,c+10,c+10,c+10,c+10><COROSSO>SQUADRE<TB>P<TB>G<TB>V<TB>N<TB>P<TB>F<TB>S<CONERO><CP7><CL8><QC>
<HR120,0.3,-1,0><CF34>9<TB>4<TB><CF33>3<TB>0<TB>1<TB>36<TB>9<TB>
<HR120,0.3,-1,0><CF34>Real Canavese<TB>9<TB><CF33>4<TB>3<TB>0<TB>1<TB>21<TB>
<HR120,0.3,-1,0><CF34>Rivarolese M<TB>9<TB><CF33>3<TB>3<TB>0<TB>0<TB>8<TB>
<HR120,0.3,-1,0><CF34>Coll. Pedenea<TB>6<TB><CF33>5<TB>2<TB>0<TB>3<TB>26<TB>
<HR120,0.3,-1,0><CF34>Charvensod<TB>4<TB><CF33>4<TB>1<TB>1<TB>2<TB>9<TB>
<HR120,0.3,-1,0><CF34>Atletico 1912<TB>3<TB><CF33>4<TB>1<TB>0<TB>3<TB>7<TB>
<HR120,0.3,-1,0><CF34>Aygreville<TB>3<TB><CF33>3<TB>1<TB>0<TB>2<TB>3<TB>
<HR120,0.3,-1,0><CF34>Castellamonte<TB>3<TB><CF33>4<TB>1<TB>0<TB>3<TB>13<TB>
<HR120,0.3,-1,0><CF34>Pont Donnaz<TB>3<TB><CF33>3<TB>1<TB>0<TB>2<TB>7<TB>
<HR120,0.3,-1,0><CF34>Rivarolese F<TB>0<TB><CF33>3<TB>0<TB>0<TB>3<TB>1<TB>
<HR120,0.3,-1,0><CF34>[riga_22_1]<TB>Atletico 1912-Rivarolese F<TB><CF33>Banchette-Rivarolese M<TB>Castellamonte-Pont Donnaz<TB>Evancon-Charvensod<TB>Real Canavese-Aygreville.<TB>Coll. Pedenea. <TB>
<HR120,0.3,-1,0><CF34>[riga_23_1]<TB>[riga_23_2]<TB><CF33>[riga_23_3]<TB>[riga_23_4]<TB>[riga_23_5]<TB>[riga_23_6]<TB>[riga_23_7]<TB>
<HR120,0.3,-1,0><CF34>[riga_24_1]<TB>[riga_24_2]<TB><CF33>[riga_24_3]<TB>[riga_24_4]<TB>[riga_24_5]<TB>[riga_24_6]<TB>[riga_24_7]<TB>
<HR120,0.3,-1,0><EL2><COROSSO><CF44><CP9>PROSSIMO TURNO<IB><CO><CF34><CP6>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<COROSSO><CF300><CP4> n <CP6><CF34>
<CONERO>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Need to remove, or replace, all text between []

I have [RanD0m] text, so need some help with stringRegExpReplace function.

thank you for any help,

m.

Edited by myspacee

Share this post


Link to post
Share on other sites
UEZ

Try this:

StringRegExpReplace($text, "\[(.*)\]", "")

whereas $text is the text from above.

Br,

UEZ


Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites
myspacee

thank you !

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×