Sign in to follow this  
Followers 0
ronburgandy

rip data from multiple files and write it to excel

21 posts in this topic

I am just learning about this scripting tool and was wondering if some of you veterans here could enlighten me. I was wondering if it would be possible to take several (possibly thousands) of text files that all have pretty much the same structure and reading each file, pulling the data (text) out of the file and writing it to a spreadsheet, then going to the next text file, writing that data to row #2 of the spreadsheet and so on into a loop until all text files in that folder have been read?

Share this post


Link to post
Share on other sites



Yes.

You could write them to a .csv file, XML file, or one of several other formats that excel supports.

Share this post


Link to post
Share on other sites

How does that work? do you tell the script to count x-number of spaces on line#1 then copy the text up to such and such tag or character and paste that in cell A, etc?

Share this post


Link to post
Share on other sites

i think its better to search all the files you desire for the "info' you want, then when that info is found .... write it to the .csv file.... then open the excell with that file to be opened in a "run" command

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

The info is contained in each file, I already know I want the info to be extracted out of each file. For example, I need all of the information out of file #1 to be copied into row#1 of the spreadsheet, then I need all of the info in file#2 to be inserted/copied into row#2 of the spreadsheet, and so on.

My question is more centered around how do I get a script to do this if the text files have some text I don't want inserted such as html tags. All of the files however have the same layout and internal structure, they just have different data. The html files are basically "View, source, save" of web pages of business listings. I have saved all of these as txt files and now want a script to extract the business listings out of the html code from the text files and save them all in a spreadsheet or MDB or some sort of database I can use to sort the listings and build mailing lists from

Share this post


Link to post
Share on other sites

So that we can better anwser your question:

1. Are you new to scripting / programming in general?

2. Are you new to Auto-It?

3. Just don't know how to do this particular task in Auto-It?

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

i think you might understand it like this...

there is a hard way and an easy way to do this

1 hard way

we siffer through the downloaded files and find the info you want... then place it in the correct cell in the correct row and in the correct column of the excell program that is already open

2 easy way

we siffer through the downloaded files and find the info you want... then place in a file named "address.csv"..

then run excell with that file

Since you seem to be new and would need guidence, i strongly suggest the easy way

8)

Edited by Valuater

NEWHeader1.png

Share this post


Link to post
Share on other sites

So that we can better anwser your question:

1. Are you new to scripting / programming in general?

No

2. Are you new to Auto-It?

Yes

3. Just don't know how to do this particular task in Auto-It?

Yes

Share this post


Link to post
Share on other sites

i think you might understand it like this...

there is a hard way and an easy way to do this

1 hard way

we siffer through the downloaded files and find the info you want... then place it in the correct cell in the correct row and in the correct column of the excell program that is already open

2 easy way

we siffer through the downloaded files and find the info you want... then place in a file named "address.csv"..

then run excell with that file

Since you seem to be new and would need guidence, i strongly suggest the easy way

8)

Yes, my goal here is to try to get the data into a managable format. i don't care if it is tab-delimited, CSV, excel, mySQL, anything can be converted at that point.

Share this post


Link to post
Share on other sites

to start ( post a file )

1

we need to see the data from a file that was downloaded

2

tell us what you want to get ( info ) from that file

( since all files are of the same format )

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

So that we can better anwser your question:

1. Are you new to scripting / programming in general?

No

2. Are you new to Auto-It?

Yes

3. Just don't know how to do this particular task in Auto-It?

Yes

On #1 above. Have you done this task in another language, say VBS?

If so can you show us that code logic?

ViM

Share this post


Link to post
Share on other sites

So that we can better anwser your question:

1. Are you new to scripting / programming in general?

No

2. Are you new to Auto-It?

Yes

3. Just don't know how to do this particular task in Auto-It?

Yes

On #1 above. Have you done this task in another language, say VBS?

If so can you show us that code logic?

ViM

No, unfortunately, I have not - most of my scripting experience is with installation setups and registry editing

Share this post


Link to post
Share on other sites

to start ( post a file )

1

we need to see the data from a file that was downloaded

2

tell us what you want to get ( info ) from that file

( since all files are of the same format )

8)

here is the file, the information I want to extract is found under <!-- Business Data Detail -->. I would like to extract as much of these details as possible if I could

example.txt

Share this post


Link to post
Share on other sites

here is the file, the information I want to extract is found under <!-- Business Data Detail -->. I would like to extract as much of these details as possible if I could

In the original post you said test files, and this is labeled .txt and yet the first line says its an HTML.

In any rate, if I rename it from .txt to .htm I see what looks like five sections:

* * * Business Information * * *

* * * License Status * * *

* * * Classifications * * *

* * * Bonding Information * * *

* * * Workers Compensation Information * * *

If correct, is there one section in paticular that your trying to capture, say * * * Business Information * * *?

ViM

Share this post


Link to post
Share on other sites

In the original post you said test files, and this is labeled .txt and yet the first line says its an HTML.

In any rate, if I rename it from .txt to .htm I see what looks like five sections:

* * * Business Information * * *

* * * License Status * * *

* * * Classifications * * *

* * * Bonding Information * * *

* * * Workers Compensation Information * * *

If correct, is there one section in paticular that your trying to capture, say * * * Business Information * * *?

ViM

when saving the html output from my browser (view, source) I intentionally saved the file as txt instead of htm as I thought raw text would be easier to process. As far as the sections, I want to try to capture all of the information. Each section has entries I would like to be able to save in it's own individual cell - such as under * * * Business Information * * *, I want to save, name, street, city, zip, etc in their own field or cell.

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

here is the info I want to pull from these files:

<!-- Business Data Detail -->

<table>

<tr><td>

<table bgcolor="#FFF8DC" border="0" cellpadding="2" cellspacing="1" width="100%"><tr><td width="100%">

<CENTER>Extract Date: <B>08/20/2006</B><br /></CENTER><CENTER><H3>* * * Business Information * * *</H3>ROB B SMITH<br />509 DIVOT DRIVE<br />CITRUS HEIGHTS, CA 95610<br />Business Phone Number: (916) 555-5555<br /><br />Entity: <B>Sole Ownership</B><br />Issue Date: <B>01/23/1968</B> Expire Date: <B>03/31/2007</B></CENTER><H3><CENTER>* * * License Status * * *</CENTER></H3>This license is expired at this time.<H3><CENTER>* * * Classifications * * *</CENTER></H3><CENTER><TABLE BORDER=1><TR><TH id=Class>Class</TH><TH id=Desc>Description</TH></TR><TR><TD headers=Class><B>B</B></TD></TR></TABLE></CENTER><H3><CENTER>* * * Bonding Information * * *</CENTER></H3><CENTER><B>BOND: </B>This license filed Bond number <B>555570</B> in the amount of <B>$5,000</B> with the bonding company <br /><B>BONDING COMPANY OF ATLANTA</A>.</B><br />Effective Date: <B>01/01/1980 </B> Cancellation Date: <B>06/07/2007</B><br /><br /></CENTER><H3><CENTER>* * * Workers Compensation Information * * *</CENTER></H3><CENTER>There was no workers comp information found for this license.<br /></CENTER><CENTER></CENTER><br /><br />

Edited by ronburgandy

Share this post


Link to post
Share on other sites

sorry so many posts, I'm just trying to illustrate what I would like to see accomplished with this...

Share this post


Link to post
Share on other sites

#18 ·  Posted (edited)

sorry so many posts, I'm just trying to illustrate what I would like to see accomplished with this...

can you post the website(s) ( or pm me... you can also give me an IM name )

i will try to get this going

8)

Edited by Valuater

NEWHeader1.png

Share this post


Link to post
Share on other sites

I am so glad he chickened out and left the building...

I was going to do all of it for him

.... what a load off of myself...wehw

8)

i wont a a guilt-trip if he returns either..lol


NEWHeader1.png

Share this post


Link to post
Share on other sites

Well, this mess of a function will take the file given and extract out the desired data:

; $Modes is text that labels the following text as data that we want
$Modes = "License #|Extract Date:|* * * Business Information * * *|Business Phone Number:|Entity:|Issue Date:|Expire Date:" _
    & "|* * * License Status * * *|Description|" _
    & "This license filed Contractor's Bond number|in the amount of|with the bonding company|" _
    & "Effective Date:|Cancellation Date:|* * * Workers Compensation Information * * *"
$ModesA = StringSplit($Modes, "|")

$Myfile = @DesktopDir & "\example.txt"
$theData = _StripData($Myfile)
; write data out to file
FileWrite(StringReplace($Myfile, ".txt", "-2.txt"), $theData)
MsgBox(0, "Debug", $theData)
        
Func _StripData($file)
    $data = FileRead($file, FileGetSize($file))
    ; Preprocess the data a little to make it easier to deal with.  we will delimit all data with a @TAB
    $data = StringReplace($data, "<TD>License #", "License #" & @TAB) ; - so we can find the license number
    $data = StringReplace($data, "Business Phone Number:", "Business Phone Number:" & @TAB) ; - so we can find the phone number
    $tags = "<TD>,</TD>,<TR>,</TR>,</font>,<B>,</B>,<H3>,</H3>,<CENTER>,</CENTER>,</A>," _
        & "<br />,</TABLE>,<TH id=Class>,</TH>,<TH id=Desc>,<TD headers=Class>, ," & @LF & "," & @CR
    $tagsA = StringSplit($tags, ",")
    For $i = 1 to $tagsA[0]
        $data = StringReplace($data, $tagsA[$i], @TAB)  ; - replace all the tags we can easily identify with @TAB
    Next
    $dataA = StringSplit($data, @TAB)   ; - make into an array
    Dim $Mode, $line = 1, $PrettyData, $Address, $Class
    For $j = 1 to $dataA[0]
        ; "$Mode" lets us know that we have found a tag that identifies significant data coming up.
        ; We need to treat some data special, so we use a Select-Case statement
        If $dataA[$j] <> '' And $Mode <> '' Then    ; Ignore blank data
            Select
                Case $Mode = "* * * Business Information * * *"
                    If $dataA[$j] <> "Business Phone Number:" Then
                        $Address = $Address & $dataA[$j] & @TAB ; - stack up the address.  Assume that all addresses have the
                        ; same number of lines in each file.  Otherwise, might want to put commas between lines and have the
                        ; full address go into a single cell in Excel.
                    EndIf
                Case $Mode = "Business Phone Number:"
                    ; Now that we are at the Phone number, add the address to the saved data
                    $PrettyData = $PrettyData & $Address & $dataA[$j] & @TAB
                    $Mode = ''
                Case $Mode = "Description"
                    If StringStripWS($dataA[$j], 3) = "* * * Bonding Information * * *" Then
                        ; don't record anymore when we get to this string
                        $Mode = ''
                    Else
                        $Class = $Class & $dataA[$j] & @TAB
                    EndIf
                Case $Mode = "* * * License Status * * *"
                    $PrettyData = $PrettyData & $dataA[$j] & @TAB
                    $Mode = ''
                Case $Mode = "This license filed Contractor's Bond number"
                    $TmpA = StringSplit($Class, @TAB)
                    If $TmpA[2] = '' Then
                        ; handle the case where the Description is blank, like in the example, by adding extra tab.
                        $Class = $Class & @TAB
                    EndIf
                    $PrettyData = $PrettyData & $Class & $dataA[$j] & @TAB
                    $Mode = ''
                Case Else
                    ; All other modes
                    ; $Mode = "NO_MORE" means that we are not looking for any more data.
                    If $Mode <> "NO_MORE" And StringInStr($Modes, $Mode) And Not(StringInStr($Modes, $dataA[$j])) Then
                        $PrettyData = $PrettyData & $dataA[$j] & @TAB
                        If $Mode = "* * * Workers Compensation Information * * *" Then
                            $Mode = "NO_MORE"
                        Else
                            $Mode = ''
                        EndIf
                    EndIf
            EndSelect           
        EndIf
        ; See if the mode changes
        If $Mode <> "NO_MORE" Then
            For $k = 1 to $ModesA[0]
                If StringStripWS($dataA[$j], 3) = $ModesA[$k] Then
                    ; We found a data tag.
                    $Mode = $ModesA[$k]
                EndIf
            Next
        EndIf
        ;ConsoleWrite("Data = " & $dataA[$j] & "   " & "$Mode = " & $Mode & @LF)
        
    Next
    Return $PrettyData
EndFunc

BlueBearrOddly enough, this is what I do for fun.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0