Jump to content

Help extracting a line from files and storing them.


Flaips
 Share

Go to solution Solved by Flaips,

Recommended Posts

So, I have around 900 html files stored on a folder, each file has a 4-digit number as its name, and inside on a line there is one line with a string of text, what I want to do is store the name of the file plus the string on a txt, like so:

1222 - String 1

1443 - String 2

Any help on that or is that even possible?

Thank you all in advance.

Link to comment
Share on other sites

Which "line"? The mere concept of line in html source is rather fuzzy.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Which "line"? The mere concept of line in html source is rather fuzzy.

 

When you open the .htm file with a text editor like Notepad++ on the line 87 of every file there is something like:

<td>Example name</td>

but each file has a different string between the <td> tags.

I just want to save these particular lines(the tag can stay, but if it's possible to save without it, it would be great) together with the name of the file, like I explained earlier.

I hope that this explains it all.

Link to comment
Share on other sites

Seems that this can be done using the simplest way

Place the script in the same folder than the html files

#include <String.au3>
#include <Array.au3>

Local $array[1000][2], $n = 0
$hSearch = FileFindFirstFile("*.html")
While 1
     $n += 1
     $sFileName = FileFindNextFile($hSearch)
     If @error Then ExitLoop
     $array[$n][0] = StringReplace($sFileName, ".html", "")
     $array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '<td>', '</td>')
Wend
FileClose($hSearch)
$array[0][0] = $n
Redim $array[$n+1][2]
_ArrayDisplay($array)
Link to comment
Share on other sites

Hi Flaipds,

Yes, that is doable with AutoIT and in fact, it's not hard.

Take a look at:

_FileListToArray

_StringBetween

FileWriteLine or FileOpen and FileWrite

Write your code and consult them here by posting it when you think you need help

Good luck!

Thanks, I will have a look at that later, even if it's to understand what's going on.

 

 

Seems that this can be done using the simplest way

Place the script in the same folder than the html files

#include <String.au3>
#include <Array.au3>

Local $array[1000][2], $n = 0
$hSearch = FileFindFirstFile("*.html")
While 1
     $n += 1
     $sFileName = FileFindNextFile($hSearch)
     If @error Then ExitLoop
     $array[$n][0] = StringReplace($sFileName, ".html", "")
     $array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '<td>', '</td>')
Wend
FileClose($hSearch)
$array[0][0] = $n
Redim $array[$n+1][2]
_ArrayDisplay($array)

It did return me all the file names, which is a great start, like so:

956|
0997|

With more 900 numbers of course, but for some reason it didn't return the names, only white spaces, so I tried to play around with code, and changed this:

$array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '<td>', '</td>')

to this:

$array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '            <td>', '</td>')

because of the white spaces on the line, and I don't know if that is necessary, but after that it returned 0, like so:

956|
0997|0
0998|0

Any clue as to what is going on?

Sorry for bothering you, but thanks for the help.

Link to comment
Share on other sites

_StringBetween function returns an array. So, in your script you are storing an array in $array[$n][1] - an array in an array.

If you are using The latest AutoIt release version 3.3.10.2, or the latest beta version then this should work.  Note the trailing "[0]".

$array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '<td>', '</td>')[0]
Link to comment
Share on other sites

  • Solution

 

_StringBetween function returns an array. So, in your script you are storing an array in $array[$n][1] - an array in an array.

If you are using The latest AutoIt release version 3.3.10.2, or the latest beta version then this should work.  Note the trailing "[0]".

$array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '<td>', '</td>')[0]

Yup, that worked perfectly, thank you all for the help ^^.

And in case anyone ever finds themselves with the same problem, this is the code I used:

 
#include <String.au3>
#include <Array.au3>

Local $array[1000][2], $n = 0
$hSearch = FileFindFirstFile("*.htm")
While 1
     $n += 1
     $sFileName = FileFindNextFile($hSearch)
     If @error Then ExitLoop
     $array[$n][0] = StringReplace($sFileName, ".htm", "")
     $array[$n][1] = _StringBetween(FileReadLine($sFileName, 87), '<td>', '</td>')[0]
Wend
FileClose($hSearch)
$array[0][0] = $n
Redim $array[$n+1][2]
_ArrayDisplay($array)

Thanks everyone for the help and support.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...