Jump to content

Scrape Text File Smart


Slipk
 Share

Recommended Posts

Hello everyone,

#include <GUIConstantsEx.au3>
#include <GUIListBox.au3>
#include <WindowsConstants.au3>

#cs

    FILE DATA :

    <name>John</name>
    <random>Hello</random>
    <name>Silly</name>
    <other>Test</other>
    <other>World</other>
    <name>Billy</name>

#ce

Local $file = "file.txt"

$Form1 = GUICreate("Get items from text", 360, 250, -1, -1)
$List1 = GUICtrlCreateList("", 8, 8, 160, 235, -1, 0)

_getitems()

GUISetState(@SW_SHOW, $Form1)

While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit

    EndSwitch
WEnd

Func _getitems()
EndFunc   ;==>_getitems

 

I have the code above and I try to figure out the next one :

It must read a text file with fields like in example below.

<name>John</name>
<random>Hello</random>
<name>Silly</name>
<other>Test</other>
<other>World</other>
<name>Billy</name>

How to add into list only what is between <name>*</name>, something like a wildcard tried to applied but not working.

Any suggestions?

Thank you!

Edited by Slipk
Link to comment
Share on other sites

  • Moderators

@Slipk there are a number of ways to accomplish this. I would suggest taking a look at _StringBetween in the help file, as it is a pretty straight-forward path to what you want.

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

 

@Slipk

You can use SRE too:

#include <Array.au3>
#include <StringConstants.au3>

Global $strString = "<name>John</name>" & @CRLF & _
                    "<random>Hello</random>" & @CRLF & _
                    "<name>Silly</name>" & @CRLF & _
                    "<other>Test</other>" & @CRLF & _
                    "<other>World</other>" & @CRLF & _
                    "<name>Billy</name>", _
       $arrResult
       
$arrResult = StringRegExp($strString, "(?m)^(?:<[^>]+>)([^<]*)(?:<\/[^>]+>)$", $STR_REGEXPARRAYGLOBALMATCH)

_ArrayDisplay($arrResult)

:)

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

Hello Francesco,

Thank you for your answer.

This is how I want to extract but it's one problem that makes the difference.

It has to extract only what it's between <name></name>

And I don't know the other data from the text it could be random text, something like scrape but for some text files.

 

I tried but no results. :(

Link to comment
Share on other sites

@Slipk
Just change the SRE pattern in this way:

#include <Array.au3>
#include <StringConstants.au3>

Global $strString = "<name>John</name>" & @CRLF & _
                    "<random>Hello</random>" & @CRLF & _
                    "<name>Silly</name>" & @CRLF & _
                    "<other>Test</other>" & @CRLF & _
                    "<other>World</other>" & @CRLF & _
                    "<name>Billy</name>", _
       $arrResult


$arrResult = StringRegExp($strString, "(?m)^(?:<name>)([^<]*)(?:</name>)$", $STR_REGEXPARRAYGLOBALMATCH)

_ArrayDisplay($arrResult)

:)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

I'd use microsoft.xmldom (do a search on this forum), and xpaths to parse out my specific node.   assuming your file is a proper syntax xml document

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...