Jump to content
Sign in to follow this  
Slipk

Scrape Text File Smart

Recommended Posts

Hello everyone,

#include <GUIConstantsEx.au3>
#include <GUIListBox.au3>
#include <WindowsConstants.au3>

#cs

    FILE DATA :

    <name>John</name>
    <random>Hello</random>
    <name>Silly</name>
    <other>Test</other>
    <other>World</other>
    <name>Billy</name>

#ce

Local $file = "file.txt"

$Form1 = GUICreate("Get items from text", 360, 250, -1, -1)
$List1 = GUICtrlCreateList("", 8, 8, 160, 235, -1, 0)

_getitems()

GUISetState(@SW_SHOW, $Form1)

While 1
    $nMsg = GUIGetMsg()
    Switch $nMsg
        Case $GUI_EVENT_CLOSE
            Exit

    EndSwitch
WEnd

Func _getitems()
EndFunc   ;==>_getitems

 

I have the code above and I try to figure out the next one :

It must read a text file with fields like in example below.

<name>John</name>
<random>Hello</random>
<name>Silly</name>
<other>Test</other>
<other>World</other>
<name>Billy</name>

How to add into list only what is between <name>*</name>, something like a wildcard tried to applied but not working.

Any suggestions?

Thank you!

Edited by Slipk

Share this post


Link to post
Share on other sites

@Slipk there are a number of ways to accomplish this. I would suggest taking a look at _StringBetween in the help file, as it is a pretty straight-forward path to what you want.


"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Share this post


Link to post
Share on other sites
 

@Slipk

You can use SRE too:

#include <Array.au3>
#include <StringConstants.au3>

Global $strString = "<name>John</name>" & @CRLF & _
                    "<random>Hello</random>" & @CRLF & _
                    "<name>Silly</name>" & @CRLF & _
                    "<other>Test</other>" & @CRLF & _
                    "<other>World</other>" & @CRLF & _
                    "<name>Billy</name>", _
       $arrResult
       
$arrResult = StringRegExp($strString, "(?m)^(?:<[^>]+>)([^<]*)(?:<\/[^>]+>)$", $STR_REGEXPARRAYGLOBALMATCH)

_ArrayDisplay($arrResult)

:)

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

Hello Francesco,

Thank you for your answer.

This is how I want to extract but it's one problem that makes the difference.

It has to extract only what it's between <name></name>

And I don't know the other data from the text it could be random text, something like scrape but for some text files.

 

I tried but no results. :(

Share this post


Link to post
Share on other sites

@Slipk
Just change the SRE pattern in this way:

#include <Array.au3>
#include <StringConstants.au3>

Global $strString = "<name>John</name>" & @CRLF & _
                    "<random>Hello</random>" & @CRLF & _
                    "<name>Silly</name>" & @CRLF & _
                    "<other>Test</other>" & @CRLF & _
                    "<other>World</other>" & @CRLF & _
                    "<name>Billy</name>", _
       $arrResult


$arrResult = StringRegExp($strString, "(?m)^(?:<name>)([^<]*)(?:</name>)$", $STR_REGEXPARRAYGLOBALMATCH)

_ArrayDisplay($arrResult)

:)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

@mikell
Yes, thanks :)
I overused non-capturing group, and I didn't remove the ^ $ anchors since I was testing the pattern on a single line string.

Edited by FrancescoDiMuro

Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

I'd use microsoft.xmldom (do a search on this forum), and xpaths to parse out my specific node.   assuming your file is a proper syntax xml document


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By shaggy89
      Hi all,
      Ive made a script that scrapes an xml off the web code below
      -<availability> -<members date="2015-07-18" daytag="Today" count="11" day="8" night="9" ooa="0" s44="" na="0"> <qualification abbrev="2YR" name="2 Years Experience" category="Ability" count="4" day="3" night="3" ooa="0" s44="0"na="0"/> <qualification abbrev="BA" name="Breathing Apparatus Operator" category="Operator" count="4" day="3" night="4" ooa="0"s44="0" na="0"/> </members> -<members date="2015-07-19" daytag="Tomorrow" count="11" day="8" night="11" ooa="0" s44="0" na="0"> <qualification abbrev="2YR" name="2 Years Experience" category="Ability" count="4" day="4" night="4" ooa="0" s44="0"na="0"/> <qualification abbrev="BA" name="Breathing Apparatus Operator" category="Operator" count="6" day="6" night="4" ooa="0"s44="0" na="0"/> </members> <availability>  
      My script is meant to scrape the "today" section. The first part of my script works and picks up the correct "day" count but when its comes to the "breathing Apparatus Operator" it collects the number from "tomorrow" how can I fix this? My code below
       
       
      $sXML = BinaryToString(InetRead($Site)) $day = StringRegExpReplace($sXML, '(?is).*<availability.*?day="([^"]+).*</availability.*', '$1') $BA = StringRegExpReplace($sXML, '(?is).*<members.*? name="Breathing Apparatus Operator".*?day="([^"]+).*</members.*', '$1');this gets the info we need  
×
×
  • Create New...