Jump to content

Parsing HTML


 Share

Recommended Posts

I've been experimenting, with little luck, with parsing various elements of web pages using the IE.au3:

example:

<head><meta name="keywords" content="stuff" />

<title>my title</>

</head>

If I wanted to parse "stuff" and "my title" into a text file, how should I do it?

please advise.

BP

Link to comment
Share on other sites

<head><meta name="keywords" content="stuff" />

<title>my title</>

</head>

If I wanted to parse "stuff" and "my title" into a text file, how should I do it?

Use the ie.au3 function that reads the source of a page and then chop the html source on the left and right using stringtrimleft and stringtrimright: '"keywords" content="' and on the right at '" />'

and at: '<title>' and '</>'

...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format
Link to comment
Share on other sites

  • Moderators

You could also use _SRE_Between() (do a search on the forum for it).

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

HI,

or something like:

#include <Array.au3>
$path_to_file = 'test.txt'

$handle = FileOpen($path_to_file, 0)
$file = FileRead($handle)
MsgBox(0,"The file",$file)

If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf

Global $titles = _SRE_Between($file, '<title>', '</title>', 1)
Global $set_resource_name = _SRE_Between($file, '<set_resource_name>', '</set_resource_name>', 1)
Global $summary = _SRE_Between($file, '<summary>', '</summary>', 1)

_ArrayDisplay($titles, "Titles")
_ArrayDisplay($set_resource_name, "$set_ressource_name")
_ArrayDisplay($summary, "$summary")
    
Func _SRE_Between($s_String, $s_Start, $s_End, $i_ReturnArray = 0); $i_ReturnArray returns an array of all found if it = 1, otherwise default returns first found
    $a_Array = StringRegExp($s_String, '(?:' & $s_Start & ')(.*?)(?:' & $s_End & ')', 3)
    If Not @error And Not $i_ReturnArray And IsArray($a_Array) Then Return $a_Array[0]
    If IsArray($a_Array) Then Return $a_Array
EndFunc

Just change $path and the tags to search between.

So long,

Mega

Edited by th.meger

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

An addition to big_daddy's script to include the page title:

#include <IE.au3>

$sURL = "http://autoitscript.com"
$oIE = _IECreate($sURL)
$oMeta = _IETagNameGetCollection($oIE, "META", 0)
ConsoleWrite("Title: " & $oIE.document.title & @CRLF)
ConsoleWrite("Name: " & $oMeta.name & @CRLF)
ConsoleWrite("Content: " & $oMeta.content & @CRLF)
ConsoleWrite("HTTP-EQUIV: " & $oMeta.httpEquiv & @CRLF)

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Another thought, since there can be multiple META tags:

#include <IE.au3>

$sURL = "http://autoitscript.com"
$oIE = _IECreate($sURL)

ConsoleWrite("Title: " & $oIE.document.title & @CRLF & @CRLF)
 
$oMetas = _IETagNameGetCollection($oIE, "META")

For $oMeta in $oMetas
    ConsoleWrite("Name: " & $oMeta.name & @CRLF)
    ConsoleWrite("Content: " & $oMeta.content & @CRLF)
    ConsoleWrite("HTTP-EQUIV: " & $oMeta.httpEquiv & @CRLF & @CRLF)
Next

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...