Jump to content
Sign in to follow this  
Guest louis

need help extracting paragraphs from website

Recommended Posts

Guest louis

hello everyone,

i need some help please. I wrote a small script to read all the data from a given website, and put it in a text file.

what i want to do next is search for certain keywords, and if a paragraph contains certain keywords, i want to extract that paragraph, and put it in a new text file.

could someone plz plz help me?

many thanks.

Share this post

Link to post
Share on other sites

Hi and Welcome to AutoIt! :shocked:

Have you tried to do this yourself yet? Please post any code you've tried, and you'll be more likely to get help with what's not working with that specific code.

In the meantime, to get you started, look at the help file at FileRead(), StringSplit() (use @CR in StringSplit()...search the help file for more info) and StringInStr(). This is a pretty easy task, a perfect one to learn some more basic concepts of coding, so take it as such :-) Don't get frustrated, just try stuff until you've tried everything you can think of, then post your code here and someone will point out problems with the code.

Most of this forum's members subscribe to the philosophy of "Teach a man to fish, and he'll fish for a lifetime" rather than coding for you. It's a helpful group, but you have to show you've put some effort in first. Have a great one!

Edited by james3mg

"There are 10 types of people in this world - those who can read binary, and those who can't.""We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true." ~Robert Wilensky0101101 1001010 1100001 1101101 1100101 1110011 0110011 1001101 10001110000101 0000111 0001000 0001110 0001101 0010010 1010110 0100001 1101110

Share this post

Link to post
Share on other sites

In your AutoIt learning lessons look at regular expressions (regexp). It's very usefull for such purposes you want here.

Especially look at "C:\Program Files\AutoIt3\Include\string.au3" --> _StringBetween()

; Function Name:        _StringBetween($sString, $sStart, $sEnd, $vCase, $iSRE)
; Parameters:           $sString:     The string to search
;                       $sStart:      The beginning of the string to find
;                       $sEnd:        The end of the string to find
;                       $vCase:       Case sensitive search:  Default or -1 = Not case sensitive
;                       $iSRE:        Choose whether to use StringRegExp or Regular Sting Manipulation to get the result
;                                     Default or -1:  Regular String Manipulation used (Non StringRegExp())
; Description:          Returns the string between the start search ($sStart) and the end search ($sEnd)
; Requirement(s)        AuotIt Beta or higher
; Return Value(s)       On Success:    A 0 based array [0] contains the first found string
;                       On Failure:    @Error = 1: No inbetween string was found
; Author(s):            SmOke_N
;                       Thanks to Valik for helping with the new StringRegExp (?s)(?i) isssue

Func _StringBetween($sString, $sStart, $sEnd, $vCase = -1, $iSRE = -1)
    If $iSRE = -1 Or $iSRE = Default Then
        If $vCase = -1 Or $vCase = Default Then
            $vCase = 0
            $vCase = 1
        Local $sHold = '', $sSnSStart = '', $sSnSEnd = ''
        While StringLen($sString) > 0
            $sSnSStart = StringInStr($sString, $sStart, $vCase)
            If Not $sSnSStart Then ExitLoop
            $sString = StringTrimLeft($sString, ($sSnSStart + StringLen($sStart)) - 1)
            $sSnSEnd = StringInStr($sString, $sEnd, $vCase)
            If Not $sSnSEnd Then ExitLoop
            $sHold &= StringLeft($sString, $sSnSEnd - 1) & Chr(1)
            $sString = StringTrimLeft($sString, $sSnSEnd)
        If Not $sHold Then Return SetError(1, 0, 0)
        $sHold = StringSplit(StringTrimRight($sHold, 1), Chr(1))
        Local $avArray[UBound($sHold) - 1]
        For $iCC = 1 To UBound($sHold) - 1
            $avArray[$iCC - 1] = $sHold[$iCC]
        Return $avArray
        If $vCase = Default Or $vCase = -1 Then
            $vCase = '(?i)'
            $vCase = ''
        Local $aArray = StringRegExp($sString, '(?s)' & $vCase & $sStart & '(.*?)' & $sEnd, 3)
        If IsArray($aArray) Then Return $aArray
        Return SetError(1, 0, 0)
EndFunc   ;==>_StringBetween

Share this post

Link to post
Share on other sites

Using the IE.au3 routines you could also examine the paragraph text in the browser content and then just write out what you want, where you want it.


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...