Guest louis

need help extracting paragraphs from website

Guest louis

hello everyone,

i need some help please. I wrote a small script to read all the data from a given website, and put it in a text file.

what i want to do next is search for certain keywords, and if a paragraph contains certain keywords, i want to extract that paragraph, and put it in a new text file.

could someone plz plz help me?

many thanks.

Hi and Welcome to AutoIt! :shocked:

Have you tried to do this yourself yet? Please post any code you've tried, and you'll be more likely to get help with what's not working with that specific code.

In the meantime, to get you started, look at the help file at FileRead(), StringSplit() (use @CR in StringSplit()...search the help file for more info) and StringInStr(). This is a pretty easy task, a perfect one to learn some more basic concepts of coding, so take it as such :-) Don't get frustrated, just try stuff until you've tried everything you can think of, then post your code here and someone will point out problems with the code.

Most of this forum's members subscribe to the philosophy of "Teach a man to fish, and he'll fish for a lifetime" rather than coding for you. It's a helpful group, but you have to show you've put some effort in first. Have a great one!

In your AutoIt learning lessons look at regular expressions (regexp). It's very usefull for such purposes you want here.

Especially look at "C:\Program Files\AutoIt3\Include\string.au3" --> _StringBetween()

; Function Name:        _StringBetween($sString, $sStart, $sEnd, $vCase, $iSRE)
; Parameters:           $sString:     The string to search
;                       $sStart:      The beginning of the string to find
;                       $sEnd:        The end of the string to find
;                       $vCase:       Case sensitive search:  Default or -1 = Not case sensitive
;                       $iSRE:        Choose whether to use StringRegExp or Regular Sting Manipulation to get the result
;                                     Default or -1:  Regular String Manipulation used (Non StringRegExp())
; Description:          Returns the string between the start search ($sStart) and the end search ($sEnd)
; Requirement(s)        AuotIt Beta or higher
; Return Value(s)       On Success:    A 0 based array [0] contains the first found string
;                       On Failure:    @Error = 1: No inbetween string was found
; Author(s):            SmOke_N
;                       Thanks to Valik for helping with the new StringRegExp (?s)(?i) isssue

Func _StringBetween($sString, $sStart, $sEnd, $vCase = -1, $iSRE = -1)
    If $iSRE = -1 Or $iSRE = Default Then
        If $vCase = -1 Or $vCase = Default Then
            $vCase = 0
            $vCase = 1
        Local $sHold = '', $sSnSStart = '', $sSnSEnd = ''
        While StringLen($sString) > 0
            $sSnSStart = StringInStr($sString, $sStart, $vCase)
            If Not $sSnSStart Then ExitLoop
            $sString = StringTrimLeft($sString, ($sSnSStart + StringLen($sStart)) - 1)
            $sSnSEnd = StringInStr($sString, $sEnd, $vCase)
            If Not $sSnSEnd Then ExitLoop
            $sHold &= StringLeft($sString, $sSnSEnd - 1) & Chr(1)
            $sString = StringTrimLeft($sString, $sSnSEnd)
        If Not $sHold Then Return SetError(1, 0, 0)
        $sHold = StringSplit(StringTrimRight($sHold, 1), Chr(1))
        Local $avArray[UBound($sHold) - 1]
        For $iCC = 1 To UBound($sHold) - 1
            $avArray[$iCC - 1] = $sHold[$iCC]
        Return $avArray
        If $vCase = Default Or $vCase = -1 Then
            $vCase = '(?i)'
            $vCase = ''
        Local $aArray = StringRegExp($sString, '(?s)' & $vCase & $sStart & '(.*?)' & $sEnd, 3)
        If IsArray($aArray) Then Return $aArray
        Return SetError(1, 0, 0)
EndFunc   ;==>_StringBetween

Using the IE.au3 routines you could also examine the paragraph text in the browser content and then just write out what you want, where you want it.


