Sign in to follow this  
Followers 0
DoubleMcLovin

Find and remove(replace) text in a document

5 posts in this topic

I have a script that is set to get a certain part of a web page, and paste it to file. This works great, one problem is that it includes all the HTML tagging! I would like to find a way to get my script to look for all instances of "<" and continue deleting until it reaches ">" and do that throughout the document. Can anyone help me out? I am using FileReadLine from an HTML file downloaded with InetGet to get the original information.

Share this post


Link to post
Share on other sites



Check your help file for _StringBetween() or StringRegExp(). Either one will pull out everything between those tags. You might also consider how you got the HTML in the first place. You might get just the text you wanted in the first place using the _IE* functions, like _IEBodyGetText(), _IEFormElementGetValue(), etc.

:mellow:


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

DoubleMcLovin,

No doubt a RegExp guru will turn up soon, but this works: :mellow:

#Include <String.au3>

$sText = "fred1<rubbish here to be got rid of>fred2<more rubbish here to be got rid of>fred3"

While 1

    $sTag = _StringBetween($sText, "<", ">")
    If @error Then ExitLoop

    ConsoleWrite("Tag: " & $sTag[0] & @CRLF)

    $sText = StringReplace($sText, "<" & $sTag[0] & ">", "")

    ConsoleWrite("Txt: " & $sText & @CRLF)

WEnd

ConsoleWrite($sText & @CRLF)

M23

Edit: Ah, the rakishly good looking water fowl has struck first! :(

Edited by Melba23

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Hi,

this may helps you:

$string = "<  tag> text 123456 zhfdklasjhf </tag>" & @CRLF
$string &= "<body  > dfhsflkjsdhahejksrh </body  >" & @CRLF
$string &= "<htm l > dfjsklfalsdhfsdkj < /h t m l>" 
MsgBox (0,"", StringRegExpReplace ($string, '<(?i)(.*?)>', ""))

;-))

Stefan

Share this post


Link to post
Share on other sites

Thank you for your help. I will look into the RegExp and _IE functions. Till then this seems to work perfectly.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0