DoubleMcLovin Posted February 23, 2010 Posted February 23, 2010 I have a script that is set to get a certain part of a web page, and paste it to file. This works great, one problem is that it includes all the HTML tagging! I would like to find a way to get my script to look for all instances of "<" and continue deleting until it reaches ">" and do that throughout the document. Can anyone help me out? I am using FileReadLine from an HTML file downloaded with InetGet to get the original information.
PsaltyDS Posted February 23, 2010 Posted February 23, 2010 Check your help file for _StringBetween() or StringRegExp(). Either one will pull out everything between those tags. You might also consider how you got the HTML in the first place. You might get just the text you wanted in the first place using the _IE* functions, like _IEBodyGetText(), _IEFormElementGetValue(), etc. Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Moderators Melba23 Posted February 23, 2010 Moderators Posted February 23, 2010 (edited) DoubleMcLovin,No doubt a RegExp guru will turn up soon, but this works: #Include <String.au3> $sText = "fred1<rubbish here to be got rid of>fred2<more rubbish here to be got rid of>fred3" While 1 $sTag = _StringBetween($sText, "<", ">") If @error Then ExitLoop ConsoleWrite("Tag: " & $sTag[0] & @CRLF) $sText = StringReplace($sText, "<" & $sTag[0] & ">", "") ConsoleWrite("Txt: " & $sText & @CRLF) WEnd ConsoleWrite($sText & @CRLF)M23Edit: Ah, the rakishly good looking water fowl has struck first! Edited February 23, 2010 by Melba23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
99ojo Posted February 23, 2010 Posted February 23, 2010 Hi, this may helps you: $string = "< tag> text 123456 zhfdklasjhf </tag>" & @CRLF $string &= "<body > dfhsflkjsdhahejksrh </body >" & @CRLF $string &= "<htm l > dfjsklfalsdhfsdkj < /h t m l>" MsgBox (0,"", StringRegExpReplace ($string, '<(?i)(.*?)>', "")) ;-)) Stefan
DoubleMcLovin Posted February 23, 2010 Author Posted February 23, 2010 Thank you for your help. I will look into the RegExp and _IE functions. Till then this seems to work perfectly.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now