dvlkn Posted September 1, 2008 Share Posted September 1, 2008 HiJust wondering if it's possible to parse html. I am trying to pick up a number from a webpage for further calculations, like for e.g:<html>You can buy <b>8</b> bla bla bla </html>I looked in the help file but could not find any function or combine the current ones to do the job.I know about _IEBodyReadHTML or _IEBodyReadText but how can I get only the required part rather than the whole body ? Any help appreciated Link to comment Share on other sites More sharing options...
oMBRa Posted September 1, 2008 Share Posted September 1, 2008 (edited) U get the whole body then u filter the required part. Look in the help file about String Management Edited September 1, 2008 by oMBra Link to comment Share on other sites More sharing options...
dbzfanatic Posted September 1, 2008 Share Posted September 1, 2008 If it's always going to be in bold you could get the tag collection and find what you want in there. Go to my website. | My Zazzle Page (custom products)Al Bhed Translator | Direct linkScreenRec ProSimple Text Editor (STE) [TUTORIAL]Task Scheduler UDF <--- First ever UDF!_ControlPaste() UDF[quote name='renanzin' post='584064' date='Sep 26 2008, 07:00 AM']whats help ?[/quote] Link to comment Share on other sites More sharing options...
dvlkn Posted September 1, 2008 Author Share Posted September 1, 2008 OK I looked, is that the String Regular Expression that I should use ? looks complicated and dbzfanatic, yes it's always in bold but not the only text that is bolded, would that interfere ? Link to comment Share on other sites More sharing options...
dvlkn Posted September 1, 2008 Author Share Posted September 1, 2008 Hmm I am not familiar with Regex and having a difficult time parsing the bold part in: "You can buy *random number here* shares". Can someone help me with the expression pls ? Link to comment Share on other sites More sharing options...
Szhlopp Posted September 1, 2008 Share Posted September 1, 2008 (edited) Hmm I am not familiar with Regex and having a difficult time parsing the bold part in: "You can buy *random number here* shares". Can someone help me with the expression pls ? RegExRelace tester=) expandcollapse popup#include <GUIConstants.au3> #include <EditConstants.au3> ; $hMainGUI = GUICreate("StringRegExReplace Tester", 574, 454, 245, 130) $hMainEdit = GUICtrlCreateEdit("", 0, 0, 573, 182) GUICtrlCreateGroup("Input", 2, 182, 569, 270) $hPatternInput = GUICtrlCreateInput("", 70, 206, 150, 21) $hReplaceInput = GUICtrlCreateInput("", 70, 236, 150, 21) GUICtrlCreateLabel("Pattern:", 12, 208, 41, 17) GUICtrlCreateLabel("Replace:", 12, 239, 47, 17) $hCountInput = GUICtrlCreateInput("0", 70, 266, 150, 21) GUICtrlCreateLabel("Count:", 12, 269, 35, 17) $hReturn = GUICtrlCreateEdit("", 5, 326, 560, 121 ) GUICtrlSendMsg($hReturn, $EM_SETREADONLY, -1, 0) GUICtrlCreateLabel("@Error:", 385, 206, 40, 17) GUICtrlCreateLabel("@Extended:", 385, 234, 63, 17) $hErrorInput = GUICtrlCreateInput("", 454, 204, 62, 21) GUICtrlSetState(-1, $GUI_DISABLE) $hExtendedInput = GUICtrlCreateInput("", 454, 230, 62, 21) GUICtrlSetState(-1, $GUI_DISABLE) $hTestButton = GUICtrlCreateButton("RegExReplace", 231, 298, 108, 26, 0) GUICtrlCreateGroup("", -99, -99, 1, 1) GUISetState(@SW_SHOW) While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit Case $hTestButton ;Sleep(3000) $sRegEx = StringRegExpReplace(GUICtrlRead($hMainEdit), GUICtrlRead($hPatternInput), GUICtrlRead($hReplaceInput) , GUICtrlRead($hCountInput)) $iError = @error $iExt = @extended If $iError = 2 Then GUICtrlSetState($hPatternInput, $GUI_FOCUS) GUICtrlSendMsg($hPatternInput, $EM_SETSEL, $iExt - 2 , $iExt - 1) EndIf GUICtrlSetData($hErrorInput, $iError) GUICtrlSetData($hExtendedInput, $iExt) GUICtrlSetData($hReturn, $sRegEx) EndSwitch WEndoÝ÷ Ù ^²×«jëh×6#include <GUIConstants.au3> #include <EditConstants.au3> #include <WindowsConstants.au3> #include <ListViewConstants.au3> #include <GuiListView.au3> $GUI = GUICreate("RegEx Tester", 622, 450) $Group1 = GUICtrlCreateGroup("Input Parameters", 16, 16, 569, 185) $String = GUICtrlCreateEdit("", 88, 48, 465, 97) $Test = GUICtrlCreateButton("Test", 464, 160, 89, 25, 0) $Pattern = GUICtrlCreateInput("", 96, 160, 137, 21) $Flag = GUICtrlCreateInput("", 320, 160, 73, 21, BitOR($ES_AUTOHSCROLL,$ES_NUMBER)) GUICtrlCreateLabel("String:", 32, 88, 34, 17) GUICtrlCreateLabel("Pattern:", 32, 160, 41, 17) GUICtrlCreateLabel("Flag", 272, 162, 24, 17) GUICtrlCreateGroup("", -99, -99, 1, 1) $Group2 = GUICtrlCreateGroup("Output", 16, 216, 577, 217) $Group3 = GUICtrlCreateGroup("Return Values", 328, 240, 249, 129) GUICtrlCreateLabel("Return: ", 344, 264, 42, 17) GUICtrlCreateLabel("@Extended", 344, 296, 60, 17) GUICtrlCreateLabel("@error", 344, 328, 36, 17) $Return = GUICtrlCreateInput("", 408, 264, 137, 21, BitOR($ES_AUTOHSCROLL,$ES_READONLY)) $Extended = GUICtrlCreateInput("", 408, 296, 137, 21, BitOR($ES_AUTOHSCROLL,$ES_READONLY)) $Error = GUICtrlCreateInput("", 408, 328, 137, 21, BitOR($ES_AUTOHSCROLL,$ES_READONLY)) GUICtrlCreateGroup("", -99, -99, 1, 1) $ListView = GUICtrlCreateListView("Element|Data", 40, 240, 257, 177, -1, BitOR($WS_EX_CLIENTEDGE,$LVS_EX_GRIDLINES,$LVS_EX_FULLROWSELECT)) GUICtrlCreateGroup("", -99, -99, 1, 1) GUISetState(@SW_SHOW) While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $GUI_EVENT_CLOSE Exit Case $Test If GuiCtrlRead($String) = "" or GUICtrlRead($Pattern) = "" Then MsgBox(0,"Error", "Please fill in all required parameters.") Else If GUICtrlRead($Flag) = "" then Process(GUICtrlRead($String), GuiCtrlRead($Pattern)) Else Process(GUICtrlRead($String), GuiCtrlRead($Pattern), GUICtrlRead($Flag)) EndIf EndIf EndSwitch WEnd Func Process($String, $Pattern, $Flag=0) _GUICtrlListView_DeleteAllItems(GUICtrlGetHandle($ListView)) $Result = StringRegExp($String,$Pattern,$Flag) $Err = @error $Ext = @extended If IsArray($Result) then Dim $Item[UBound($Result)] For $x = 0 to UBound($Result)-1 $Item[$x] = GUICtrlCreateListViewItem("["&$x&"]|"&$Result[$x],$ListView) Next GUICtrlSetData($Return, "Array") GUICtrlSetData($Extended, $Ext) GUICtrlSetData($Error, $Err) Else GUICtrlSetData($Return, $Result) GUICtrlSetData($Extended, $Ext) GUICtrlSetData($Error, $Err) EndIf EndFunc Edited September 1, 2008 by Szhlopp RegEx/RegExRep Tester!Nerd Olympics - Community App!Login UDFMemory UDF - "Game.exe+753EC" - CE pointer to AU3Password Manager W/ SourceDataFiler - Include files in your au3!--- Was I helpful? Click the little green '+' Link to comment Share on other sites More sharing options...
dvlkn Posted September 2, 2008 Author Share Posted September 2, 2008 Thanks. Using the program I could only do half the job so I still need some help: This is the part of the text I need to parse: <br>You currently have <b>412</b> shares.<br>You can buy a maximum of <b>20</b> shares.<br> With the pattern, ([0-9]{1,3})(?:</b> shares)', 1), it return as 412</b> shares What I need is in fact the second number, 20 in this e.g Link to comment Share on other sites More sharing options...
Szhlopp Posted September 2, 2008 Share Posted September 2, 2008 (edited) Thanks. Using the program I could only do half the job so I still need some help: This is the part of the text I need to parse: With the pattern, ([0-9]{1,3})(?:</b> shares)', 1), it return as 412</b> shares What I need is in fact the second number, 20 in this e.g Heh here ya go=) Text used: <br>You currently have <b>412</b> shares.<br>You can buy a maximum of <b>20</b> shares.<br> Pattern: <b>(.*?)< Flag: 3 Array[0] is 412 Array[1] is 20 Edited September 2, 2008 by Szhlopp RegEx/RegExRep Tester!Nerd Olympics - Community App!Login UDFMemory UDF - "Game.exe+753EC" - CE pointer to AU3Password Manager W/ SourceDataFiler - Include files in your au3!--- Was I helpful? Click the little green '+' Link to comment Share on other sites More sharing options...
dvlkn Posted September 2, 2008 Author Share Posted September 2, 2008 (edited) Thanks for your help. With a bit of tweaking I managed to filter the required part I used the function _IEBodyReadText instead of html. But now I am getting another error which says "Subscript used with non-Array variable" And this is the part that triggers it : $sResult = StringRegExp($sText, 'You can buy a maximum of (.*?) shares', 3) $Amount = Int ($sResult[0] / 24) * 24 Strangely, sometimes it works and sometimes it does not :S Can anyone help me clear this please Edited September 2, 2008 by dvlkn Link to comment Share on other sites More sharing options...
James Posted September 2, 2008 Share Posted September 2, 2008 Use ConsoleWrite to check that $Result is equal to something. It could also be a page timeout problem. Blog - Seriously epic web hosting - Twitter - GitHub - Cachet HQ Link to comment Share on other sites More sharing options...
mc83 Posted September 2, 2008 Share Posted September 2, 2008 (edited) afaik, if StringRegExp does not return an array, it means it could not match anything with your patterni've written a couple of web crawlers in PHP, and i can tell you one thing: never trust "user" input (well, it's a well known fact, actually)maybe, in some cases, there is no _space_ before or after the number, so the pattern can't match. Maybe it would be a good idea to capture anything in between maximum of and shares and then trim the result.you could also fetch a large number of such pages and analyze the text yourself, in order to see whether it really changes every once in a while.anyway, i think you should use regexes on the actual html code... i believe it's easier to extract bits of text by using the tag structure of a html documenthope it helpsRaduedit: typo Edited September 2, 2008 by mc83 Link to comment Share on other sites More sharing options...
dvlkn Posted September 2, 2008 Author Share Posted September 2, 2008 Thanks for the advice, I'll run a couple more of tests. I am pretty sure my coding is good though because when I used a msgbox to return a value, it was correct and worked every time. Link to comment Share on other sites More sharing options...
Szhlopp Posted September 2, 2008 Share Posted September 2, 2008 Thanks for the advice, I'll run a couple more of tests. I am pretty sure my coding is good though because when I used a msgbox to return a value, it was correct and worked every time. Check @error right after the StringRegEx. Also use UBound to see how big the array is. Try this too... Text: You can buy a maximum 54 shares or you could buy a minimum of 640 =P You can buy a maximum 59shares and minimum 70. A maximum of 100! shares and minimum70?. Pattern: (?:maximum|minimum) (?:of)?\s?([0-9]*)\s? Flag: 3 Always returns what you want=) RegEx/RegExRep Tester!Nerd Olympics - Community App!Login UDFMemory UDF - "Game.exe+753EC" - CE pointer to AU3Password Manager W/ SourceDataFiler - Include files in your au3!--- Was I helpful? Click the little green '+' Link to comment Share on other sites More sharing options...
dvlkn Posted September 2, 2008 Author Share Posted September 2, 2008 It's working now, I just had to add some delay even though there is the _IEloadwait function. Thanks everyone, this forum and autoIT rocks! /thread. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now