2Toes Posted August 13, 2018 Posted August 13, 2018 Greetings fellow Autoiteers! I've been working with Autoit for about 4 months now. I've created 3 programs thus far.. And the vast majority of what I've learned along the way has came from this forum. So I'd like to start by thanking everybody who has contributed to this forum over the years, and helped compile the vast wealth of knowledge here. This forum alone has helped me tremendously through my journey of learning to code with Autoit. However, the most recent project that I've been working on, has me at a complete loss. And brings me here asking my 1st question in search of a point in the right direction. I have a bunch of mini websites that I've created over the past 8 years or so, and I'm trying to create a program that goes to all of my sites, and grabs Only the sentences that contain specified Keyword Phrases. For example, using Autoit's Basic Example page, we get these 3 lines of text: "This is a simple HTML page with text, links and images. AutoIt is a wonderful automation scripting language. It is supported by a very active and supporting user forum." ...and let's say for example that the specified keyword phrase is "automation scripting". You'll notice that That specific keyword phrase can be found in the 2nd sentence. How can I grab That entire sentence, based on the keyword given, ..while ignoring the rest of the content pulled from the web page using _IEBodyReadText() ? #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IE_Example("basic") Local $sText = _IEBodyReadText($oIE) ConsoleWrite($sText & @CRLF) I know that you can crawl through the page and find the given Keyword Phrase with a basic StringRegExp() function: #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IE_Example("basic") Local $sText = _IEBodyReadText($oIE) ConsoleWrite($sText & @CRLF) $testReg = StringRegExp ( $sText, "automation scripting", 1) ConsoleWrite ('Keyword found: "' & $testReg[0] & '"' & @CRLF) But after that, as far as grabbing the Entire sentence that contains the given Keyword, I'm at a complete loss. Is what I'm trying to accomplish here even possible with Autoit? Any help, or a point in the right direction, would be greatly appreciated!
FrancescoDiMuro Posted August 13, 2018 Posted August 13, 2018 (edited) @2Toes Happy to hear from you those kind words about this amazing community ( I can confirm! ) and the help it gives to everyone kindly ask here By the way, I think that your approach is correct, but the search pattern in the StringRegExp should be filled as much as to fit your request I am sure it is possible, but I'm not a big fan of StringRegExp(), so, I really don't know where to start. Maybe @mikell could give you some tips Edited August 13, 2018 by FrancescoDiMuro Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
2Toes Posted August 13, 2018 Author Posted August 13, 2018 @FrancescoDiMuro Thanks for stopping by to leave a response! I'm with you.. I'm not a big fan of StringRegExp() either! I've been reading through HelpFile and online Tutorials over the past 2 days. And StringRegExp() is a Massive can-of-worms with a lot information to take in and understand. Even after looking into it as much as I have, I'm still unsure where to start ..mainly because I'm just not sure what steps would need to be taken to accomplish such a task. It's all very confusing lol. I appreciate you recommending and tagging mikell to the post for me. Hopefully he/she will be able to send me off in the right direction. Anywho, it's good to hear from ya.. And thanks again for stopping by to leave a response!
mikell Posted August 13, 2018 Posted August 13, 2018 Hum, StringRegExp is magic, everybody knows that. So what you want to do is possible, BUT the requirements need to be precisely defined Example : in the code below, a sentence is strictly defined as "a sequence of characters which are not a dot, beginning at start of text OR after a dot followed by 0 or more white spaces" #Include <Array.au3> $s = "This is a simple HTML page with text, links and images. " & @crlf & _ "AutoIt is a wonderful automation scripting language." & @crlf & _ "It is supported by a very active and supporting user forum." ;$keyword = "automation scripting" ;$keyword = "this" ;$keyword = "is a" $keyword = "and" $r = StringRegExp($s, '(?is)(?:^|\.\s*)([^.]*' & $keyword & '[^.]*)', 3) _ArrayDisplay($r)
2Toes Posted August 13, 2018 Author Posted August 13, 2018 @mikell Wow, that is fantastic!! I don't know what any of that means, or how it works... but it definitely shows me what to look into to better understand the logic behind it. Which is exactly what I was hoping to get here. I cannot thank you enough for that.. Very much appreciated my friend!!
2Toes Posted August 13, 2018 Author Posted August 13, 2018 @mikell Quick follow up question.. What part of that code detects when there is a Period in the sentence? Assuming that not all sentences will end with a Period, or follow a sentence ending with a period, I changed the 1st & 2nd sentence to a '?' and a '!'. #Include <Array.au3> $s = "This is a simple HTML page with text, links and images? " & @crlf & _ "AutoIt is a wonderful automation scripting language!" & @crlf & _ "It is supported by a very active and supporting user forum." ;$keyword = "automation scripting" ;$keyword = "this" $keyword = "is a" ;$keyword = "and" $r = StringRegExp($s, '(?is)(?:^|\.\s*)([^.]*' & $keyword & '[^.]*)', 3) _ArrayDisplay($r) After doing so, the code no longer works properly, and places the entire block of content into a single array Instance. What part of the code would I focus on to handle sentences ending with a '?' and '!' etc? Thank you again for your help!
iamtheky Posted August 14, 2018 Posted August 14, 2018 (edited) strip all your doublespaces, then split by all the punctuation you want it split by. Then you can play with rows as lines however you need; if it will only ever be singular matches you dont even have to go regex at that point. But if there is even a 2nd criteria definitely use whatever mikell posts. #Include <Array.au3> $s = "This is a simple HTML page with text. Links and images, maybe? " & @crlf & _ "AutoIt is a wonderful automation scripting language! " & @crlf & _ "It is supported by a very active and supporting user forum." ;~ $keyword = "Links" ;~ $keyword = "HTML" $keyword = "by a very" ;~ $keyword = "wonderful" msgbox(0, '' , _ArrayExtract(stringsplit(stringstripws($s , 4) , ".?!" , 2) , _ArraySearch(stringsplit(stringstripws($s , 4) , ".?!" , 2) , $keyword , 0 , 0 ,0 , 1))[0]) Edited August 14, 2018 by iamtheky ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
mikell Posted August 14, 2018 Posted August 14, 2018 (edited) 6 hours ago, iamtheky said: But if there is even a 2nd criteria Sorry but I don't agree Local $r, $a = StringSplit(StringStripWS($s , 4) , ".?!" , 2) For $i = 0 to UBound($a)-1 If StringInStr($a[$i], $keyword) Then $r &= StringStripWS($a[$i], 3) & @crlf Next ;Msgbox(0, "", StringTrimRight($r, 2)) $final = StringSplit(StringTrimRight($r, 2), @crlf, 3) _ArrayDisplay($final) 9 hours ago, 2Toes said: the code no longer works properly Of course, if the definition of what is a sentence changes, then the regex must be amended to fit with that Reason why I said "the requirements need to be precisely defined" $r = StringRegExp($s, '(?is)(?:^|[.\?!]\s*)([^.\?!]*' & $keyword & '[^.\?!]*)', 3) Edited August 14, 2018 by mikell
iamtheky Posted August 14, 2018 Posted August 14, 2018 well played. Suppose now I will spend too many minutes trying to pull off that dance without the loop. ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now