Pured Posted August 24, 2018 Posted August 24, 2018 This is my first personal project, and I have been wanting to write a script for it for a while not, but I just never got around to it. Up until now, all my work scripts revolve around pixel checks and mouse clicks. Is there a smart way to to do the following task using the html of the website? The website is http://lang-8.com/ What I want to do: 1- Go through each page 1 by 1, by clicking the "Next" button. 2- Open each entry (e.g. 8月23日 (木)) 3- Easy part: View Page Source -> Copy/Paste The text between these tags and filter out the junk I don't want: <div class='cfx' id='body_show'> <div id='body_show_ori'> 僕は今、「モブサイコ100」というアニメを見ています。見たことがありますか?とても面白いアニメです。このシーンを見て下さい。でも、日本語のバージョンをユーチュブで見つけませんでした。<br/><object width="560" height="315"> <param name="movie" value="https://www.youtube.com/v/gnj9J6bO2c4"></param> <embed src="https://www.youtube.com/v/gnj9J6bO2c4" type="application/x-shockwave-flash" width="560" height="315"></embed> </object> <br/>そのキャラは、他のバカげた「絶技」もあります(笑)。サイキックの真似にしているごとしだけですが。 </div> <div id='body_show_mo'> I'm currently watching an anime called "Mob Psycho 100". Have you seen it? It's a really funny anime. Please watch the following scene. I was unable to find the Japanese version on YouTube though...<br/><object width="560" height="315"> <param name="movie" value="https://www.youtube.com/v/gnj9J6bO2c4"></param> <embed src="https://www.youtube.com/v/gnj9J6bO2c4" type="application/x-shockwave-flash" width="560" height="315"></embed> </object> <br/>That character has other silly "special moves". XD He's nothing more than a con-artist pretending to be a psychic. </div> 4- Hard part: Get all of these things in a nice format: Of which the page source looks like: </div> <div class='correct_sentence_body'> <div style=''> <div> <span class='sentence' id='sentence_7'>そのキャラは、他のバカげた「絶技」もあります(笑)。</span> </div> <div id='corrections_7'> <ul class='correction_field'> <li class='correct '> そのキャラには、他のバカげた「必殺技」もあります(笑)。 <span id='e-correction-67496066'> <span class="nice_pt_status"> <a class="eval_user_links" href="#" onclick="new Ajax.Updater('eval_users_list', '/journals/eval_users/67496066', {asynchronous:true, evalScripts:true, parameters:'authenticity_token=' + encodeURIComponent('98JA0ldsHLF4DiB+nbMCKcfRK1pWuBduv3oGx1SVIh8=')}); return false;">2 people</a> think this correction is good. </span> </span> </li> <li class='correct '> <span class="f_red">この</span>キャラは、他<span class="f_red">にも</span>バカげた「<span class="f_red">必殺</span>技」<span class="f_red">を持って</span>ます(笑)。 <span id='e-correction-67498048'> <span class="nice_pt_status"> </span> I realise I can find these by looking for "sentence_7" or any other number, which I can loop from 1->first-non-found-number. 5- Then, parse the text so that things like <span class="f_red"> turns the text colour red in Word (I will also need strikethrough). I think I could hardcode a lot of that, but it would take me a very long time, and there may be a better way to do this with a few fancy function calls. Things I don't know how to do without doing pixel searches and mouse clicks: 1, 2 and 3. Things I don't know how to do at all: 5. Thank you in advance.
caramen Posted August 24, 2018 Posted August 24, 2018 (edited) Well... I do not want to discurage you, and i ll try to do the oposite...Since the community of this forum will help you to understand everything but wont do the code for you. My advise is... You should do a reproducer of each thing you want to accomplish. Reproducer main you gonna use a function with a very little sample of code to try the function you are running. And learn about it. Get each thing in one topic. And ask what you want to do with the part of the code you do not understand..... So... when you have all the knowledge you can build your final project with your code and if you got probleme we are here for you. I can give you some way to start with./ First thing : Automation of browsers: (look for these subject in the forum research tool) UIA-automation > FF CHROME IE automation FF_UDF.au3 > FF automation WebDriver_UDF.au3 > CHROME automation (Thx @water ) Automation of the text manipulations: (in the helpfile for these) String ClipGet () Send () May someone will add somthing i forgot ... Edited August 24, 2018 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki
Pured Posted August 24, 2018 Author Posted August 24, 2018 @caramen I can code it (apart from #5). It would just take me a long time, so I was asking before I started to see if there were better ways. I will read up on the names you've provided. Thanks. I'll update this thread if I have any issues.
junkew Posted August 25, 2018 Posted August 25, 2018 Better ways will start with faq31 where you will find different udf for clicking navigating scraping screen. For what you have given ie.udf would be a good start FAQ 31 How to click some elements, FAQ 40 Test automation with AutoIt, Multithreading CLR .NET Powershell CMDLets
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now