Jump to content

Recommended Posts

Posted

This is my first personal project, and I have been wanting to write a script for it for a while not, but I just never got around to it.
Up until now, all my work scripts revolve around pixel checks and mouse clicks.

Is there a smart way to to do the following task using the html of the website? The website is http://lang-8.com/lang8.thumb.png.2d066ec704a2875e56bb6d506125ee83.png

What I want to do:
1- Go through each page 1 by 1, by clicking the "Next" button.
2- Open each entry (e.g. 8月23日 (木))
3- Easy part: View Page Source -> Copy/Paste The text between these tags and filter out the junk I don't want:

<div class='cfx' id='body_show'>
<div id='body_show_ori'>
僕は今、「モブサイコ100」というアニメを見ています。見たことがありますか?とても面白いアニメです。このシーンを見て下さい。でも、日本語のバージョンをユーチュブで見つけませんでした。<br/><object width="560" height="315">
<param name="movie" value="https://www.youtube.com/v/gnj9J6bO2c4"></param>
<embed src="https://www.youtube.com/v/gnj9J6bO2c4" type="application/x-shockwave-flash" width="560" height="315"></embed>
</object>
<br/>そのキャラは、他のバカげた「絶技」もあります(笑)。サイキックの真似にしているごとしだけですが。
</div>
<div id='body_show_mo'>
I&#39;m currently watching an anime called &quot;Mob Psycho 100&quot;. Have you seen it? It&#39;s a really funny anime. Please watch the following scene. I was unable to find the Japanese version on YouTube though...<br/><object width="560" height="315">
<param name="movie" value="https://www.youtube.com/v/gnj9J6bO2c4"></param>
<embed src="https://www.youtube.com/v/gnj9J6bO2c4" type="application/x-shockwave-flash" width="560" height="315"></embed>
</object>
<br/>That character has other silly &quot;special moves&quot;. XD He&#39;s nothing more than a con-artist pretending to be a psychic.
</div>

4- Hard part: Get all of these things in a nice format:
image.thumb.png.4ad7a05dbc7654357b56fa8ba85a5e57.png
Of which the page source looks like:  

</div>
<div class='correct_sentence_body'>
<div style=''>
<div>
<span class='sentence' id='sentence_7'>そのキャラは、他のバカげた「絶技」もあります(笑)。</span>
</div>
<div id='corrections_7'>
<ul class='correction_field'>
<li class='correct '>
そのキャラには、他のバカげた「必殺技」もあります(笑)。
<span id='e-correction-67496066'>
<span class="nice_pt_status">

  <a class="eval_user_links" href="#" onclick="new Ajax.Updater('eval_users_list', '/journals/eval_users/67496066', {asynchronous:true, evalScripts:true, parameters:'authenticity_token=' + encodeURIComponent('98JA0ldsHLF4DiB+nbMCKcfRK1pWuBduv3oGx1SVIh8=')}); return false;">2 people</a> think this correction is good.

</span>

</span>
</li>
<li class='correct '>
<span class="f_red">この</span>キャラは、他<span class="f_red">にも</span>バカげた「<span class="f_red">必殺</span>技」<span class="f_red">を持って</span>ます(笑)。
<span id='e-correction-67498048'>
<span class="nice_pt_status">

</span>

I realise I can find these by looking for "sentence_7" or any other number, which I can loop from 1->first-non-found-number.

5- Then, parse the text so that things like <span class="f_red"> turns the text colour red in Word (I will also need strikethrough).

 

I think I could hardcode a lot of that, but it would take me a very long time, and there may be a better way to do this with a few fancy function calls.
Things I don't know how to do without doing pixel searches and mouse clicks: 1, 2 and 3.
Things I don't know how to do at all: 5.

Thank you in advance.

Posted (edited)

Well... I do not want to discurage you, and i ll try to do the oposite...Since the community of this forum will help you to understand everything but wont do the code for you.

 

My advise is... You should do a reproducer of each thing you want to accomplish.

Reproducer main you gonna use a function with a very little sample of code to try the function you are running. And learn about it.

 

Get each thing in one topic. And ask what you want to do with the part of the code you do not understand..... So... when you have all the knowledge you can build your final project with your code and if you got probleme we are here for you. 

 

I can give you some way to start with./

First thing : 

Automation of browsers: (look for these subject in the forum research tool) 

UIA-automation  > FF CHROME IE automation

 FF_UDF.au3 >  FF automation 

WebDriver_UDF.au3  > CHROME automation (Thx @water ;) )

Automation of the text manipulations: (in the helpfile for these)

String 

ClipGet () 

Send ()

 

May someone will add somthing i forgot ... :) 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Posted

@caramen I can code it (apart from #5). It would just take me a long time, so I was asking before I started to see if there were better ways.

I will read up on the names you've provided. Thanks.
I'll update this thread if I have any issues.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...