Jump to content

Recommended Posts

Posted (edited)

Hello, guys.

I'm having trouble capturing a certain section of text from a webpage (using Internet Explorer).

1. My goal
What I want is to copy a certain section of text from several webpages and then paste it into a Word document. Basically, it's just clicking, dragging (to select the text), copying and pasting it just the way it is into the Word document.

2. Webpage structure (see attached image "Screenshot")
The page consists of many sections of text, each having a small header. The sections vary in size, and that's the root of the problem (it would be easy to use MouseClickDrag to select the text it they all had the same size).

Looking at the attached screenshot (I covered what might possibly be confidential information), the section I want to copy starts in 1 ("Dados do...") and ends in 2 ("Partes do..."). Everything inside the red square. I want to copy and paste with the same formatting.

3. What I tried already
a. Click, drag and copy: doesn't work because the sections of text vary in size
b. _IEBodyReadText: this almost works, but the text loses the format

I was thinking that maybe something could be done with _IEBodyReadHTML and then translating the HTML formatting instructions to Word, but I didn't have any insights so far. Would it work?

Is there a simpler way?

Obs: the section "Movimentações" always comes right after, in case that helps somehow.

 

Versions:
Autoit v3.3.14.5
Windows 7
IE 11

Screenshot.png

Edited by vitorbf
Posted

Thanks for the reply, jdelaney!

Sorry for the lack of knowledge, but would that .text you mentioned contain the current formatting (the same way a copy/paste to Word with mouse selection does)? Because I already know how to copy and paste the text itself by using _IEBodyReadText, but in this way the formatting is lost.

 

I have been looking into the IE functions, and I can feel the answer is right around the corner, but I feel I lack the knowledge in HTML structures and manipulation... Could you just point me in the right direction, like pointing to a specific function or just outlining how I could achieve my goal?

 

Sorry again if I'm abusing. I'm not a programmer and also new to Autoit.

  • vitorbf changed the title to How to copy/paste text of random size from webpage and keep format
Posted

It would be kind of involved.  You'd have to grab all the formatting, and add in that formatting yourself into word.  So it would be overkill for what you are attempting to do.  But it was also be the most accurate way to grab data consistently.

 

#include <IE.au3>
$oIE = _IECreate('https://www.google.com/',1)
$oObj = _IEGetObjById($oIE,'gbqfbb')
ConsoleWrite($oObj.getattribute('value') & @CRLF)
ConsoleWrite($oObj.currentstyle.color & @CRLF)
ConsoleWrite($oObj.currentstyle.fontfamily & @CRLF)
ConsoleWrite($oObj.currentstyle.fontsize & @CRLF)

output:
I'm Feeling Lucky
#757575
arial,sans-serif
13px

 

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Posted

@vitorbf

Maybe like in this example:

 

btw.
:welcome: to the forum.

 

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Posted

@jdelaney

Thanks for the ideia.
That would be hard as hell, though...

@mLipok

Thanks, dude!

I tried your script and used it in the webpage that I wanted, but both "withCSS" and "noCSS" were identical.

 

Maybe it will help if provide the exact word I want the mouse to hover to, along with the HTML containers. See picture below. I tried _IEGetObjByName($oIE, "Movimentações "), but it doesn't seem to work.

image.png.a7e699ca82a666abccfe522ba89e8891.png

 

EXTRA:

I searched even more, and found topics that are more similar to what I want to accomplish, but they are old topics and the examples provided throw errors I do not understand (I believe it may be because of versions changing)

I'll give some links just to define better what I want:

 

 

Posted
10 hours ago, vitorbf said:

I tried your script and used it in the webpage that I wanted, but both "withCSS" and "noCSS" were identical.

Is this website contains <frame...   or <iframe.....   ?

 

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST APIErrorLog.au3 UDF - A logging Library * Include Dependency Tree (Tool for analyzing script relations) * Show_Macro_Values.au3 *

 

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 UDF * ADO.au3 UDF SMTP Mailer UDF * Dual Monitor resolution detection * * 2GUI on Dual Monitor System * _SciLexer.au3 UDF * SciTE - Lexer for console pane

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Good coding practices in AutoIt * 

OpenOffice/LibreOffice/XLS Related: WriterDemo.au3 * XLS/MDB from scratch with ADOX

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * IE in TaskSchedulerIE Embedded Control Versioning (use IE9+ and HTML5 in a GUI) * PDF Related:How to get reference to PDF object embeded in IE * IE on Windows 11

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

I also encourage you to check awesome @trancexx code:  * Create COM objects from modules without any demand on user to register anything. * Another COM object registering stuffOnHungApp handlerAvoid "AutoIt Error" message box in unknown errors  * HTML editor

winhttp.au3 related : * https://www.autoitscript.com/forum/topic/206771-winhttpau3-download-problem-youre-speaking-plain-http-to-an-ssl-enabled-server-port/

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2023-04-24

Posted

@mLipok

No, no <frame> or <iframe>...

 

Anyway, I found a solution. I used Ctrl+F to highlight the words and PixelSearch to find the pixels of the color highlight, and thus getting the coordinates of the beginning of the words.

The solution is so improvised and "ugly" that I don't know if I should share with the forum, but I'll share here anyway. (Should I close the topic?)

 

I did something like this:

1) Hit Ctrl+F and type "Dados do..." (where the selection will start at) in order to make the word be highlighted in the webpage

2) Use PixelSearch looking for the first pixel matching the color of the highlight (0x3399FF)

3) Get the coordinates of the pixel

pic01.png.1d9739ea6b48bfe96ccf5f1acce411a4.png

4) Repeat steps 1 to 3 to find the word that always comes right after what I want (word "Movimentações")

pic02.png.afa23008634da9a275c881cee2fd1c5a.png

5) Use MouseClickDrag using the two coordinates. The selection will include the word in the first coordinate and ignore the word "Movimentações" because it comes after the pixel and the end of the selection. This made the selection be exactly what I wanted. See below.

pic03.png.8a4f412b8a23c98c80f33b8a8e46a3e8.png

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...