Sign in to follow this  
Followers 0
nihylo

Extract text from several type of documents

4 posts in this topic

Hi,

I apologize if my question has already been asked, I searched a lot but I couldn't find a simple way to check the presence of the text "foobar" in PDF, Word or HTML files.

I'm using AutoIt for almost the first time.

Thanks by advance.

Share this post


Link to post
Share on other sites



You are going to have a tough time with PDF files. They are encoded in a proprietary format and you will have to find a program that can extract all of the contents to a text file.

Share this post


Link to post
Share on other sites

For help earching a solution

0. Read topic in AutoIt help on obj/com reference (below function reference)

1. Word: Search for Word.au3 in this forum for AutoIt samples and in Google for word.application you get zillions of examples on wordautomation object

2. HTML: Search for IE.au3 on the forum and for shdocvw and internetexplorer.application for zillionz of examples

3. PDF: A little harder but look for OCR in this forum and/or press the searchbutton on the adobe reader and use controlsend to send the text to find.

$oExcel = ObjCreate("Excel.Application")
$oWord = ObjCreate("Word.Application")
$oIE = ObjCreate("InternetExplorer.Application")

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0