Jury Posted December 14, 2019 Posted December 14, 2019 (edited) Hi, I used to batch convert word files (see attached file) to html and it worked fine but recently they've started putting some title page information in a different way. I use a RegExp after conversion to capture the text associated with, I believe are, inline images using: <img.*?001.*?002.*?alt="(.*?)"><img.*?(?=</p>) but this doesn't always work well and because these files come in large batches I'd like to get the text displayed in the html during the Word to html conversion stage is this possible or I'm I barking up the wrong tree? This is the information on the front page I wish to capture in the html: 2019] UKFTT 0717 (TC) TC07488 CAPITAL GAINS TAX – Contracts for land and properties to be built in Barbados – Properties not built – Disposal of contractual rights – Whether payments made under contracts prior to disposal of rights gave rise to losses for CGT purposes – Appeal allowed Thanks for any help/hints. here is a portion of my conversion script (thanks to those such as Water who've helped me with examples in the past) expandcollapse popup$oDoc = _Word_DocOpen($oWord, $processing & $sFileName) ;save file as: Select Case $convert2 = 1 Local $i_Format = 10 _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format) If @error <> 0 Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocSaveAs Example", _ "Error saving the Word document." & @CRLF & "@error = " & @error & ", @extended = " & @extended) Case $convert2 = 2 Local $i_Format = 6 _Word_DocSaveAs($oDoc, $out & $fn & ".rtf", $i_Format) Case $convert2 = 3 Local $i_Format = 17 _Word_DocSaveAs($oDoc, $out & $fn & ".pdf", $i_Format) Case $convert2 = 4 Local $i_Format = 10 _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format) Local $i_Format = 6 _Word_DocSaveAs($oDoc, $out & $fn & ".rtf", $i_Format) Case $convert2 = 5 Local $i_Format = 10 _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format) Local $i_Format = 17 $iFileExists = FileExists($out & $fn & ".pdf") If $iFileExists Then ConsoleWrite("PDF exists: " & $out & $fn & ".pdf" & @CRLF) Else _Word_DocSaveAs($oDoc, $out & $fn & ".pdf", $i_Format) EndIf Case $convert2 = 6 Local $i_Format = 10 _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format) Local $i_Format = 6 _Word_DocSaveAs($oDoc, $out & $fn & ".rtf", $i_Format) Local $i_Format = 17 _Word_DocSaveAs($oDoc, $out & $fn & ".pdf", $i_Format) EndSelect _Word_DocClose($oDoc, $WdSaveChanges) TC07488.docx Edited December 14, 2019 by Jury typo
Danp2 Posted December 14, 2019 Posted December 14, 2019 1 hour ago, Jury said: This is the information on the front page I wish to capture in the html: Looking at the document, that info is stored in text boxes. You may want to review this thread -- P.S. When I manually saved the file as html, the text you're looking for was present in the resulting file, so it's unclear to me what exactly isn't working for you. You may want to provide more details (Word version, Autoit version, etc) and a short reproducer script that we could actually run to test the issue. Latest Webdriver UDF Release Webdriver Wiki FAQs
Jury Posted December 14, 2019 Author Posted December 14, 2019 Thanks Danp2 oh text boxes no wonder my searches didn't come up with anything! You've given me some good threads to start with. Yes. sorry, I can see the information in the html but it is often in a strange position on the page - sometime even over the top of other text on the page so I'd like to capture it in a variable and have more control over where it is paced. Again thanks for taking the time. Joe
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now