Jump to content

Recommended Posts

Posted (edited)

Hi,

I used to batch convert word files (see attached file) to html and it worked fine but recently they've started putting some title page information in a different way.  I use a RegExp after conversion to capture the text associated with, I believe are, inline images using:

<img.*?001.*?002.*?alt="(.*?)"><img.*?(?=</p>)

but this doesn't always work well and because these files come in large batches I'd like to get the text displayed in the html during the Word to html conversion stage is this possible or I'm I barking up the wrong tree?

This is the information on the front page I wish to capture in the html:

2019] UKFTT 0717 (TC)

TC07488

CAPITAL GAINS TAX – Contracts for land and properties to be built in Barbados – Properties not built – Disposal of contractual rights – Whether payments made under contracts prior to disposal of rights gave rise to losses for CGT purposes – Appeal allowed


 

 

Thanks for any help/hints. 

here is a portion of my conversion script (thanks to those such as Water who've helped me with examples in the past)

 

$oDoc = _Word_DocOpen($oWord, $processing & $sFileName)
        ;save file as:
        Select
            Case $convert2 = 1
                Local $i_Format = 10
                _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format)
                If @error <> 0 Then Exit MsgBox($MB_SYSTEMMODAL, "Word UDF: _Word_DocSaveAs Example", _
                        "Error saving the Word document." & @CRLF & "@error = " & @error & ", @extended = " & @extended)

            Case $convert2 = 2
                Local $i_Format = 6
                _Word_DocSaveAs($oDoc, $out & $fn & ".rtf", $i_Format)
            Case $convert2 = 3
                Local $i_Format = 17
                _Word_DocSaveAs($oDoc, $out & $fn & ".pdf", $i_Format)
            Case $convert2 = 4
                Local $i_Format = 10
                _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format)
                Local $i_Format = 6
                _Word_DocSaveAs($oDoc, $out & $fn & ".rtf", $i_Format)
            Case $convert2 = 5
                Local $i_Format = 10
                _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format)
                Local $i_Format = 17
                $iFileExists = FileExists($out & $fn & ".pdf")
                If $iFileExists Then
                    ConsoleWrite("PDF exists:   " & $out & $fn & ".pdf" & @CRLF)
                Else
                    _Word_DocSaveAs($oDoc, $out & $fn & ".pdf", $i_Format)
                EndIf
            Case $convert2 = 6
                Local $i_Format = 10
                _Word_DocSaveAs($oDoc, $out & $fn & ".html", $i_Format)
                Local $i_Format = 6
                _Word_DocSaveAs($oDoc, $out & $fn & ".rtf", $i_Format)
                Local $i_Format = 17
                _Word_DocSaveAs($oDoc, $out & $fn & ".pdf", $i_Format)
        EndSelect
        _Word_DocClose($oDoc, $WdSaveChanges)

 

TC07488.docx

Edited by Jury
typo
Posted
1 hour ago, Jury said:

This is the information on the front page I wish to capture in the html:

Looking at the document, that info is stored in text boxes. You may want to review this thread --

P.S. When I manually saved the file as html, the text you're looking for was present in the resulting file, so it's unclear to me what exactly isn't working for you. You may want to provide more details (Word version, Autoit version, etc) and a short reproducer script that we could actually run to test the issue.

Posted

Thanks Danp2  oh text boxes no wonder my searches didn't come up with anything! 

You've given me some good threads to start with. 

Yes. sorry, I can see the information in the html but it is often in a strange position on the page - sometime even over the top of other text on the page so I'd like to capture it in a variable and have more control over where it is paced.  Again thanks for taking the time.

Joe 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...