Jump to content

Extracting text out of .doc file


trancexx
 Share

Recommended Posts

ManusX, add just below:

$text = StringReplace(BinaryToString($content1), Chr(0), "")

these lines:

$text = StringReplace($text, Chr(2)&Chr(1), "A")
 $text = StringReplace($text, Chr(3)&Chr(1), "a")
 $text = StringReplace($text, Chr(194), "A",0,1)
 $text = StringReplace($text, Chr(226), "a",0,1)
 $text = StringReplace($text, Chr(206), "I",0,1)
 $text = StringReplace($text, Chr(238), "i",0,1)
 $text = StringReplace($text, Chr(24)&Chr(2), "S")
 $text = StringReplace($text, Chr(25)&Chr(2), "s")
 $text = StringReplace($text, Chr(26)&Chr(2), "T")
 $text = StringReplace($text, Chr(27)&Chr(2), "t")
 $text = StringReplace($text, Chr(28)&Chr(32), '"')
 $text = StringReplace($text, Chr(29)&Chr(32), '"')

It will replace those diacritics.

mesale0077, can you post a sample document?

[EDIT] ManusX: I've tested with one of my documents, and replaced all I needed from it.

[EDIT2] Tested with several documents, now it replace all diacritics.

Edited by taietel
Link to comment
Share on other sites

  • Replies 42
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

  • 3 years later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...