Jump to content
Sign in to follow this  

Extracting text from a doc file

Recommended Posts

There's almost no formatting (except for the header and I believe that's the problem I'm running into). I can do a filereadline (fileread fails to get past ÐÏࡱá) and it gives me everything but the first line... which is what I want to read. If I do a filereadline from lines 1-2 I get

            WARRANTY DEED

What I should've gotten (based on opening it in notepad) was

ÐÏà¡±á                >  þÿ                   b          e      þÿÿÿ    a   ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿì¥Á €      ð¿             €>   bjbjBrBr                       8^      r6      
                       ÿÿ         ÿÿ         ÿÿ                 ·     Â      Â                                       ÿÿÿÿ                  8   Q  ,   }  T         Å+  h  Ñ      Ñ      Ñ      Ñ      Ñ      Ð      Ð      Ð      D+     F+      F+      F+      F+      F+      F+  $   --  ¢  Ï/  ˆ   j+                           Ð                      Ð      Ð      Ð      Ð      j+                          Ñ              Ñ  ÿ   +     !      !      !      Ð  b        Ñ            Ñ      D+              !                                                      Ð      D+              !              !                                                                              !      Ñ      ÿÿÿÿ    âã(WÆÍ              2  ¢  !              0+     •+  0   Å+      !      W0      Ô   @   W0      !                                                                      !     W0                    (!  
  Ð      Ð      !      Ð      Ð                                      Ð      Ð      Ð      j+      j+                                      !                                      Ð      Ð      Ð      Å+      Ð      Ð      Ð      Ð              ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ            ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    ÿÿÿÿ    W0      Ð      Ð      Ð      Ð      Ð      Ð                                                              Ð      Ð      Ð      Â        Ë  :                                                                                                                                                                                                                                                                                                                                                                                                                                               NOV 10, 2012        HALL COUNTY AREA BULLETIN            PAGE        1
            WARRANTY DEED

Opening the file in notepad, the line has a ton of encoding (a lot more than what was outputted by filereadline) and it eventually gets to a regular string, however my output never gets to that string. Any ideas why filereadline won't get to the actual end of the line and stops at what was outputted?

Edited by Mechaflash


“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”


Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...