Jump to content

Extract Text Of Any File Type


HAMID
 Share

Recommended Posts

This Is a Small Part Of a File :

Resources)\idrc_PMST\1119.idrc–%a88a90fc[iNSTALL

öäÒÀ®œŠxfTB0File–WnstallFile–TôòŸôüÅJõÊGü3þÒ

ËÅ0±è

b_óýk… êãøÿö×&Xð‹É_!aüË~-&ýo÷ÿ³X†

Ü—pœá7«wó›þaOîï+ðpþëßéŠ)…ÖSŠW§Ôߎ~þó¿·çáÌÿ

ïßú{ÔõýÀqÉoáOºÐUD|Uæ‘?LÒQÊ&ñ7˜CRünòiÌ)ú=ýü«˜ôW

éSr³ÿ.þýÜzJ ̈ÛþÒߌç÷á—|¹ø5Îþáÿü×ùöûž}òïÓGûq‘ׄǿö?Bq?“

‡õ—%¿Þ÷ÿ±£ÿáïÿÿèõÑý'ÿ¯Ë?aôgýÿò?ò×ýØüü«

¿eûÝÿìø·~¯?é_øßÿïÇ¿Þ÷ëýÌo÷ÿ„ßõOûíÿ‹¿ëÿ>úö

~ï_ûþ}Š·Þÿù‡üÿü³ßüþè_ôOþËOç¿æ¯ûì7ûM?üØ?þü

ë?ûéo~ò›üÌ_5ýSæOù¿øgþÝ¿ý_ücÿÇ?°ý÷~×W¿î§_ýƒÿùoþ_ÿ±/>

¹ûþë¿ÿ¿ð¿ýóÏ~Í?èwø'þñƒ¿öKÿ¾lô‡ÿ¡ÿóOþGÿÚþïþÕŸù7ü%Úú§ÿÛ

?÷×ý

~zù§ýùÛÙ_ñoþ

How I Can Extract This Words Of File :

Resources , INSTALL , ALL , File

I Mean Extract meaningful Words Of File

I Wrote This Func But I Want This Func Be accurate And Strongly HighSpeed

I Want To Used It For Extract Text Of Any File Type

Please Help Me

#Include <File.au3>

#include <String.au3>

#include <Array.au3>

$SameFiles=_FileListToArray(@ScriptDir , "*.Exe")

LevelOne( $SameFiles)

Func LevelOne( $SameFiles)

;Define Variables

;------------------------------------>

$MemoryFile = @ScriptDir & "\InitialStandardText.Lrn"

;------------------------------------>

;Generate Number For Random Read Files

;---------------------------------------------->

If $SameFiles[0]=1 Then

$Count=1

Dim $Number[1]

$Number[0]="NULL"

ElseIf $SameFiles[0]<6 Then

$Count=Random(1,$SameFiles[0],1)

Dim $Number[$Count]

For $i=0 To $Count-1

$Number[$i]="NULL"

Next

Else

$Count=Random(1,6,1)

Dim $Number[$Count]

For $i=0 To $Count-1

$Number[$i]="NULL"

Next

EndIf

;MsgBox(0, "", $Count)

;_ArrayDisplay($Number)

;---------------------------------------------->

$i=0

While $Number[$Count-1]="NULL"

$Number[$i] = Random(1,Int($SameFiles[0]),1)

$File = $SameFiles[$Number[$i]]

;Read File

;---------------------------------------------->

$OpenedFile=FileOpen($File)

$WritedText=""

$Read=FileRead($OpenedFile )

;---------------------------------------------->

;MsgBox(0,"",$Read)

$ReadArray=StringToASCIIArray($Read)

$Text=""

$k=0

While $k<=UBound($ReadArray)-4 ; -4 For Scrutiny $ReadArray[$k+3] In Loop

If ($ReadArray[$k]=0) Then

$ReadArray[$k]="'"

EndIf

If ($ReadArray[$k]=$ReadArray[$k+2] And $ReadArray[$k+1]=$ReadArray[$k+3] ) Then

$ReadArray[$k]="'"

$ReadArray[$k+1]="'"

$ReadArray[$k+2]="'"

$ReadArray[$k+3]="'"

EndIf

If ($ReadArray[$k]=$ReadArray[$k+1]) And ($ReadArray[$k+1]=$ReadArray[$k+2]) Then

$ReadArray[$k]="'"

$ReadArray[$k+1]="'"

$ReadArray[$k+2]="'"

EndIf

$k+=1

WEnd

;_ArrayDisplay($ReadArray)

For $k=0 To UBound($ReadArray)-1

If StringIsLower (Chr($ReadArray[$k])) Or StringIsUpper(Chr($ReadArray[$k])) Then

$Text=$Text & Chr($ReadArray[$k])

ElseIf (StringLen($Text)<4) And ($ReadArray[$k]<>0) Then

$Text=""

ElseIf (StringLen($Text)>4) And ($ReadArray[$k]<>0) Then

;FileWrite($hLrnInitialFileTypeMemory,$Text & @CRLF ) ; ---------------------> Must Convert To IniWrite

$WritedText = $WritedText & $Text & @CRLF

$Text=""

EndIf

; StringIsASCII ( Chr($ReadArray[$i]) ) ; Contain Digit Characters

;Local $s = StringFromASCIIArray($ReadArray)

;MsgBox(0, "", $s)

;_ArrayDisplay($a)

Next

FileClose($OpenedFile)

FileWrite($MemoryFile,$WritedText & "<<---------->>" & @CRLF )

$i=$i+1

If $i=$Count Then ExitLoop

WEnd

EndFunc

Extract Text Of File.zip

Link to comment
Share on other sites

You are probably going to need some kind of dictionary to compare possible words to.

Doing Dictionary compares on every ASCII combination will be pretty time consuming. Although you could make something that does what you want with AutoIt, I doubt you will be able to make it run very fast.

If you are just trying to scan files for the existence of keywords, you can probably do that with reasonable speed results.

Link to comment
Share on other sites

You are probably going to need some kind of dictionary to compare possible words to.

Doing Dictionary compares on every ASCII combination will be pretty time consuming. Although you could make something that does what you want with AutoIt, I doubt you will be able to make it run very fast.

If you are just trying to scan files for the existence of keywords, you can probably do that with reasonable speed results.

no reason you shouldn't be able to do a FileRead() and use regular expressions to find the characters you're looking for. and it should be pretty quick
Link to comment
Share on other sites

no reason you shouldn't be able to do a FileRead() and use regular expressions to find the characters you're looking for. and it should be pretty quick

I dont think to OP knows the characters, seems to me that just wants english words, which would require a dictionary.

Looks like somthing thats trying to dig up some ledgable details on a compiled executable.

AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Link to comment
Share on other sites

It's easy to get all of the words used however you would then have to compare those to a dictionary word list to get the valid words

Your list of words is incorrect; the word All does not exist although it does appear as part of a string it is not there as a separate word.

The simplest approach would be using StringRegExp.

$aWords = StringRegExp(FileRead("somefile.exe"), "\b[a-z]+", 3)
Then you would probably want to change the array to get only unique elements so you would use _ArrayUnique() to do that.

After that each element would have to be compared to a word list (I have one of almost 89000 words) again using a StringRegExp() call.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

I have a word list in SQLite db format (imported from the OPTED dictionary), with word types and soundex (no definitions) if anyone is interested.

Link to comment
Share on other sites

I have a word list in SQLite db format (imported from the OPTED dictionary), with word types and soundex (no definitions) if anyone is interested.

I'm interested if you don't mind PMing it to me.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

@HAMID

Speed is going to depend entirely on the size of the files being read. There is certainly nothing slow about the SRE I gave you; I've used it many times and in fact use something very similar to replace the _FileReadToArray() function where it proves itself much faster.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Do you mean the replacement for _FileReadToArray()?

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

i'm test this code but this not accurate :

$sFilePath = @ScriptDir & '\' & "MyProg 1.exe"

$hwnd = FileOpen($sFilePath, 0)

If $hwnd = -1 Then Exit

$sRead = FileRead($hwnd)

$aWords = StringRegExp($sRead, '.+\b', 3)

FileClose($hwnd)

_ArrayDisplay($aWords,'$aWords')

------------------------------------------------------------------------

please explain me more about :

Speed is going to depend entirely on the size of the files being read. There is certainly nothing slow about the SRE I gave you; I've used it many times and in fact use something very similar to replace the _FileReadToArray() function where it proves itself much faster.

i'm sorry for my bad english but i'm writing program that have artificial intelligence so i need to your help .

Edited by HAMID
Link to comment
Share on other sites

Your expression is totally incorrect for doing what you wanted. .+ will match ANY character. You asked to match only alpha characters (a-z). Also you are losing time with FileOpen() and FileClose(). You don't have to open a file to read it unless it has to be opened in Binary.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...