Jump to content

Read Extended File Properties via Binary ADS Read


lod3n
 Share

Recommended Posts

I know you can do this with the COM object Shell.Application, but that's not the point of this demo.

I was curious as to the specifics of where NTFS metadata was actually stored. This is the information you see when you right click on a file, go to properties, and then Summary, such as Keywords, Title, Categories, etc.

It seems that Microsoft is not storing the information in the file itself, but rather in a binary file attached to the file via an Alternate Data Stream. The following is a good primer on ADS, and even gives a quick primer on what this script is actually doing:

http://members.cox.net/slatteryt/Streams.html

The following link is a technical analysis of the binary files that Windows is using to store the metadata:

http://sedna-soft.de/summary-information-stream/

I have never dissected and reassembled a binary file like this before, so I am sure my methodology is less than ideal. If you understand what I am doing here, and have some suggestions, I am all ears. The file starts with some headers that point to sections, and the sections point to data blocks, each of which seem to be null terminated. I can't explain it any further than that. :)

The end goal I have in mind for this is to accurately disassemble the 2 metadata files, and then rebuild them, and reapply them to the target file. Eventually this will evolve into a _FileSetMetadata UDF, but I have a long way to go. Microsoft provides no documentation for this, which is kind of cool, in that I seem to be doing something unique.

You must run this in Scite, as it outputs the the Console.

#include <string.au3>

$message = "Select a file that has Metadata properties set"

$filename = FileOpenDialog($message, @DesktopDir & "\", "All (*.*)", 1)

If @error Then
    MsgBox(4096,"","No File(s) chosen")
    Exit
EndIf

; ADS file containing most extended properties
$ads1 = $filename&":"&Chr(5)&"SummaryInformation" 

; ADS file containing the category information
$ads2 = $filename&":"&Chr(5)&"DocumentsummaryInformation"


If Not FileExists($ads1) Then
    MsgBox(16,"Error","The file: " & @CRLF & "   " & $filename & @CRLF & "does not have any metadata properties set. "&@CRLF & @CRLF&"Please choose a different file")
    Exit
EndIf

$propType_1 = 2
$propType_2 = 30
$propType_3 = 30
$propType_4 = 30
$propType_5 = 30
$propType_6 = 30
$propType_7 = 30
$propType_8 = 30
$propType_9 = 30
$propType_10 = 64
$propType_11 = 64
$propType_12 = 64
$propType_13 = 64
$propType_14 = 3
$propType_15 = 3
$propType_16 = 3
$propType_17 = 71
$propType_18 = 30
$propType_19 = 3

$propDesc_1 = "Code page"
$propDesc_2 = "Title"
$propDesc_3 = "Subject"
$propDesc_4 = "Author"
$propDesc_5 = "Keywords"
$propDesc_6 = "Comments"
$propDesc_7 = "Template"
$propDesc_8 = "Last Saved By"
$propDesc_9 = "Revision Number"
$propDesc_10 = "Total Editing Time"
$propDesc_11 = "Last Printed"
$propDesc_12 = "Create Time/Date"
$propDesc_13 = "Last Saved Time/Date"
$propDesc_14 = "Number of Pages"
$propDesc_15 = "Number of Words"
$propDesc_16 = "Number of Characters"
$propDesc_17 = "Thumbnail"
$propDesc_18 = "Name of Creating Application"
$propDesc_19 = "Security"






;16 = Force binary(byte) reading and writing mode with FileRead and FileWrite
;32 = Use Unicode UTF16 Little Endian mode when writing text with FileWrite and FileWriteLine (default is ANSI)
$word = 2
$dword = 4
$byte = 1
$guid = $dword + ($word * 2) + ($byte * 2) + ($byte * 6)

$file = FileOpen($ads1, 16+32)
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file.")
    Exit
EndIf
Global $data = Hex(FileRead($file))
FileClose($file)

; read header

$byteOrderMarkForUTF16LE = Chomp(0x00,$word)
ConsoleWrite("$byteOrderMarkForUTF16LE: " & $byteOrderMarkForUTF16LE & @CRLF)

$streamValidation = Chomp(0x02,$word)
ConsoleWrite("$streamValidation: " & $streamValidation & @CRLF)

$unknownPurpose = Chomp(0x04,$word)
ConsoleWrite("$unknownPurpose: " & $unknownPurpose & @CRLF)

$OSIndicator = Number(LEDec(Chomp(0x06,$word)))
ConsoleWrite("$OSIndicator: " & $OSIndicator & @CRLF)

$streamClassID = Chomp(0x08,$guid )
ConsoleWrite("$streamClassID: " & $streamClassID & @CRLF)

$sectionCount = Dec(LEDec(Chomp(0x18,$dword)))
ConsoleWrite("$sectionCount: " & $sectionCount & @CRLF)


; read section declarations

$sect1ClassID = LEDec(Chomp(0x1c,$guid)) ; not sure this should be LE decoded
ConsoleWrite("$sect1ClassID: " & $sect1ClassID & @CRLF)

$sect1Offset = Dec(LEDec(Chomp(0x2c,$dword)))
ConsoleWrite("$sect1Offset: " & $sect1Offset & @CRLF)

; read first section header

$sect1Length = Dec(LEDec(Chomp($sect1Offset+0x00,$dword)))
ConsoleWrite("$sect1Length: " & $sect1Length & @CRLF)

$sect1PropCount = Dec(LEDec(Chomp($sect1Offset+0x04,$dword)))
ConsoleWrite("$sect1PropCount: " & $sect1PropCount & @CRLF)

; read Property declarations

$cursor = 0x04
For $i = 1 To $sect1PropCount
    
    ConsoleWrite("------------------" & @crlf)
    
    ; read property ID and Offset from Section Header
    
    $cursor += $dword
    $PropId = Dec(LEDec(Chomp($sect1Offset+$cursor,$dword)))
    $cursor += $dword
    $PropOffset = Dec(LEDec(Chomp($sect1Offset+$cursor,$dword)))
    $realPropOffset = $sect1Offset+$PropOffset
    $propType = PropGetType($propID)
    
    
    $propDesc = PropGetDesc($propID)
    
    Switch $propType
        Case 2  
            ConsoleWrite("2 byte signed integer" & @CRLF)
        Case 3  
            ConsoleWrite("4 byte signed integer" & @CRLF)
        Case 30 
            ConsoleWrite("null-terminated string prepended by dword string length" & @CRLF)
        Case 64 
            ConsoleWrite("Filetime (64-bit value representing the number of 100-nanosecond intervals since January 1, 1601)" & @CRLF)
        Case 71 
            ConsoleWrite("Clipboard format" & @CRLF)
        Case Else
            ConsoleWrite("Unknown Type" & @CRLF)
    EndSwitch

    ConsoleWrite($i & " $PropId: " & $PropId & @CRLF)
    ConsoleWrite($i & " $realPropOffset: " & $realPropOffset & @CRLF)
    ConsoleWrite($i & " $propType: " & $propType & @CRLF)
    
    
    $ptype = Dec(LEDec(Chomp($realPropOffset,$dword))) ; always 31? WTF?
    ConsoleWrite("$ptype = " & $ptype & @CRLF)
    $plen = Dec(LEDec(Chomp($realPropOffset+$dword,$dword)))
    ConsoleWrite("$plen = " & $plen & @CRLF)
    
    $pdataLoc = $realPropOffset+$dword+$dword
    $pcursor = 0 
    $propStringValue = ""
    While 1
        $char = Dec(LEDec(Chomp($pdataLoc+$pcursor,$word)))
        If $char = "0000" Then ExitLoop
        $propStringValue &= Chr($char)
        $pcursor += $word
    WEnd
    ConsoleWrite("! " & $propDesc & ": " &$propStringValue & @CRLF)
    
Next




;read a number of hex characters from a given offset point
Func Chomp($offset,$length)
    ;ConsoleWrite($offset & ": ")
    $charPosition = ($offset*2)+1
    $charLen = $length*2
    Return StringMid($data,$charPosition,$charLen)
EndFunc

; little endian decoder
Func LEDec($hexstring)
    Local $output = ""
    For $i = 1 To StringLen($hexstring) Step 2
        $output = StringMid($hexstring,$i,2) & $output
    Next
    Return $output
EndFunc



Func PropGetType($propID)
    If IsDeclared ("propType_"&$propID) Then
        Return Eval("propType_"&$propID)
    Else
        Return ""
    EndIf
EndFunc

Func PropGetDesc($propID)
    If IsDeclared ("propDesc_"&$propID) Then
        Return Eval("propDesc_"&$propID)
    Else
        Return ""
    EndIf
EndFunc



Exit
ConsoleWrite("---------------------" & @CRLF)
$count = 0
While StringLen($data) > 0
    $char = StringLeft($data,2)
    $data = StringTrimLeft($data,2)
    $ascii = StringStripCR(_HexToString($char))
    ConsoleWrite($count & @TAB & $char & @TAB & $ascii & @CRLF)
    $count += 1
WEnd

[font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font]

Link to comment
Share on other sites

Link to comment
Share on other sites

Right click on the file, and select Properties. Then click on the Summary Tab. Add text to the Title, Subject, Author, Category, Keywords and Comments fields and click OK. Not all files provide a Summary tab, for some reason.

Also you need to be running NT or later, and your hard drive must already be NTFS formatted.

Edited by lod3n

[font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font]

Link to comment
Share on other sites

  • 1 month later...

Yes I have thought about that: "The end goal I have in mind for this is to accurately disassemble the 2 metadata files, and then rebuild them, and reapply them to the target file. Eventually this will evolve into a _FileSetMetadata UDF, but I have a long way to go."

My attempts so far have not generated good results, but once they do, I will post something. The problem is I have no motivation to work on this right now, as I have no project that would benefit from it.

[font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font]

Link to comment
Share on other sites

Nice idea in the way of using ads for extended properties on files with a autoit based udf.

I used this page to get an idea about using ADS http://www.irongeek.com/i.php?page=security/altds

I found the link to that page in this thread http://www.autoitscript.com/forum/index.ph...5222&hl=ADS

The only thing that sorta turned me off using ads more was the idea of loosing the ads data if if I copied my file to a fat32 drive (eg: memory stick).

Cheers

Link to comment
Share on other sites

This article at Desaware should give insight and keywords to search for for anyone interested in MS alternate data streams. Dan Appleman and Desaware was quite early at exploring and creating support for this "technology". But the 1.0 component was quite buggy. Burned my fingers a bit on that one..:)
Link to comment
Share on other sites

I have the ADS portion of this is all sorted out, that's not the difficulty at all. :)

What I struggle with is the issue of manipulating the actual hex in the binary metadata files themselves. It's not like they're INIs or something, and the 3rd party documentation that I linked to above is not 100% accurate in describing the contents of the two files. If anyone is good at decrypting proprietary undocumented binary file formats from Microsoft, please take a look at how this works, and let me know if you have any suggestions or improvements.

If the fact that the binary files are stored in ADS is problematic for anyone, here is a small function to extract them so you can hack on them in your favorite hex editor:

$filename = @ScriptDir&"\file.flv"

$AdsSrc = $filename &":"&Chr(5)& "SummaryInformation"
$BinTarget = $filename & "_SummaryInformation.bin"
_ExtractAdsFile($AdsSrc,$BinTarget)

$AdsSrc = $filename &":"&Chr(5)&"DocumentsummaryInformation"
$BinTarget = $filename & "_DocumentsummaryInformation.bin"
_ExtractAdsFile($AdsSrc,$BinTarget)

Func _ExtractAdsFile($src,$target)
    Local $fhSrc = FileOpen($src, 16+32)
    If $fhSrc = -1 Then 
        ConsoleWrite("! Could not open " & $src & " for reading" & @CRLF)
        Return False
    EndIf
    Local $data = FileRead($fhSrc)
    FileClose($fhSrc)
    Local $fhTarg = FileOpen($target, 16+32+8+2)
    If $fhTarg = -1 Then 
        ConsoleWrite("! Could not open " & $target & " for writing" & @CRLF)
        Return False
    EndIf
    FileWrite($fhTarg,$data)
    FileClose($fhTarg)
EndFunc

Attached are the extracted metadata files that this function produced, and a screenshot of the data that they contain.

post-14785-1189632671_thumb.png

metadata.zip

[font="Fixedsys"][list][*]All of my AutoIt Example Scripts[*]http://saneasylum.com[/list][/font]

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...