Sign in to follow this  
Followers 0
fu2m8

Breaking a text file into sections

6 posts in this topic

Hey Guys,

Here at work i've got a text (ldif) file that i would like to extract data from but Im not sure how to do it properly.

Basically the ldif file contains a whole bunch of LDAP information relating to certain workstations, however the only stuff im concerned with currently is the lines that begin with groupMembership: and the data those lines contain.

Currently if i export a single workstations LDAP information i can do a StrInStr function to return me the relevant lines of information i want and dump this basic information into Excel, however what i would hope to be able to do is export a whole container of multiple workstations LDAP information (which i can do however all the information goes into one file) & somehow pull each workstations relevant groupMembership: into its own array/section which i could then populate into Excel (either in its own book per workstation or someother way i chose to format it).

The basic structure of the LDIF file is as follows (note that a whole chunk of irrelevant information has been taken out for simplicity's sake but this is the structure of the text file):

#-------------------------------------------------------------------------------

# This file has been generated on 11.03.2006 at 17:02 from test-server:839

# by Softerra LDAP Browser 2.6 (http://www.ldapbrowser.com)

#-------------------------------------------------------------------------------

version: 1

dn: cn=00003,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=A0738 Trim,ou=Applications,ou=Services,ou=test,o=test

groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A1021 Lawpoint,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test

dn: cn=00004,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=A1021 Lawpoint,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test

dn: cn=00005,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=A0115 CMS Author,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0025 MS Project 2000,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0020 MS Access 2000,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0056 ACL,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test

The text in Red relates to the workstation name (so 3 workstations in this example) and the groupMembership: stuff directly below this is information on what applications are associated to the workstation (delivered through Novell Zenworks). Each different type of workstation seems to be seperated by a Carriage Return (thats what its called right?! :whistle: you know when you push enter... ;) )

To recap I would basically like to know how i could break the 3 workstations (with their relevant data) into their own array/section type thing that i could then use in some other way (i.e for this i would ideally like the information populated into Excel).

Here's the basic (uncommented) code i have for importing a single workstations groupMembership: stuff into an Excel document:

#include <file.au3>
#include <array.au3>
#include <ExcelCOM_UDF.au3>

Dim $group[1]

Dim $aRecords

Dim $y = 1

$iBegin = MsgBox(1, "This will run the test code", "This example will open Excel & write some data from the text file to it. Click OK to begin, or CANCEL to exit.")

If $iBegin = 2 Then Exit        ; Close out if the user clicked "Cancel"
    
$file = FileOpen("Workstations.ldif", 0)

_FileReadToArray("Workstations.ldif",$aRecords)

$oExcel = _ExcelBookNew()

For $x In $aRecords

    If StringInStr($x, "groupMembership:") Then
        If Not StringInStr($x, "Base Applications") Then
    
        _ExcelWriteCellR1C1($oExcel, $y, 2, $x)
        
        $y = $y + 1
        
        EndIf
    
    EndIf
Next

Exit

Thanks for any ideas or insight you guys can provide! :P

Peace

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Well I took a stab at it, and tried to use StringRegExp(), it could easily be done with FileRead/StringSplit/StringInStr (or _FileReadToArray + StringInStr) and a few loops.

If anyone wants to figure out why I can't get the last group feel free:

CODE
$sString = FileRead(@DesktopDir & '\blah.txt')
$aArray = _StringBetween($sString, 'dn:\s', '\n', -1, 1);Get [0] of the 2 dim array (the headers of the workstations)
If Not IsArray($aArray) And MsgBox(64, 'Info', 'No WorkStations Found') Then Exit
Dim $aWKStationData[UBound($aArray)][2]
For $iCC = 0 To UBound($aArray) - 1
    $aWKStationData[$iCC][0] = 'dn: ' & $aArray[$iCC]
    $aData = _StringBetween($sString, 'groupMembership:\s', '\r\n\r\n|$', -1, 1) ;supposed to get the remaining groupmemberships under the headers...
    If IsArray($aData) Then $aWKStationData[$iCC][1] = 'groupMembership: ' & $aData[$iCC]
Next
    
;~ ================== _StringBetween is In the newest 3.2.1.12 beta And _ArrayDisplay2D is my own    =====================
_ArrayDisplay2D($aWKStationData, 'Array Display 2Dim', 0)   
Func _ArrayDisplay2D($aArray, $sTitle = 'Array Display 2Dim', $iBase = 1, $sToConsole = 0)
    If Not IsArray($aArray) Then Return SetError(1, 0, 0)
    Local $sHold = 'Dimension 1 Has:  ' & UBound($aArray, 1) -1 & ' Element(s)' & @LF & _
            'Dimension 2 Has:  ' & UBound($aArray, 2) - 1 & ' Element(s)' & @LF & @LF
    For $iCC = $iBase To UBound($aArray, 1) - 1
        For $xCC = 0 To UBound($aArray, 2) - 1
            $sHold &= '[' & $iCC & '][' & $xCC & ']  = ' & $aArray[$iCC][$xCC] & @LF
        Next
    Next
    If $sToConsole Then Return ConsoleWrite(@LF & $sHold)
    Return MsgBox(262144, $sTitle, StringTrimRight($sHold, 1))
EndFunc
Func _StringBetween($sString, $sStart, $sEnd, $vCase = -1, $iSRE = -1)
    If $iSRE = -1 Or $iSRE = Default Then
        If $vCase = -1 Or $vCase = Default Then 
            $vCase = 0
        Else
            $vCase = 1
        EndIf
        Local $sHold = '', $sSnSStart = '', $sSnSEnd = ''
        While StringLen($sString) > 0
            $sSnSStart = StringInStr($sString, $sStart, $vCase)
            If Not $sSnSStart Then ExitLoop
            $sString = StringTrimLeft($sString, ($sSnSStart + StringLen($sStart)) - 1)
            $sSnSEnd = StringInStr($sString, $sEnd, $vCase)
            If Not $sSnSEnd Then ExitLoop
            $sHold &= StringLeft($sString, $sSnSEnd - 1) & Chr(1)
            $sString = StringTrimLeft($sString, $sSnSEnd)
        WEnd
        If Not $sHold Then Return SetError(1, 0, 0)
        $sHold = StringSplit(StringTrimRight($sHold, 1), Chr(1))
        Local $aArray[UBound($sHold) - 1]
        For $iCC = 1 To UBound($sHold) - 1
            $aArray[$iCC - 1] = $sHold[$iCC]
        Next
        Return $aArray
    Else
        If $vCase = Default Or $vCase = -1 Then 
            $vCase = '(?i)'
        Else
            $vCase = ''
        EndIf
        Local $aArray = StringRegExp($sString, '(?s)' & $vCase & $sStart & '(.*?)' & $sEnd, 3)
        If IsArray($aArray) Then Return $aArray
        Return SetError(1, 0, 0)
    EndIf
EndFunc
;~ =====================================================================================================================
Edited by SmOke_N

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

If I understand the task at hand this should do it?

Func testExtractLDIFRecords()
   Local $data, $regexp, $arr, $i
#region - data   
   $data &= '#-------------------------------------------------------------------------------' & @CRLF
   $data &= '# This file has been generated on 11.03.2006 at 17:02 from test-server:839' & @CRLF
   $data &= '# by Softerra LDAP Browser 2.6 (http://www.ldapbrowser.com)' & @CRLF
   $data &= '#-------------------------------------------------------------------------------' & @CRLF
   $data &= 'version: 1' & @CRLF
   $data &= 'dn: cn=00003,ou=Workstations,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0738 Trim,ou=Applications,ou=Services,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A1021 Lawpoint,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= '' & @CRLF
   $data &= 'dn: cn=00004,ou=Workstations,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A1021 Lawpoint,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= '' & @CRLF
   $data &= 'dn: cn=00005,ou=Workstations,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0115 CMS Author,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0025 MS Project 2000,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0020 MS Access 2000,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0056 ACL,ou=Applications,ou=ZEN,ou=test,o=test' & @CRLF
   $data &= 'groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test'
#endregion   
   $regexp = '(?m)(?s)(dn:.*(\s{4}|$))'
   $arr = StringRegExp($data, $regexp, 3)
   If @error <> 0 Then ConsoleWrite("@error:=" & @error & ", @extended:=" & @extended & @LF)
   If Not IsArray($arr) Then ConsoleWrite("NOT an array" & @LF)
   For $i = 0 to UBound($arr) -1
      ConsoleWrite($arr[$i] & @LF)
   Next 
EndFunc

Share this post


Link to post
Share on other sites

thx Smoke_N and Uten for the quick replies ;)

gee this StringRegExp stuff could do your head in couldn't it lol :whistle:

I made only one slight change to your code Uten however it is returning all the entries for everything in the Workstations.ldif file.

$data = FileRead("Workstations.ldif")

A full entry for 1 Workstation in the LDIF file looks something like this (these weren't shown in the original post):

CODE

dn: cn=00003,ou=Workstations,ou=ZEN,ou=test,o=test

zenwmDisableUserHistory: FALSE

zenimgCompression: 1

zenimgImageFlags: 0

zenzfdVersion: <?xml version="1.0" encoding="UTF-8"?><AgentData><Version>7.0.1.0</Version><VerWriteTime>1162507394</VerWriteTime></AgentData>

zenwmSubnetMask: 255.255.255.0

zenwmMACAddress: 00:15:C5:43:E4:42

zenwmID: a6507acdc884b8d66b144d398db7fd30

wMLastRegisteredTime: 20061102224314Z

wMNAMEComputer: 00003

wMNAMECPU: PENTIUM III

wMNAMEDNS: 00003.commerce.nsw.gov.au

wMNAMEOS: WINXP (5.1 Service Pack 2)

wMNAMEServer: PWS-IS2-NSRV

wMNAMEUser: cn=testuser,ou=AUDIT,ou=ESD,ou=test,o=test

wMNetworkAddress: 192.168.115.16

wMUserHistory: cn=ZEN7Adm1,ou=Department,ou=ZEN7Test,ou=test,o=test

wMUserHistory: cn=testuser1,ou=TIM,ou=ESD,ou=test,o=test

wMUserHistory: cn=AuditG,ou=Department,ou=ZEN7Test,ou=test,o=test

wMUserHistory: cn=testuser,ou=AUDIT,ou=ESD,ou=test,o=test

groupMembership: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test

groupMembership: cn=A0738 Trim,ou=Applications,ou=Services,ou=test,o=test

groupMembership: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A1021 Lawpoint,ou=Applications,ou=ZEN,ou=test,o=test

groupMembership: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test

securityEquals: cn=Base Applications,ou=Workstations,ou=ZEN,ou=test,o=test

securityEquals: cn=A0738 Trim,ou=Applications,ou=Services,ou=test,o=test

securityEquals: cn=A0886 Dial Connect,ou=Applications,ou=ZEN,ou=test,o=test

securityEquals: cn=A1021 Lawpoint,ou=Applications,ou=ZEN,ou=test,o=test

securityEquals: cn=A0533 Corporate Archive Viewer,ou=Applications,ou=ZEN,ou=test,o=test

objectClass: Workstation

objectClass: computer

objectClass: device

objectClass: top

cn: 00003

ACL: 16#subtree#cn=ZEN7 Server Package:General:Workstation Import,ou=Policies,ou=ZEN,ou=test,o=test#[Entry Rights]

ACL: 1#subtree#cn=00003,ou=Workstations,ou=ZEN,ou=test,o=test#[Entry Rights]

ACL: 15#subtree#cn=00003,ou=Workstations,ou=ZEN,ou=test,o=test#[All Attributes Rights]

ACL: 3#entry#[Public]#wMNetworkAddress

ACL: 3#entry#[Public]#zenwmMACAddress

ACL: 3#entry#[Public]#zenwmSubnetMask

ACL: 3#entry#[Public]#groupMembership

and there potentially could be hundreds of these entries in the file.

I'm assuming its the Regular Expression that is returning all the entries when I run the test function so I'll have a go at trying to work it out, but if it's glaringly obvious for any of you pro's on what to change please feel free to pass it on :P .

I was also thinking i could do what Smoke_N mentioned and create a couple of loops that check each line, if dn= (the dn= line(s) means its a new workstation) is found then add every groupMembership: into an array until the next dn= line is found and start a new array for the new workstation.

Thx again for the help (and for any in the future) ;) .

Share this post


Link to post
Share on other sites

Yeah, the regexp stuff can be a real pain :whistle: Specially if it has been a while since you last did it. I think a well crafted regexp will be faster in this case but you might feel more comfy with a loop.

Share this post


Link to post
Share on other sites

Ohh, yeah forgot...

To filter out the lines you dont want you have to play with non capturing groups or do a manual filter in the array loop and use a regexp that capturs lines starting with dn: or groupMembership:

Somthing like

$regexp = '(?m)(dn:.*)$|(groupMembership:.*)$'
And then filter in a loop?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0