Jump to content

Extracting value in list of files using regex


Recommended Posts

Hi!

 

I have a folder which further has multiple folders. In the end, the last node of each recursion contains a set of jboss logs out of which I intend to extract usernames and their counts and write that info to a file. I need help fixing the broken pieces. Here's what I have:

 

#include <File.au3>
#include <Array.au3>
#include <Date.au3>
#include <ListViewEditInput.au3>
#include <GuiListView.au3>
#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <GUIListBox.au3>
#include <ListViewConstants.au3>
#include <WindowsConstants.au3>
#include <Excel.au3>
#include <_FindInFile.au3>
#include <Array.au3>

AutoItSetOption("TrayIconDebug",1)
$Output=@DesktopDir & "/output_users.txt"

    Global $sFolderPath = FileSelectFolder("Select Folder", "","")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.*", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
    #cs
    If @error = 1 Then
            MsgBox(0, "", "No Folders Found.")
            Exit
        EndIf
        If @error = 4 Then
            MsgBox(0, "", "No Files Found.")
            Exit
        EndIf
#ce

        ;For $i=0  to $aFileList[0]
            ConsoleWrite($sFolderPath &@CRLF&$aFileList[0])
            MsgBox(0,"Test",$aFileList[0])
        ;Next

        
        $FileArray=_FileReadToArray($aFileList)

        

        ;For $i = 1 To $FileList[0]
        ;   StringSplit($FileArray[$i]," ")
        ;Next

For $i=0 To UBound($FileArray)-1
$sRegEx= "((\w)(\.)(-?\d*)(\.)(\d*)(\.)(\b" & $UserNames[$i] & "\b))" 
; I also have no idea how to pull only usernames from the info but that's exactly how the regex looks like. Any help with that matter would also be appreciated.
If StringRegExp($sFileContent,$sRegEx,5) Then
        FileWriteLine($Output,$UserNames[$i])
EndIf
Next

I'm trying to debug the code in small parts but I can't comprehend what fails.

 

Edit: I was thinking maybe I would write the logic for count later

Edited by pranaynanda
Link to comment
Share on other sites

2 hours ago, pranaynanda said:

a set of jboss logs out of which I intend to extract usernames and their counts

You might post a sample log file, so we can see the content
We currently can't guess what it looks like

BTW flag 5 doesn't exist  ^^

Link to comment
Share on other sites

It looks something like this:

 

(http-0.0.0.0:8543-2) E72N7-007584.31672.01.gtlead.00197 - 2017/07/26-14:56:35,709 UTC - #SecretServerName - User authenticated: gtlead

I need to extract that part where it says 'gtlead' and get its count in the directory the file is in.

 

13 hours ago, mikell said:

BTW flag 5 doesn't exist  ^^

Can't be sure but if you say, then it must be true. However, it was working for me in the application that I made previously. But then that was based on reading an excel sheet and finding information.

 

 

Link to comment
Share on other sites

More details needed please. I can see "that part where it says 'gtlead' " but what exactly is "its count" ?
In your example what is precisely the expected result ?

1 hour ago, pranaynanda said:

However, it was working for me

Well, if StringRegExp was only used to do a check, note that this one :
StringRegExp($txt, $pattern, 375*2+@computername)
works nicely too - though IMHO this flag doesn't exist either (I'm pretty sure)

Link to comment
Share on other sites

Here it is - for the concept. I wrote some comments  :)

#include <File.au3>
#include <Array.au3>

Global $sFolderPath = FileSelectFolder("Select Folder", @scriptdir,"")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.log", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
;_ArrayDisplay($aFileList)

Local $sUsersList
For $i = 1 to $aFileList[0]
     $content = FileRead($aFileList[$i])
     $user = StringRegExpReplace($content, '.*User authenticated: (\S+).*', "$1")
     If not @error Then $sUsersList &= $user & @crlf 
Next
;MsgBox(0, "", $sUsersList)

$aList = StringSplit($sUsersList, @crlf, 3)
  ; create a 2D array
Local $aListCount[UBound($aList)][2], $k = 0
For $i = 0 to UBound($aList)-1
       ; count occurences of each user in the list
    $sUsersList = StringReplace($sUsersList, $aList[$i], "")
    $count = @extended
       ; skip if count already done for this user
    If $count = 0 Then ContinueLoop
       ; fill the 2D array with users and count
    $aListCount[$k][0] = $aList[$i]
    $aListCount[$k][1] = $count
    $k += 1
Next

Redim $aListCount[$k][2]
_ArrayDisplay($aListCount)

 

Edited by mikell
Link to comment
Share on other sites

Why does it fetch the entire content here instead of only giving usernames?

 

31 minutes ago, mikell said:
Local $sUsersList
For $i = 1 to $aFileList[0]
     $content = FileRead($aFileList[$i])
     $user = StringRegExpReplace($content, '.*User authenticated: (\w+).*', "$1")
     If not @error Then $sUsersList &= $user & @crlf
Next
MsgBox(0, "", $sUsersList)

 

 

Link to comment
Share on other sites

Usually things happen like this

  1. You provide precise informations as much as possible and ask a question
  2. My answer (and/or code) is perforce based on these informations
  3. Next you say "ok, it works" and we are done, OR "it doesn't work" and then you describe the failure with a maximum of details - required

Reason why all I can suggest for the moment is : try to change this line

If not @error Then $sUsersList &= $user & @crlf

to this one

If @extended > 0 Then $sUsersList &= $user & @crlf

 

Link to comment
Share on other sites

On 8/4/2017 at 8:06 PM, mikell said:

Usually things happen like this

  1. You provide precise informations as much as possible and ask a question
  2. My answer (and/or code) is perforce based on these informations
  3. Next you say "ok, it works" and we are done, OR "it doesn't work" and then you describe the failure with a maximum of details - required

Apologies for not have been able to provide sufficient information. Please let me try again.

I have a set of a folders that contain some files. I wish to extract a set of usernames in the file that are in the statements that look like:

 

(http-0.0.0.0:8543-2) E72N7-007584.31672.01.gtlead.00197 - 2017/07/26-14:56:35,709 UTC - #SecretServerName - User authenticated: gtlead

I wish to extract the string 'gtlead' from 'E72N7-007584.31672.01.gtlead.00197'. In my opinion, the regex for this translates to 

((\w)(\.)(-?\d*)(\.)(\d*)(\.)(\b" & $UserNames[$i] & "\b))

where

$UserNames[i]

 can be written as (\w). I don't know how to extract that value. 

 

Right now, on executing the code you helped with, I get entire statements whereas I only need the usernames as per the regex. Please help me with that.

 

Also, out of curiosity, and because I could not understand the help file, what does '$1' mean here? 

$user = StringRegExpReplace($content, '.*User authenticated: (\S+).*', "$1"
Link to comment
Share on other sites

15 hours ago, pranaynanda said:

I wish to extract the string 'gtlead'

The username exists in 'E72N7-007584.31672.01.gtlead.00197' and also in 'User authenticated: gtlead'. The latter is easier to get  :)
But  if the statements line you wrote is not the only text in the .log files, then the regex must be fixed by adding "(?s)" . This assumes that there is one log file per user

$content = "text" & @crlf & _
" (http-0.0.0.0:8543-2) E72N7-007584.31672.01.gtlead.00197 - 2017/07/26-14:56:35,709 UTC - #SecretServerName - User authenticated: gtlead " & @crlf & "end text"

$user = StringRegExpReplace($content, '(?s).*User authenticated: (\S+).*', "$1")
Msgbox(0,"", $content & @crlf & @crlf & "user is " & $user)

So what does '$1' mean here?
In a StringRegExpReplace , the pattern describes the whole string . The part to match is the group (between parenthesis) and "$1" is the backreference which contains the match
The expression in the code above means :
"replace the whole text by the part where one or more non-space characters which follow the literal 'User authenticated: ' string"


Edit
BUT, if there are several users mentioned in the same .log file then it should rather be like this

#include <File.au3>
#include <Array.au3>

Global $sFolderPath = FileSelectFolder("Select Folder", @scriptdir,"")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.log", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
;_ArrayDisplay($aFileList)

; create global list of all usernames found
Local $sUsersList
For $i = 1 to $aFileList[0]
     $content = FileRead($aFileList[$i])
     $aUsers = StringRegExp($content, 'User authenticated: (\S+)', 3)
     If not @error Then 
        For $k = 0 to UBound($aUsers)-1
             $sUsersList &= $aUsers[$k] & @crlf 
        Next
     EndIf
Next
;MsgBox(0, "", $sUsersList)

$aList = StringSplit($sUsersList, @crlf, 3)
  ; create a 2D array
Local $aListCount[UBound($aList)][2], $k = 0
For $i = 0 to UBound($aList)-1
       ; count occurences of each user in the list
    $sUsersList = StringReplace($sUsersList, $aList[$i], "")
    $count = @extended
       ; skip if count already done for this user
    If $count = 0 Then ContinueLoop
       ; fill the 2D array with users and count
    $aListCount[$k][0] = $aList[$i]
    $aListCount[$k][1] = $count
    $k += 1
Next

Redim $aListCount[$k][2]
_ArrayDisplay($aListCount)

 

Edited by mikell
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...