pranaynanda

Extracting value in list of files using regex

11 posts in this topic

#1 ·  Posted (edited)

Hi!

 

I have a folder which further has multiple folders. In the end, the last node of each recursion contains a set of jboss logs out of which I intend to extract usernames and their counts and write that info to a file. I need help fixing the broken pieces. Here's what I have:

 

#include <File.au3>
#include <Array.au3>
#include <Date.au3>
#include <ListViewEditInput.au3>
#include <GuiListView.au3>
#include <ButtonConstants.au3>
#include <EditConstants.au3>
#include <GUIConstantsEx.au3>
#include <GUIListBox.au3>
#include <ListViewConstants.au3>
#include <WindowsConstants.au3>
#include <Excel.au3>
#include <_FindInFile.au3>
#include <Array.au3>

AutoItSetOption("TrayIconDebug",1)
$Output=@DesktopDir & "/output_users.txt"

    Global $sFolderPath = FileSelectFolder("Select Folder", "","")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.*", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
    #cs
    If @error = 1 Then
            MsgBox(0, "", "No Folders Found.")
            Exit
        EndIf
        If @error = 4 Then
            MsgBox(0, "", "No Files Found.")
            Exit
        EndIf
#ce

        ;For $i=0  to $aFileList[0]
            ConsoleWrite($sFolderPath &@CRLF&$aFileList[0])
            MsgBox(0,"Test",$aFileList[0])
        ;Next

        
        $FileArray=_FileReadToArray($aFileList)

        

        ;For $i = 1 To $FileList[0]
        ;   StringSplit($FileArray[$i]," ")
        ;Next

For $i=0 To UBound($FileArray)-1
$sRegEx= "((\w)(\.)(-?\d*)(\.)(\d*)(\.)(\b" & $UserNames[$i] & "\b))" 
; I also have no idea how to pull only usernames from the info but that's exactly how the regex looks like. Any help with that matter would also be appreciated.
If StringRegExp($sFileContent,$sRegEx,5) Then
        FileWriteLine($Output,$UserNames[$i])
EndIf
Next

I'm trying to debug the code in small parts but I can't comprehend what fails.

 

Edit: I was thinking maybe I would write the logic for count later

Edited by pranaynanda

Share this post


Link to post
Share on other sites



#2 ·  Posted

2 hours ago, pranaynanda said:

a set of jboss logs out of which I intend to extract usernames and their counts

You might post a sample log file, so we can see the content
We currently can't guess what it looks like

BTW flag 5 doesn't exist  ^^

Share this post


Link to post
Share on other sites

#3 ·  Posted

It looks something like this:

 

(http-0.0.0.0:8543-2) E72N7-007584.31672.01.gtlead.00197 - 2017/07/26-14:56:35,709 UTC - #SecretServerName - User authenticated: gtlead

I need to extract that part where it says 'gtlead' and get its count in the directory the file is in.

 

13 hours ago, mikell said:

BTW flag 5 doesn't exist  ^^

Can't be sure but if you say, then it must be true. However, it was working for me in the application that I made previously. But then that was based on reading an excel sheet and finding information.

 

 

Share this post


Link to post
Share on other sites

#4 ·  Posted

More details needed please. I can see "that part where it says 'gtlead' " but what exactly is "its count" ?
In your example what is precisely the expected result ?

1 hour ago, pranaynanda said:

However, it was working for me

Well, if StringRegExp was only used to do a check, note that this one :
StringRegExp($txt, $pattern, 375*2+@computername)
works nicely too - though IMHO this flag doesn't exist either (I'm pretty sure)

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

5 minutes ago, mikell said:

what exactly is "its count" ?

If gtlead occurs more than once in all those files in the directory, I want to know how many times.

Edited by pranaynanda

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Here it is - for the concept. I wrote some comments  :)

#include <File.au3>
#include <Array.au3>

Global $sFolderPath = FileSelectFolder("Select Folder", @scriptdir,"")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.log", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
;_ArrayDisplay($aFileList)

Local $sUsersList
For $i = 1 to $aFileList[0]
     $content = FileRead($aFileList[$i])
     $user = StringRegExpReplace($content, '.*User authenticated: (\S+).*', "$1")
     If not @error Then $sUsersList &= $user & @crlf 
Next
;MsgBox(0, "", $sUsersList)

$aList = StringSplit($sUsersList, @crlf, 3)
  ; create a 2D array
Local $aListCount[UBound($aList)][2], $k = 0
For $i = 0 to UBound($aList)-1
       ; count occurences of each user in the list
    $sUsersList = StringReplace($sUsersList, $aList[$i], "")
    $count = @extended
       ; skip if count already done for this user
    If $count = 0 Then ContinueLoop
       ; fill the 2D array with users and count
    $aListCount[$k][0] = $aList[$i]
    $aListCount[$k][1] = $count
    $k += 1
Next

Redim $aListCount[$k][2]
_ArrayDisplay($aListCount)

 

Edited by mikell

Share this post


Link to post
Share on other sites

#7 ·  Posted

Why does it fetch the entire content here instead of only giving usernames?

 

31 minutes ago, mikell said:
Local $sUsersList
For $i = 1 to $aFileList[0]
     $content = FileRead($aFileList[$i])
     $user = StringRegExpReplace($content, '.*User authenticated: (\w+).*', "$1")
     If not @error Then $sUsersList &= $user & @crlf
Next
MsgBox(0, "", $sUsersList)

 

 

Share this post


Link to post
Share on other sites

#8 ·  Posted

Usually things happen like this

  1. You provide precise informations as much as possible and ask a question
  2. My answer (and/or code) is perforce based on these informations
  3. Next you say "ok, it works" and we are done, OR "it doesn't work" and then you describe the failure with a maximum of details - required

Reason why all I can suggest for the moment is : try to change this line

If not @error Then $sUsersList &= $user & @crlf

to this one

If @extended > 0 Then $sUsersList &= $user & @crlf

 

1 person likes this

Share this post


Link to post
Share on other sites

#9 ·  Posted

On 8/4/2017 at 8:06 PM, mikell said:

Usually things happen like this

  1. You provide precise informations as much as possible and ask a question
  2. My answer (and/or code) is perforce based on these informations
  3. Next you say "ok, it works" and we are done, OR "it doesn't work" and then you describe the failure with a maximum of details - required

Apologies for not have been able to provide sufficient information. Please let me try again.

I have a set of a folders that contain some files. I wish to extract a set of usernames in the file that are in the statements that look like:

 

(http-0.0.0.0:8543-2) E72N7-007584.31672.01.gtlead.00197 - 2017/07/26-14:56:35,709 UTC - #SecretServerName - User authenticated: gtlead

I wish to extract the string 'gtlead' from 'E72N7-007584.31672.01.gtlead.00197'. In my opinion, the regex for this translates to 

((\w)(\.)(-?\d*)(\.)(\d*)(\.)(\b" & $UserNames[$i] & "\b))

where

$UserNames[i]

 can be written as (\w). I don't know how to extract that value. 

 

Right now, on executing the code you helped with, I get entire statements whereas I only need the usernames as per the regex. Please help me with that.

 

Also, out of curiosity, and because I could not understand the help file, what does '$1' mean here? 

$user = StringRegExpReplace($content, '.*User authenticated: (\S+).*', "$1"

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

15 hours ago, pranaynanda said:

I wish to extract the string 'gtlead'

The username exists in 'E72N7-007584.31672.01.gtlead.00197' and also in 'User authenticated: gtlead'. The latter is easier to get  :)
But  if the statements line you wrote is not the only text in the .log files, then the regex must be fixed by adding "(?s)" . This assumes that there is one log file per user

$content = "text" & @crlf & _
" (http-0.0.0.0:8543-2) E72N7-007584.31672.01.gtlead.00197 - 2017/07/26-14:56:35,709 UTC - #SecretServerName - User authenticated: gtlead " & @crlf & "end text"

$user = StringRegExpReplace($content, '(?s).*User authenticated: (\S+).*', "$1")
Msgbox(0,"", $content & @crlf & @crlf & "user is " & $user)

So what does '$1' mean here?
In a StringRegExpReplace , the pattern describes the whole string . The part to match is the group (between parenthesis) and "$1" is the backreference which contains the match
The expression in the code above means :
"replace the whole text by the part where one or more non-space characters which follow the literal 'User authenticated: ' string"


Edit
BUT, if there are several users mentioned in the same .log file then it should rather be like this

#include <File.au3>
#include <Array.au3>

Global $sFolderPath = FileSelectFolder("Select Folder", @scriptdir,"")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.log", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
;_ArrayDisplay($aFileList)

; create global list of all usernames found
Local $sUsersList
For $i = 1 to $aFileList[0]
     $content = FileRead($aFileList[$i])
     $aUsers = StringRegExp($content, 'User authenticated: (\S+)', 3)
     If not @error Then 
        For $k = 0 to UBound($aUsers)-1
             $sUsersList &= $aUsers[$k] & @crlf 
        Next
     EndIf
Next
;MsgBox(0, "", $sUsersList)

$aList = StringSplit($sUsersList, @crlf, 3)
  ; create a 2D array
Local $aListCount[UBound($aList)][2], $k = 0
For $i = 0 to UBound($aList)-1
       ; count occurences of each user in the list
    $sUsersList = StringReplace($sUsersList, $aList[$i], "")
    $count = @extended
       ; skip if count already done for this user
    If $count = 0 Then ContinueLoop
       ; fill the 2D array with users and count
    $aListCount[$k][0] = $aList[$i]
    $aListCount[$k][1] = $count
    $k += 1
Next

Redim $aListCount[$k][2]
_ArrayDisplay($aListCount)

 

Edited by mikell
1 person likes this

Share this post


Link to post
Share on other sites

#11 ·  Posted

Amazing. Thanks. Out of plain curiosity, how can I create a regex and capture a single term or a group of them? 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now