Sign in to follow this  
Followers 0
steve8tch

Script just exits

8 posts in this topic

I have an application that has grown substantially over recent weeks. It is a control and reporting application running on ~1100 headless computers.

The application does 2 things.

It listens for requests for information, and at regular time intervals runs checks looking at process and application configuration information.

We feed information received from these checks to reporting website - effectively - this application acts as our eyes and ears on these computers.

But...

recently some of these instances suddenly started to just stop. Typically 1 per day at present. In the environment I manage - 1 in 1000 instances failing is not going to break it - but from a coding point of view this failure rate is too high.

As you would image - I have put a whole lot of debug code in - trying to find where the issue is and my findings will surprise you. It turns out it was not failing in any of my function statements or in any of the socket functions - but in a "sleep" statement.

Look at code below.

$MainSocket = TCPListen($g_IP, $g_port) ; Create a Listening "SOCKET"
If $MainSocket = -1 Then
    MsgBox(0, "error", "NTserver failed to create a network listening socket" & @CRLF & "error : " & @error & @CRLF)
    Exit
EndIf
While 1 ;create while loop to poll the socket - listening for connections.
    FileWrite($h_debugLog,1)
    $ConnectedSocket = TCPAccept($MainSocket)
    FileWrite($h_debugLog,2)
    If $ConnectedSocket = -1 Then
        FileWrite($h_debugLog,3)
        Sleep(20)
        FileWrite($h_debugLog,4)
        If $min = @min then
            FileWrite($h_debugLog,5)
            continueloop
        Else
            FileWrite($h_debugLog,6)
            .........
        EndIf
    Else
        ............
        ; Do some work
        ;4000 + lines of code
    EndIf
WEnd

I have put some debug markers around the main socket polling loop and in every case, when the code fails, the last marker written to the debug log file is the number "3". It nevers gets to the number "4".

In other words - the code just seems to spontaneously exit during the "sleep(20)" function.

There is no Autoit application error box.

It just quietly exits.

I don't understand this.

Does anyone have an insight as to what might be going on.

Thanks for your interest and help.

Share this post


Link to post
Share on other sites



If I understand you right this is a long-running program. If so then it sounds like a resource is exhausted. Use Process Explorer or a similar program and look at the Handles count and the other miscellaneous statistics and see if you can spot something abnormal. This should obviously be run against a long running instance compared to an instance that's not been running very long.

Share this post


Link to post
Share on other sites

Valik, thanks for helpful reply.

I have been making loads of changes to this code recently (adding some additional functionality - but mainly trying to track down this issue) so current code has not been allowed to run for too long. The first instances of code failing typically occurs after 24 hours of run time..

I have Process Explorer on all the PCs - I will look at handle count on a number of them and report back.

Share this post


Link to post
Share on other sites

Valik

I don't record all the FileOpen or FIleFindFirstFile handles - but I do record the debug file handle.

Here is snippet from logs.

Opened new debuglog file - hnd :1

The handle number is "1".

I have rechecked ( a number of times) - I have no FileOpen or FileFindFirstFile statements left open - they are all closed after use.

Process Explorer shows the following information.

For the instance up the longest

Handles :137

Thread count :4

For an instance I have just restarted

Handles :137

Thread count :4

I have seen some with

Handles :138

Thread count :4

I am not too sure where all the handles come from - I have maybe 20 calls which use FileOpen / FileFindFirstFile that get called on a regular basis - as I said earlier - all these get closed.

Share this post


Link to post
Share on other sites

Check all the other numbers and see if anything looks weird when comparing a long-running versus short-running copy of the program.

Share this post


Link to post
Share on other sites

Valik - an update.

A pattern emerged - and I was able to track it down to coincide with another operation being completed by another application.

This other application - invokes a com control to manipulate a form that can interact with the desktop and when the command was given to this application to exit, it (the OS) seem so have sent out message similar to a logoff message. I noticed that another application (ccApp.exe - part of Symantec AV suite) also exits - as if the OS was closing applications with windows hooks in preparation for a log off.

It turns out - when I looked in my code that I had a few GUI functions in the code (although they were never called - a problem of re-using old code !!!). I "presume" - that because those functions were in the code that Autoit must prepare the code for windowed interaction (even though the code was being run as a service without desktop interaction.)

Anyway - removing those lines of unused code allowed the script to survive these.

I think the reason why the code was always failing at the same point was just probability. The sleep (20) was effectively the loop wait state - and scripts spend 99% of its time in that listening loop.

Anyway - AutoIt is good - no issue.

Thanks :)

ps I look forward to playing with the beta

Share this post


Link to post
Share on other sites

I think you need to provide more detail. Was this unused GUI set up so that closing the GUI would close the entire script?

Share this post


Link to post
Share on other sites

It was all very odd.

These are all headless computers ~ 1100 of them.

Engineers and IT need to use rdp to connect to them if they need to do some work. They all use one account. The computers never get logged off, or, logged on with a different account.

A few weeks ago - I had added some code that would allow us to send a message to the "screen" saying "sorry -I am going to bump you off" - I had used a simple method of writing out a message in a text file and displaying it in notepad - this meant changing the way the service ran to interact with desktop.

But - the application failed at the rate of 1 per day - and I assumed it was some sort of log off activity that was killing of the application due to it running with interacting with desktop, (There were definitely no actual log offs - ) or some subtle interaction that was killing off GUI applications under certain circumstance.

So I commented out the function call, reconfigured the application to run as system account NOT interacting with desktop.

At this point I still had the application failing at the rate of 1 per day - and I put in a huge amount of logging to try and catch an event - and at that point I put my first post into the forum.

The solution to the problem came about be removing the code that referred to the notepad window. Remember this code was not called - but the function did still exist in the code.

I was using WinExist, WinMove, WinClose

Steve

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0