steve8tch Posted May 17, 2009 Share Posted May 17, 2009 I have an application that has grown substantially over recent weeks. It is a control and reporting application running on ~1100 headless computers. The application does 2 things. It listens for requests for information, and at regular time intervals runs checks looking at process and application configuration information. We feed information received from these checks to reporting website - effectively - this application acts as our eyes and ears on these computers. But... recently some of these instances suddenly started to just stop. Typically 1 per day at present. In the environment I manage - 1 in 1000 instances failing is not going to break it - but from a coding point of view this failure rate is too high. As you would image - I have put a whole lot of debug code in - trying to find where the issue is and my findings will surprise you. It turns out it was not failing in any of my function statements or in any of the socket functions - but in a "sleep" statement. Look at code below. $MainSocket = TCPListen($g_IP, $g_port) ; Create a Listening "SOCKET" If $MainSocket = -1 Then MsgBox(0, "error", "NTserver failed to create a network listening socket" & @CRLF & "error : " & @error & @CRLF) Exit EndIf While 1 ;create while loop to poll the socket - listening for connections. FileWrite($h_debugLog,1) $ConnectedSocket = TCPAccept($MainSocket) FileWrite($h_debugLog,2) If $ConnectedSocket = -1 Then FileWrite($h_debugLog,3) Sleep(20) FileWrite($h_debugLog,4) If $min = @min then FileWrite($h_debugLog,5) continueloop Else FileWrite($h_debugLog,6) ......... EndIf Else ............ ; Do some work ;4000 + lines of code EndIf WEnd I have put some debug markers around the main socket polling loop and in every case, when the code fails, the last marker written to the debug log file is the number "3". It nevers gets to the number "4". In other words - the code just seems to spontaneously exit during the "sleep(20)" function. There is no Autoit application error box. It just quietly exits. I don't understand this. Does anyone have an insight as to what might be going on. Thanks for your interest and help. Link to comment Share on other sites More sharing options...
Valik Posted May 17, 2009 Share Posted May 17, 2009 If I understand you right this is a long-running program. If so then it sounds like a resource is exhausted. Use Process Explorer or a similar program and look at the Handles count and the other miscellaneous statistics and see if you can spot something abnormal. This should obviously be run against a long running instance compared to an instance that's not been running very long. Link to comment Share on other sites More sharing options...
steve8tch Posted May 17, 2009 Author Share Posted May 17, 2009 Valik, thanks for helpful reply. I have been making loads of changes to this code recently (adding some additional functionality - but mainly trying to track down this issue) so current code has not been allowed to run for too long. The first instances of code failing typically occurs after 24 hours of run time.. I have Process Explorer on all the PCs - I will look at handle count on a number of them and report back. Link to comment Share on other sites More sharing options...
steve8tch Posted May 17, 2009 Author Share Posted May 17, 2009 Valik I don't record all the FileOpen or FIleFindFirstFile handles - but I do record the debug file handle. Here is snippet from logs. Opened new debuglog file - hnd :1 The handle number is "1". I have rechecked ( a number of times) - I have no FileOpen or FileFindFirstFile statements left open - they are all closed after use. Process Explorer shows the following information. For the instance up the longest Handles :137 Thread count :4 For an instance I have just restarted Handles :137 Thread count :4 I have seen some with Handles :138 Thread count :4 I am not too sure where all the handles come from - I have maybe 20 calls which use FileOpen / FileFindFirstFile that get called on a regular basis - as I said earlier - all these get closed. Link to comment Share on other sites More sharing options...
Valik Posted May 17, 2009 Share Posted May 17, 2009 Check all the other numbers and see if anything looks weird when comparing a long-running versus short-running copy of the program. Link to comment Share on other sites More sharing options...
steve8tch Posted May 22, 2009 Author Share Posted May 22, 2009 Valik - an update. A pattern emerged - and I was able to track it down to coincide with another operation being completed by another application. This other application - invokes a com control to manipulate a form that can interact with the desktop and when the command was given to this application to exit, it (the OS) seem so have sent out message similar to a logoff message. I noticed that another application (ccApp.exe - part of Symantec AV suite) also exits - as if the OS was closing applications with windows hooks in preparation for a log off. It turns out - when I looked in my code that I had a few GUI functions in the code (although they were never called - a problem of re-using old code !!!). I "presume" - that because those functions were in the code that Autoit must prepare the code for windowed interaction (even though the code was being run as a service without desktop interaction.) Anyway - removing those lines of unused code allowed the script to survive these. I think the reason why the code was always failing at the same point was just probability. The sleep (20) was effectively the loop wait state - and scripts spend 99% of its time in that listening loop. Anyway - AutoIt is good - no issue. Thanks ps I look forward to playing with the beta Link to comment Share on other sites More sharing options...
Valik Posted May 22, 2009 Share Posted May 22, 2009 I think you need to provide more detail. Was this unused GUI set up so that closing the GUI would close the entire script? Link to comment Share on other sites More sharing options...
steve8tch Posted May 22, 2009 Author Share Posted May 22, 2009 It was all very odd. These are all headless computers ~ 1100 of them. Engineers and IT need to use rdp to connect to them if they need to do some work. They all use one account. The computers never get logged off, or, logged on with a different account. A few weeks ago - I had added some code that would allow us to send a message to the "screen" saying "sorry -I am going to bump you off" - I had used a simple method of writing out a message in a text file and displaying it in notepad - this meant changing the way the service ran to interact with desktop. But - the application failed at the rate of 1 per day - and I assumed it was some sort of log off activity that was killing of the application due to it running with interacting with desktop, (There were definitely no actual log offs - ) or some subtle interaction that was killing off GUI applications under certain circumstance. So I commented out the function call, reconfigured the application to run as system account NOT interacting with desktop. At this point I still had the application failing at the rate of 1 per day - and I put in a huge amount of logging to try and catch an event - and at that point I put my first post into the forum. The solution to the problem came about be removing the code that referred to the notepad window. Remember this code was not called - but the function did still exist in the code. I was using WinExist, WinMove, WinClose Steve Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now