Server crashes/hangs

I just wanted to bring everyone up-to-date on the server reliability issues we have had here at WSPRnet the past few months. Unfortunately, for much of that time, I was not very active either at hamming or at server upkeep because of work commitments. However, I am very actively on the case now.

For those who know about such things, the symptom has been that the linux kernel on the serer suddenly dies or hangs without logging anything indication of impending doom beforehand. There was some stack trace information on the virtual console (this server is running on a Linode virtual machine these days), and it has taken me a while for me to set up full logging of console output to an external machine.

Often, these crashes have been at times of heavy load...for instance, I run a bunch of things after 0000Z to compute statistics, perform database backups, etc., and that has been a common time of failure (including tonight). However, sometimes it seems to happen at times of low load as well.

I tried running on a newer experimental kernel, but that showed no improvement. I finally got more information during the crash tonight (0018Z) which leads me to believe something is suddenly allocating huge amounts of memory, and I did find a typo in the mysql configuration file which could cause excess memory usage. We shall see.

I also plan to increase the amount of swap space available, though during normal operation, memory usage isn't an issue. In any case, I will continue to post updates on progress or lack thereof here...the current situation is unacceptable!