bmurray at snf.stanford.edu
Sun Mar 3 22:49:54 PST 2002
I took Miranda and a friend to the beach today and didn't get home until
about 8pm. I checked the logs then and noticed that problems began at
about 12:30pm today with transaction 20796 which never completed. The
clients can still start, but there have been no heartbeats recorded in
the database since this time. I've been tracking down the problem for
the last few hours. I'm still looking.
We've had other problems from the moment we started the servers. I don't
know how I missed these few lines in the admmgr log:
What this means is that we have old and new code running in the production
version of the admin manager. The AgentMonitor appears to be using a
an updateStatus signature that doesn't exist. The compiler should have
picked this up. So I suspect there may be a dependency that Troy and I
didn't capture in the makefiles. A clean make should help me find this one.
However, this is complicating my search for the "real" problem because
this error appears to mean that the AgentMonitor hasn't been running.
I'll keep searching. At some point, this evening I'll probably have
to move the logs and restart everything. I would like to watch and see
what eventually causes everything to stop although I may have enough info
to track this down now.
More information about the coral