Coral died this morning ...
bmurray at snf.stanford.edu
Mon Jan 6 16:30:53 PST 2003
I was looking at the logs. I'm puzzled. From the logs it appears that
the Resource Manager could not heartbeat the Admin Manager so the the
Admin Manager assumed it was dead and restarted the servers. The Admin
Manager successfully restarted the servers. However, the Resource Manager
was still unable to contact the Admin Manager and register so the Resource
Manager shut down. At this point, no managers are registered with the
Admin Manager so everything stops. I would say this was clearly a network
problem, but how can that be if all the servers are running on rosen?
Perhaps, it was an ORB failure?
On Mon, 6 Jan 2003, John Shott wrote:
> Bill and Mike:
> Hmmm ... the coral servers appear to have died this morning at about 9:10
> a.m. I think that the admin manager thought that the resource manager had
> died and likely tried to restart. However, it's restart was unsuccessful ...
> if I am not mistaken, it may be because user "admin" (rather than root) tried
> to restart the servers. In any event, by 9:20 or so, I got calls from the lab
> and restarted the servers manually ...
More information about the coral