Interesting!

Bill Murray bmurray at snf.stanford.edu
Thu Aug 2 00:15:25 PDT 2001


John,

There was something in Sunil's remote access problem description that began
to bother me: " and finally after 10 > minutes gives org.omg.CORBA.
COMM_FAILURE: minor code: 2 completed: No"   I doubt it was actually
10 minutes, but it was way too long for us rely upon the COMM_FAILURE
exception to decide wether a remote client is behind a firewall.  (I will
return to this problem in a minute.)  Then it suddenly dawned on me why
we have periodic slowdowns in the event service.  I believe that on more
than one occasion you have speculated that these slowdowns might be related
to lots of defunct clients.  I think you're right.  If Sunil has to wait
a long time for the COMM_FAILURE, then the event service post method also
probably waits quite some time for the COMM_FAILURE to find and remove
defunct clients.  If you look at the logs, these slowdowns typically
coincide with the removal of lots of defunct clients.  Also the timing,
for example first thing in the morning, coincides with the same situation.
Here's a typical slowdown from the current log where 5 out of 33 clients
are unreachable:

2001-07-18 16:56:14 Posted event: 101 (33, 28) Elapsed time: 183318 + 623 msec
Unsubscribe all: this agent has no subscriptions.
Service queue posted this event -->
2001-07-18 16:56:40 Posted event: 102 (33, 28) Elapsed time: 158309 + 563 msec
Unsubscribe all: this agent has no subscriptions.
Unsubscribe all: this agent has no subscriptions.
Service queue posted this event -->
2001-07-18 16:58:04 Posted event: 102 (33, 28) Elapsed time: 74500 + 562 msec
Service queue posted this event -->
2001-07-18 16:57:06 Posted event: 101 (33, 28) Elapsed time: 133146 + 206 msec

How do we solve this problem?  First, determine why clients are unreachable.
If a user closes the Coral window without using "Exit", we could catch the
window/frame close and unsubscribe.  If the user logs out without closing
or exiting Coral, we could use heartbeats to remove defunct clients.  The
same would work for stale network connections.  I think it may be time
to consider implementing heartbeats.

Now returning to the earlier question, how do we determine if a remote
client is behind a firewall without waiting for a COMM_FAILURE?  We
could ask remote users or place a variable in client configuration files
(FIREWALL=true)?  Any ideas?

The good news is that it appears that our proposed solution will work
once we solve this problem.

Good night,
Bill




More information about the coral mailing list