Bug 667835

Summary: Dropped connections are not handled correctly in some cases
Product: Red Hat Enterprise Linux 6 Reporter: Lon Hohberger <lhh>
Component: dbusAssignee: David King <dking>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: low Docs Contact:
Priority: low    
Version: 6.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-05 18:58:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix
none
Test case none

Description Lon Hohberger 2011-01-06 22:52:04 UTC
Description of problem:

If the dbus-daemon dies or is killed, it's possible for certain clients to never notice.

Typically, you can figure out that the remote has closed its half of the connection by checking POLLHUP, but the dbus client library does not check for this, it only checks for POLLERR.


Version-Release number of selected component (if applicable): 1.2.24-3.el6

How reproducible: 100%


Steps to Reproduce:
1.  Connect an application which only emits signals.
2.  Kill dbus-daemon
3.  Emit a couple of signals in the client
4.  Check for dbus_connection_get_is_connected()
  
Actual results:

* Every subsequent signal emitted allocates memory, which could eventually run the system out of RAM.
* dbus_connection_get_is_connected() _never_ returns false.

Expected results:

* No memory leak
* Client low-level transport connection structures set to disconnected.  That is, "dbus_connection_get_is_connected()" should return FALSE after the dbus-daemon has died.

Additional info:

This can be worked around in clients by each one polling the dbus connection file descriptor and checking for POLLUP, but should not be necessary, and the dbus documentation expressly disdains doing this sort of thing.

The attached patch resolves the issue.

Comment 2 Lon Hohberger 2011-01-06 22:53:45 UTC
Created attachment 472155 [details]
Fix

Comment 3 RHEL Program Management 2011-01-07 16:01:36 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 Lon Hohberger 2011-01-07 19:12:15 UTC
Created attachment 472280 [details]
Test case

Compile:

  gcc -o dbus-bug dbus-bug.c `pkg-config dbus-1 --cflags` `pkg-config dbus-1 --libs` -D_GNU_SOURCE

usage:

  ./dbus-bug [w]

  (if argv[1] is present, it enables a poll() workaround which is similar
   to the patch provided)

Expected behavior:

   Program should exit after dbus-daemon is killed.

Behavior with unpatched dbus-1.2.24-3.el6:

   Program never exits.

Behavior with patched dbus-1.2.24-3.el6:

   Program exits after the dbus-daemon has been killed.

Behavior in both cases when workaround is enabled:

   Program exits after the dbus-daemon has been killed.

We can use the workaround as provided in this test case if required, however, since we'll have to duplicate it several times (once in each daemon), it makes more sense to fix the dbus library.

Comment 5 Lon Hohberger 2011-01-07 19:15:37 UTC
BTW, if you comment out the following line in dbus-bug.c:

    dbus_connection_set_exit_on_disconnect(dbc, FALSE);

... the behavior of the program does not change in any test case.

Comment 6 Lon Hohberger 2011-01-07 20:00:00 UTC
dbus_connection_flush() detects dbus-daemon's death in a timely manner as well.

Comment 7 Lon Hohberger 2011-01-07 20:22:51 UTC
http://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga580d8766c23fe5f49418bc7d87b67dc6

^^ Also can be used in a thread to detect connection death.

Comment 8 Colin Walters 2011-01-07 21:02:50 UTC
(In reply to comment #6)
> dbus_connection_flush() detects dbus-daemon's death in a timely manner as well.

If you're not using a mainloop to talk to dbus (as "dbus-bug" isn't), then I think the right thing to do is call dbus_connection_flush() after you're done with any dbus_connection_send()s.  This would be line 148 of dbus-bug.c.

The docs for dbus_connection_send() do say that a flush() is unnecessary *if* you're using a mainloop, but they don't make it clear that it really *is* necessary if you're not, otherwise the state is sort of undefined.

I'm not entirely sure about your patch; it smells right but I'd need to truly understand what it could possibly change.

Can you confirm if you add the dbus_connection_flush() that the test case works as expected?

Comment 9 Lon Hohberger 2011-01-10 23:06:05 UTC
dbus_connection_flush() works, as noted in comment #6.

We went the other route for now, adding a thread which calls dbus_connection_read_write() in a loop, which also picks up if the dbus-daemon dies.

I assume both at some level do some sort of non-POLLHUP error checking if the dbus-daemon died (ex: "select() said there was data ready to read, but read returned 0 -> oops, connection died").  I didn't investigate that much.

So, this might be a case of WONTFIX - but in talking with others, not checking POLLHUP seemed to be a bug, so I left this open.

dbus_connection_send() sends a message immediately if it can, and only queues it if it can't send it immediately (by calling poll()).  When it goes to try to send, it calls poll().

If we check POLLHUP, we can immediately detect the connection is dead instead of queueing, waiting for the other thread to wake up do its error detection (which I still need to look at) and handling it later.