Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 617310 - dbus_connection_send_with_reply_and_block() takes longer than expected in a virtualized environment
dbus_connection_send_with_reply_and_block() takes longer than expected in a v...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: dbus (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: David King
Desktop QE
: RHELNAK
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-22 14:44 EDT by Andrew Beekhof
Modified: 2017-12-06 06:51 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-06 06:51:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrew Beekhof 2010-07-22 14:44:04 EDT
Description of problem:

ricci uses dbus to perform various jobs, the jobs themselves take approx. 3ms to complete and on bare-metal the call to dbus_connection_send_with_reply_and_block() returns after 10-20ms.

we've noticed however, that the same functionality takes much longer when performed on a virtual machine.  the actions still take 3ms or so, but it seems dbus_connection_send_with_reply_and_block() is slow to find out the job is done.

I rebuilt dbus with  --enable-verbose-mode, added timestamps to the logs and added a few extra log points.

As far as I can tell, the client is in socket_do_iteration() calling:
      poll_res = _dbus_poll (&poll_fd, 1, poll_timeout);

and the dbus server is also calling poll

  time in ms    pid            log message
4205254211 6380: unlock socket_do_iteration pre poll
4205254211 6380:   UNLOCK: _dbus_connection_unlock
4205254211 6380: _dbus_poll real polling
... more messages from 5592...
4205254222 5592: _dbus_poll real polling
4205255238 5592: _dbus_connection_handle_watch start
... more messages from 5592...
4205255254 6380: lock socket_do_iteration post poll

An strace (using the -Tv options) of the server would also seem to confirm this:

poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}], 6, -1) = 1 ([{fd=10, revents=POLLIN}]) <1.012872>

"help?"
 
Version-Release number of selected component (if applicable):

dbus-libs-1.2.24-2.el6.x86_64
dbus-1.2.24-2.el6.x86_64

How reproducible:

Every time

Steps to Reproduce:
1.  Install strace, ricci and modcluster in a virtual machine running rhel6

2. Run: service messagebus start

3. Run: service ricci start

4.  Paste the following into /tmp/job

<batch status="1"><module name="cluster"><request API_version="1.0"><function_call name="get_cluster.conf"></function_call></request></module></batch>

5.  strace -Tv -p $pidofdbus

6.  Run /usr/libexec/ricci/ricci-worker -f /tmp/job
  
Actual results:

One call to poll will take > 1s
Look for the pattern "<1."
For example:
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}], 6, -1) = 1 ([{fd=10, revents=POLLIN}]) <1.012872>


Expected results:

No call to poll takes anywhere near 1s

Additional info:
Comment 1 RHEL Product and Program Management 2010-07-22 14:58:30 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 2 Andrew Beekhof 2010-07-22 16:07:58 EDT
Sorry, I forgot to mention that /etc/cluster/cluster.conf needs to exist before
executing step 6.
Try pasting the following...

[root@pcmk-4 ~]# cat /etc/cluster/cluster.conf 
<cluster config_version="12" name="blahblah">  
  <fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>  
  <clusternodes>    
    <clusternode name="pcmk-2.beekhof.net" nodeid="1" votes="1"/>      
    <clusternode name="pcmk-3.beekhof.net" nodeid="2" votes="1"/>      
  </clusternodes>  
  <cman expected_votes="1" two_node="1"/>  
  <fencedevices>    
    <fencedevice agent="fence_xvm" name="virt-fence"/>    
  </fencedevices>  
  <rm>    
    <failoverdomains/>      
  </rm>  
</cluster>
Comment 6 Colin Walters 2010-08-06 12:06:03 EDT
First, don't call send_with_reply_and_block if you don't actually need the reply immediately.  Instead do asynchronous calls with _send_with_reply, or if you don't need the reply at all, just don't request one and use _send.

There's some unavoidable latency for round trips with dbus because it requires a few context switches (message sender, bus daemon, message receipient, bus daemon, message sender); in a virtualized environment this is likely to be magnified because the entire virtual OS needs to compete for scheduling with other OSes.

But again, your dbus experience is going to be significantly better if you avoid blocking while waiting for a response and instead do as many requests in parallel as possible.  For example, if you need to call a GetStatus() method on 15 objects, this should be obviously paralellizable.
Comment 7 Andrew Beekhof 2010-09-06 04:52:54 EDT
(In reply to comment #6)
> First, don't call send_with_reply_and_block if you don't actually need the
> reply immediately. 

But we do.  There's no parallelism in the caller, its a single process with a single job to execute.

> There's some unavoidable latency for round trips with dbus because it requires
> a few context switches (message sender, bus daemon, message receipient, bus
> daemon, message sender); in a virtualized environment this is likely to be
> magnified because the entire virtual OS needs to compete for scheduling with
> other OSes.

This was on a 4 core machine with a single KVM guest running.

> But again, your dbus experience is going to be significantly better if you
> avoid blocking while waiting for a response and instead do as many requests in
> parallel as possible.  For example, if you need to call a GetStatus() method on
> 15 objects, this should be obviously paralellizable.

Not really relevant to our use-case I'm afraid.
Comment 8 RHEL Product and Program Management 2011-01-07 10:37:22 EST
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.
Comment 9 Jan Kurik 2017-12-06 06:51:42 EST
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.