617310 – dbus_connection_send_with_reply_and_block() takes longer than expected in a virtualized environment

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 617310 - dbus_connection_send_with_reply_and_block() takes longer than expected in a virtualized environment

Summary: dbus_connection_send_with_reply_and_block() takes longer than expected in a v...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	dbus
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	David King
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-22 18:44 UTC by Andrew Beekhof
Modified:	2017-12-06 11:51 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-06 11:51:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andrew Beekhof 2010-07-22 18:44:04 UTC

Description of problem:

ricci uses dbus to perform various jobs, the jobs themselves take approx. 3ms to complete and on bare-metal the call to dbus_connection_send_with_reply_and_block() returns after 10-20ms.

we've noticed however, that the same functionality takes much longer when performed on a virtual machine.  the actions still take 3ms or so, but it seems dbus_connection_send_with_reply_and_block() is slow to find out the job is done.

I rebuilt dbus with  --enable-verbose-mode, added timestamps to the logs and added a few extra log points.

As far as I can tell, the client is in socket_do_iteration() calling:
      poll_res = _dbus_poll (&poll_fd, 1, poll_timeout);

and the dbus server is also calling poll

  time in ms    pid            log message
4205254211 6380: unlock socket_do_iteration pre poll
4205254211 6380:   UNLOCK: _dbus_connection_unlock
4205254211 6380: _dbus_poll real polling
... more messages from 5592...
4205254222 5592: _dbus_poll real polling
4205255238 5592: _dbus_connection_handle_watch start
... more messages from 5592...
4205255254 6380: lock socket_do_iteration post poll

An strace (using the -Tv options) of the server would also seem to confirm this:

poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}], 6, -1) = 1 ([{fd=10, revents=POLLIN}]) <1.012872>

"help?"
 
Version-Release number of selected component (if applicable):

dbus-libs-1.2.24-2.el6.x86_64
dbus-1.2.24-2.el6.x86_64

How reproducible:

Every time

Steps to Reproduce:
1.  Install strace, ricci and modcluster in a virtual machine running rhel6

2. Run: service messagebus start

3. Run: service ricci start

4.  Paste the following into /tmp/job

<batch status="1"><module name="cluster"><request API_version="1.0"><function_call name="get_cluster.conf"></function_call></request></module></batch>

5.  strace -Tv -p $pidofdbus

6.  Run /usr/libexec/ricci/ricci-worker -f /tmp/job
  
Actual results:

One call to poll will take > 1s
Look for the pattern "<1."
For example:
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}], 6, -1) = 1 ([{fd=10, revents=POLLIN}]) <1.012872>


Expected results:

No call to poll takes anywhere near 1s

Additional info:

Comment 1 RHEL Program Management 2010-07-22 18:58:30 UTC

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 2 Andrew Beekhof 2010-07-22 20:07:58 UTC

Sorry, I forgot to mention that /etc/cluster/cluster.conf needs to exist before
executing step 6.
Try pasting the following...

[root@pcmk-4 ~]# cat /etc/cluster/cluster.conf 
<cluster config_version="12" name="blahblah">  
  <fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>  
  <clusternodes>    
    <clusternode name="pcmk-2.beekhof.net" nodeid="1" votes="1"/>      
    <clusternode name="pcmk-3.beekhof.net" nodeid="2" votes="1"/>      
  </clusternodes>  
  <cman expected_votes="1" two_node="1"/>  
  <fencedevices>    
    <fencedevice agent="fence_xvm" name="virt-fence"/>    
  </fencedevices>  
  <rm>    
    <failoverdomains/>      
  </rm>  
</cluster>

Comment 6 Colin Walters 2010-08-06 16:06:03 UTC

First, don't call send_with_reply_and_block if you don't actually need the reply immediately.  Instead do asynchronous calls with _send_with_reply, or if you don't need the reply at all, just don't request one and use _send.

There's some unavoidable latency for round trips with dbus because it requires a few context switches (message sender, bus daemon, message receipient, bus daemon, message sender); in a virtualized environment this is likely to be magnified because the entire virtual OS needs to compete for scheduling with other OSes.

But again, your dbus experience is going to be significantly better if you avoid blocking while waiting for a response and instead do as many requests in parallel as possible.  For example, if you need to call a GetStatus() method on 15 objects, this should be obviously paralellizable.

Comment 7 Andrew Beekhof 2010-09-06 08:52:54 UTC

(In reply to comment #6)
> First, don't call send_with_reply_and_block if you don't actually need the
> reply immediately. 

But we do.  There's no parallelism in the caller, its a single process with a single job to execute.

> There's some unavoidable latency for round trips with dbus because it requires
> a few context switches (message sender, bus daemon, message receipient, bus
> daemon, message sender); in a virtualized environment this is likely to be
> magnified because the entire virtual OS needs to compete for scheduling with
> other OSes.

This was on a 4 core machine with a single KVM guest running.

> But again, your dbus experience is going to be significantly better if you
> avoid blocking while waiting for a response and instead do as many requests in
> parallel as possible.  For example, if you need to call a GetStatus() method on
> 15 objects, this should be obviously paralellizable.

Not really relevant to our use-case I'm afraid.

Comment 8 RHEL Program Management 2011-01-07 15:37:22 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 9 Jan Kurik 2017-12-06 11:51:42 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.