Bug 760419

Summary: Fails to push updated cluster.conf with dbus errors
Product: [Fedora] Fedora Reporter: Madison Kelly <mkelly>
Component: ricciAssignee: Chris Feist <cfeist>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 15CC: cfeist, fdinitto, jpokorny, rmccabe
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-07 20:23:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Madison Kelly 2011-12-06 06:10:42 UTC
Description of problem:

When attempting to push an updated cluster.conf using 'cman_tool version -r', ricci fails with dbus errors;

=====
Dec  5 23:28:21 test-node-1 dbus[783]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.8" (uid=998 pid=2471 comm="/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/") interface="com.redhat.ricci" member="modcluster_rw" error name="(unset)" requested_reply="0" destination="com.redhat.ricci" (uid=0 pid=2359 comm="/usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300 ")
Dec  5 23:28:21 test-node-1 dbus-daemon[783]: dbus[783]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.8" (uid=998 pid=2471 comm="/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/") interface="com.redhat.ricci" member="modcluster_rw" error name="(unset)" requested_reply="0" destination="com.redhat.ricci" (uid=0 pid=2359 comm="/usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300 ")
Dec  5 23:28:21 test-node-1 corosync[1379]:   [QUORUM] Members[3]: 1 2 3
=====

Each target node (nodes other than the one pushing) start emitting;

=====
Dec  6 00:46:42 test-node-2 dbus[848]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.12" (uid=998 pid=3239 comm="/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/") interface="com.redhat.ricci" member="modcluster_rw" error name="(unset)" requested_reply="0" destination="com.redhat.ricci" (uid=0 pid=1121 comm="/usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300 ")
Dec  6 00:46:42 test-node-2 dbus-daemon[848]: dbus[848]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.12" (uid=998 pid=3239 comm="/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/") interface="com.redhat.ricci" member="modcluster_rw" error name="(unset)" requested_reply="0" destination="com.redhat.ricci" (uid=0 pid=1121 comm="/usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300 ")
Dec  6 00:46:42 test-node-2 corosync[1384]:   [CMAN  ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration
Dec  6 00:46:42 test-node-2 corosync[1384]:   [CMAN  ] Can't get updated config version 4: New configuration version has to be newer than current running configuration#012.
Dec  6 00:46:42 test-node-2 corosync[1384]:   [CMAN  ] Activity suspended on this node
Dec  6 00:46:42 test-node-2 corosync[1384]:   [CMAN  ] Error reloading the configuration, will retry every second
Dec  6 00:46:43 test-node-2 corosync[1384]:   [CMAN  ] Unable to load new config in corosync: New configuration version has to be newer than current running configuration
Dec  6 00:46:43 test-node-2 corosync[1384]:   [CMAN  ] Can't get updated config version 4: New configuration version has to be newer than current running configuration#012.
Dec  6 00:46:43 test-node-2 corosync[1384]:   [CMAN  ] Activity suspended on this node
Dec  6 00:46:43 test-node-2 corosync[1384]:   [CMAN  ] Error reloading the configuration, will retry every second
=====

Version-Release number of selected component (if applicable):

ricci-0.18.7-1.fc15.x86_64
cluster 3.1.8 rc

How reproducible:

100%

Steps to Reproduce:
1. Updated the cluster.conf
2. try to push out using cman_tool version -r
3.
  
Actual results:

Fails to push.

Expected results:

Pushes out file

Additional info:

Once I rsync the file to the other nodes, they pick up the changes and the cluster returns to normal.

Comment 1 Jan Pokorný [poki] 2011-12-06 11:49:40 UTC
digimer,

spotting that D-Bus problem, do you have modclusterd installed by the time
of running cman_tool?

Comment 2 Jan Pokorný [poki] 2011-12-06 12:26:35 UTC
(modclusterd is in modcluster package)

Comment 3 Madison Kelly 2011-12-06 16:16:02 UTC
Installing modcluster, starting modclusterd and restarting ricci solved the problem.

I would recommend adding a more verbose error message. The dbus error is cryptic and might not mean much or be of much help to users trying to diagnose this issue.

Cheers

Comment 4 Jan Pokorný [poki] 2011-12-06 16:36:13 UTC
As per discussion on <irc://chat.freenode.net/linux-cluster>, there is
an issue with modcluster package not being installed prior to starting
ricci service on respective nodes.  Under this circumstance, the same
issue will show up when one wants to deploy cluster using luci interface
with "install packages" option selected -- when modcluster package is
installed as part of the process, updates to D-Bus policies are not
propagated to yet-existing ricci's D-Bus connection.

D-Bus error being cryptic is a feature of D-Bus side, not ours :)


As mentioned, workaround is to install modcluster package followed by
(re)starting ricci prior to using ricci's cluster functionality.


Solution can be either (1) setting modcluster packages as a dependency for
ricci as discussed with bug 721109 (public) or (2) there is a patch making
ricci able to restart D-Bus connection when necessary (related but non-public
bug 742345), but this is limited on create-cluster-via-luci scenario only.

Thinking about it, it should be enough something like adding "service ricci
condrestart" to %post in modcluster's spec file -- ricci is most probably
robust enough to handle this, but this would require some (extensive)
testing.


Anyway, reassigning to the same person as with the mentioned bugs.

Comment 5 Jan Pokorný [poki] 2011-12-06 16:44:35 UTC
Another variation of "service ricci condrestart" idea is to add a function
to force D-Bus connection restart to ricci's API and invoke this instead.
Still (1) seems to be the sanest and safest way.

Comment 6 Chris Feist 2011-12-06 17:05:43 UTC
Fixed in ricci-0.18.7-2.fc16 and should be pushed live for fc16 in the next week or two.

Comment 7 Fedora End Of Life 2012-08-07 20:23:04 UTC
This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping