Bug 1022050 - libqb-0.16.0 breaks pacemaker-1.1.8
Summary: libqb-0.16.0 breaks pacemaker-1.1.8
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libqb
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: David Vossel
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-22 14:40 UTC by Matthew Mosesohn
Modified: 2014-01-09 14:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-09 14:33:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
crm_report data (177.94 KB, application/octet-stream)
2013-10-22 14:45 UTC, Matthew Mosesohn
no flags Details

Description Matthew Mosesohn 2013-10-22 14:40:31 UTC
Description of problem:
libqb-0.16.0 was released accidentally, breaking corosync and pacemaker deployments

Version-Release number of selected component (if applicable):
libqb-0.16.0-1.el6_4.1.x86_64
pacemaker-1.1.8-7.el6.x86_64
corosync-1.4.1-15.el6_4.1.x86_64

Additional info:
Steps to reproduce are a bit tricky to share here, but I can post crm_report data. The fact is this may severely impact several deployments in production now if they accidentally yum update, preventing any changes to existing configuration. 


Devels Fabio Di Nitto and David Vossel confirmed this release was accidental and will remain a problem until either this package is recalled from RHEL HA repository or pacemaker and corosync updates get released.

Comment 1 Matthew Mosesohn 2013-10-22 14:45:43 UTC
Created attachment 815024 [details]
crm_report data

Comment 3 David Vossel 2013-10-22 14:58:39 UTC
The crm_report is lacking the pacemaker log portion.  I need to see that to understand why this is failing.

Comment 4 David Vossel 2013-10-22 20:04:10 UTC
I was able to reproduce this.

This issue has to do with not properly removing the ipc server's client connections from mainloop in pacemaker.  There was a race condition in pacemaker that caused mainloop dispatch a fd that libqb has already told us to remove.

Here are the related upstream patches that resolved this in pacemaker.
https://github.com/ClusterLabs/pacemaker/commit/0628021134835ea1c683d0d70cef2a4112e08404

Libqb's example server implementation received this same change.
https://github.com/ClusterLabs/libqb/commit/1e1397fb22c04e46197873b116c6798892a29ee3

I'm not entirely sure why this appeared stable with the old version of libqb.  There were a couple of libqb reference leaks that were discovered between 0.14.4 and 0.16.0.  My guess is that those leaks covered up the issue.  When I fixed the leaks, the mainloop issue likely appeared causing me to fix that as well.

The end result here is that libqb 0.16.0 should only be used with pacemaker 1.1.10 or greater.

-- Vossel

Comment 5 RHEL Program Management 2013-10-26 17:14:54 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.


Note You need to log in before you can comment on or make changes to this bug.