Bug 887787 - Call to cman_stop_notification() causes termination with SIGPIPE
Summary: Call to cman_stop_notification() causes termination with SIGPIPE
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Fabio Massimo Di Nitto
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 895654
TreeView+ depends on / blocked
 
Reported: 2012-12-17 10:52 UTC by Andrew Beekhof
Modified: 2013-02-21 07:43 UTC (History)
9 users (show)

Fixed In Version: cluster-3.0.12.1-47.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-21 07:43:27 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0287 normal SHIPPED_LIVE cluster and gfs2-utils bug fix and enhancement update 2013-02-20 20:36:42 UTC

Description Andrew Beekhof 2012-12-17 10:52:03 UTC
Description of problem:

If (cman) corosync is killed, pacemakerd notices and attempts to shut down pacemaker gracefully.

However any clients using the cman API try to call cman_stop_notification() which writes into the now dead pipe and are killed with SIGPIPE.

Version-Release number of selected component (if applicable):

Version     : 3.0.12.1                          Vendor: Red Hat, Inc.
Release     : 32.el6_3.2                    Build Date: Tue 20 Nov 2012 06:12:23 EST

How reproducible:

100%

Steps to Reproduce:
1. Start cman
2. Start pacemaker
3. killall -9 corosync
  
Actual results:

Client (crmd) using the cman API terminates with SIGPIPE

Expected results:

Connection is gracefully closed out.

Additional info:

Stack trace from crmd...

#0  0x00007fd4aace789b in writev () from /lib64/libc.so.6
#1  0x00007fd4abbdc54d in ?? () from /usr/lib64/libcman.so.3
#2  0x00007fd4abbdcd20 in ?? () from /usr/lib64/libcman.so.3
#3  0x00007fd4abbde19c in cman_stop_notification () from /usr/lib64/libcman.so.3
#4  0x00007fd4ad33f112 in terminate_cs_connection () at legacy.c:441

Strace output from crmd...

shutdown(14, 2 /* send and receive */)  = 0
close(14)                               = 0
munmap(0x7fac7e959000, 24)              = 0
unlink("/dev/shm/qb-stonith-ng-control-8010-8014-11") = -1 ENOENT (No such file or directory)
close(13)                               = 0
close(14)                               = -1 EBADF (Bad file descriptor)
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 146) = 146
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 135) = 135
poll([{fd=11, events=POLLIN}], 1, 0)    = 0 (Timeout)
shutdown(11, 2 /* send and receive */)  = 0
close(11)                               = 0
munmap(0x7fac7e974000, 40960)           = 0
munmap(0x7fac7e97e000, 8248)            = 0
munmap(0x7fac7e967000, 40960)           = 0
munmap(0x7fac7e971000, 8248)            = 0
munmap(0x7fac7e95a000, 40960)           = 0
munmap(0x7fac7e964000, 8248)            = 0
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 128) = 128
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 134) = 134
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 124) = 124
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 154) = 154
gettimeofday({1355392604, 46118}, NULL) = 0
sendto(4, "<165>Dec 13 04:56:44 crmd[8014]:"..., 95, MSG_NOSIGNAL, NULL, 0) = 95
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 135) = 135
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 125) = 125
futex(0x7fac7ea9b010, FUTEX_WAKE, 1)    = 0
gettimeofday({1355392604, 47222}, NULL) = 0
futex(0x7fac7ea9b070, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1355392606, 0}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
poll([{fd=6, events=0}], 1, 0)          = 1 ([{fd=6, revents=POLLHUP}])
write(3, "Dec 13 04:56:45 [8014] corosync-"..., 126) = 126
writev(9, [{"NAMC\3\0\0\20\24\0\0\0\2\0\0\0\0\0\0\0", 20}], 1) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---

Comment 7 errata-xmlrpc 2013-02-21 07:43:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0287.html


Note You need to log in before you can comment on or make changes to this bug.