Bug 887787

Summary: Call to cman_stop_notification() causes termination with SIGPIPE
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Beekhof <abeekhof>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: ccaulfie, cluster-maint, jkortus, lhh, mjuricek, rpeterso, teigland, tlavigne, wanghb8
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cluster-3.0.12.1-47.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:43:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 895654    

Description Andrew Beekhof 2012-12-17 10:52:03 UTC
Description of problem:

If (cman) corosync is killed, pacemakerd notices and attempts to shut down pacemaker gracefully.

However any clients using the cman API try to call cman_stop_notification() which writes into the now dead pipe and are killed with SIGPIPE.

Version-Release number of selected component (if applicable):

Version     : 3.0.12.1                          Vendor: Red Hat, Inc.
Release     : 32.el6_3.2                    Build Date: Tue 20 Nov 2012 06:12:23 EST

How reproducible:

100%

Steps to Reproduce:
1. Start cman
2. Start pacemaker
3. killall -9 corosync
  
Actual results:

Client (crmd) using the cman API terminates with SIGPIPE

Expected results:

Connection is gracefully closed out.

Additional info:

Stack trace from crmd...

#0  0x00007fd4aace789b in writev () from /lib64/libc.so.6
#1  0x00007fd4abbdc54d in ?? () from /usr/lib64/libcman.so.3
#2  0x00007fd4abbdcd20 in ?? () from /usr/lib64/libcman.so.3
#3  0x00007fd4abbde19c in cman_stop_notification () from /usr/lib64/libcman.so.3
#4  0x00007fd4ad33f112 in terminate_cs_connection () at legacy.c:441

Strace output from crmd...

shutdown(14, 2 /* send and receive */)  = 0
close(14)                               = 0
munmap(0x7fac7e959000, 24)              = 0
unlink("/dev/shm/qb-stonith-ng-control-8010-8014-11") = -1 ENOENT (No such file or directory)
close(13)                               = 0
close(14)                               = -1 EBADF (Bad file descriptor)
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 146) = 146
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 135) = 135
poll([{fd=11, events=POLLIN}], 1, 0)    = 0 (Timeout)
shutdown(11, 2 /* send and receive */)  = 0
close(11)                               = 0
munmap(0x7fac7e974000, 40960)           = 0
munmap(0x7fac7e97e000, 8248)            = 0
munmap(0x7fac7e967000, 40960)           = 0
munmap(0x7fac7e971000, 8248)            = 0
munmap(0x7fac7e95a000, 40960)           = 0
munmap(0x7fac7e964000, 8248)            = 0
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 128) = 128
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 134) = 134
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 124) = 124
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 154) = 154
gettimeofday({1355392604, 46118}, NULL) = 0
sendto(4, "<165>Dec 13 04:56:44 crmd[8014]:"..., 95, MSG_NOSIGNAL, NULL, 0) = 95
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 135) = 135
write(3, "Dec 13 04:56:44 [8014] corosync-"..., 125) = 125
futex(0x7fac7ea9b010, FUTEX_WAKE, 1)    = 0
gettimeofday({1355392604, 47222}, NULL) = 0
futex(0x7fac7ea9b070, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1355392606, 0}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
poll([{fd=6, events=0}], 1, 0)          = 1 ([{fd=6, revents=POLLHUP}])
write(3, "Dec 13 04:56:45 [8014] corosync-"..., 126) = 126
writev(9, [{"NAMC\3\0\0\20\24\0\0\0\2\0\0\0\0\0\0\0", 20}], 1) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---

Comment 7 errata-xmlrpc 2013-02-21 07:43:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0287.html