Description of problem: If (cman) corosync is killed, pacemakerd notices and attempts to shut down pacemaker gracefully. However any clients using the cman API try to call cman_stop_notification() which writes into the now dead pipe and are killed with SIGPIPE. Version-Release number of selected component (if applicable): Version : 3.0.12.1 Vendor: Red Hat, Inc. Release : 32.el6_3.2 Build Date: Tue 20 Nov 2012 06:12:23 EST How reproducible: 100% Steps to Reproduce: 1. Start cman 2. Start pacemaker 3. killall -9 corosync Actual results: Client (crmd) using the cman API terminates with SIGPIPE Expected results: Connection is gracefully closed out. Additional info: Stack trace from crmd... #0 0x00007fd4aace789b in writev () from /lib64/libc.so.6 #1 0x00007fd4abbdc54d in ?? () from /usr/lib64/libcman.so.3 #2 0x00007fd4abbdcd20 in ?? () from /usr/lib64/libcman.so.3 #3 0x00007fd4abbde19c in cman_stop_notification () from /usr/lib64/libcman.so.3 #4 0x00007fd4ad33f112 in terminate_cs_connection () at legacy.c:441 Strace output from crmd... shutdown(14, 2 /* send and receive */) = 0 close(14) = 0 munmap(0x7fac7e959000, 24) = 0 unlink("/dev/shm/qb-stonith-ng-control-8010-8014-11") = -1 ENOENT (No such file or directory) close(13) = 0 close(14) = -1 EBADF (Bad file descriptor) write(3, "Dec 13 04:56:44 [8014] corosync-"..., 146) = 146 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 135) = 135 poll([{fd=11, events=POLLIN}], 1, 0) = 0 (Timeout) shutdown(11, 2 /* send and receive */) = 0 close(11) = 0 munmap(0x7fac7e974000, 40960) = 0 munmap(0x7fac7e97e000, 8248) = 0 munmap(0x7fac7e967000, 40960) = 0 munmap(0x7fac7e971000, 8248) = 0 munmap(0x7fac7e95a000, 40960) = 0 munmap(0x7fac7e964000, 8248) = 0 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 128) = 128 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 134) = 134 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 124) = 124 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 154) = 154 gettimeofday({1355392604, 46118}, NULL) = 0 sendto(4, "<165>Dec 13 04:56:44 crmd[8014]:"..., 95, MSG_NOSIGNAL, NULL, 0) = 95 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 135) = 135 write(3, "Dec 13 04:56:44 [8014] corosync-"..., 125) = 125 futex(0x7fac7ea9b010, FUTEX_WAKE, 1) = 0 gettimeofday({1355392604, 47222}, NULL) = 0 futex(0x7fac7ea9b070, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1355392606, 0}, ffffffff) = -1 ETIMEDOUT (Connection timed out) poll([{fd=6, events=0}], 1, 0) = 1 ([{fd=6, revents=POLLHUP}]) write(3, "Dec 13 04:56:45 [8014] corosync-"..., 126) = 126 writev(9, [{"NAMC\3\0\0\20\24\0\0\0\2\0\0\0\0\0\0\0", 20}], 1) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) ---
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0287.html