Bug 590101
Summary: | cman service engine segfaults in unbind_con (no symbols) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Steven Dake <sdake> | ||||||
Component: | cman | Assignee: | Fabio Massimo Di Nitto <fdinitto> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 5.4 | CC: | cluster-maint, djansa, edamato, fdinitto, mjuricek, rdassen | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | cman-2.0.115-93.el5 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-02-21 05:23:17 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 795814 | ||||||||
Attachments: |
|
Description
Steven Dake
2010-05-07 18:00:35 UTC
That's rather an odd stacktrace. unbind_con is never called from any of the routines listed above it! Also there is nowhere that I can see in unbind_con that can segv - the only pointer is uses is defererenced several times before it is called. Can you run it with proper debugging info? or post reliable reproduction instructions ? #0 unbind_con (con=0xcf4d750) at commands.c:1448 1448 port_array[con->port] = NULL; (gdb) print con->port $1 = 1649748712 (gdb) print con $2 = (struct connection *) 0xcf4d750 (gdb) print *con $3 = {fd = 1649737756, type = 53, port = 1649748712, shutdown_reply = 53, events = 0, confchg = 0, write_msgs = {n = 0x41, p = 0x3562552a38}, num_write_msgs = 1649748536, next = 0x11acc2b8b9f53e60, list = {n = 0x2aaaab0bdb70, p = 0xcf4cf38}} (gdb) up #1 0x00002aaaaaeb3728 in remove_client (handle=<value optimized out>, con=0xcf4d750) at daemon.c:124 124 unbind_con(con); (gdb) up #2 0x00002aaaaaeb38db in send_reply_message (con=0xcf4d750, msg=0x7fffe372b680) at daemon.c:89 89 remove_client(ais_poll_handle, con); (gdb) up #3 0x00002aaaaaeb3a61 in send_status_return (con=0x7fffe372b610, cmd=<value optimized out>, status=1649748712) at daemon.c:421 421 return send_reply_message(con, (struct sock_header *)&msg); (gdb) print con $4 = (struct connection *) 0x7fffe372b610 (gdb) print *con $5 = {fd = 59392, type = CON_COMMS, port = 4294967287, shutdown_reply = 4294967295, events = 3815946064, confchg = 32767, write_msgs = { n = 0x2aaaaaeb3728, p = 0x7fffe372b750}, num_write_msgs = 217372496, next = 0x7fffe372b680, list = {n = 0x2aaaaaeb38db, p = 0x7fffe372b750}} (gdb) up #4 0x00002aaaaaeb6ff5 in send_to_userport (fromport=<value optimized out>, toport=<value optimized out>, nodeid=2, tgtid=<value optimized out>, recv_buf=0x7fffe372b750 "\a�\003", len=4, endian_conv=0) at commands.c:1901 1901 send_status_return(shutdown_con, CMAN_CMD_TRY_SHUTDOWN, 0); (gdb) down #3 0x00002aaaaaeb3a61 in send_status_return (con=0x7fffe372b610, cmd=<value optimized out>, status=1649748712) at daemon.c:421 421 return send_reply_message(con, (struct sock_header *)&msg); (gdb) up #4 0x00002aaaaaeb6ff5 in send_to_userport (fromport=<value optimized out>, toport=<value optimized out>, nodeid=2, tgtid=<value optimized out>, recv_buf=0x7fffe372b750 "\a�\003", len=4, endian_conv=0) at commands.c:1901 1901 send_status_return(shutdown_con, CMAN_CMD_TRY_SHUTDOWN, 0); (gdb) print shutdown_con $6 = (struct connection *) 0xcf4d750 (gdb) print *shutdown_con $7 = {fd = 1649737756, type = 53, port = 1649748712, shutdown_reply = 53, events = 0, confchg = 0, write_msgs = {n = 0x41, p = 0x3562552a38}, num_write_msgs = 1649748536, next = 0x11acc2b8b9f53e60, list = {n = 0x2aaaab0bdb70, p = 0xcf4cf38}} (gdb) up #5 0x00002aaaaaeb4c49 in cman_deliver_fn (nodeid=<value optimized out>, iovec=0x7fffe372b7d0, iov_len=<value optimized out>, endian_conversion_required=0) at ais.c:412 412 send_to_userport(header->srcport, header->tgtport, (gdb) print *header $8 = {tgtport = 0 '\0', srcport = 0 '\0', pad = 25138, flags = 0, srcid = 2, tgtid = 0} (gdb) print *iovec $9 = {iov_base = 0x7fffe372b740, iov_len = 20} (gdb) print sizeof (struct cl_protheader) $10 = 16 (gdb) print buf $11 = 0 (gdb) print iovec->iov_base $12 = (void *) 0x7fffe372b740 (gdb) print header $13 = (struct cl_protheader *) 0x7fffe372b740 (gdb) print *header $14 = {tgtport = 0 '\0', srcport = 0 '\0', pad = 25138, flags = 0, srcid = 2, tgtid = 0} (gdb) print buf $15 = 0 (gdb) down #4 0x00002aaaaaeb6ff5 in send_to_userport (fromport=<value optimized out>, toport=<value optimized out>, nodeid=2, tgtid=<value optimized out>, recv_buf=0x7fffe372b750 "\a�\003", len=4, endian_conv=0) at commands.c:1901 1901 send_status_return(shutdown_con, CMAN_CMD_TRY_SHUTDOWN, 0); (gdb) print to_port No symbol "to_port" in current context. (gdb) up #5 0x00002aaaaaeb4c49 in cman_deliver_fn (nodeid=<value optimized out>, iovec=0x7fffe372b7d0, iov_len=<value optimized out>, endian_conversion_required=0) at ais.c:412 412 send_to_userport(header->srcport, header->tgtport, (gdb) print to_port No symbol "to_port" in current context. (gdb) up #6 0x00000000004152f5 in ?? () (gdb) down #5 0x00002aaaaaeb4c49 in cman_deliver_fn (nodeid=<value optimized out>, iovec=0x7fffe372b7d0, iov_len=<value optimized out>, endian_conversion_required=0) at ais.c:412 412 send_to_userport(header->srcport, header->tgtport, (gdb) print header->srcport $16 = 0 '\0' (gdb) print header->srcid $17 = 2 (gdb) print header->tgtport $18 = 0 '\0' (gdb) print header->tgtid $19 = 0 (gdb) print endian_conversion_required $20 = 0 (gdb) down #4 0x00002aaaaaeb6ff5 in send_to_userport (fromport=<value optimized out>, toport=<value optimized out>, nodeid=2, tgtid=<value optimized out>, recv_buf=0x7fffe372b750 "\a�\003", len=4, endian_conv=0) at commands.c:1901 1901 send_status_return(shutdown_con, CMAN_CMD_TRY_SHUTDOWN, 0); (gdb) print shutdown_con $21 = (struct connection *) 0xcf4d750 (gdb) rpint *shutdown_con Undefined command: "rpint". Try "help". (gdb) print *shutdown_con $22 = {fd = 1649737756, type = 53, port = 1649748712, shutdown_reply = 53, events = 0, confchg = 0, write_msgs = {n = 0x41, p = 0x3562552a38}, num_write_msgs = 1649748536, next = 0x11acc2b8b9f53e60, list = {n = 0x2aaaab0bdb70, p = 0xcf4cf38}} (gdb) Chrissie, I've attached a patch which adds -g to the "optimized" build environment for the service engine. This is required to get debug symbols in comment #2. It has no affect on optimization and only produces a slightly larger package for the debuginfo package. I believe there was a 5.5 issue on this but it did not solve the problem - maybe you can reopen that bug or file a new bug with this patch. Regards -steve Created attachment 413246 [details]
patch which adds -g to CFLAGS for service engine to get proper debug output
Created attachment 413742 [details]
Patch that might fix
Thanks Steven, that gave me the information I needed. Try this patch, I hope it will fix it.
I've just checked, and this patch is already in STABLE3/RHEL6. commit e5d433e74f26ddab3e7eedb4c50f2b3546738ecd Author: Christine Caulfield <ccaulfie> Date: Fri Sep 19 13:02:40 2008 +0100 cman: Clean shutdown_con if the controlling process is killed. If a shutdown is initiated by a process that is then killed, the shutdown_con isn't cleared. So if another process replies to the shutdown request cman could segfault. In this case the process is killing itself because the output queue is full. So you might still see odd failures that can be fixed by increasing the output queue size: <cman max_queued="1024"/> Christine, I was unable to test this patch before releasing the test cluster back to QE. It did have a fairly high production rate with the 5.4 qe test cases however resulting in a segfault. Regards -steve *** Bug 746936 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0167.html |