Bug 472460
Summary: | cman_tool nodes -F name segfaults when a node is out of membership | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Nate Straz <nstraz> | ||||
Component: | cman | Assignee: | Christine Caulfield <ccaulfie> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.3 | CC: | baptiste.millemathias, cluster-maint, edamato | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-09-02 11:08:26 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Nate Straz
2008-11-20 22:53:40 UTC
This isn't as easy to reproduce as you might thing... can you provide a backtrace from the coredump please: [root@cc-xen-03 ~]# cman_tool nodes Node Sts Inc Joined Name 1 X 31928 cc-xen-01.lab.msp.redhat.com 2 M 31928 2008-11-14 02:01:48 cc-xen-02.lab.msp.redhat.com 3 M 31924 2008-11-14 02:01:47 cc-xen-03.lab.msp.redhat.com 4 X 31940 cc-xen-04.lab.msp.redhat.com 5 X 0 cc-xen-05.lab.msp.redhat.com 6 X 0 cc-xen-06.lab.msp.redhat.com [root@cc-xen-03 ~]# cman_tool nodes -F name,type cc-xen-01.lab.msp.redhat.com X cc-xen-02.lab.msp.redhat.com M cc-xen-03.lab.msp.redhat.com M cc-xen-04.lab.msp.redhat.com X cc-xen-05.lab.msp.redhat.com X cc-xen-06.lab.msp.redhat.com X Here is some info from the core south-02:/tmp/core.12531. There are several more to choose from in the same directory. [New process 12531] #0 0x0000000000406aad in cman_get_node_addrs (handle=0x16770030, nodeid=1, max_addrs=10, num_addrs=0x7fffe8ef8684, addrs=0x7fffe8ef8400) at libcman.c:1092 1092 memcpy(&addrs[i].cna_address, &outbuf->addrs[i].addr, outbuf->addrs[i].addrlen); (gdb) bt #0 0x0000000000406aad in cman_get_node_addrs (handle=0x16770030, nodeid=1, max_addrs=10, num_addrs=0x7fffe8ef8684, addrs=0x7fffe8ef8400) at libcman.c:1092 #1 0x00000000004028ab in show_nodes () #2 0x000081a400007fff in ?? () #3 0x0000001400000000 in ?? () #4 0x0000000000000000 in ?? () (gdb) list 1087 1088 if (outbuf->numaddrs > max_addrs) 1089 outbuf->numaddrs = max_addrs; 1090 1091 for (i=0; i < outbuf->numaddrs; i++) { 1092 memcpy(&addrs[i].cna_address, &outbuf->addrs[i].addr, outbuf->addrs[i].addrlen); 1093 addrs[i].cna_addrlen = outbuf->addrs[i].addrlen; 1094 } 1095 } 1096 return ret; (gdb) info locals i = 0 h = (struct cman_handle *) 0x16770030 ret = 0 buf = 0x7fffe8ef79f0 "\001" outbuf = (struct cl_get_node_addrs *) 0x7fffe8ef79f0 (gdb) info args handle = (cman_handle_t) 0x16770030 nodeid = 1 max_addrs = 10 num_addrs = (int *) 0x7fffe8ef8684 addrs = (struct cman_node_address *) 0x7fffe8ef8400 (gdb) print *num_addrs $1 = 376900256 (gdb) print *outbuf $2 = {numaddrs = 1, addrs = 0x7fffe8ef79f8} (gdb) print *addrs $3 = {cna_addrlen = 1, cna_address = "�\000w\026\000\000\000\0008&\003A9", '\0' <repeats 14 times>} I dug into this with Chrissie and I think I know what is going on here. From gdb I get the following data from inside cman_dispatch(): (gdb) print *h $6 = {magic = 1129136462, fd = 7, zero_fd = 8, privdata = 0x0, want_reply = 1, event_callback = 0, data_callback = 0, confchg_callback = 0, reply_buffer = 0x7fff8c258d40, reply_buflen = 1368, reply_status = 28, saved_data_msg = 0x0, saved_event_msg = 0x0, saved_reply_msg = 0x0} (gdb) print /x *header $12 = {magic = 0x434d414e, version = 0x7fff, length = 0x18, command = 0x400000bf, flags = 0x0} Here we see that we have a header with a reply and the total size is 24 bytes. The header itself takes 20 bytes, but the reply header takes 24 bytes which leaves nothing for the reply data. In do_cmd_get_node_addrs() we have the following code: /* AIS doesn't know about nodes that are not members */ if (node->state != NODESTATE_MEMBER) return 0; We don't set retlen here so we don't return any data for this reply. We need to at least return an int that we have 0 node addresses to return I'm hitting this because in cman_get_node_addrs() we allocate the return buffer on the stack and don't initialize it. In my failing case, outbuf->numaddrs == 1 and it never gets overwritten because of the problem in do_cmd_get_node_addrs. Created attachment 324322 [details]
Patch to make do_cmd_get_node_addrs() return data.
*** Bug 473873 has been marked as a duplicate of this bug. *** Thanks for investigating this Nate, that patch looks good to me. Patch committed for 5.4 commit 6266ee3b4f6b7c85fe87316f334d8a47a1a71165 Author: Christine Caulfield <ccaulfie> Date: Mon Dec 1 14:07:24 2008 +0000 cman: Don't crash cman_tool nodes -a Patch is already in the code, adding flags to make sure this shows up in the errata. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1341.html |