Bug 472460 - cman_tool nodes -F name segfaults when a node is out of membership
cman_tool nodes -F name segfaults when a node is out of membership
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.3
All Linux
medium Severity medium
: rc
: ---
Assigned To: Christine Caulfield
Cluster QE
:
: 473873 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-20 17:53 EST by Nate Straz
Modified: 2016-04-26 12:44 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:08:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to make do_cmd_get_node_addrs() return data. (627 bytes, text/plain)
2008-11-21 11:56 EST, Nate Straz
no flags Details

  None (edit)
Description Nate Straz 2008-11-20 17:53:40 EST
Description of problem:

When another node in the cluster is out of membership, the output shows this:

[root@south-02 ~]# cman_tool nodes 
Node  Sts   Inc   Joined               Name
   1   X  106200                        south-01
   2   M  106028   2008-11-20 11:02:44  south-02
...

But when the -F option is used, the command segfaults.

[root@south-02 ~]# cman_tool nodes -F name,type
Segmentation fault (core dumped)

Once the node returns to membership, the command works again.

[root@south-02 ~]# cman_tool nodes -F name,type
south-01 M 
south-02 M 
...

Version-Release number of selected component (if applicable):
cman-2.0.97-1.el5

How reproducible:
100%

Steps to Reproduce:
1. on node A, service cman stop
2. on node B, cman_tool nodes -F name,type
  
Actual results:
See above

Expected results:
cman_tool nodes -F name,type
south-01 X
south-02 M

Additional info:
Comment 1 Christine Caulfield 2008-11-21 03:31:47 EST
This isn't as easy to reproduce as you might thing... can you provide a backtrace from the coredump please:

[root@cc-xen-03 ~]# cman_tool nodes             
Node  Sts   Inc   Joined               Name
   1   X  31928                        cc-xen-01.lab.msp.redhat.com
   2   M  31928   2008-11-14 02:01:48  cc-xen-02.lab.msp.redhat.com
   3   M  31924   2008-11-14 02:01:47  cc-xen-03.lab.msp.redhat.com
   4   X  31940                        cc-xen-04.lab.msp.redhat.com
   5   X      0                        cc-xen-05.lab.msp.redhat.com
   6   X      0                        cc-xen-06.lab.msp.redhat.com

[root@cc-xen-03 ~]# cman_tool nodes -F name,type
cc-xen-01.lab.msp.redhat.com X 
cc-xen-02.lab.msp.redhat.com M 
cc-xen-03.lab.msp.redhat.com M 
cc-xen-04.lab.msp.redhat.com X 
cc-xen-05.lab.msp.redhat.com X 
cc-xen-06.lab.msp.redhat.com X
Comment 2 Nate Straz 2008-11-21 09:02:26 EST
Here is some info from the core south-02:/tmp/core.12531.  There are several more to choose from in the same directory.

[New process 12531]
#0  0x0000000000406aad in cman_get_node_addrs (handle=0x16770030, nodeid=1, 
    max_addrs=10, num_addrs=0x7fffe8ef8684, addrs=0x7fffe8ef8400)
    at libcman.c:1092
1092                            memcpy(&addrs[i].cna_address, &outbuf->addrs[i].addr, outbuf->addrs[i].addrlen);
(gdb) bt
#0  0x0000000000406aad in cman_get_node_addrs (handle=0x16770030, nodeid=1, 
    max_addrs=10, num_addrs=0x7fffe8ef8684, addrs=0x7fffe8ef8400)
    at libcman.c:1092
#1  0x00000000004028ab in show_nodes ()
#2  0x000081a400007fff in ?? ()
#3  0x0000001400000000 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) list
1087
1088                    if (outbuf->numaddrs > max_addrs)
1089                            outbuf->numaddrs = max_addrs;
1090
1091                    for (i=0; i < outbuf->numaddrs; i++) {
1092                            memcpy(&addrs[i].cna_address, &outbuf->addrs[i].addr, outbuf->addrs[i].addrlen);
1093                            addrs[i].cna_addrlen = outbuf->addrs[i].addrlen;
1094                    }
1095            }
1096            return ret;
(gdb) info locals
i = 0
h = (struct cman_handle *) 0x16770030
ret = 0
buf = 0x7fffe8ef79f0 "\001"
outbuf = (struct cl_get_node_addrs *) 0x7fffe8ef79f0
(gdb) info args
handle = (cman_handle_t) 0x16770030
nodeid = 1
max_addrs = 10
num_addrs = (int *) 0x7fffe8ef8684
addrs = (struct cman_node_address *) 0x7fffe8ef8400
(gdb) print *num_addrs
$1 = 376900256
(gdb) print *outbuf
$2 = {numaddrs = 1, addrs = 0x7fffe8ef79f8}
(gdb) print *addrs
$3 = {cna_addrlen = 1, 
  cna_address = "�\000w\026\000\000\000\0008&\003A9", '\0' <repeats 14 times>}
Comment 3 Nate Straz 2008-11-21 11:21:12 EST
I dug into this with Chrissie and I think I know what is going on here.
From gdb I get the following data from inside cman_dispatch():

(gdb) print *h
$6 = {magic = 1129136462, fd = 7, zero_fd = 8, privdata = 0x0, want_reply = 1, 
  event_callback = 0, data_callback = 0, confchg_callback = 0, 
  reply_buffer = 0x7fff8c258d40, reply_buflen = 1368, reply_status = 28, 
  saved_data_msg = 0x0, saved_event_msg = 0x0, saved_reply_msg = 0x0}
(gdb) print /x *header
$12 = {magic = 0x434d414e, version = 0x7fff, length = 0x18, 
  command = 0x400000bf, flags = 0x0}

Here we see that we have a header with a reply and the total size is
24 bytes.  The header itself takes 20 bytes, but the reply header
takes 24 bytes which leaves nothing for the reply data.

In do_cmd_get_node_addrs() we have the following code:

        /* AIS doesn't know about nodes that are not members */
        if (node->state != NODESTATE_MEMBER)
                return 0;

We don't set retlen here so we don't return any data for this reply.
We need to at least return an int that we have 0 node addresses to
return

I'm hitting this because in cman_get_node_addrs() we allocate the
return buffer on the stack and don't initialize it.  In my failing
case, outbuf->numaddrs == 1 and it never gets overwritten because of
the problem in do_cmd_get_node_addrs.
Comment 4 Nate Straz 2008-11-21 11:56:17 EST
Created attachment 324322 [details]
Patch to make do_cmd_get_node_addrs() return data.
Comment 5 Christine Caulfield 2008-12-01 06:46:25 EST
*** Bug 473873 has been marked as a duplicate of this bug. ***
Comment 6 Christine Caulfield 2008-12-01 09:03:18 EST
Thanks for investigating this Nate, that patch looks good to me.
Comment 7 Christine Caulfield 2008-12-03 06:01:48 EST
Patch committed for 5.4

commit 6266ee3b4f6b7c85fe87316f334d8a47a1a71165
Author: Christine Caulfield <ccaulfie@redhat.com>
Date:   Mon Dec 1 14:07:24 2008 +0000

    cman: Don't crash cman_tool nodes -a
Comment 8 Nate Straz 2009-06-22 14:48:07 EDT
Patch is already in the code, adding flags to make sure this shows up in the errata.
Comment 12 errata-xmlrpc 2009-09-02 07:08:26 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html

Note You need to log in before you can comment on or make changes to this bug.