Bug 134134 - cman_tool join -N int: kernel oops when specifying node ID
Summary: cman_tool join -N int: kernel oops when specifying node ID
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-09-29 19:20 UTC by Derek Anderson
Modified: 2010-01-12 02:59 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-09-30 16:23:54 UTC
Embargoed:


Attachments (Terms of Use)

Description Derek Anderson 2004-09-29 19:20:54 UTC
Description of problem:  Attempting to specify the nodeids for each
node on the command line.  Started ccsd on all, then issued "cman_tool
join -N 1000" on node link-10.  Issued "cman_tool join -N 2000" on
link-11 and got this Oops on link-10.

Unable to handle kernel paging request at virtual address e00f7f40
 printing eip:
e02b56d0
*pde = 014ef067
Oops: 0002 [#1]
SMP
Modules linked in: dlm loop cman ipv6 parport_pc lp parport autofs4
sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button
battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod
scsi_mod
CPU:    0
EIP:    0060:[<e02b56d0>]    Not tainted
EFLAGS: 00010246   (2.6.8.1)
EIP is at add_new_node+0x1a0/0x310 [cman]
eax: e00f6000   ebx: e00f6000   ecx: 000007d0   edx: 00000077
esi: e00e6200   edi: e00f6228   ebp: db348c00   esp: dcabbedc
ds: 007b   es: 007b   ss: 0068
Process cman_memb (pid: 5115, threadinfo=dcaba000 task=df19a8d0)
Stack: 00000000 0000007b 0000008a 00000003 e02caf40 e02caf00 e02caf48
e02cadc8
       dcabbf78 e02b691f 000007d0 00000001 e02caf40 e02caf30 00000000
dcabbf70
       00000000 00000048 df19a8d0 e02caf00 dcabbf78 00000048 ffffffff
e02b48fa
Call Trace:
 [<e02b691f>] do_process_joinreq+0xff/0x1b0 [cman]
 [<e02b48fa>] do_membership_packet+0x4a/0x1e0 [cman]
 [<e02b739a>] dispatch_messages+0xda/0x100 [cman]
 [<e02b3744>] membership_kthread+0x194/0x400 [cman]
 [<c0105db2>] ret_from_fork+0x6/0x14
 [<c011efb0>] default_wake_function+0x0/0x10
 [<e02b35b0>] membership_kthread+0x0/0x400 [cman]
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 89 2c 88 b0 01 86 05 20 bb 2c e0 bb 48 ac 2c e0 ba 77 00 00
 Sep 29 14:16:08 link-10 kernel: CMAN: forming a new cluster


Version-Release number of selected component (if applicable):
[root@link-11 /]# cman_tool -V
cman_tool DEVEL.1096386815 (built Sep 28 2004 10:54:45)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

How reproducible:
Will try again after I file bug.

Steps to Reproduce:
1. Start ccsd on two nodes
2. Run 'cman_tool join -N 1000' on one
3. Run 'cman_tool join -N 2000' on the other.
  
Actual results:
First node panics.

Expected results:
Cluster formation or graceful failure.

Additional info:

Comment 1 Derek Anderson 2004-09-29 19:28:12 UTC
Recreated with the -N 1000 and -N 2000.  It did not Oops with -N 10 
and -N 20, so those ints must be too big for something. 

Comment 2 Christine Caulfield 2004-09-30 15:41:16 UTC
This affected DLM too. Both keep an array of nodeIDs for fast access
on the (previously correct) assumption that nodeIDs would be
contiguous. By starting a node with ID of 1000 it went way off the end
of that array.

Anyway, both now do rather smarter allocation of node information.
I've also put a limit on the size of a nodeID simply to prevent the
kernel running out of memory if someone tries to make a nodeID that's
really,really big.

Comment 3 Derek Anderson 2004-09-30 16:23:54 UTC
Verified: 
 
No Oops and no garbage-in.  I approve this message. 
 
[root@link-10 root]# cman_tool join -N 5000 
cman_tool: Node id must be between 1 and 4096 
[root@link-10 root]# echo $? 
1 
[root@link-10 root]# 

Comment 4 Kiersten (Kerri) Anderson 2004-11-16 19:03:36 UTC
Updating version to the right level in the defects.  Sorry for the storm.


Note You need to log in before you can comment on or make changes to this bug.