Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 134134

Summary: cman_tool join -N int: kernel oops when specifying node ID
Product: [Retired] Red Hat Cluster Suite Reporter: Derek Anderson <danderso>
Component: gfsAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED CURRENTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 16:23:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Anderson 2004-09-29 19:20:54 UTC
Description of problem:  Attempting to specify the nodeids for each
node on the command line.  Started ccsd on all, then issued "cman_tool
join -N 1000" on node link-10.  Issued "cman_tool join -N 2000" on
link-11 and got this Oops on link-10.

Unable to handle kernel paging request at virtual address e00f7f40
 printing eip:
e02b56d0
*pde = 014ef067
Oops: 0002 [#1]
SMP
Modules linked in: dlm loop cman ipv6 parport_pc lp parport autofs4
sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button
battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod
scsi_mod
CPU:    0
EIP:    0060:[<e02b56d0>]    Not tainted
EFLAGS: 00010246   (2.6.8.1)
EIP is at add_new_node+0x1a0/0x310 [cman]
eax: e00f6000   ebx: e00f6000   ecx: 000007d0   edx: 00000077
esi: e00e6200   edi: e00f6228   ebp: db348c00   esp: dcabbedc
ds: 007b   es: 007b   ss: 0068
Process cman_memb (pid: 5115, threadinfo=dcaba000 task=df19a8d0)
Stack: 00000000 0000007b 0000008a 00000003 e02caf40 e02caf00 e02caf48
e02cadc8
       dcabbf78 e02b691f 000007d0 00000001 e02caf40 e02caf30 00000000
dcabbf70
       00000000 00000048 df19a8d0 e02caf00 dcabbf78 00000048 ffffffff
e02b48fa
Call Trace:
 [<e02b691f>] do_process_joinreq+0xff/0x1b0 [cman]
 [<e02b48fa>] do_membership_packet+0x4a/0x1e0 [cman]
 [<e02b739a>] dispatch_messages+0xda/0x100 [cman]
 [<e02b3744>] membership_kthread+0x194/0x400 [cman]
 [<c0105db2>] ret_from_fork+0x6/0x14
 [<c011efb0>] default_wake_function+0x0/0x10
 [<e02b35b0>] membership_kthread+0x0/0x400 [cman]
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 89 2c 88 b0 01 86 05 20 bb 2c e0 bb 48 ac 2c e0 ba 77 00 00
 Sep 29 14:16:08 link-10 kernel: CMAN: forming a new cluster


Version-Release number of selected component (if applicable):
[root@link-11 /]# cman_tool -V
cman_tool DEVEL.1096386815 (built Sep 28 2004 10:54:45)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

How reproducible:
Will try again after I file bug.

Steps to Reproduce:
1. Start ccsd on two nodes
2. Run 'cman_tool join -N 1000' on one
3. Run 'cman_tool join -N 2000' on the other.
  
Actual results:
First node panics.

Expected results:
Cluster formation or graceful failure.

Additional info:

Comment 1 Derek Anderson 2004-09-29 19:28:12 UTC
Recreated with the -N 1000 and -N 2000.  It did not Oops with -N 10 
and -N 20, so those ints must be too big for something. 

Comment 2 Christine Caulfield 2004-09-30 15:41:16 UTC
This affected DLM too. Both keep an array of nodeIDs for fast access
on the (previously correct) assumption that nodeIDs would be
contiguous. By starting a node with ID of 1000 it went way off the end
of that array.

Anyway, both now do rather smarter allocation of node information.
I've also put a limit on the size of a nodeID simply to prevent the
kernel running out of memory if someone tries to make a nodeID that's
really,really big.

Comment 3 Derek Anderson 2004-09-30 16:23:54 UTC
Verified: 
 
No Oops and no garbage-in.  I approve this message. 
 
[root@link-10 root]# cman_tool join -N 5000 
cman_tool: Node id must be between 1 and 4096 
[root@link-10 root]# echo $? 
1 
[root@link-10 root]# 

Comment 4 Kiersten (Kerri) Anderson 2004-11-16 19:03:36 UTC
Updating version to the right level in the defects.  Sorry for the storm.