Bug 129510

Summary: cat /proc/cluster/services with 100 filesystems: kernel Oops
Product: [Retired] Red Hat Cluster Suite Reporter: Derek Anderson <danderso>
Component: gfsAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED CURRENTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-30 19:23:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Anderson 2004-08-09 20:30:03 UTC
Description of problem:
Set up a 2-node cluster and create and mount 100 filesystems on each.
 Run 'cat /proc/cluster/services' on one of the nodes and you get
these messages in the log file (the command also randomly segfaults)

proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!

And the output of the command only displays the first 50 of the 100
DLM Lock Spaces you have instantiated.

Ran the cat command a few more times just to be a bastard and the
kernel Oopsed:
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
proc_file_read: Apparent buffer overflow!
Unable to handle kernel paging request at virtual address 31347366
 printing eip:
e030b6b7
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: loop gnbd lock_gulm lock_nolock lock_dlm dlm cman
gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000
floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi
ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e030b6b7>]    Not tainted
EFLAGS: 00010282   (2.6.7)
EIP is at search_bucket+0x17/0x70 [dlm]
eax: df2a6000   ebx: 31347366   ecx: 00000018   edx: 0000017e
esi: df765738   edi: 00000002   ebp: 00000018   esp: db25fdb0
ds: 007b   es: 007b   ss: 0068
Process dlm_recvd (pid: 4509, threadinfo=db25e000 task=dac90b30)
Stack: e031cbc3 00000000 d4070d7d 00000002 df765738 00000002 00000018
e030b742
       0000017e 00000246 35736667 206e7520 d4070d7d 00000002 d4070d08
df765738
       00000000 e031b08a 00000018 00000246 20202020 0a226131 ff000000
d4070d08
Call Trace:
 [<e030b742>] dlm_dir_remove+0x32/0xf0 [dlm]
 [<e031b08a>] _release_rsb+0x11a/0x2a0 [dlm]
 [<e030d2a1>] dlm_unlock_stage2+0xd1/0x1a0 [dlm]
 [<e030f8d3>] process_cluster_request+0x243/0xd30 [dlm]
 [<c02b0e98>] inet_recvmsg+0x48/0x70
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0
 [<e0313a53>] midcomms_process_incoming_buffer+0x173/0x250 [dlm]
 [<c0136af3>] __alloc_pages+0x2f3/0x340
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0
 [<e0311721>] receive_from_sock+0x141/0x300 [dlm]
 [<c0117e67>] recalc_task_prio+0x97/0x190
 [<e031261b>] process_sockets+0x7b/0xa0 [dlm]
 [<e031288e>] dlm_recvd+0x9e/0xf0 [dlm]
 [<e03127f0>] dlm_recvd+0x0/0xf0 [dlm]
 [<c010429d>] kernel_thread_helper+0x5/0x18

Code: 8b 0b 89 0c 24 0f 18 01 90 8d 04 d0 39 c3 74 23 89 44 24 04
<1>Unable to handle kernel paging request at virtual address 36312038
 printing eip:
c013b7c2
*pde = 00000000
Oops: 0000 [#2]
Modules linked in: loop gnbd lock_gulm lock_nolock lock_dlm dlm cman
gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000
floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi
ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c013b7c2>]    Not tainted
EFLAGS: 00010203   (2.6.7)
EIP is at put_page+0x2/0x90
eax: 36312038   ebx: 00000001   ecx: df2a7880   edx: 36312038
esi: ddf32c80   edi: 00000000   ebp: df494680   esp: db23ddf8
ds: 007b   es: 007b   ss: 0068
Process cman_comms (pid: 4412, threadinfo=db23c000 task=dac905b0)
Stack: c026f39f ddf32c80 db23df6c c026f3c8 00000000 c026f483 ddf32c80
ddf32c80
       db23df6c ddf32c80 ddf32c80 c02aa1ca 00000018 db23de48 00000018
ddf32c90
       db23df4c df4947ac 00000018 00000040 c0334140 db23df6c db23df6c
c02b0e98
Call Trace:
 [<c026f39f>] skb_release_data+0x6f/0x90
 [<c026f3c8>] kfree_skbmem+0x8/0x20
 [<c026f483>] __kfree_skb+0xa3/0x140
 [<c02aa1ca>] udp_recvmsg+0x20a/0x290
 [<c02b0e98>] inet_recvmsg+0x48/0x70
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0
 [<c0118897>] __wake_up_common+0x37/0x70
 [<c026ed5c>] sock_def_readable+0x5c/0x60
 [<e02ea9dd>] send_to_userport+0x1d/0x520 [cman]
 [<e02ea155>] receive_message+0x85/0xf0 [cman]
 [<e02ea319>] cluster_kthread+0x159/0x2d0 [cman]
 [<c0105c12>] ret_from_fork+0x6/0x14
 [<c0118850>] default_wake_function+0x0/0x10
 [<e02ea1c0>] cluster_kthread+0x0/0x2d0 [cman]
 [<c010429d>] kernel_thread_helper+0x5/0x18

Code: 8b 00 a9 00 00 08 00 75 47 8b 02 f6 c4 08 75 2e 8b 02 89 d1

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2004-08-23 12:42:12 UTC
Now uses seq_file for /proc/cluster/services.

Checking in proc.c;
/cvs/cluster/cluster/cman-kernel/src/proc.c,v  <--  proc.c
new revision: 1.2; previous revision: 1.1
done
Checking in sm_misc.c;
/cvs/cluster/cluster/cman-kernel/src/sm_misc.c,v  <--  sm_misc.c
new revision: 1.2; previous revision: 1.1
done

Comment 2 Corey Marthaler 2004-08-30 19:23:52 UTC
fix verified.

Comment 3 Kiersten (Kerri) Anderson 2004-11-16 19:05:27 UTC
Updating version to the right level in the defects.  Sorry for the storm.