Bug 133499

Summary: Various 5.2.1 crashes on Opteron
Product: [Retired] Red Hat Cluster Suite Reporter: Derek Anderson <danderso>
Component: gfsAssignee: Ben Marzinski <bmarzins>
Status: CLOSED WORKSFORME QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-04 15:12:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Anderson 2004-09-24 14:21:36 UTC
Description of problem:
GFS 5.2.1 p031.  3 node cluster.  First test mounting of 40
simultaneous filesystems.  Then run sistina-test vedder -S QUICK.  In
two runs so far I have gotten two different crashes, one an Oops and
panic while 'gnbd_import -R'-ing and one kernel BUG and reboot while
'pool_assemble'-ing.

Kernel BUG at buffer:2809
invalid operand: 0000
CPU 0
Pid: 29493, comm: pool_assemble Tainted: P
RIP: 0010:[<ffffffff8015761e>]{grow_buffers+62}
RSP: 0000:0000010033013958  EFLAGS: 00010206
RAX: 000000002975251f RBX: 000000000000fe01 RCX: 0000000029752520
RDX: ffffffffa0182ac0 RSI: 0000000000000000 RDI: 000000000000fe01
RBP: 0000000000000400 R08: 000000000000000a R09: 0000000000000400
R10: 000000000000fe01 R11: 0000000000000010 R12: 0000000000000000
R13: 000000000000fe01 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a958bc4c0(0000) GS:ffffffff804fbdc0(0000)
knlGS:0000000040016a80
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000040942f CR3: 0000000000101000 CR4: 00000000000006e0

Call Trace: [<ffffffff80154ec4>]{getblk+52}
[<ffffffffa0188b72>]{:pool:scan_dev_for_pool_label+82}
       [<ffffffffa01891b2>]{:pool:find_and_sort_labels+82}
       [<ffffffffa018a048>]{:pool:assemble_pools+40}
[<ffffffffa01921d8>]{:pool:pools+312}
       [<ffffffffa0187d75>]{:pool:pool_ioctl+869}
[<ffffffffa01920a0>]{:pool:pools+0}
       [<ffffffff80154ed8>]{getblk+72}
[<ffffffffa00dad41>]{:ext3:ext3_getblk+177}
       [<ffffffffa00ddfea>]{:ext3:ext3_find_entry+618}
[<ffffffff80163c30>]{filldir64+192}
       [<ffffffff801634fd>]{vfs_readdir+125}
[<ffffffff8015a6ea>]{blkdev_ioctl+26}
       [<ffffffff80163422>]{sys_ioctl+578}
[<ffffffff8015299e>]{filp_close+110}
       [<ffffffff8010fe37>]{system_call+119}
Process pool_assemble (pid: 29493, stackpage=10033013000)
Stack: 0000010033013958 0000000000000000 0000000000000000 0000000000000000
       0000000000000400 0000000000000000 000000000000fe01 0000000000000007
       ffffffff80154ec4 0000000000000007 000000000000fe01 0000000000000000
       00000100330139f8 0000000000000004 ffffffffa0188b72 0000000000007051
       0000000000000000 0000010033013de8 0000000000000004 0000000000000000
       ffffffffa01891b2 00000100049669c0 000001003f92c540 0000000000000000
       0000000000000000 0000000000000000 0000000000000000 0000000000000000

====================================================================
Unable to handle kernel paging request at virtual address ffffffffa00a6ac0
 printing rip:
ffffffff801599b9
PML4 103027 PGD 105027 PMD 3f9b9067 PTE 0
Oops: 0000
CPU 0
Pid: 11092, comm: gnbd_import Tainted: P
RIP: 0010:[<ffffffff801599b9>]{set_blocksize+89}
RSP: 0018:0000010038f49e88  EFLAGS: 00010286
RAX: 000000000000fe00 RBX: 0000000000000400 RCX: 0000000000000200
RDX: ffffffffa00a6ac0 RSI: 00000000000000fe RDI: 0000000000000000
RBP: 000000000000fe00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000004 R12: 000000000000fe00
R13: 0000010004a93f78 R14: 0000000000000000 R15: 000001003d086b40
FS:  0000002a958bd4c0(0000) GS:ffffffff804fbdc0(0000)
knlGS:0000000040023740
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa00a6ac0 CR3: 0000000000101000 CR4: 00000000000006e0

Call Trace: [<ffffffff8015a497>]{do_open+215}
[<ffffffff801525ec>]{dentry_open+252}
       [<ffffffff801524e1>]{filp_open+65}
[<ffffffff8015e4c9>]{getname+137}
       [<ffffffff8015289d>]{sys_open+77}
[<ffffffff8010fe37>]{system_call+119}

Process gnbd_import (pid: 11092, stackpage=10038f49000)
Stack: 0000010038f49e88 0000000000000018 0000010038f2ac00 0000010038f2ac00
       0000010004a93f40 000000000000fe00 ffffffff8015a497 000001003fe2a480
       000001003d086b40 0000010038f2ac00 0000010004b755c0 0000000000000000
       000001003fe2a480 0000000000000000 ffffffff801525ec 0000000000008001
       0000000000000004 000001003faa5000 000000000000fe00 0000000000000000
       ffffffff801524e1 000001003fe2a480 0000010004b755c0 00000000004053de
       ffffffff8015e4c9 0000000100000001 000000003faa5000 0000000000008001
       ffffffff8015289d 0000000000000002 0000000000000004 0000000000000002
       0000007fbfefe178 0000000000000000 ffffffff8010fe37 0000000000000246
       0000002a956878c0 ffffffffffffffc0 00000000004053de 0000000000000002
Call Trace: [<ffffffff8015a497>]{do_open+215}
[<ffffffff801525ec>]{dentry_open+252}
       [<ffffffff801524e1>]{filp_open+65}
[<ffffffff8015e4c9>]{getname+137}
       [<ffffffff8015289d>]{sys_open+77}
[<ffffffff8010fe37>]{system_call+119}


Code: 42 8b 04 82 85 c0 0f 45 c8 39 cb b8 ea ff ff ff 0f 8c a6 00

Kernel panic: Fatal exception


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Derek Anderson 2004-10-04 15:12:38 UTC
We have been unable to reproduce this bug in over a week of testing. 
I am now thinking that some inadvertant LVM2 testing on other nodes
connected to this same storage caused these failures.  Closing this
bug and releasing 5.2.1 p031.