Bug 167650

Summary: NMI watchdog lockup while attempting nfs traffic on gfs during ip relocation
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: gfsAssignee: Ben Marzinski <bmarzins>
Status: CLOSED WORKSFORME QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-04 15:38:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 165449    

Description Corey Marthaler 2005-09-06 16:25:38 UTC
Description of problem:
This occured while attempting to reproduce bz #166701 with Ben's fix. 

I had very similar cluster setup:
Steps to Reproduce:
1. Configure a service IP address on a two node cluster (link-02 & link-08)
2. Export a GFS filesystem over NFS
3. Mount the GFS export from an NFS client (link-01) using the service IP address
4. Generate read/write traffic over the NFS mount so that the cpu load is at 
least 50%
5. Use a simple script to move the service IP address between the two nodes 
using clusvcadm -r every 10 seconds.

How reproducible:
seen it only once so far


NMI Watchdog detected LOCKUP, CPU=1, registers:
CPU 1
Modules linked in: nfsd(U) exportfs(U) lockd(U) gfs(U) lock_dlm(U)
lock_harness(U) dlm(U) cman(U) parport_pc(U) lp(U) parport(U) autofs4(U)
i2c_dev(U) i2c_core(U) md5(U) ipv6(U) sunrpc(U) ds(U) yenta_socket(U)
pcmcia_core(U) button(U) battery(U) ac(U) ohci_hcd(U) hw_random(U) tg3(U)
floppy(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U)
qla2300(U) qla2xxx(U) scsi_transport_fc(U) sd_mod(U) scsi_mod(U)
Pid: 6875, comm: nfsd Tainted: G   M  2.6.9-11.kdbsmp
RIP: 0010:[<ffffffff8015b734>] <ffffffff8015b734>{cache_alloc_refill+352}
RSP: 0018:0000010037aa9b58  EFLAGS: 00000013
RAX: 000001001bca8000 RBX: 0000000000000025 RCX: 00000100331a5000
RDX: 000001003ff6f4c8 RSI: 00000000000000d0 RDI: 000001003ff6f528
RBP: 0000010037e3b000 R08: 0000000000000010 R09: 0000000000000000
R10: 0000010024041f00 R11: 0000000000000070 R12: 000001003ff6f4c8
R13: 000001003ff6f480 R14: 0000010024041b00 R15: 0000010037aa9c68
FS:  0000002a9589eb00(0000) GS:ffffffff804eb900(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000006c3530 CR3: 000000003ff38000 CR4: 00000000000006e0
Process nfsd (pid: 6875, threadinfo 0000010037aa8000, task 00000100326bb030)
Stack: 000000d03ff6f480 000001003ff6f480 00000000000000d0 000001000ec00580
       ffffff0000263000 0000010024041b00 0000010037aa9c68 ffffffff8015b503
       0000000000000202 0000000000000070
Call Trace:<ffffffff8015b503>{__kmalloc+123} <ffffffffa024fd8d>{:gfs:gmalloc+15}
       <ffffffffa02395f8>{:gfs:gfs_log_commit+286}
<ffffffffa024e85c>{:gfs:gfs_trans_end+195}
       <ffffffffa0243e07>{:gfs:gfs_create+243} <ffffffff80180399>{vfs_create+210}
       <ffffffffa028cbd6>{:nfsd:nfsd_create_v3+811}
<ffffffffa02927dc>{:nfsd:nfsd3_proc_create+307}
       <ffffffffa02876f9>{:nfsd:nfsd_dispatch+220}
<ffffffffa012a1cb>{:sunrpc:svc_process+1160}
       <ffffffffa0287245>{:nfsd:nfsd+0} <ffffffffa028747d>{:nfsd:nfsd+568}
       <ffffffff80110cab>{child_rip+8} <ffffffffa0287245>{:nfsd:nfsd+0}
       <ffffffffa0287245>{:nfsd:nfsd+0} <ffffffff80110ca3>{child_rip+0}

Comment 1 Ben Marzinski 2005-09-19 19:03:20 UTC
Um.  If we can reproduce this, I'll look at it. It may have gotten fixed with the
other change I made to this call path.