Description of problem: This occured while attempting to reproduce bz #166701 with Ben's fix. I had very similar cluster setup: Steps to Reproduce: 1. Configure a service IP address on a two node cluster (link-02 & link-08) 2. Export a GFS filesystem over NFS 3. Mount the GFS export from an NFS client (link-01) using the service IP address 4. Generate read/write traffic over the NFS mount so that the cpu load is at least 50% 5. Use a simple script to move the service IP address between the two nodes using clusvcadm -r every 10 seconds. How reproducible: seen it only once so far NMI Watchdog detected LOCKUP, CPU=1, registers: CPU 1 Modules linked in: nfsd(U) exportfs(U) lockd(U) gfs(U) lock_dlm(U) lock_harness(U) dlm(U) cman(U) parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U) md5(U) ipv6(U) sunrpc(U) ds(U) yenta_socket(U) pcmcia_core(U) button(U) battery(U) ac(U) ohci_hcd(U) hw_random(U) tg3(U) floppy(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) qla2300(U) qla2xxx(U) scsi_transport_fc(U) sd_mod(U) scsi_mod(U) Pid: 6875, comm: nfsd Tainted: G M 2.6.9-11.kdbsmp RIP: 0010:[<ffffffff8015b734>] <ffffffff8015b734>{cache_alloc_refill+352} RSP: 0018:0000010037aa9b58 EFLAGS: 00000013 RAX: 000001001bca8000 RBX: 0000000000000025 RCX: 00000100331a5000 RDX: 000001003ff6f4c8 RSI: 00000000000000d0 RDI: 000001003ff6f528 RBP: 0000010037e3b000 R08: 0000000000000010 R09: 0000000000000000 R10: 0000010024041f00 R11: 0000000000000070 R12: 000001003ff6f4c8 R13: 000001003ff6f480 R14: 0000010024041b00 R15: 0000010037aa9c68 FS: 0000002a9589eb00(0000) GS:ffffffff804eb900(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000006c3530 CR3: 000000003ff38000 CR4: 00000000000006e0 Process nfsd (pid: 6875, threadinfo 0000010037aa8000, task 00000100326bb030) Stack: 000000d03ff6f480 000001003ff6f480 00000000000000d0 000001000ec00580 ffffff0000263000 0000010024041b00 0000010037aa9c68 ffffffff8015b503 0000000000000202 0000000000000070 Call Trace:<ffffffff8015b503>{__kmalloc+123} <ffffffffa024fd8d>{:gfs:gmalloc+15} <ffffffffa02395f8>{:gfs:gfs_log_commit+286} <ffffffffa024e85c>{:gfs:gfs_trans_end+195} <ffffffffa0243e07>{:gfs:gfs_create+243} <ffffffff80180399>{vfs_create+210} <ffffffffa028cbd6>{:nfsd:nfsd_create_v3+811} <ffffffffa02927dc>{:nfsd:nfsd3_proc_create+307} <ffffffffa02876f9>{:nfsd:nfsd_dispatch+220} <ffffffffa012a1cb>{:sunrpc:svc_process+1160} <ffffffffa0287245>{:nfsd:nfsd+0} <ffffffffa028747d>{:nfsd:nfsd+568} <ffffffff80110cab>{child_rip+8} <ffffffffa0287245>{:nfsd:nfsd+0} <ffffffffa0287245>{:nfsd:nfsd+0} <ffffffff80110ca3>{child_rip+0}
Um. If we can reproduce this, I'll look at it. It may have gotten fixed with the other change I made to this call path.