Bug 193817
Summary: | panic on mkdir on NFS on GFS | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Nate Straz <nstraz> |
Component: | gfs | Assignee: | Wendy Cheng <nobody+wcheng> |
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2006-0561 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-10 21:35:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 180185 |
Description
Nate Straz
2006-06-01 20:24:11 UTC
I just hit this same issue, on an ia64 4 node cluster. No service relocation was taking place, just IO to NFS service. 1) Create service (GFS fs exproted via NFS) 2) mount on client node 3) Start IO Load Unable to handle kernel NULL pointer dereference (address 00000000000000f0) nfsd[13682]: Oops 11012296146944 [1] Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket pcmcia_core vfat fat button ohci_hcd ehci_hcd e100 mii tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc mptscsih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod Pid: 13682, CPU 0, comm: nfsd psr : 0000121008126010 ifs : 800000000000038b ip : [<a00000020077eba1>] Not tainted ip is at gfs_fsync+0x21/0x200 [gfs] unat: 0000000000000000 pfs : 0000000000000309 rsc : 0000000000000003 rnat: e00000000b24fc00 bsps: e00000000b24fc00 pr : 0000000000009a81 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000002008a9470 b6 : a00000020077eb80 b7 : a000000100203420 f6 : 1003e6000000d0f655180 f7 : 1003ee00000003f153b80 f8 : 1003e0000000000000035 f9 : 10002f000000000000000 f10 : 0ffffaaaaaaaaa9574a00 f11 : 1003e0000000000000001 r1 : a000000200918000 r2 : e00000000b24fd98 r3 : a00000020077eb80 r8 : 0000000000000000 r9 : e000000006079988 r10 : 0000000000000000 r11 : e00000000b24fde0 r12 : e00000000b24fd50 r13 : e00000000b248000 r14 : a0000002007c6540 r15 : e000000006079af0 r16 : e000000039374950 r17 : 0000000000000000 r18 : e00000000b24fdd8 r19 : e00000000b24fde8 r20 : e00000000b24fdf0 r21 : e00000000b24fdf8 r22 : e00000000b24fde0 r23 : 0000000000000000 r24 : e00000000b24fdd0 r25 : 0000000000000001 r26 : e00000000b24fdc8 r27 : 0000000000000000 r28 : e00000000b24fdc0 r29 : 0000000000000000 r30 : e000000039374940 r31 : 0000000000000000 Call Trace: [<a000000100016da0>] show_stack+0x80/0xa0 sp=e00000000b24f8e0 bsp=e00000000b2491f0 [<a0000001000176b0>] show_regs+0x890/0x8c0 sp=e00000000b24fab0 bsp=e00000000b2491a8 [<a00000010003e8f0>] die+0x150/0x240 sp=e00000000b24fad0 bsp=e00000000b249168 [<a000000100064440>] ia64_do_page_fault+0x8c0/0xbc0 sp=e00000000b24fad0 bsp=e00000000b249100 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 sp=e00000000b24fb80 bsp=e00000000b249100 [<a00000020077eba0>] gfs_fsync+0x20/0x200 [gfs] sp=e00000000b24fd50 bsp=e00000000b2490a8 [<a0000002008a9470>] nfsd_sync_dir+0xb0/0x100 [nfsd] sp=e00000000b24fdf0 bsp=e00000000b249078 [<a0000002008b0f00>] nfsd_create+0x700/0x940 [nfsd] sp=e00000000b24fdf0 bsp=e00000000b248fe8 [<a0000002008c4b40>] nfsd3_proc_mkdir+0x1a0/0x220 [nfsd] sp=e00000000b24fdf0 bsp=e00000000b248f90 [<a00000020089f820>] nfsd_dispatch+0x340/0x600 [nfsd] sp=e00000000b24fdf0 bsp=e00000000b248f38 [<a0000002004654d0>] svc_process+0x1630/0x1880 [sunrpc] sp=e00000000b24fdf0 bsp=e00000000b248ec0 [<a00000020089efb0>] nfsd+0x490/0x9c0 [nfsd] sp=e00000000b24fe00 bsp=e00000000b248e38 [<a000000100018c70>] kernel_thread_helper+0x30/0x60 sp=e00000000b24fe30 bsp=e00000000b248e10 [<a000000100008c60>] start_kernel_thread+0x20/0x40 sp=e00000000b24fe30 bsp=e00000000b248e10 Kernel panic - not syncing: Fatal exception After further investigation, it appears that the panic is caused by a mkdir in our tools on an NFS client when trying to start the NFS load. Relocation had not been attempted when the panic happened. This is what happens... RHEL4 defaults all exports with "sync" option .. so nfsd_create() goes to: if (EX_ISSYNC(fhp->fh_export)) { nfsd_sync_dir(dentry); write_inode_now(dchild->d_inode, 1); } And nfsd_sync_dir() calls nfsd_dosync with filep set to NULL: void nfsd_sync_dir(struct dentry *dp) { nfsd_dosync(NULL, dp, dp->d_inode->i_fop); } so nfsd_dosync() passes gfs_sync() a NULL filep: inline void nfsd_dosync(struct file *filp, struct dentry *dp, struct file_operations *fop) { struct inode *inode = dp->d_inode; int (*fsync) (struct file *, struct dentry *, int); filemap_fdatawrite(inode->i_mapping); if (fop && (fsync = fop->fsync)) fsync(filp, dp, 0); filemap_fdatawait(inode->i_mapping); } And I used filp to get the mapping pointer.... ok, fix is on the way. Code checked into CVS. Please re-try. This bug isn't interfering with our testing anymore. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0561.html |