Hi, Description of problem: execution of a particular program from the Arachne suite reliably causes a kernel panic due to a NULL-pointer dereference in nfs4_proc_lock(). Version-Release number of selected component (if applicable): 2.6.18-164.2.1.el5 How reproducible: always on NFSv4 mounted directories Steps to Reproduce: 1. wget http://www.genoscope.cns.fr/externe/redhat/XMLMissingField 2. Save a copy on an NFSv4-mounted directory 3. Execute it Actual results: Kernel panic Expected results: No panic Additional info: Console output: Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: [<ffffffff8837b210>] :nfs:nfs4_proc_lock+0x21f/0x3ad PGD 1026eec067 PUD 1026f2f067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /block/dm-1/range CPU 0 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl sunrpc bonding ipv6 xfrm_nalgo crypto_api video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac joydev sg shpchp i2c_nforce2 i2c_core forcedeth dm_snapshot dm_zero dm_mod sata_nv libata mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 4070, comm: XMLMissingField Not tainted 2.6.18-164.2.1.el5 #1 RIP: 0010:[<ffffffff8837b210>] [<ffffffff8837b210>] :nfs:nfs4_proc_lock+0x21f/0x3ad RSP: 0018:ffff810819bdbdd8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff810827c52088 RSI: 0000000000000006 RDI: ffff810819bdbe38 RBP: ffff81081a6dfdc0 R08: 0000000000000001 R09: ffff810819bdbd68 R10: ffff810819bdbd68 R11: 00000000000000d0 R12: ffff810827c52088 R13: 0000000000000000 R14: ffff810819a9b930 R15: 0000000000000006 FS: 00002b97d31fc7b0(0000) GS:ffffffff803c1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000030 CR3: 00000010268cb000 CR4: 00000000000006e0 Process XMLMissingField (pid: 4070, threadinfo ffff810819bda000, task ffff810827d6a7e0) Stack: 00000000000003e8 0000000000000000 ffff810819a9b930 ffffffff88373e4f 0000000000000000 0000000000000000 0000000000000000 0000000019a9ba40 ffff810819bdbe18 ffff810819bdbe18 0000000000000000 0000000000000000 Call Trace: [<ffffffff88373e4f>] :nfs:nfs_sync_inode_wait+0x116/0x1db [<ffffffff8836a226>] :nfs:do_setlk+0x55/0x8c [<ffffffff80039e72>] fcntl_setlk+0x11e/0x273 [<ffffffff800b66fa>] audit_syscall_entry+0x180/0x1b3 [<ffffffff8002e5bb>] sys_fcntl+0x269/0x2dc [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 49 8b 45 30 4c 89 e6 4c 89 ef 45 8a 74 24 58 48 8b 40 18 48 RIP [<ffffffff8837b210>] :nfs:nfs4_proc_lock+0x21f/0x3ad RSP <ffff810819bdbdd8> CR2: 0000000000000030 <0>Kernel panic - not syncing: Fatal exception Also see http://www.spinics.net/linux/lists/linux-nfs/msg03357.html
Here's what this binary does: open("/proc/self/exe", O_RDONLY) = 3 fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0} I'll see if I can roll up a more stripped down reproducer (with source code).
Here's the assembly from the crash: 169ec: e8 37 f0 ff ff callq 15a28 <nfs4_wait_for_completion_rpc_task> 169f1: 48 89 df mov %rbx,%rdi 169f4: 89 c5 mov %eax,%ebp 169f6: e8 00 00 00 00 callq 169fb <nfs4_proc_lock+0x217> 169fb: 45 88 b4 24 88 00 00 mov %r14b,0x88(%r12) 16a02: 00 16a03: e9 ab 01 00 00 jmpq 16bb3 <nfs4_proc_lock+0x3cf> 16a08: bd 00 fe ff ff mov $0xfffffe00,%ebp 16a0d: e9 a1 01 00 00 jmpq 16bb3 <nfs4_proc_lock+0x3cf> 16a12: fc cld 16a13: 48 8d 7c 24 50 lea 0x50(%rsp),%rdi 16a18: 31 c0 xor %eax,%eax 16a1a: ab stos %eax,%es:(%rdi) 16a1b: ab stos %eax,%es:(%rdi) 16a1c: ab stos %eax,%es:(%rdi) 16a1d: ab stos %eax,%es:(%rdi) 16a1e: 49 8b 45 30 mov 0x30(%r13),%rax <<<< CRASH HERE 16a22: 4c 89 e6 mov %r12,%rsi 16a25: 4c 89 ef mov %r13,%rdi 16a28: 45 8a b4 24 88 00 00 mov 0x88(%r12),%r14b 16a2f: 00 16a30: 48 8b 40 48 mov 0x48(%rax),%rax 16a34: 48 89 44 24 08 mov %rax,0x8(%rsp) 16a39: e8 00 00 00 00 callq 16a3e <nfs4_proc_lock+0x25a> 16a3e: 85 c0 test %eax,%eax 16a40: 89 c3 mov %eax,%ebx 16a42: 0f 85 e4 00 00 00 jne 16b2c <nfs4_proc_lock+0x348> 16a48: 41 80 8c 24 88 00 00 orb $0x8,0x88(%r12) 16a4f: 00 08 16a51: 49 8b bc 24 80 00 00 mov 0x80(%r12),%rdi 16a58: 00 16a59: 4c 89 e6 mov %r12,%rsi 16a5c: e8 c9 e8 ff ff callq 1532a <do_vfs_lock>
Created attachment 367124 [details] reproducer program This program can reproduce the problem easily... I was hoping that the patch for 511278 would fix this too, but that doesn't seem to be the case. Still working on determining the cause, there are a lot of inlined functions in this area so it's hard to tell exactly where it crashed.
The crash is actually occurring in nfs4_proc_setlk(). It does this: do { err = nfs4_handle_exception(NFS_SERVER(state->inode), _nfs4_proc_setlk(state, cmd, request), &exception); } while (exception.retry); ...but the problem here is that "state" is NULL and dereferencing the inode pointer in it causes the crash. So the real problem appears to be in the open code which is where state should be getting set.
A little bit of printk debugging: ------------------[snip]-------------------- nfs4_intent_set_file: filp=ffff810029c75a70 Call Trace: [<ffffffff88578c54>] :nfs:nfs4_intent_set_file+0x3e/0x6f [<ffffffff88579e9e>] :nfs:nfs4_open_revalidate+0x8c/0x43f [<ffffffff88564a2b>] :nfs:nfs_open_revalidate+0x13d/0x21d [<ffffffff8000d8ac>] do_lookup+0x191/0x1e8 [<ffffffff8000a5bc>] __link_path_walk+0xa1d/0xf62 [<ffffffff8000f370>] link_path_walk+0x42/0xb2 [<ffffffff8000d59a>] do_path_lookup+0x279/0x2f6 [<ffffffff80024a13>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8003e3ad>] open_exec+0x24/0xc0 [<ffffffff80041036>] do_execve+0x32/0x1f7 [<ffffffff80041036>] do_execve+0x32/0x1f7 [<ffffffff8000cece>] cache_alloc_debugcheck_after+0xef/0x1c1 [<ffffffff8004104a>] do_execve+0x46/0x1f7 [<ffffffff8005796a>] sys_execve+0x36/0x4c [<ffffffff8006050b>] stub_execve+0x67/0xb0 nfs4_proc_lock: filp=ffff81002a302c20 ctx=ffff810027702b90 nfs4_proc_setlk: state=0000000000000000 file_lock=ffff810038c88838 nfs4_proc_setlk: calling nfs4_handle_exception Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: [<ffffffff88576ac8>] :nfs:nfs4_proc_lock+0x286/0x459 PGD 27abe067 PUD 27abd067 PMD 0 ------------------[snip]-------------------- One major problem here. The filp passed to nfs4_proc_lock is different from the one that was set up in the execve call. Possibly a VFS bug where we're not attaching the filp instantiated in the lookup to the fd?
Of course though...that *should* be the case. We're not interested in the filp associated with the execve call, but rather the one associated with the open(). Very confusing reproducer...I'll keep poking at it.
Turns out that the nfs_debug facility in RHEL5 is broken (and has been since GA). Some messages don't get printed out. Here's some more output with that fixed: ---------------------[snip]-------------------- NFS: nfs_fhget(0:18/43651 ct=1) nfs_file_open: i_ino=43651 filp=ffff810029db5380 nfs4_intent_set_file: filp=ffff810029db5380 Call Trace: [<ffffffff885790a0>] :nfs:nfs4_intent_set_file+0x3e/0x6f [<ffffffff8857ac44>] :nfs:nfs4_atomic_open+0x184/0x1a4 [<ffffffff800239af>] d_alloc+0x24/0x1d3 [<ffffffff8856489b>] :nfs:nfs_atomic_lookup+0xb2/0x105 [<ffffffff8000d802>] do_lookup+0xe7/0x1e8 [<ffffffff8000a5bc>] __link_path_walk+0xa1d/0xf62 [<ffffffff8000f370>] link_path_walk+0x42/0xb2 [<ffffffff8000d59a>] do_path_lookup+0x279/0x2f6 [<ffffffff80024a13>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8003e3ad>] open_exec+0x24/0xc0 [<ffffffff80041036>] do_execve+0x32/0x1f7 [<ffffffff80041036>] do_execve+0x32/0x1f7 [<ffffffff8000cece>] cache_alloc_debugcheck_after+0xef/0x1c1 [<ffffffff8004104a>] do_execve+0x46/0x1f7 [<ffffffff8005796a>] sys_execve+0x36/0x4c [<ffffffff8006050b>] stub_execve+0x67/0xb0 encode_compound: tag= NFS: permission(0:18/43651), mask=0x1, res=0 nfs: read(/bz529227, 128@0) NFS: nfs_readpages (0:18/43651 2) NFS: 0 initiated read call (req 0:18/43651, 7578 bytes @ offset 0) encode_compound: tag= nfs_readpage_result: 111, (status 7578) NFS: read done (0:18/43651 4096@0) NFS: read done (0:18/43651 3482@4096) nfs: read(/bz529227, 448@64) nfs: read(/bz529227, 28@512) NFS: permission(0:18/43651), mask=0x4, res=0 nfs: mmap(/bz529227) nfs: mmap(/bz529227) nfs: flush(0:18/43651) ^^^^ exec call ends... NFS: permission(0:18/43651), mask=0x4, res=0 nfs_file_open: i_ino=43651 filp=ffff81001ef10d50 ^^^^ open call -- no lookup NFS: nfs_lock(f=0:18/43651, t=0, fl=1, r=0:9223372036854775807) nfs4_proc_lock: filp=ffff81001ef10d50 ctx=ffff81001ecaf560 nfs4_proc_setlk: state=0000000000000000 file_lock=ffff810038d58268 nfs4_proc_setlk: calling nfs4_handle_exception Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: ^^^^ oops because context state is zeroed out --------------------[snip]----------------------- I think we need to ensure that the state is set on the open context when a lookup doesn't occur, or force a lookup on the second open here somehow. I'll need to check more recent kernels and see how this works there.
The really odd thing here is that in this situation, the second open doesn't seem to cause d_revalidate to get called. Obviously we need to do a lookup there, and it's probably in the dcache at that point. The VFS should call d_revalidate there, but I never see nfs_open_revalidate get called. That's the d_revalidate function for nfs4 so it should be getting called here. I need to confirm whether I'm doing something wrong, but I don't think that I am.
Ok, I've identified a patch that prevents this oops in rawhide. The downside is that it causes the SETLK call to return -ENOLCK there instead of getting a successful read lock. So I still think there's something not quite right here. What we'll probably need to do is to take that patch in the near term, and then look at fixing this the "right way" upstream.
I have some kernels on my people.redhat.com page that have a patch for this issue. If you have a non-critical place to do so, could you test them and report whether they fix the issue for you? The kernels are here: http://people.redhat.com/jlayton
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Hi, thanks for looking into this. I have installed your test kernel (2.6.18-172.el5.jtltest.92) and can confirm that the panic does not happen anymore. From what I've seen, the lock is acquired principally in order to prevent make from overwriting the executable, which in our case is unlikely -- so returning ENOLCK would be OK from our standpoint. The general problem remains, however: have you been able to contact upstream about this ? For information, below is the corresponding source leading to the lock request: // Place an advisory lock on the current executable file. This will // go away automatically when the process ends for any reason, as // will the file descriptor that's needed to make the lock. We use // the lock to prevent make from overwriting the current executable, // as that may have disastrous effects on the running process in // certain circumstances. void readLockThisExe() { // Note: While this works on LSF and wga*, it does not work on the // alphas. There is no such file so the open() will fail and no // lock will be placed. const char filename[] = "/proc/self/exe"; int ret; int fid = open(filename, O_RDONLY, 0); if (fid>0) { ret = lock_reg(fid, F_SETLK, F_RDLCK, 0, SEEK_SET, 0); // Note we ignore errors in setting the lock. } } Simon
Oops -- Here is lock_reg, as called in readLockThisExe(): // Helper for setting advisory locks, from Stevens. int lock_reg(int fd, int cmd, int type, off_t offset, int whence, off_t len) { struct flock lock; lock.l_type = type; lock.l_start = offset; lock.l_whence = whence; lock.l_len = len; return fcntl(fd, cmd, &lock); } Simon
No, I haven't contacted upstream on this yet. I want to do a bit more research and see if I can determine the cause (and maybe generate a patch). In the interim though we'll see about getting the fix for the oops in an errata soon.
Upstream commit: http://git.kernel.org/linus/d953126a28f97ec965d23c69fd5795854c048f30
(In reply to comment #3) > Created an attachment (id=367124) [details] > reproducer program > > This program can reproduce the problem easily... > > I was hoping that the patch for 511278 would fix this too, but that doesn't > seem to be the case. Still working on determining the cause, there are a lot of > inlined functions in this area so it's hard to tell exactly where it crashed. To clarify on how to use this... Build this program and copy it to any nfs4 mount and run it.
How to mitigate against NULL pointer dereference vulnerabilities? Read http://kbase.redhat.com/faq/docs/DOC-20536
This issue has been addressed in following products: MRG for RHEL-5 Via RHSA-2009:1635 https://rhn.redhat.com/errata/RHSA-2009-1635.html
We have had a report with a different manifestation of the same bug this time while unlocking a file having a NULL state. ============================================================================ BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: c060f3f1 *pde = 15ea2001 Oops: 0002 [#1] SMP last sysfs file: /class/net/eth7/address Modules linked in: linuxshield(U) lshook(U) mptctl autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2c ap bluetooth ipmi(U) smbus(U) sunrpc bonding loop dm_multipath scsi_dh video sr_mod cdrom hwmon back light sbs i2c_ec button battery asus_acpi ac parport_pc lp parport usb_storage sg i2c_i801 i2c_core tg3 pcspkr libphy bnx2 serio_raw dm_snapshot dm_zero dm_mirror dm_log dm_mod megaraid_sas mptsas mpt scsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<c060f3f1>] Tainted: G VLI EFLAGS: 00010246 (2.6.18-128.2.1.el5PAE #1) EIP is at _spin_lock+0x0/0xf eax: 00000024 ebx: f91e3fa0 ecx: d7777f30 edx: d7777f30 esi: 00000000 edi: d7777f30 ebp: 00000000 esp: d7777e90 ds: 007b es: 007b ss: 0068 Process scanner (pid: 7956, ti=d7777000 task=d781a550 task.ti=d7777000) Stack: f91dfee1 d7777f30 d75880c0 f91e3fa0 d7777f30 d7777f30 00000006 f91d6849 00000000 000003e8 f28c4d40 f28c4e88 00000000 f91d0206 00000000 00000000 00000000 00000000 00000000 d7777edc d7777edc f28c4f30 00000000 d7777f30 Call Trace: [<f91dfee1>] nfs4_set_lock_state+0x4a/0xe9 [nfs] [<f91d6849>] nfs4_proc_lock+0x16d/0x374 [nfs] [<f91d0206>] nfs_sync_inode_wait+0xf5/0x1ab [nfs] [<f91c75d7>] do_unlk+0x3f/0x58 [nfs] [<f91c76f4>] nfs_lock+0x0/0xfb [nfs] [<c0483d7b>] vfs_lock_file+0x2d/0x31 [<c048489e>] locks_remove_posix+0x7d/0x97 [<c0436ef1>] hrtimer_try_to_cancel+0x3c/0x42 [<c0427a3c>] do_setitimer+0xe0/0x4b0 [<c046fb8e>] filp_close+0x47/0x54 [<c0470dc1>] sys_close+0x71/0xa0 [<c0404f17>] syscall_call+0x7/0xb Code: 79 05 f0 ff 02 30 c9 89 c8 c3 89 c2 f0 81 28 00 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 f 0 81 02 00 00 00 01 30 c9 89 c8 c3 <f0> fe 08 79 09 f3 90 80 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 EIP: [<c060f3f1>] _spin_lock+0x0/0xf SS:ESP 0068:d7777e90 ============================================================================= The crash happens at static struct nfs4_lock_state *nfs4_get_lock_state(struct nfs4_state *state, fl_owner_t owner) { struct nfs4_lock_state *lsp, *new = NULL; for(;;) { spin_lock(&state->state_lock); <-- here lsp = __nfs4_find_lock_state(state, owner); } The problem is caused by nfs4_proc_lock() passing a NULL value as state to nfs4_proc_unlck(). static int nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request) { .. if (request->fl_type == F_UNLCK) return nfs4_proc_unlck(state, cmd, request); .. }
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2009:1670 https://rhn.redhat.com/errata/RHSA-2009-1670.html
Acknowledgements: Red Hat would like to thank Simon Vallet for responsibly reporting this issue.
This issue has been addressed in following products: Red Hat Enterprise Linux 4 Via RHSA-2010:0474 https://rhn.redhat.com/errata/RHSA-2010-0474.html
Statement: The Linux kernel as shipped with Red Hat Enterprise Linux 3 did not have support for NFSv4, and therefore is not affected by this issue. It was addressed in Red Hat Enterprise Linux 4, 5, and Red Hat Enterprise MRG via https://rhn.redhat.com/errata/RHSA-2010-0474.html, https://rhn.redhat.com/errata/RHSA-2009-1670.html and https://rhn.redhat.com/errata/RHSA-2009-1635.html respectively.