Bug 848867
| Summary: | 3.5.1-1.fc17.x86_64 kernel panic | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | H.J. Lu <hongjiu.lu> |
| Component: | kernel | Assignee: | Jeff Layton <jlayton> |
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 17 | CC: | bfields, gansalmon, htl10, itamar, jaroslav.sykora, jlayton, jonathan, kernel-maint, madhu.chinakonda, steved |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-12-05 15:18:21 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
H.J. Lu
2012-08-16 15:51:58 UTC
(gdb) list *(svc_destroy+0x1a)
0xe1aa is in svc_destroy (net/sunrpc/svc.c:540).
535 * protect the sv_nrthreads, sv_permsocks and sv_tempsocks.
536 */
537 void
538 svc_destroy(struct svc_serv *serv)
539 {
540 dprintk("svc: svc_destroy(%s, %d)\n",
541 serv->sv_program->pg_name,
542 serv->sv_nrthreads);
543
544 if (serv->sv_nrthreads) {
...which was called from:
(gdb) list *(write_ports+0x2cf)
0x1b9f is in write_ports (fs/nfsd/nfsctl.c:679).
674 err = svc_addsock(nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT);
675 if (err < 0) {
676 if (nfsd_serv->sv_nrthreads == 1)
677 svc_shutdown_net(nfsd_serv, net);
678 svc_destroy(nfsd_serv);
679 return err;
680 }
681
682 /* Decrease the count, but don't shut down the service */
683 nfsd_serv->sv_nrthreads--;
(gdb) disassemble svc_destroy
Dump of assembler code for function svc_destroy:
0x000000000000e190 <+0>: push %rbp
0x000000000000e191 <+1>: mov %rsp,%rbp
0x000000000000e194 <+4>: push %rbx
0x000000000000e195 <+5>: sub $0x8,%rsp
0x000000000000e199 <+9>: callq 0xe19e <svc_destroy+14>
0x000000000000e19e <+14>: testb $0x2,0x0(%rip) # 0xe1a5 <svc_destroy+21>
0x000000000000e1a5 <+21>: mov %rdi,%rbx
0x000000000000e1a8 <+24>: jne 0xe2a5 <svc_destroy+277>
0x000000000000e1ae <+30>: mov 0x14(%rbx),%eax
...so I think what happened was that nfsd_serv got set to NULL just before svc_destroy got called on it. IOW, a race of some sort (ugh).
I've got the same: Fedora 17, 3.5.3-1.fc17.x86_64 Sep 19 07:48:14 nibbler kernel: [ 36.710360] nfsd: last server has exited, flushing export cache Sep 19 07:48:14 nibbler kernel: [ 36.710394] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 Sep 19 07:48:14 nibbler kernel: [ 36.710418] IP: [<ffffffffa028c1ae>] svc_destroy+0x1e/0x140 [sunrpc] Sep 19 07:48:14 nibbler kernel: [ 36.710447] PGD 0 Sep 19 07:48:14 nibbler kernel: [ 36.710453] Oops: 0000 [#1] SMP Sep 19 07:48:14 nibbler kernel: [ 36.710461] CPU 2 Sep 19 07:48:14 nibbler kernel: [ 36.710465] Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle vboxpci(O) bridge stp llc vboxnetadp(O) bnep bluetooth rfkill vboxnetflt(O) vboxdrv(O) be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi max6650 f71882fg ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_physdev nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_realtek nfsd snd_hda_intel nfs_acl snd_hda_codec auth_rpcgss snd_hwdep snd_pcm snd_page_alloc snd_timer snd r8169 lpc_ich mfd_core e1000e soundcore mii i2c_i801 coretemp serio_raw microcode lockd vhost_net tun macvtap macvlan sunrpc kvm_intel kvm binfmt_misc uinput firewire_ohci firewire_core crc_itu_t nouveau mxm_wmi wmi video i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] Sep 19 07:48:14 nibbler kernel: [ 36.710686] Sep 19 07:48:14 nibbler kernel: [ 36.710689] Pid: 939, comm: rpc.nfsd Tainted: G C O 3.5.3-1.fc17.x86_64 #1 Hewlett-Packard HP Compaq dx7500 Microtower/2A84h Sep 19 07:48:14 nibbler kernel: [ 36.710705] RIP: 0010:[<ffffffffa028c1ae>] [<ffffffffa028c1ae>] svc_destroy+0x1e/0x140 [sunrpc] Sep 19 07:48:14 nibbler kernel: [ 36.710726] RSP: 0018:ffff88020547de28 EFLAGS: 00010246 Sep 19 07:48:14 nibbler kernel: [ 36.710735] RAX: 00000000fffffff3 RBX: 0000000000000000 RCX: 0000000000000100 Sep 19 07:48:14 nibbler kernel: [ 36.710743] RDX: 0000000000000100 RSI: 0000000050595c7a RDI: 0000000000000000 Sep 19 07:48:14 nibbler kernel: [ 36.710751] RBP: ffff88020547de38 R08: ffff8802125c7000 R09: 0000000000000000 Sep 19 07:48:14 nibbler kernel: [ 36.710758] R10: 0000000000000000 R11: 0000000000040000 R12: 0000000000000002 Sep 19 07:48:14 nibbler kernel: [ 36.710766] R13: 0000000000000004 R14: ffff88020547df50 R15: 00007fc6681cd2ae Sep 19 07:48:14 nibbler kernel: [ 36.710774] FS: 00007fc668186740(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000 Sep 19 07:48:14 nibbler kernel: [ 36.710782] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 19 07:48:14 nibbler kernel: [ 36.710789] CR2: 0000000000000014 CR3: 000000020e41a000 CR4: 00000000000407e0 Sep 19 07:48:14 nibbler kernel: [ 36.710797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 19 07:48:14 nibbler kernel: [ 36.710804] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 19 07:48:14 nibbler kernel: [ 36.710811] Process rpc.nfsd (pid: 939, threadinfo ffff88020547c000, task ffff8802055d2e20) Sep 19 07:48:14 nibbler kernel: [ 36.710819] Stack: Sep 19 07:48:14 nibbler kernel: [ 36.710823] ffff880210d62008 ffff880210d62008 ffff88020547dec8 ffffffffa0398b9f Sep 19 07:48:14 nibbler kernel: [ 36.710836] ffff88020547df50 00007fc6fffffff3 ffff880210d62009 ffff88020547de6f Sep 19 07:48:14 nibbler kernel: [ 36.710850] 0034fffffffffff4 0000000000000002 ffff88020547de88 ffffffff81129a36 Sep 19 07:48:14 nibbler kernel: [ 36.710863] Call Trace: Sep 19 07:48:14 nibbler kernel: [ 36.710878] [<ffffffffa0398b9f>] write_ports+0x2cf/0x3e0 [nfsd] Sep 19 07:48:14 nibbler kernel: [ 36.710890] [<ffffffff81129a36>] ? get_zeroed_page+0x16/0x20 Sep 19 07:48:14 nibbler kernel: [ 36.710899] [<ffffffff811aae95>] ? simple_transaction_get+0xc5/0xe0 Sep 19 07:48:14 nibbler kernel: [ 36.710911] [<ffffffffa03988d0>] ? write_gracetime+0x60/0x60 [nfsd] Sep 19 07:48:14 nibbler kernel: [ 36.710923] [<ffffffffa0398047>] nfsctl_transaction_write+0x57/0x90 [nfsd] Sep 19 07:48:14 nibbler kernel: [ 36.710934] [<ffffffff811881bc>] vfs_write+0xac/0x180 Sep 19 07:48:14 nibbler kernel: [ 36.710941] [<ffffffff811884ea>] sys_write+0x4a/0x90 Sep 19 07:48:14 nibbler kernel: [ 36.710953] [<ffffffff81614ae9>] system_call_fastpath+0x16/0x1b Sep 19 07:48:14 nibbler kernel: [ 36.710959] Code: 68 37 e1 eb b9 e8 23 c1 dc e0 0f 1f 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 f6 05 68 1b 02 00 02 48 89 fb 0f 85 f7 00 00 00 <8b> 43 14 85 c0 0f 84 d6 00 00 00 83 e8 01 85 c0 89 43 14 0f 85 Sep 19 07:48:14 nibbler kernel: [ 36.711066] RIP [<ffffffffa028c1ae>] svc_destroy+0x1e/0x140 [sunrpc] Sep 19 07:48:14 nibbler kernel: [ 36.711084] RSP <ffff88020547de28> Sep 19 07:48:14 nibbler kernel: [ 36.711089] CR2: 0000000000000014 Sep 19 07:48:14 nibbler kernel: [ 36.734295] ---[ end trace db1e88ecfb895970 ]--- Sep 19 07:48:14 nibbler systemd[1]: nfs-server.service: main process exited, code=killed, status=9 Sep 19 07:48:14 nibbler systemd[1]: Job nfs-idmap.service/start failed with result 'dependency'. Sep 19 07:48:14 nibbler systemd[1]: Job nfs-rquotad.service/start failed with result 'dependency'. Sep 19 07:48:14 nibbler systemd[1]: Job nfs-mountd.service/start failed with result 'dependency'. Sep 19 07:48:14 nibbler systemd[1]: Unit nfs-server.service entered failed state. related: bug 858489 *** Bug 858489 has been marked as a duplicate of this bug. *** Ok, I think I see the bug here. It was a regression created by 074d0f67cfe. In nfsd_create_serv, we create the struct svc_serv by calling svc_create_pooled(). That function is passed an argument that sets the sv_shutdown parameter to nfsd_last_thread(). That function is what sets nfsd_serv to NULL. It then tries to svc_bind the service to register it with rpcbind. This fails, and we end up calling svc_destroy() on the service. It used to be that svc_destroy() would call sv_shutdown but in early 3.5 kernels, that was moved to svc_shutdown_net(). So now we have nfsd_serv set to a non-NULL value, but its refcount has gone to zero and it has been destroyed. On the next call into I think all of this code was fixed upstream in commit 57c8b13e3c, which unfortunately relies on a number of previous commits. I'm going to close this with a resolution of ERRATA since any kernel 3.6 and up shouldn't be vulnerable to the problem. |