Description of problem: got kernel panic while do nfsd containerization test Version-Release number of selected component (if applicable): kernel-5.14.0-252.el9 How reproducible: TBD Steps to Reproduce: TBD Actual results: ''' [ 210.198337] NFSD: Using nfsdcld client tracking operations. [ 210.199175] NFSD: no clients to reclaim, skipping NFSv4 grace period (net f00001f0) [ 212.109828] nfsd: last server has exited, flushing export cache [ 213.277753] ------------[ cut here ]------------ [ 213.278380] kernel BUG at mm/slub.c:385! [ 213.278897] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 213.279532] CPU: 0 PID: 20567 Comm: rpc.nfsd Kdump: loaded Not tainted 5.14.0-252.el9.x86_64 #1 [ 213.280596] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 213.281312] RIP: 0010:__slab_free+0x25f/0x3f0 [ 213.281864] Code: ff 80 7c 24 4b 00 0f 89 eb fe ff ff 48 83 c4 60 48 89 ee 4c 89 ef ba 01 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f e9 61 0e 00 00 <0f> 0b 41 f7 45 08 00 0d 21 00 0f 85 3a ff ff ff e9 2c ff ff ff 80 [ 213.284100] RSP: 0000:ffffb96940bcbbd8 EFLAGS: 00010246 [ 213.284747] RAX: ffff9a5f89c59800 RBX: ffff9a5f89c59000 RCX: ffff9a5f89c59000 [ 213.285624] RDX: 0000000000080003 RSI: ffffeef884271600 RDI: ffff9a5f80042d00 [ 213.286631] RBP: ffffeef884271600 R08: 0000000000000001 R09: ffffffff8cb68ad7 [ 213.287503] R10: 0000000000000001 R11: 0000000000000003 R12: ffff9a5f89c59000 [ 213.288381] R13: ffff9a5f80042d00 R14: ffff9a5f89c59000 R15: dead000000000100 [ 213.289250] FS: 00007f1f030f8740(0000) GS:ffff9a60b7c00000(0000) knlGS:0000000000000000 [ 213.290232] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.290948] CR2: 0000559b09034d20 CR3: 00000001093c2003 CR4: 00000000007706f0 [ 213.291828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 213.292739] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 213.293630] PKRU: 55555554 [ 213.293975] Call Trace: [ 213.294291] <TASK> [ 213.294578] ? __cancel_work_timer+0x103/0x190 [ 213.295140] rhashtable_free_and_destroy+0x47/0x140 [ 213.295754] nfs4_state_shutdown_net+0x171/0x220 [nfsd] [ 213.296443] nfsd_shutdown_net+0x2d/0x80 [nfsd] [ 213.297055] nfsd_put+0x123/0x140 [nfsd] [ 213.297572] nfsd_svc+0x15c/0x190 [nfsd] [ 213.298085] write_threads+0x95/0x100 [nfsd] [ 213.298656] ? _copy_from_user+0x3a/0x60 [ 213.299155] ? simple_transaction_get+0xc4/0xf0 [ 213.299730] ? write_pool_threads+0x230/0x230 [nfsd] [ 213.300372] nfsctl_transaction_write+0x43/0x80 [nfsd] [ 213.301047] vfs_write+0xb2/0x280 [ 213.301482] ksys_write+0x5f/0xe0 [ 213.301904] do_syscall_64+0x59/0x90 [ 213.302361] ? do_sys_openat2+0x85/0x160 [ 213.302855] ? syscall_exit_work+0x11a/0x150 [ 213.303400] ? syscall_exit_to_user_mode+0x12/0x30 [ 213.303995] ? do_syscall_64+0x69/0x90 [ 213.304470] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 213.305108] RIP: 0033:0x7f1f02f3eb97 [ 213.305566] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 213.307808] RSP: 002b:00007fff54f579b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 213.308749] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f1f02f3eb97 [ 213.309628] RDX: 0000000000000002 RSI: 000055ac9aacfc20 RDI: 0000000000000003 [ 213.310506] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007fff54f57850 [ 213.311384] R10: 0000000000000000 R11: 0000000000000246 R12: 000055ac9aacfc20 [ 213.312255] R13: 00007f1f030f86c0 R14: 00007fff54f57a80 R15: 00000000ffffffff [ 213.313134] </TASK> [ 213.313455] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink veth tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm virtio_balloon pcspkr i2c_piix4 joydev drm fuse xfs libcrc32c ata_generic ata_piix libata crct10dif_pclmul virtio_net crc32_pclmul crc32c_intel virtio_blk ghash_clmulni_intel net_failover failover serio_raw dm_mirror dm_region_hash dm_log dm_mod [ 0.000000] Linux version 5.14.0-252.el9.x86_64 (mockbuild.eng.bos.redhat.com) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-35.el9) #1 SMP PREEMPT_DYNAMIC Wed Feb 1 09:46:00 EST 2023 ''' Expected results: no panic Additional info:
Nice catch. I think I see the bug. It's tearing down the global nfs4_file_rhltable in the per-net shutdown procedures, which leads to a double free after a containerized nfs server is shut down.
You can probably reproduce this by starting the NFS server on the host, and then starting another one inside a container. Then shut the containerized one down and then the one on the host. Presumably, that should make the machine crash with a similar stack trace.
(In reply to Jeff Layton from comment #5) > You can probably reproduce this by starting the NFS server on the host, and > then starting another one inside a container. Then shut the containerized > one down and then the one on the host. Presumably, that should make the > machine crash with a similar stack trace. Good to know, now I can reproduce it 100% according to your tips :) thanks Jeff!
*** Bug 2170140 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2458