Bug 2223560 - [xfstests/nfs generic/650] kernel BUG at fs/nfsd/xdr4.h:780!
Summary: [xfstests/nfs generic/650] kernel BUG at fs/nfsd/xdr4.h:780!
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.3
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Boyang Xue
URL:
Whiteboard:
Depends On:
Blocks: 2209174
TreeView+ depends on / blocked
 
Reported: 2023-07-18 08:40 UTC by Boyang Xue
Modified: 2023-08-11 07:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-162542 0 None None None 2023-07-18 08:41:12 UTC

Description Boyang Xue 2023-07-18 08:40:35 UTC
Description of problem:
The latest RHEL-9 kernel crashes when running xfstests generic/650 on nfs. We have 2 reproduction so far, one is on -336.el9 and the other is on -339.el9, but unfortunately no vmcore has been saved. We have loop run generic/650 for hundreds times, but 0 of them reproduced this bug.

log
```
[51935.482019] XFS (vda3): Filesystem has been shut down due to log error (0x2). 
[51935.482020] XFS (vda3): Please unmount the filesystem and rectify the problem(s). 
[51935.482550] vda3: writeback error on inode 25320400, offset 2097152, sector 58684120 
[51935.497820] ------------[ cut here ]------------ 
[51935.497821] kernel BUG at fs/nfsd/xdr4.h:780! 
[51935.497828] invalid opcode: 0000 [#1] PREEMPT SMP PTI 
[51935.512241] CPU: 3 PID: 56031 Comm: nfsd Kdump: loaded Not tainted 5.14.0-339.el9.x86_64 #1 
[51935.515313] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 
[51935.517807] RIP: 0010:do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.520261] smpboot: CPU 0 is now offline 
[51935.530115] Code: 5d 41 5e c3 cc cc cc cc 8b 16 48 8b bb 08 01 00 00 48 83 c6 04 89 57 58 48 83 c7 5c e8 86 a8 f3 d6 48 8b 75 00 e9 4d ff ff ff <0f> 0b b8 00 00 27 18 e9 0e ff ff ff 90 0f 1f 44 00 00 41 57 41 56 
[51935.530119] RSP: 0018:ffffb7454099bdc8 EFLAGS: 00010246 
[51935.530122] RAX: 0000000000000000 RBX: ffff88e504d12280 RCX: 0000000000000070 
[51935.530124] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 
[51935.530125] RBP: ffffb7454099bdf8 R08: ffffb7454099bcf4 R09: 0000000000000000 
[51935.530126] R10: ffff88e51a8af1e0 R11: 0000000000000000 R12: ffff88e527e1c028 
[51935.530128] R13: ffff88e503bb8000 R14: ffff88e503bb8000 R15: ffffffff99c65d40 
[51935.530130] FS:  0000000000000000(0000) GS:ffff88e53bd80000(0000) knlGS:0000000000000000 
[51935.530137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[51935.530139] CR2: 00007f23ab5fb000 CR3: 0000000110356005 CR4: 00000000000206e0 
[51935.530141] Call Trace: 
[51935.530143]  <TASK> 
[51935.530146]  ? show_trace_log_lvl+0x1c4/0x2df 
[51935.530152]  ? show_trace_log_lvl+0x1c4/0x2df 
[51935.530156]  ? nfsd4_open+0x26f/0x4b0 [nfsd] 
[51935.530193]  ? __die_body.cold+0x8/0xd 
[51935.530196]  ? die+0x2b/0x50 
[51935.530200]  ? do_trap+0xce/0x120 
[51935.530201]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530236]  ? do_error_trap+0x65/0x80 
[51935.530238]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530271]  ? exc_invalid_op+0x4e/0x70 
[51935.530274]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530308]  ? asm_exc_invalid_op+0x16/0x20 
[51935.530312]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530345]  ? do_open_lookup+0x169/0x1f0 [nfsd] 
[51935.530391]  nfsd4_open+0x26f/0x4b0 [nfsd] 
[51935.530426]  nfsd4_proc_compound+0x44e/0x700 [nfsd] 
[51935.608662]  nfsd_dispatch+0x53/0x170 [nfsd] 
[51935.610697]  svc_process_common+0x357/0x640 [sunrpc] 
[51935.612865]  ? nfsd_svc+0x170/0x170 [nfsd] 
[51935.614767]  ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd] 
[51935.616983]  svc_process+0x12d/0x180 [sunrpc] 
[51935.618969]  nfsd+0xd5/0x190 [nfsd] 
[51935.620685]  kthread+0xe0/0x100 
[51935.622334]  ? kthread_complete_and_exit+0x20/0x20 
[51935.624482]  ret_from_fork+0x22/0x30 
[51935.626210]  </TASK> 
[51935.627577] Modules linked in: loop dm_mod tls rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm_intel snd_hwdep qxl rfkill sunrpc snd_pcm drm_ttm_helper kvm ttm irqbypass drm_kms_helper snd_timer syscopyarea snd sysfillrect soundcore i2c_piix4 virtio_balloon pcspkr joydev sysimgblt drm fuse xfs libcrc32c ata_generic crct10dif_pclmul virtio_net crc32_pclmul ata_piix net_failover crc32c_intel virtio_console failover virtio_blk libata ghash_clmulni_intel serio_raw 
```

reproduction on -336.el9
https://beaker.engineering.redhat.com/recipes/14196563
reproduction on -339.el9
https://beaker.engineering.redhat.com/recipes/14235733

Version-Release number of selected component (if applicable):
kernel-5.14.0-339.el9

How reproducible:
2 of hundreds runs

Steps to Reproduce:
1. Run xfstests generic/650 on nfs
2.
3.

Actual results:
Kernel crash

Expected results:
No kernel crash

Additional info:
N/A

Comment 1 Jeff Layton 2023-07-18 19:06:56 UTC
I'm pretty sure the call chain where we tripped the BUG is:

nfs4_open
    do_open_lookup
        set_change_info

The problem is that I can't see any way that we could end up calling set_change_info with fh_pre_saved set to false. All of the paths that lead to that call ensure that it's set to true, AFAICT.
Another possibility is that something raced in and called fh_clear_pre_post_attrs while we were processing here. That seems unlikely though, as nothing else should be operating on this compound.

Either way, it seems plausible that the apparent disk failure in the log messages above this point may have been a factor here.

A core would be really helpful if one turns up!

Comment 2 Jeff Layton 2023-07-20 12:09:49 UTC
Oh! I do see one way that this can happen. If the OPEN was not a create request, and the nfsd_lookup fails, we'll call fh_fill_both_attrs(), which does this:

void fh_fill_both_attrs(struct svc_fh *fhp)
{
        fh_fill_post_attrs(fhp);
        if (!fhp->fh_post_saved)
                return;
        fhp->fh_pre_change = fhp->fh_post_change;
        fhp->fh_pre_mtime = fhp->fh_post_attr.mtime;
        fhp->fh_pre_ctime = fhp->fh_post_attr.ctime;
        fhp->fh_pre_size = fhp->fh_post_attr.size;
        fhp->fh_pre_saved = true;
}

...if fh_fill_post_attrs also fails at that point, then fh_post_saved will be false and fh_pre_saved will never be set to true. So this is likely related to the disk failure that happened just before. Both operations failed, and fh_pre_saved never got set.

The code to do this changed in the last few years to lean more heavily on getattr operations to fetch this info. Prior to that, it would fetch this info directly out of the inode, so we could assert on fh_pre_saved being set like this. Now, that's no longer the case -- we need to be able to cope with this failing.

I think we probably just need to rip out the BUG_ON, but we may need to do some further work to ensure that we comply with the protocol. It's not clear to me what you're supposed to do when you can't fetch the change info. I'll have to dive into the spec.

Comment 4 Boyang Xue 2023-07-24 03:26:49 UTC
As I had not successfully reproduce this bug by loop running generic/650 sololy, I submitted 10 jobs to run full xfstests (assumed previous tests affecting generic/650) on the latest -340.el9, but there's no reproduction unfortunately.

Link to Beaker jobs - full xfstests runs:
https://url.corp.redhat.com/8c2741d

Link to Beaker jobs - generic/650 solo runs:
https://url.corp.redhat.com/44a7d4b

Comment 5 Jeff Layton 2023-07-24 12:35:08 UTC
Yeah, I think this is going to be a hard one to reliably reproduce. You basically need a disk failure to coincide with a non-O_CREAT open request on a dentry that is already in cache.

The current upstream patch and discussion is here.

    https://lore.kernel.org/linux-nfs/963cc8896ff6a6759c57f1d97e51f35972d4fc6d.camel@kernel.org/T/#t

I'll see if I can get this in for v6.5.


Note You need to log in before you can comment on or make changes to this bug.