Bug 2223560

Summary:	[xfstests/nfs generic/650] kernel BUG at fs/nfsd/xdr4.h:780!
Product:	Red Hat Enterprise Linux 9	Reporter:	Boyang Xue <bxue>
Component:	kernel	Assignee:	Jeff Layton <jlayton>
kernel sub component:	NFS	QA Contact:	Boyang Xue <bxue>
Status:	CLOSED MIGRATED	Docs Contact:
Severity:	high
Priority:	medium	CC:	jlayton, nfs-team, xzhou, yoyang
Version:	9.3	Keywords:	MigratedToJIRA, Triaged
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-23 11:55:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2209174

Description Boyang Xue 2023-07-18 08:40:35 UTC

Description of problem:
The latest RHEL-9 kernel crashes when running xfstests generic/650 on nfs. We have 2 reproduction so far, one is on -336.el9 and the other is on -339.el9, but unfortunately no vmcore has been saved. We have loop run generic/650 for hundreds times, but 0 of them reproduced this bug.

log
```
[51935.482019] XFS (vda3): Filesystem has been shut down due to log error (0x2). 
[51935.482020] XFS (vda3): Please unmount the filesystem and rectify the problem(s). 
[51935.482550] vda3: writeback error on inode 25320400, offset 2097152, sector 58684120 
[51935.497820] ------------[ cut here ]------------ 
[51935.497821] kernel BUG at fs/nfsd/xdr4.h:780! 
[51935.497828] invalid opcode: 0000 [#1] PREEMPT SMP PTI 
[51935.512241] CPU: 3 PID: 56031 Comm: nfsd Kdump: loaded Not tainted 5.14.0-339.el9.x86_64 #1 
[51935.515313] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 
[51935.517807] RIP: 0010:do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.520261] smpboot: CPU 0 is now offline 
[51935.530115] Code: 5d 41 5e c3 cc cc cc cc 8b 16 48 8b bb 08 01 00 00 48 83 c6 04 89 57 58 48 83 c7 5c e8 86 a8 f3 d6 48 8b 75 00 e9 4d ff ff ff <0f> 0b b8 00 00 27 18 e9 0e ff ff ff 90 0f 1f 44 00 00 41 57 41 56 
[51935.530119] RSP: 0018:ffffb7454099bdc8 EFLAGS: 00010246 
[51935.530122] RAX: 0000000000000000 RBX: ffff88e504d12280 RCX: 0000000000000070 
[51935.530124] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 
[51935.530125] RBP: ffffb7454099bdf8 R08: ffffb7454099bcf4 R09: 0000000000000000 
[51935.530126] R10: ffff88e51a8af1e0 R11: 0000000000000000 R12: ffff88e527e1c028 
[51935.530128] R13: ffff88e503bb8000 R14: ffff88e503bb8000 R15: ffffffff99c65d40 
[51935.530130] FS:  0000000000000000(0000) GS:ffff88e53bd80000(0000) knlGS:0000000000000000 
[51935.530137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[51935.530139] CR2: 00007f23ab5fb000 CR3: 0000000110356005 CR4: 00000000000206e0 
[51935.530141] Call Trace: 
[51935.530143]  <TASK> 
[51935.530146]  ? show_trace_log_lvl+0x1c4/0x2df 
[51935.530152]  ? show_trace_log_lvl+0x1c4/0x2df 
[51935.530156]  ? nfsd4_open+0x26f/0x4b0 [nfsd] 
[51935.530193]  ? __die_body.cold+0x8/0xd 
[51935.530196]  ? die+0x2b/0x50 
[51935.530200]  ? do_trap+0xce/0x120 
[51935.530201]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530236]  ? do_error_trap+0x65/0x80 
[51935.530238]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530271]  ? exc_invalid_op+0x4e/0x70 
[51935.530274]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530308]  ? asm_exc_invalid_op+0x16/0x20 
[51935.530312]  ? do_open_lookup+0x1e3/0x1f0 [nfsd] 
[51935.530345]  ? do_open_lookup+0x169/0x1f0 [nfsd] 
[51935.530391]  nfsd4_open+0x26f/0x4b0 [nfsd] 
[51935.530426]  nfsd4_proc_compound+0x44e/0x700 [nfsd] 
[51935.608662]  nfsd_dispatch+0x53/0x170 [nfsd] 
[51935.610697]  svc_process_common+0x357/0x640 [sunrpc] 
[51935.612865]  ? nfsd_svc+0x170/0x170 [nfsd] 
[51935.614767]  ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd] 
[51935.616983]  svc_process+0x12d/0x180 [sunrpc] 
[51935.618969]  nfsd+0xd5/0x190 [nfsd] 
[51935.620685]  kthread+0xe0/0x100 
[51935.622334]  ? kthread_complete_and_exit+0x20/0x20 
[51935.624482]  ret_from_fork+0x22/0x30 
[51935.626210]  </TASK> 
[51935.627577] Modules linked in: loop dm_mod tls rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm_intel snd_hwdep qxl rfkill sunrpc snd_pcm drm_ttm_helper kvm ttm irqbypass drm_kms_helper snd_timer syscopyarea snd sysfillrect soundcore i2c_piix4 virtio_balloon pcspkr joydev sysimgblt drm fuse xfs libcrc32c ata_generic crct10dif_pclmul virtio_net crc32_pclmul ata_piix net_failover crc32c_intel virtio_console failover virtio_blk libata ghash_clmulni_intel serio_raw 
```

reproduction on -336.el9
https://beaker.engineering.redhat.com/recipes/14196563
reproduction on -339.el9
https://beaker.engineering.redhat.com/recipes/14235733

Version-Release number of selected component (if applicable):
kernel-5.14.0-339.el9

How reproducible:
2 of hundreds runs

Steps to Reproduce:
1. Run xfstests generic/650 on nfs
2.
3.

Actual results:
Kernel crash

Expected results:
No kernel crash

Additional info:
N/A

Comment 1 Jeff Layton 2023-07-18 19:06:56 UTC

I'm pretty sure the call chain where we tripped the BUG is:

nfs4_open
    do_open_lookup
        set_change_info

The problem is that I can't see any way that we could end up calling set_change_info with fh_pre_saved set to false. All of the paths that lead to that call ensure that it's set to true, AFAICT.
Another possibility is that something raced in and called fh_clear_pre_post_attrs while we were processing here. That seems unlikely though, as nothing else should be operating on this compound.

Either way, it seems plausible that the apparent disk failure in the log messages above this point may have been a factor here.

A core would be really helpful if one turns up!

Comment 2 Jeff Layton 2023-07-20 12:09:49 UTC

Oh! I do see one way that this can happen. If the OPEN was not a create request, and the nfsd_lookup fails, we'll call fh_fill_both_attrs(), which does this:

void fh_fill_both_attrs(struct svc_fh *fhp)
{
        fh_fill_post_attrs(fhp);
        if (!fhp->fh_post_saved)
                return;
        fhp->fh_pre_change = fhp->fh_post_change;
        fhp->fh_pre_mtime = fhp->fh_post_attr.mtime;
        fhp->fh_pre_ctime = fhp->fh_post_attr.ctime;
        fhp->fh_pre_size = fhp->fh_post_attr.size;
        fhp->fh_pre_saved = true;
}

...if fh_fill_post_attrs also fails at that point, then fh_post_saved will be false and fh_pre_saved will never be set to true. So this is likely related to the disk failure that happened just before. Both operations failed, and fh_pre_saved never got set.

The code to do this changed in the last few years to lean more heavily on getattr operations to fetch this info. Prior to that, it would fetch this info directly out of the inode, so we could assert on fh_pre_saved being set like this. Now, that's no longer the case -- we need to be able to cope with this failing.

I think we probably just need to rip out the BUG_ON, but we may need to do some further work to ensure that we comply with the protocol. It's not clear to me what you're supposed to do when you can't fetch the change info. I'll have to dive into the spec.

Comment 4 Boyang Xue 2023-07-24 03:26:49 UTC

As I had not successfully reproduce this bug by loop running generic/650 sololy, I submitted 10 jobs to run full xfstests (assumed previous tests affecting generic/650) on the latest -340.el9, but there's no reproduction unfortunately.

Link to Beaker jobs - full xfstests runs:
https://url.corp.redhat.com/8c2741d

Link to Beaker jobs - generic/650 solo runs:
https://url.corp.redhat.com/44a7d4b

Comment 5 Jeff Layton 2023-07-24 12:35:08 UTC

Yeah, I think this is going to be a hard one to reliably reproduce. You basically need a disk failure to coincide with a non-O_CREAT open request on a dentry that is already in cache.

The current upstream patch and discussion is here.

    https://lore.kernel.org/linux-nfs/963cc8896ff6a6759c57f1d97e51f35972d4fc6d.camel@kernel.org/T/#t

I'll see if I can get this in for v6.5.

Comment 7 RHEL Program Management 2023-09-23 11:53:07 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 8 RHEL Program Management 2023-09-23 11:55:41 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.