Bug 1854379 - Can't unmount bind-mounted NFS mounts with "Stale file handle"
Summary: Can't unmount bind-mounted NFS mounts with "Stale file handle"
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-07 10:43 UTC by Vadim Rutkovsky
Modified: 2021-05-25 18:04 UTC (History)
24 users (show)

Fixed In Version:
Clone Of:
: 1900239 (view as bug list)
Environment:
Last Closed: 2021-05-25 18:04:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 209399 0 None None None 2020-09-26 08:03:52 UTC

Comment 1 Jan Safranek 2020-07-13 14:42:19 UTC
Reproduced with kernel 5.7.8-200.fc32.x86_64. Reformatted for readability.

E0713 14:30:19.540124    5301 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/nfs/15bc6521-d0be-459b-8e1b-37e307510db9-pvc-cb938e62-792c-41a2-b7a8-ec9650c97d6e podName:15bc6521-d0be-459b-8e1b-37e307510db9 nodeName
:}" failed. No retries permitted until 2020-07-13 14:32:21.54008141 +0000 UTC m=+944.066210092 (durationBeforeRetry 2m2s).

Error: "error cleaning subPath mounts for volume \"test-volume\" (UniqueName: \"kubernetes.io/nfs/15bc6521-d0be-459b-8e1b-37e307510db9-pvc-cb938e62-792c-41a2-b7a8-ec9650c97d6e\") pod \"15bc6521-d0be-459b-8e1b-37e307510db9\" (UID: \"15bc6521-d0be-459b-8e1b-37e307510db9\") :
error processing /var/lib/kubelet/pods/15bc6521-d0be-459b-8e1b-37e307510db9/volume-subpaths/pvc-cb938e62-792c-41a2-b7a8-ec9650c97d6e/test-container-subpath-dynamicpv-wjts:
error cleaning subpath mount /var/lib/kubelet/pods/15bc6521-d0be-459b-8e1b-37e307510db9/volume-subpaths/pvc-cb938e62-792c-41a2-b7a8-ec9650c97d6e/
test-container-subpath-dynamicpv-wjts/0:
  unmount failed: exit status 16
  Unmounting arguments: /var/lib/kubelet/pods/15bc6521-d0be-459b-8e1b-37e307510db9/volume-subpaths/pvc-cb938e62-792c-41a2-b7a8-ec9650c97d6e/test-container-subpath-dynamicpv-wjts/0
  Output: umount.nfs4: /var/lib/kubelet/pods/15bc6521-d0be-459b-8e1b-37e307510db9/volume-subpaths/pvc-cb938e62-792c-41a2-b7a8-ec9650c97d6e/test-container-subpath-dynamicpv-wjts/0: Stale file handle

Brief summary of the test:
1. create a pod with a NFS volume, with 2 containers:
  - The first uses subdirectory of the PV as subpath
  - The seconds uses the whole volume

2. Exec into the second container and remove the subpath directory.

3. Delete the pod.

Comment 2 Jan Safranek 2020-07-17 10:41:01 UTC
Starting with 5.7.x, kernel does not allow users to unmount NFS mounts with "Stale file handle"

I tested with 5.7.4-200.fc32.x86_64, the first 5.7.x kernel in Fedora 32.

Steps to reproduce (basically, get "Stale file handle" error on bind-mounted nfs dir):

1. Use this dummy /etc/exports:
/var/tmp 127.0.0.1(rw,sync,all_squash,anonuid=1000)

2. Mount it to /mnt/test:
$ mkdir /mnt/test
$ mount localhost:/var/tmp /mnt/test

3. Bind-mount a subdirectory of it to /mnt/test2:
$ mkdir /mnt/test/reproduce
$ mkdir /mnt/test2
$ mount --bind /mnt/test/reproduce /mnt/test2

4. Remove the bind-mounted dir
$ rmdir /mnt/test/reproduce

5. Check that /mnt/test2 is not happy about that
$ ls /mnt/test2
ls: cannot access '/mnt/test2': Stale file handle

This is expected.

6. Try to unmount /mnt/test2
$ umount /mnt/test2
umount.nfs4: /mnt/test2: Stale file handle

This is not expected! There is no way how to unmount the directory. It's mounted forever. Even reboot gets stuck.


With kernel-core-5.6.19-300.fc32.x86_64 (the last 5.6.x in Fedora 32), step 6. succeeds.

Comment 3 Vadim Rutkovsky 2020-07-23 07:37:29 UTC
Steve, could you have a look? Reproducible in latest 5.7.x kernel in F32

Comment 4 Vadim Rutkovsky 2020-08-20 07:48:56 UTC
No longer happening in 5.7.15-200.fc32.x86_64

Comment 5 Vadim Rutkovsky 2020-09-23 13:53:51 UTC
I was wrong - the test didn't pass, instead it was skipped.

The issue still occurs on 5.8.10-200.fc32.x86_64

Comment 6 Colin Walters 2020-09-23 14:20:11 UTC
Probably the best way to get traction on this is to bisect it and report to the linux-nfs@ email list: https://linux-nfs.org/wiki/index.php/Main_Page

Speaking with a very broad brush, Fedora kernel BZs are mostly triaged to upstream bugs and that's the best way to address them.

Comment 7 Colin Walters 2020-09-23 14:30:56 UTC
Looking at changes and code, at a vague guess this may be related to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=779df6a5480f1307d51b66ea72352be592265cad
Specifically 
```
	if (ctx->clone_data.sb) {
		if (d_inode(fc->root)->i_fop != &nfs_dir_operations) {
			error = -ESTALE;
```

has the right appearance for this problem at least.

Comment 8 Vadim Rutkovsky 2020-10-09 15:29:57 UTC
Still occurs on 5.8.10-200.fc32.x86_64 (same test failing in https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/promote-release-openshift-okd-machine-os-content-e2e-gcp-4.5/1314504432246853632)

Comment 9 Vadim Rutkovsky 2020-11-12 15:57:22 UTC
kernel-5.9.8-200.fc33 from updates-testing is also affected

Comment 10 Fedora Program Management 2021-04-29 17:13:26 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Ben Cotton 2021-05-25 18:04:37 UTC
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.