Bug 1924363
| Summary: | nfsserver: Failure to unmount /var/lib/nfs doesn't cause stop failure | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> |
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.5 | CC: | agk, cluster-maint, fdinitto, mjuricek, pbhoite, phagara |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 8.5 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | resource-agents-4.1.1-91.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-11-09 17:26:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
ON_QA bug without Verified:Tested should be in the MODIFIED state. ON_QA bug without Verified:Tested should be in the MODIFIED state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: resource-agents security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4139 |
Description of problem: If the unbind_tree() function fails to unmount /var/lib/nfs and the rest of the stop operation succeeds, the stop operation as a whole is declared a success. The operation should fail if the RA fails to unmount /var/lib/nfs. The stop operation should arguably also fail if unbind_tree() fails to unmount the rpcpipefs_dir. ~~~ unbind_tree () { local i=1 while `mount | grep -q " on $OCF_RESKEY_rpcpipefs_dir "` && [ "$i" -le 10 ]; do ocf_log info "Stop: umount ($i/10 attempts)" umount -t rpc_pipefs $OCF_RESKEY_rpcpipefs_dir sleep 1 i=$((i + 1)) done # # Insert error check here, probably if is_bound /var/lib/nfs; then umount /var/lib/nfs fi # # Insert another error check here } ~~~ Since the shared infodir is mounted on /var/lib/nfs and should reside on shared, cluster-managed storage, a resource closer to the base of the resource group is likely to fail to stop during recovery. For example, an LVM-activate resource may fail because the LV is busy (because it's mounted on /var/lib/nfs). In practical terms, this nfsserver RA misbehavior is unlikely to cause any additional impact. The desired behavior for nfsserver is a stop failure. If the stop failure doesn't occur in the nfsserver resource, then it's likely to occur farther up the chain. ----- Version-Release number of selected component (if applicable): resource-agents-4.1.1-68.el8 ----- How reproducible: Always ----- Steps to Reproduce: With /var/lib/nfs: 1. Create an ocf:heartbeat:nfsserver resource. Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/mnt/nfs_shared_infodir Operations: monitor interval=10s timeout=20s (nfs-daemon-monitor-interval-10s) start interval=0s timeout=40s (nfs-daemon-start-interval-0s) stop interval=0s timeout=20s (nfs-daemon-stop-interval-0s) 2. Hold open /var/lib/nfs. # touch /var/lib/nfs/testfile # exec 3>/var/lib/nfs/testfile 3. Stop the resource. # pcs resource debug-stop nfs-daemon With rpcpipefs_dir: As noted in the description, IMO the resource also should fail to stop if it fails to unmount the rpcpipefs_dir. However, I can't get the resource to start with a custom rpcpipefs_dir at all. It times out during start because nfs-idmapd tries to use /var/lib/nfs/rpc_pipefs instead of the directory specified in OCF_RESKEY_rpcpipefs_dir. I'm not spending further time trying to get this particular configuration to work right now. ----- Actual results: # pcs resource debug-stop nfs-daemon Operation stop for nfs-daemon (ocf:heartbeat:nfsserver) returned: 'ok' (0) Feb 02 17:30:13 INFO: Stopping NFS server ... Feb 02 17:30:13 INFO: Stop: threads Feb 02 17:30:13 INFO: Stop: rpc-statd Feb 02 17:30:13 INFO: Stop: nfs-idmapd Feb 02 17:30:13 INFO: Stop: nfs-mountd Feb 02 17:30:13 INFO: Stop: nfsdcld Feb 02 17:30:13 INFO: Stop: rpc-gssd Feb 02 17:30:13 INFO: Stop: umount (1/10 attempts) umount: /var/lib/nfs: target is busy. Feb 02 17:30:14 INFO: NFS server stopped # mount | grep /var/lib/nfs /dev/mapper/cluster_vg-cluster_lv1 on /var/lib/nfs type ext4 (rw,relatime,seclabel) ----- Expected results: The resource fails to stop.