Bug 1740069
Summary: | NovaEvacuate: InstanceHA evacuation fails with "Failed to get "write" lock Is another process using the image?" when using NFSv4 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yadnesh Kulkarni <ykulkarn> | |
Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.7 | CC: | agk, aherr, cfeist, cluster-maint, dasmith, eglynn, fdinitto, jhakimra, jjoyce, jschluet, kchamart, kmehta, lmiccini, lyarwood, mschuppe, phagara, pkomarov, sbauza, sbradley, sgordon, slinaber, tvignaud, vromanso | |
Target Milestone: | rc | Keywords: | Reopened, Triaged, ZStream | |
Target Release: | 7.9 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | resource-agents-4.1.1-40.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1755760 1756262 1775587 (view as bug list) | Environment: | ||
Last Closed: | 2020-03-31 19:47:12 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1755760, 1756262 |
Description
Yadnesh Kulkarni
2019-08-12 08:55:33 UTC
In my test environment, on all the compute nodes I remounted /var/lib/nova/instances with NFS version to 3 instead of 4, along with that I have used "nolock" option. Check below configuration from my compute node ~~~ [root@overcloud-novacomputeiha-0 ~]# cat /etc/fstab LABEL=img-rootfs / xfs defaults 0 1 192.168.122.1:/home/nova /var/lib/nova/instances nfs _netdev,bg,nolock,context=system_u:object_r:nfs_t:s0,vers=3,nfsvers=3 0 0 ~~~ With this workaround I was able to evacuate instances. (In reply to Yadnesh Kulkarni from comment #0) > 2019-08-05 09:57:39.121 1 ERROR nova.compute.manager DiskNotFound: No disk > at /var/lib/nova/instances/739e7248-6fd4-476c-8cf7-c833ae322ee4/disk That's a different disk for a different instance, we should ignore this in the context of this evacuation bug. (In reply to Yadnesh Kulkarni from comment #2) > In my test environment, on all the compute nodes I remounted > /var/lib/nova/instances with NFS version to 3 instead of 4, along with that > I have used "nolock" option. > > Check below configuration from my compute node > ~~~ > [root@overcloud-novacomputeiha-0 ~]# cat /etc/fstab > LABEL=img-rootfs / xfs defaults 0 1 > 192.168.122.1:/home/nova /var/lib/nova/instances nfs > _netdev,bg,nolock,context=system_u:object_r:nfs_t:s0,vers=3,nfsvers=3 0 0 > ~~~ > > With this workaround I was able to evacuate instances. NACK, we actually want the locking provided by NFSv4 here to ensure the destination instance only starts when the source is really dead and no longer accessing the disk(s). Which NFS backend is being used here and what is the currently configured lease timeout? I assume the instance is being evacuated before the lease held by the original compute has timed out. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1067 |