Bug 1001544

Summary:

nova resize fails on shared storage, qemu can't open the instance image

Product:

Red Hat OpenStack

Reporter:

Jaroslav Henner <jhenner>

Component:

openstack-nova

Assignee:

Vladan Popovic <vpopovic>

Status:

CLOSED WONTFIX

QA Contact:

Jaroslav Henner <jhenner>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.0

CC:

apevec, breeler, ddomingo, ndipanov, sclewis, vpopovic, yeylon

Target Milestone:

---

Keywords:

Regression, Reopened, ZStream

Target Release:

3.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

docs-relnotes-adhoc

Fixed In Version:

Doc Type:

Release Note

Doc Text:

When using NFS shared storage for Nova instance storage, it is advisable to mount the share with the noac or the lookupcache=none option (see the NFS man page for more details) to prevent NFS clients from caching file attributes. This will enable migration and resizing instances between compute hosts that use the shared storage, but with some slight performance penalties. In a future release of RHOS, this requirement to use noac or lookupcache=none may be removed. We will update the Release Notes to indicate when it is safe to use NFS shared storage for the Nova instance store without enabling the noac option.

Story Points:

---

Clone Of:

Clones:

1112632 1112634 (view as bug list)

Environment:

Last Closed:

2014-08-01 12:52:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1115946

Bug Blocks:

1112632, 1112634

Attachments:

Description	Flags
nova show	none
lithium.log	none

Description Jaroslav Henner 2013-08-27 09:12:16 UTC

Created attachment 790854 [details]
nova show

Description of problem:
nova resize of instances on shared /var/lib/nova/instances fails with:
Stderr: "qemu-img: Could not open \'/var/lib/nova/instances/c4beef19-c8c3-43b9-95eb-9e912a99480a/disk\'\ 

Version-Release number of selected component (if applicable):
openstack-nova-api-2013.1.3-3.el6ost.noarch

How reproducible:
always

Steps to Reproduce:
1. have /var/lib/nova/instances shared between the compute-nodes
2. have distributed ssh keys and have resizing allowed
2. nova boot cirros --image cirros1 --flavor m1.tiny
3. nova resize cirros  m1.small
4. nova show cirros

Actual results:
ERRORS in log see attachment.

Expected results:
no problem

Additional info:

Comment 1 Jaroslav Henner 2013-08-27 11:21:20 UTC

Created attachment 790907 [details]
lithium.log

logs from another reproduce

Comment 2 Jaroslav Henner 2013-08-27 12:07:48 UTC

Looks like it is a regression. It didn't reproduce in 2013-08-05.1, after doing clean install.

Comment 4 Jaroslav Henner 2013-08-27 15:34:37 UTC

Note that for seems to happen on non-shared storage as well. I am about 80% sure it does.

Comment 5 Xavier Queralt 2013-08-28 17:37:54 UTC

I've been able to reproduce it while using shared storage but it happens a bit randomly and couldn't see any pattern.

The only thing that comes into my mind right now is a possible race condition between nova and NFS causing the destination host to try a resize on an image that doesn't exist in the file system yet. The source host renames the instance directory (which contains the disks and log) and merges the disk image with the backing file to a new file (using the same old directory name). Adding some logging with the files seen in the destination host I've seen that some times, just before extending the disk it still sees the original disk (smaller size because it is backed) which has been moved to another directory (thus the access error).

On the other hand, when this error doesn't appear and the disk has been resized properly, I've noticed another possible problem with SELinux preventing libvirt/qemu from writing to the console.log (contained in an NFS share). I imagine this can be solved with a proper configuration.

I haven't been able to reproduce this when not using shared storage.

@jhenner: if possible, tomorrow I'll like to run some tests in the host you found this.

Comment 6 Xavier Queralt 2013-08-29 09:07:02 UTC

tl;dr with the proposed fix in the bottom.

After some more tests I can confirm that the bug can *always* be reproduced. I was hit by the "Observer effect" when listing the instances directory to check that the files were created. What I was really doing was to update NFS's cache for the shared directory, thus making the disk file available to nova when it was going to resize it and giving the impression it was working.

This problem was introduced in backport https://review.openstack.org/32768 which changes the behaviour of migrate/resize when using shared storage. In 2013.1.2 the disk was moved to the new host using ssh even if using shared storage (which could cause some data loss when an error happened), but in 2013.1.3 if we're using shared storage it won't send the disk to the other host but expect that it will be able to access it. In the end both are using the same storage, so what's should be the problem?

To continue we need some background on how NFS handles its client cache (which I just discovered). NFS client keeps a file cache with the file name and the inodes which, if no process asks for it before, will be refreshed on intervals of from 3 to 60 seconds (See nfs options ac[dir|reg][min|max]). So, if a process tries to access a file which has been renamed on the remote server it will be accessing the old version because the name is still associated to the old inode (cache will be updated if listing a directory but not when accessing a file)

In our case, the origin compute node renamed the instance directory to "$INSTANCE_DIR/<instance_uuid>_resize" (owned by root after qemu stops) and created the new instance disk from it in the new directory "$INSTANCE_DIR/<instance_uuid>".

From the destination host, even thought we were trying to access the new disk file in "$INSTANCE_DIR/<instance_uuid>/disk" we were still holding the old inode for that path which pointed to "$INSTANCE_DIR/<instance_uuid>_resize/disk" (owned by root, inaccessible, and the wrong image).

As a temporal workaround, we can prevent this issue by mounting the NFS share with the "noac" option which (from manpage) "forces application writes to become synchronous so that local changes to a file become visible on the server immediately."

A definitive fix would be to make the resize task to wait until the file is available in the destination host before doing any operation on it (which can take up to one minute). I'll test this on Havana too because it is likely to be affected too.

Comment 7 Xavier Queralt 2013-08-29 11:30:22 UTC

I've been able to reproduce this in Havana too.

Have two hosts installed with devstack with on a multi-node configuration and the directory /opt/stack/data/nova/instances/ shared using NFS.

When performing a resize I get the same error as reported in this BZ:

qemu-img: Could not open '/opt/stack/data/nova/instances/7dbeb7f2-39e2-4f1d-8228-0b7a84d27745/disk': Permission denied\n"

I'll open a bug upstream an propose a fix (probably wait until the file is available before doing anything with it).

Comment 8 Xavier Queralt 2013-08-29 13:10:55 UTC

Created bug upstream

Comment 12 Scott Lewis 2013-12-13 16:05:57 UTC

Adding regressions to next bugfix release

Comment 13 Alan Pevec 2014-06-02 12:36:44 UTC

Last comment from upstream review which got abandoned:

"There's a much simpler solution here. We should simply recommend "lookupcache=none" be set as an NFS mount option."

Is this better than currently suggested "noac" in the relnote here?

Comment 14 Vladan Popovic 2014-06-23 14:59:07 UTC

Sorry for the late response, forgot about this and got a notification today :/

This error is reproducable in icehouse as well, but when setting up nfs with the "lookupcache=none" or "noac" option independently in fstab nova resize works.

I really can't tell which option is better. According to the nfs manpage both methods have performance penalties. Maybe "lookupcache=none" seems like a better option (since *noac extracts a significant performance penalty*).

From the manpage:
- If lookupcache=none is specified, the client revalidates both types of directory cache entries before an application can use them. This permits quick detection of files that were created or removed by other clients, but can impact application and server performance.
- Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead.

Comment 15 Vladan Popovic 2014-06-23 15:05:49 UTC

(In reply to Vladan Popovic from comment #14)
> 
> I really can't tell which option is better. According to the nfs manpage
> both methods have performance penalties. Maybe "lookupcache=none" seems like
> a better option (since *noac extracts a significant performance penalty*).
> 

Some webapp benchmarks that could back this:
http://www.sebastien-han.fr/blog/2012/12/18/noac-performance-impact-on-web-applications/

Comment 16 Vladan Popovic 2014-06-23 16:12:22 UTC

It doesn't really change anything for openstack, if this benchmark means something that is:

$ for i in {1..5}; do time nova resize test$i m1.mini --poll; done

-----------------------------------------
noac              | lookupcache=none
-----------------------------------------
real    0m11.421s | real    0m11.413s
user    0m0.438s  | user    0m0.430s
sys     0m0.045s  | sys     0m0.048s
                  |
real    0m11.461s | real    0m11.822s
user    0m0.451s  | user    0m0.479s
sys     0m0.056s  | sys     0m0.046s
                  |
real    0m11.511s | real    0m11.734s
user    0m0.452s  | user    0m0.437s
sys     0m0.061s  | sys     0m0.048s
                  |
real    0m11.349s | real    0m11.288s
user    0m0.435s  | user    0m0.417s
sys     0m0.060s  | sys     0m0.045s
                  |
real    0m11.373s | real    0m11.365s
user    0m0.416s  | user    0m0.412s
sys     0m0.049s  | sys     0m0.052s

Comment 18 Vladan Popovic 2014-06-24 11:07:39 UTC

Hi Don,
yes, this is applicable to RHOS 4 and 5. I'll clone it to both releases on doc-Release_Notes component.

Comment 21 Jaroslav Henner 2014-07-04 09:10:45 UTC

It didn't help and I found other problem:
https://bugs.launchpad.net/nova/+bug/1337760

Comment 22 Jaroslav Henner 2014-07-04 09:12:53 UTC

I have the share mounted as:

str-02.foo.bar.redhat.com:/mnt/export/nfs/10/lithium/libvirt on /var/lib/nova/instances type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.16.32.25,lookupcache=none,local_lock=none,addr=1.2.3.204)

Comment 23 Scott Lewis 2014-08-01 12:52:38 UTC

In accordance with the Red Hat Enterprise Linux OpenStack Platform Support
Policy, the one-year life cycle of Production Support for version 3 will
end on July 31, 2014. On August 1, 2014, Red Hat Enterprise Linux OpenStack
Platform version 3 will enter an inactive state and will no longer receive
updated packages, including Critical-impact security patches or
urgent-priority bug fixes. In addition, technical support through Red Hat's
Global Support Services will no longer be provided after this date.

We encourage customers to plan their migration from Red Hat Enterprise
Linux OpenStack Platform 3.0 to a supported version of Red Hat Enterprise
Linux OpenStack Platform. To upgrade to Red Hat Enterprise Linux OpenStack
Platform version 4, see Chapter "Upgrading" in the Release Notes document
linked to in the References section.

Full details of the Red Hat Enterprise Linux OpenStack Platform Life Cycle
can be found at
https://access.redhat.com/support/policy/updates/openstack/platform/

https://rhn.redhat.com/errata/RHSA-2014-0995.html