Bug 1594261 - [OSP14] starting nova_compute docker on a new compute node makes guest disks on nfs share read-only
Summary: [OSP14] starting nova_compute docker on a new compute node makes guest disks ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Ollie Walsh
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 1603538 1608913
TreeView+ depends on / blocked
 
Reported: 2018-06-22 13:10 UTC by Sven Michels
Modified: 2021-12-10 16:27 UTC (History)
27 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.0-0.20180919080946.0rc1.0rc1.el7ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, with shared storage for /var/lib/nova/instances, such as nfs, restarting the nova_compute container on any compute node resulted in an owner/group change of the instances virtual ephemeral disks and console.log. As a result, instances lost access to their virtual ephemeral disks and stopped working. The method to modify the ownership of the instance files in /var/lib/nova/instances have been improved to target only the necessary files/directories. There is now no loss in access to the instance files during restart of nova compute.
Clone Of:
: 1603538 1608913 (view as bug list)
Environment:
Last Closed: 2019-01-11 11:50:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1778465 0 None None None 2018-06-25 06:35:09 UTC
OpenStack gerrit 577855 0 None MERGED Improve nova statedir ownership logic 2021-02-09 17:48:42 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:50:31 UTC

Description Sven Michels 2018-06-22 13:10:10 UTC
Description of problem:
When starting nova_compute container on a compute node, which has /var/lib/nova/instances shared between multiple nodes, all running instances get their disks put into read-only mode.

Version-Release number of selected component (if applicable):


How reproducible:
Test environment:
1 Hypervisor Host for Infrastructure
2 Compute Nodes
1 NetApp (maybe other NFS servers would work, too).

All nodes in our setup where predeployed, but this should be reproduce-able on normal installations as well.

Install: RHOSP12, 3 Controllers, 2 Compute. Containerized Environment.
The NFS share was added before the actual RHOSP installation, but from my tests it should also work when adding them later.

After the installation, we just prepare one node to be able to receive VMs to ensure that the test would be like a scale (its just faster).
So: if not predeployed: stop all docker containers, add nfs share to /var/lib/nova/instances on compute1, start containers on compute1.

After that, and a bit of preparing OpenStack, you should be able to start a VM.
Inside the VM, we executed bonnie++ to put IO load on the root disk:
bonnie++ -d /root -c 1 -s 8100 -x 2000 -f -b -u root

Now add the second compute node by adding the share to the node (should work without issues). After that, start the containers, one by one, but leave nova_compute as last one to start. Before you start nova_compute, you can run something like:
while true; do date; dmesg | grep -i error; sleep 1; do
into the VMs console. This will print out the time each second and check dmesg for errors.

If you're ready, start nova_compute and watch the VM go read-only.

Steps to Reproduce:
If you did the above, easy way to reproduce:
1. stop nova_compute on node2
2. start a new VM as described above
3. start nova_compute on node2

Actual results:
Guest (we tested with different guest images!) gets IO errors and the guest root disk will go read-only until you turn it off and back on.

Expected results:
Guests on other hypervisors (but same share) are not affected at all.

Additional info:
We hit a problem in one of customers environments when we scaled two new compute nodes into the environment. The OpenStack environment consists of four availability zones, two of them got a node added. After the scaling, we got reports about VMs reporting their root disks read-only. We noticed that out of the four zones only two where affected. But it affected all VMs on each zone.
So after some investigation, we noticed that the NFS share is the only commonality between all nodes/VMs. The NFS share is added before RHOSP is installed, so its not part of the RHOSP installation. For that reason, we removed the share from a new compute node, scaled that node into the environment and nothing happened. So it must be somehow related to the share. For that reason we started to test only related takss to the share. And finally we where able to reproduce it by simply starting nova_compute.

The customer plans to put RHOSP12 into production starting beginning of July. This one would currently be a killer.

Some Infos:
# docker start nova_compute
nova_compute
# 

/var/log/messages:
2018-06-22 14:07:57 +02:00 compute02 kern.warning kernel: [79615.684979] overlayfs: upperdir is in-use by another mount, accessing files from both mounts will result in undefined behavior.
2018-06-22 14:07:57 +02:00 compute02 kern.warning kernel: [79615.684992] overlayfs: workdir is in-use by another mount, accessing files from both mounts will result in undefined behavior.
2018-06-22 14:07:57 +02:00 compute02 user.debug oci-systemd-hook[304421]:  systemdhook <debug>: ae399472d6c8: Skipping as container command is kolla_start, not init or systemd
2018-06-22 14:07:57 +02:00 compute02 user.debug oci-umount[304422]:  umounthook <debug>: prestart container_id:ae399472d6c8 rootfs:/var/lib/docker/overlay2/6324b25985ec8262ad5bf750e9ae3fb96dda4b45a15901ff5242b71c77d93498/merged


Same moment on the Guest:
/var/log/messages:
2018-06-22 14:07:57 +02:00 (none) kern.err kernel: [ 1760.124936] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:57 +02:00 (none) kern.err kernel: [ 1760.124940] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:57 +02:00 (none) kern.warning kernel: [ 1760.124942] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.353043] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.353046] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.353048] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.353498] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.353500] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.353501] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.353916] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.353917] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.353918] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.357556] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.357558] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.357559] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.357963] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.357965] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.357966] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.358420] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.358423] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.358425] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.358932] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.358935] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.358937] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.359378] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.359380] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.359381] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.359827] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.359830] Buffer I/O error on device vda1, logical block 2223895
2018-06-22 14:07:58 +02:00 (none) kern.warning kernel: [ 1760.359832] lost page write due to I/O error on vda1
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.360345] end_request: I/O error, dev vda, sector 17793208
2018-06-22 14:07:58 +02:00 (none) kern.err kernel: [ 1760.360790] end_request: I/O error, dev vda, sector 17793208

Comment 1 Sven Michels 2018-06-22 13:19:07 UTC
The NetApp in this case is a FAS6290 running Ontap Release 8.2.3P6

Comment 2 Matthew Booth 2018-06-22 13:54:59 UTC
(In reply to Sven Michels from comment #0)
> /var/log/messages:
> 2018-06-22 14:07:57 +02:00 compute02 kern.warning kernel: [79615.684979]
> overlayfs: upperdir is in-use by another mount, accessing files from both
> mounts will result in undefined behavior.

I'm pretty sure that's the problem right there. overlayfs can't be involved here or, as it says, the behaviour is undefined. This needs to be mounted directly. Presumably docker on the source is detecting that the directory is mounted elsewhere and it marking it readonly for safety.

When qemu writes to a disk, that write needs to go direct to the filer with no overlayfs involved. However the docker containers needs to be configured to achieve that, that's what we need to do.

Presumably this is still a compute bug, but nova isn't involved.

Comment 5 Sven Michels 2018-06-22 15:44:30 UTC
As Irina pointed out, we're using RHEL7.5 and according to this doc:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.5_release_notes/technology_previews_file_systems

overlayfs is tech preview and only supported under some circumstances. One note i found:
Only XFS is currently supported for use as a lower layer file system.

The nodes are running on ext4, not sure how this is related to issue, but Irina and i agreed that this might be interesting.

Comment 6 Ollie Walsh 2018-06-22 18:08:52 UTC
Can you retest with the nfs export mounted on /var/lib/nova instead of /var/lib/nova/instances?

This appears to be safe, looking at the state_path docs in 
 https://docs.openstack.org/ocata/config-reference/compute/config-options.html: "In some scenarios (for example migrations) it makes sense to use a storage location which is shared between multiple compute hosts (for example via NFS)"

Comment 7 Sven Michels 2018-06-22 21:04:10 UTC
Hey Ollie,

i tested the suggestion. I stopped all the docker containers and VMs , created an instances director on the share, moved everything in there, unmounted the share, moved /var/lib/nova away, created new dir, set permissions, changed fstab, mounted /var/lib/nova and started docker again. I also ensured a fresh new vm was used for testing. After the VM was up&running, i started the containers on the second compute node again. The result was the same. Everything was normal till i started nova_compute container. Then the VM on the other node immediately got the disk r-o. So the change didn't change much, sorry :(

Comment 8 Daniel Walsh 2018-06-23 10:00:01 UTC
I believe this is a conflict between overlayfs and nfs that is causing this issue.

Adding Vivek to see what he thinks.

Comment 9 Luca Miccini 2018-06-23 13:28:30 UTC
I've been able to reproduce with a fresh deployment (1 controller + 2 computes) using latest docker images (12.0-20180529.1) and a rhel nfs server:

(undercloud) [stack@undercloud-12 ~]$ cat /etc/exports
/nfs/nova	*(rw,sync,no_root_squash,no_all_squash)

here what happens to a cirros instance running on compute-0:

$ 
$ while true; do echo "1" > /tmp/iotest && sleep 2; done
[ 1989.829606] end_request: I/O error, dev vda, sector 64899
[ 1989.833039] Buffer I/O error on device vda1, logical block 24417
[ 1989.833039] lost page write due to I/O error on vda1
[ 1989.833039] end_request: I/O error, dev vda, sector 46791
[ 1989.833039] Buffer I/O error on device vda1, logical block 15363
[ 1989.833039] lost page write due to I/O error on vda1
[ 1989.910484] JBD: Detected IO errors while flushing file data on vda1
[ 1989.932504] end_request: I/O error, dev vda, sector 51025
[ 1989.945224] Aborting journal on device vda1.
[ 1989.955549] EXT3-fs (vda1): error: ext3_journal_start_sb: Detected aborted journal
[ 1989.970686] end_request: I/O error, dev vda, sector 49351
[ 1989.974598] Buffer I/O error on device vda1, logical block 16643
[ 1989.974598] lost page write due to I/O error on vda1
[ 1990.006547] EXT3-fs (vda1): error: remounting filesystem read-only
[ 1990.019265] JBD: I/O error detected when updating journal superblock for vda1.
[ 1990.038305] end_request: I/O error, dev vda, sector 26931
[ 1990.042271] Buffer I/O error on device vda1, logical block 5433
[ 1990.042271] lost page write due to I/O error on vda1
[ 1990.072201] JBD: Detected IO errors while flushing file data on vda1
-sh: can't create /tmp/iotest: Read-only file system

the fs remains accessible r/w on the underlying host:

[root@overcloud-compute-0 ~]# touch /var/lib/nova/instances/a
[root@overcloud-compute-0 ~]# echo a > /var/lib/nova/instances/a 
[root@overcloud-compute-0 ~]# 

my deployment is using standard images, so xfs on the overcloud (and undercloud/nfs-server fwwi).

stopping and starting the vm restores r/w access to the root disk.

there are no entries in dmesg/messages that could correlate to this issue.

Comment 10 Luca Miccini 2018-06-23 15:14:47 UTC
I've run this on compute-0 (where the vm runs) from within /var/lib/nova/instances/ :


[root@overcloud-compute-0 instances]# while true; do echo `date` >> /tmp/output && ls -lart * >> /tmp/output && sleep 1; done

Sat Jun 23 15:04:24 UTC 2018

locks:
total 0
-rw-r--r--. 1 42436 42436  0 Jun 23 12:43 nova-e0aa1cba172506b12f79eb056c4c9ea0ae9442b7
drwxr-xr-x. 2 42436 42436 59 Jun 23 12:43 .
drwxrwxrwx. 5 42436 42436 94 Jun 23 15:01 ..

_base:
total 18176
drwxr-xr-x. 2 42436 42436       54 Jun 23 12:44 .
-rw-r--r--. 1 qemu  qemu  41126400 Jun 23 12:44 e0aa1cba172506b12f79eb056c4c9ea0ae9442b7
drwxrwxrwx. 5 42436 42436       94 Jun 23 15:01 ..

49ef880a-d19a-46ab-91f9-239975a31ddc:
total 2588
-rw-r--r--. 1 42436 42436      79 Jun 23 12:43 disk.info
drwxrwxrwx. 5 42436 42436      94 Jun 23 15:01 ..
drwxr-xr-x. 2 42436 42436      54 Jun 23 15:01 .
-rw-r--r--. 1 qemu  qemu  2686976 Jun 23 15:02 disk
-rw-------. 1 root  root    16635 Jun 23 15:04 console.log



here the vm is running fine, notice the disk is owned by qemu:qemu while console.log is root:root


I then started the nova_compute container on compute-1, and here what happens:

Sat Jun 23 15:04:56 UTC 2018

locks:
total 0
-rw-r--r--. 1 42436 42436  0 Jun 23 12:43 nova-e0aa1cba172506b12f79eb056c4c9ea0ae9442b7
drwxr-xr-x. 2 42436 42436 59 Jun 23 12:43 .
drwxrwxrwx. 5 42436 42436 94 Jun 23 15:01 ..

_base:
total 18176
drwxr-xr-x. 2 42436 42436       54 Jun 23 12:44 .
-rw-r--r--. 1 42436 42436 41126400 Jun 23 12:44 e0aa1cba172506b12f79eb056c4c9ea0ae9442b7
drwxrwxrwx. 5 42436 42436       94 Jun 23 15:01 ..

49ef880a-d19a-46ab-91f9-239975a31ddc:
total 2588
-rw-r--r--. 1 42436 42436      79 Jun 23 12:43 disk.info
drwxrwxrwx. 5 42436 42436      94 Jun 23 15:01 ..
drwxr-xr-x. 2 42436 42436      54 Jun 23 15:01 .
-rw-r--r--. 1 42436 42436 2686976 Jun 23 15:02 disk
-rw-------. 1 42436 42436   16657 Jun 23  2018 console.log

disk and console.log are chown'd to 42436:42436 (id of user nova inside the container).

stopping nova_compute + stopping/starting the vm brings things back on track:

-rw-------. 1 root  root    16684 Jun 23 15:13 console.log
-rw-r--r--. 1 qemu  qemu  2686976 Jun 23 15:13 disk
-rw-r--r--. 1 42436 42436      79 Jun 23 12:43 disk.info


Sven can you check this is actually happening on the customer environment?

Comment 11 Luca Miccini 2018-06-23 15:30:19 UTC
nova_compute container logs:

INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/libvirt/libvirtd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/libvirtd.conf to /etc/libvirt/libvirtd.conf
INFO:__main__:Deleting /etc/libvirt/passwd.db
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/passwd.db to /etc/libvirt/passwd.db
INFO:__main__:Deleting /etc/libvirt/qemu.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/libvirt/qemu.conf to /etc/libvirt/qemu.conf
INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/nova/migration/authorized_keys
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/migration/authorized_keys to /etc/nova/migration/authorized_keys
INFO:__main__:Deleting /etc/nova/migration/identity
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/migration/identity to /etc/nova/migration/identity
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf
INFO:__main__:Deleting /etc/sasl2/libvirt.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/sasl2/libvirt.conf to /etc/sasl2/libvirt.conf
INFO:__main__:Deleting /etc/ssh/sshd_config
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/ssh/sshd_config to /etc/ssh/sshd_config
INFO:__main__:Deleting /var/lib/nova/.ssh/config
INFO:__main__:Copying /var/lib/kolla/config_files/src/var/lib/nova/.ssh/config to /var/lib/nova/.ssh/config
INFO:__main__:Deleting /etc/ceph/rbdmap
INFO:__main__:Copying /var/lib/kolla/config_files/src-ceph/rbdmap to /etc/ceph/rbdmap
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/nova
INFO:__main__:Setting permission for /var/log/nova/nova-compute.log
INFO:__main__:Setting permission for /var/lib/nova
INFO:__main__:Setting permission for /var/lib/nova/buckets
INFO:__main__:Setting permission for /var/lib/nova/networks
INFO:__main__:Setting permission for /var/lib/nova/.ssh
INFO:__main__:Setting permission for /var/lib/nova/tmp
INFO:__main__:Setting permission for /var/lib/nova/keys
INFO:__main__:Setting permission for /var/lib/nova/instances
INFO:__main__:Setting permission for /var/lib/nova/.ssh/config
INFO:__main__:Setting permission for /var/lib/nova/instances/49ef880a-d19a-46ab-91f9-239975a31ddc
INFO:__main__:Setting permission for /var/lib/nova/instances/_base
INFO:__main__:Setting permission for /var/lib/nova/instances/locks
INFO:__main__:Setting permission for /var/lib/nova/instances/a
INFO:__main__:Setting permission for /var/lib/nova/instances/b
INFO:__main__:Setting permission for /var/lib/nova/instances/49ef880a-d19a-46ab-91f9-239975a31ddc/disk.info
INFO:__main__:Setting permission for /var/lib/nova/instances/49ef880a-d19a-46ab-91f9-239975a31ddc/disk
INFO:__main__:Setting permission for /var/lib/nova/instances/49ef880a-d19a-46ab-91f9-239975a31ddc/console.log
INFO:__main__:Setting permission for /var/lib/nova/instances/_base/e0aa1cba172506b12f79eb056c4c9ea0ae9442b7
INFO:__main__:Setting permission for /var/lib/nova/instances/locks/nova-e0aa1cba172506b12f79eb056c4c9ea0ae9442b7
Running command: '/usr/bin/nova-compute --config-file /etc/nova/nova.conf --config-file /etc/nova/rootwrap.conf'

IMHO kolla_start should check whether permissions are actually OK before setting them recursively. Thoughts?

Comment 12 Sven Michels 2018-06-23 15:44:18 UTC
Hey Luca,

i can confirm. I've seen this yesterday by my tests but not really taken care of. To be 100% sure, i rechecked:

[root@compute01 41ed0b4b-3445-4757-a6ff-3e13f5f43a86]# ls -la
total 70652
drwxr-xr-x. 2 42436 42436     4096 Jun 23 17:33 .
drwxr-xr-x. 6 42436 42436     4096 Jun 23 17:32 ..
-rw-------. 1 root  root         0 Jun 23 17:33 console.log
-rw-r--r--. 1 qemu  qemu  71630848 Jun 23 17:36 disk
-rw-r--r--. 1 qemu  qemu    475136 Jun 23 17:33 disk.config
-rw-r--r--. 1 42436 42436       79 Jun 23 17:32 disk.info

[root@compute02 ~]# docker start nova_compute
nova_compute
[root@compute02 ~]#

[root@compute01 41ed0b4b-3445-4757-a6ff-3e13f5f43a86]# ls -la
total 70716
drwxr-xr-x. 2 42436 42436     4096 Jun 23 17:33 .
drwxr-xr-x. 6 42436 42436     4096 Jun 23 17:32 ..
-rw-------. 1 42436 42436        0 Jun 23 17:33 console.log
-rw-r--r--. 1 42436 42436 71696384 Jun 23 17:36 disk
-rw-r--r--. 1 42436 42436   475136 Jun 23 17:33 disk.config
-rw-r--r--. 1 42436 42436       79 Jun 23 17:32 disk.info
[root@d100siul0552 41ed0b4b-3445-4757-a6ff-3e13f5f43a86]# 


So the permissions are changed after starting the compute service.

I agree that this should be done carefully, but i'm also not sure if the first set of permissions actually is the correct one. The 42436 id is usually the nova id from the container and everything owned by this id makes more sense for me at first place. Because otherwise we have 2 or 3 different owners here:
nova container id for the info file, qmeu for the disk, root for the log.

Cheers,
Sven

Comment 13 Ollie Walsh 2018-06-23 15:49:12 UTC
First set is correct, qemu run the the VMs so needs to  own the disk, libvirt (root) manages the logs.

This is the culprit - https://github.com/openstack/tripleo-heat-templates/blob/stable/pike/docker/services/nova-compute.yaml#L122

I assume this was added to handle upgrades from baremetal OSP11 to docker OSP12 as the nova uid/gids are not the same on host and kolla images. Should be safe to remove this to workaround the issue.

Comment 14 Luca Miccini 2018-06-23 16:04:52 UTC
(In reply to Ollie Walsh from comment #13)
> First set is correct, qemu run the the VMs so needs to  own the disk,
> libvirt (root) manages the logs.
> 
> This is the culprit -
> https://github.com/openstack/tripleo-heat-templates/blob/stable/pike/docker/
> services/nova-compute.yaml#L122
> 
> I assume this was added to handle upgrades from baremetal OSP11 to docker
> OSP12 as the nova uid/gids are not the same on host and kolla images. Should
> be safe to remove this to workaround the issue.

Thanks Ollie!

If it makes sense for upgrades we could maybe limit the depth of the recursiveness to "1" (not sure if feasible)? 

If we prevent chown from reaching disks/console.log it should be safe to keep.

Comment 15 Ollie Walsh 2018-06-23 16:16:23 UTC
kolla doesn't currently have a max depth an option.

Also I don't think it would be sufficient - we need a recursive chown, but only on the files currently owned by the host nova user.
We also don't need to do this on ever nova-compute start, just once during upgrade.

I'll figure something out on Monday...

Comment 16 Luca Miccini 2018-06-23 16:30:32 UTC
(In reply to Ollie Walsh from comment #15)
> kolla doesn't currently have a max depth an option.
> 
> Also I don't think it would be sufficient - we need a recursive chown, but
> only on the files currently owned by the host nova user.
> We also don't need to do this on ever nova-compute start, just once during
> upgrade.
> 
> I'll figure something out on Monday...

awesome, thank you very much! And have a nice rest of the weekend :)

@Sven:

for a quick&dirty workaround, without redeployment ad without touching the containers you can edit /var/lib/kolla/config_files/nova_compute.json

the original should look similar to:

{"config_files": [{"dest": "/", "merge": true, "source": "/var/lib/kolla/config_files/src/*", "preserve_properties": true}, {"dest": "/etc/ceph/", "merge": true, "source": "/var/lib/kolla/config_files/src-ceph/", "preserve_properties": true}], "command": "/usr/bin/nova-compute --config-file /etc/nova/nova.conf --config-file /etc/nova/rootwrap.conf", "permissions": [{"owner": "nova:nova", "path": "/var/log/nova", "recurse": true}, {"owner": "nova:nova", "path": "/var/lib/nova", "recurse": true}, {"owner": "nova:nova", "path": "/etc/ceph/ceph.client.openstack.keyring", "perm": "0600"}]}


I've modified like this:

{"config_files": [{"dest": "/", "merge": true, "source": "/var/lib/kolla/config_files/src/*", "preserve_properties": true}, {"dest": "/etc/ceph/", "merge": true, "source": "/var/lib/kolla/config_files/src-ceph/", "preserve_properties": true}], "command": "/usr/bin/nova-compute --config-file /etc/nova/nova.conf --config-file /etc/nova/rootwrap.conf", "permissions": [{"owner": "nova:nova", "path": "/var/log/nova", "recurse": true}, {"owner": "nova:nova", "path": "/var/lib/nova", "recurse": false}, {"owner": "nova:nova", "path": "/var/lib/nova/instances/*", "recurse": false}, {"owner": "nova:nova", "path": "/var/lib/nova/buckets", "recurse": false}, {"owner": "nova:nova", "path": "/var/lib/nova/keys", "recurse": false}, {"owner": "nova:nova", "path": "/var/lib/nova/networks", "recurse": false}, {"owner": "nova:nova", "path": "/var/lib/nova/tmp", "recurse": false}, {"owner": "nova:nova", "path": "/etc/ceph/ceph.client.openstack.keyring", "perm": "0600"}]}


Alternatively you can remove the /var/lib/nova paths alltogether I guess, or set recurse to false. I just included all the subdirs of /var/lib/nova for testing purposes.

Comment 17 Sven Michels 2018-06-25 09:52:46 UTC
Hey there,

as the json fix would be at risk when they run a stack update, i would like to adjust the template. So my proposal would be:
modify docker/services/nova-compute.yaml:

103             permissions:
104               - path: /var/log/nova
105                 owner: nova:nova
106                 recurse: true
107               - path: /var/lib/nova
108                 owner: nova:nova
109                 recurse: true


and remove line 109 / change it to false. This way we can deploy and the nasty part should be gone. We will test it with the customer affected and if that works for them, we're done with them, cause they can't wait for an official fix.

Ollie: would you agree with this?

Thanks and cheers,
Sven

Comment 18 Rhys Oxenham 2018-06-25 11:55:42 UTC
I have reproduced this from #16 and can confirm that dropping the recursive chown on /var/lib/nova/instances fixes this problem.

Comment 19 Matthew Booth 2018-06-25 12:21:13 UTC
After looking at a reproducer system, firstly I can confirm that overlayfs isn't involved here. This was my primary concern, as this would be a data integrity issue.

The issue appears to relate to how NFS manages open file handles when permissions change. I did the following quick reproducer:

$ touch foo; tail -f foo

# date >> foo

Observe that the unprivileged tail can read the data written by root.

# chown root.root foo; chmod 600 foo; date >> foo

Observe that even though the unprivileged tail no longer has permissions to read the file, it can still read the new data written by root.

The above test fails on NFS, though, with:

tail: error reading 'foo': Input/output error

Which matches what we see from VMs when we change the file permissions.

Comment 20 Ollie Walsh 2018-06-25 16:19:37 UTC
With the patch from https://review.openstack.org/577855:

[root@overcloud-novacompute-0 nova]# docker logs nova_statedir_owner
ownership of '/var/lib/nova' retained as nova:nova
ownership of '/var/lib/nova/buckets' retained as nova:nova
ownership of '/var/lib/nova/.ssh' retained as nova:nova
ownership of '/var/lib/nova/.ssh/config' retained as nova:nova
ownership of '/var/lib/nova/keys' retained as nova:nova
ownership of '/var/lib/nova/instances' retained as nova:nova
ownership of '/var/lib/nova/instances/_base' retained as nova:nova
ownership of '/var/lib/nova/instances/locks' retained as nova:nova
ownership of '/var/lib/nova/instances/0d0dd47b-0354-404a-9c20-a6049d5ac103' retained as nova:nova
ownership of '/var/lib/nova/instances/0d0dd47b-0354-404a-9c20-a6049d5ac103/disk.info' retained as nova:nova
ownership of '/var/lib/nova/tmp' retained as nova:nova
ownership of '/var/lib/nova/networks' retained as nova:nova
ownership of '/var/lib/nova/.bash_history' retained as nova:nova
changed ownership of '/var/lib/nova/foo' from root:root to nova:nova

Comment 27 Vivek Goyal 2018-11-05 19:23:52 UTC
(In reply to Matthew Booth from comment #2)
> (In reply to Sven Michels from comment #0)
> > /var/log/messages:
> > 2018-06-22 14:07:57 +02:00 compute02 kern.warning kernel: [79615.684979]
> > overlayfs: upperdir is in-use by another mount, accessing files from both
> > mounts will result in undefined behavior.
> 
> I'm pretty sure that's the problem right there. overlayfs can't be involved
> here or, as it says, the behaviour is undefined. This needs to be mounted
> directly. Presumably docker on the source is detecting that the directory is
> mounted elsewhere and it marking it readonly for safety.

I doubt that this is causing the problem you are seeing. Reason being that overlayfs either denies the mount or just warns (but does not make mount read-only). So while leaked mount is a problem, but that's a different issue.


What about all these errors from the disk (vda). I don't understand the configuration fully, but that seems to be part of the problem. I am assuming that VM images are over NFS and it shows up as vda in disk. So that error is happeing because NFS is read-only?

Also overlayfs should not have anything to do with NFS mount. Can somebody explain, what's the correlation here between NFS and overlayfs.

Comment 28 Ollie Walsh 2018-11-05 20:34:00 UTC
(In reply to Vivek Goyal from comment #27)
> (In reply to Matthew Booth from comment #2)
> > (In reply to Sven Michels from comment #0)
> > > /var/log/messages:
> > > 2018-06-22 14:07:57 +02:00 compute02 kern.warning kernel: [79615.684979]
> > > overlayfs: upperdir is in-use by another mount, accessing files from both
> > > mounts will result in undefined behavior.
> > 
> > I'm pretty sure that's the problem right there. overlayfs can't be involved
> > here or, as it says, the behaviour is undefined. This needs to be mounted
> > directly. Presumably docker on the source is detecting that the directory is
> > mounted elsewhere and it marking it readonly for safety.
> 
> I doubt that this is causing the problem you are seeing. Reason being that
> overlayfs either denies the mount or just warns (but does not make mount
> read-only). So while leaked mount is a problem, but that's a different issue.
> 
> 
> What about all these errors from the disk (vda). I don't understand the
> configuration fully, but that seems to be part of the problem. I am assuming
> that VM images are over NFS and it shows up as vda in disk. So that error is
> happeing because NFS is read-only?
> 
> Also overlayfs should not have anything to do with NFS mount. Can somebody
> explain, what's the correlation here between NFS and overlayfs.

Did you read on? Had nothing to do with overlayfs. The root cause was a recursive chown combined with the fact that NFS isn't POSIX (results in I/O errors on open files).

Comment 35 errata-xmlrpc 2019-01-11 11:50:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.