Bug 1487920 - Encrypted volume attachment to instance failed in containerize environment
Summary: Encrypted volume attachment to instance failed in containerize environment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: 12.0 (Pike)
Assignee: Eric Harney
QA Contact: Avi Avraham
Don Domingo
URL:
Whiteboard:
: 1478161 (view as bug list)
Depends On:
Blocks: 1285089 1479751
TreeView+ depends on / blocked
 
Reported: 2017-09-03 11:12 UTC by Avi Avraham
Modified: 2019-09-09 14:35 UTC (History)
29 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.3-0.20171024200824.el7ost
Doc Type: Known Issue
Doc Text:
Encrypted volumes cannot attach correctly to instances in containerized environments. The Compute service runs "cryptsetup luksOpen", which waits for the udev device creation process to finish. This process does not actually finish, which causes the command to hang. Workaround: Restart the containerized Compute service with the docker option "--ipc=host".
Clone Of:
Environment:
Last Closed: 2017-12-13 22:02:49 UTC
Target Upstream Version:


Attachments (Terms of Use)
Log from nova server (1.77 MB, text/plain)
2017-09-03 11:12 UTC, Avi Avraham
no flags Details
nova-compute debug log (919.26 KB, application/x-gzip)
2017-10-25 12:40 UTC, Avi Avraham
no flags Details
strace of cryptsetup which hangs indefinitely (123.54 KB, text/plain)
2017-10-26 14:42 UTC, Matthew Booth
no flags Details
cryptsetup luksOpen --debug output (2.18 KB, text/plain)
2017-11-01 18:52 UTC, Eric Harney
no flags Details
config and logs of verification. (552.67 KB, application/x-gzip)
2017-11-13 09:44 UTC, Tzach Shefi
no flags Details


Links
System ID Priority Status Summary Last Updated
Launchpad 1729419 None None None 2017-11-01 19:49:19 UTC
OpenStack gerrit 517096 None MERGED Set ipc=host for services attaching encrypted volumes 2020-01-24 06:16:12 UTC
OpenStack gerrit 517209 None MERGED Set ipc=host for services attaching encrypted volumes 2020-01-24 06:16:12 UTC
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Avi Avraham 2017-09-03 11:12:29 UTC
Created attachment 1321488 [details]
Log from nova server

Description of problem:
Fail to attach encrypted Cinder volume while getting error in nova log 

**********************************************************************
017-09-03 10:35:46.574 1 INFO os_vif [req-702a0989-eddc-4c79-b1d5-2eb576cf58bb - - - - -] Successfully plugged vif VIFBridge(active=True,address=fa:16:3e:f3:dd:4a,bridge_name='qbra737dff6-42',has_traffic_filtering=True,id=a737dff6-420c-4171-b141-79325dc75cbe,network=Network(5222a77c-b210-4eef-82de-8a400b167a76),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapa737dff6-42')
2017-09-03 10:35:46.629 1 WARNING nova.compute.monitors [req-702a0989-eddc-4c79-b1d5-2eb576cf58bb - - - - -] Excluding nova.compute.monitors.cpu monitor virt_driver. Not in the list of enabled monitors (CONF.compute_monitors).
2017-09-03 10:35:48.687 1 INFO nova.compute.resource_tracker [req-702a0989-eddc-4c79-b1d5-2eb576cf58bb - - - - -] Final resource view: name=compute-0.localdomain phys_ram=6143MB used_ram=4352MB phys_disk=19GB used_disk=0GB total_vcpus=2 used_vcpus=2 pci_stats=[]
2017-09-03 10:36:53.051 1 INFO nova.compute.resource_tracker [req-99a28998-ca4c-4dfd-a090-8674c7da0aeb - - - - -] Final resource view: name=compute-0.localdomain phys_ram=6143MB used_ram=4352MB phys_disk=19GB used_disk=0GB total_vcpus=2 used_vcpus=2 pci_stats=[]
2017-09-03 10:37:04.653 1 INFO nova.compute.manager [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] [instance: 0594beef-8e50-4004-aa58-57b05e3867cb] Attaching volume 1b752023-0976-492d-b93e-ad04d3076e34 to /dev/vdc
2017-09-03 10:37:34.749 1 WARNING nova.volume.cinder [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] The support for the Cinder API v2 is deprecated, please upgrade to Cinder API v3.
2017-09-03 10:37:34.844 1 INFO oslo.privsep.daemon [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] Running privsep helper: ['sudo', 'nova-rootwrap', '/etc/nova/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/nova/nova.conf', '--config-file', '/etc/nova/rootwrap.conf', '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpGevJW8/privsep.sock']
2017-09-03 10:37:35.488 1 INFO oslo.privsep.daemon [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] Spawned new privsep daemon via rootwrap
2017-09-03 10:37:35.418 108 INFO oslo.privsep.daemon [-] privsep daemon starting
2017-09-03 10:37:35.426 108 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2017-09-03 10:37:35.429 108 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
2017-09-03 10:37:35.430 108 INFO oslo.privsep.daemon [-] privsep daemon running as pid 108
2017-09-03 10:37:35.675 1 WARNING nova.volume.cinder [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] The support for the Cinder API v2 is deprecated, please upgrade to Cinder API v3.
2017-09-03 10:37:40.081 1 WARNING nova.volume.cinder [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] The support for the Cinder API v2 is deprecated, please upgrade to Cinder API v3.
2017-09-03 10:37:40.147 1 INFO os_brick.initiator.connectors.iscsi [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] Trying to connect to iSCSI portal 172.17.3.16:3260
2017-09-03 10:37:45.626 1 WARNING nova.keymgr.conf_key_mgr [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] This key manager is insecure and is not recommended for production deployments
2017-09-03 10:37:45.627 1 WARNING os_brick.encryptors [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] Use of the in tree encryptor class nova.volume.encryptors.luks.LuksEncryptor by directly referencing the implementation class will be blocked in the Queens release of os-brick.
2017-09-03 10:37:45.881 1 WARNING os_brick.encryptors.luks [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] isLuks exited abnormally (status 1): Device /dev/sdb is not a valid LUKS device.
Command failed with code 22: Invalid argument
: ProcessExecutionError: Unexpected error while running command.
Command: cryptsetup isLuks --verbose /dev/sdb
Exit code: 1
Stdout: u''
Stderr: u'Device /dev/sdb is not a valid LUKS device.\nCommand failed with code 22: Invalid argument\n'
2017-09-03 10:37:45.882 1 INFO os_brick.encryptors.luks [req-2ed95e06-e301-49dd-8be4-d442a880231e 5945076c6d1244aea334d49892f27a3c 5c4dec03342e432ebb52043b9c1d1861 - default default] /dev/sdb is not a valid LUKS device; formatting device for first use
2017-09-03 10:37:56.528 1 INFO nova.compute.resource_tracker [req-99a28998-ca4c-4dfd-a090-8674c7da0aeb - - - - -] Final resource view: name=compute-0.localdomain phys_ram=6143MB used_ram=4352MB phys_disk=19GB used_disk=0GB total_vcpus=2 used_vcpus=2 pci_stats=[]
*********************************************************************
 

Version-Release number of selected component (if applicable):
openstack-nova-placement-api-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-compute-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-conductor-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
puppet-nova-11.3.0-0.20170805105252.30a205c.el7ost.noarch
openstack-nova-common-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-api-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
python-novaclient-9.1.0-0.20170804194758.0a53d19.el7ost.noarch
python-nova-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-novncproxy-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-console-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-scheduler-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch
openstack-nova-migration-16.0.0-0.20170808024016.6d4fc21.el7ost.noarch

How reproducible:


Steps to Reproduce:
0. Create a "fixed_key" and configure it in Cinder and Nova configuration files
1.Create a Cinder encrypted volume 
# openstack type create LUKS
# cinder encryption-type-create \
--cipher aes-xts-plain64 --key_size 256 \
--control_location front-end LUKS \
nova.volume.encryptors.luks.LuksEncryptor
# openstack volume create --type LUKS --size 1 vol1  
2. Create instance  
3.Attach volume to instance  

Actual results:
volume stay in "attaching" state for ever 

Expected results:
Successfully attachment of volume to instance.  

Additional info:
This scenario was successfully tested on non containerize environment

Comment 2 Lee Yarwood 2017-09-04 14:01:04 UTC
Can we get DEBUG nova-compute logs from the environment?

Note that the test here isn't correct for testing RHBZ#1285089 that is associated with _booting_ an instance from a volume that is _already_ encrypted. We should be creating an encrypted volume and directly booting from it.

This appears to either be an issue with c-vol encrypting the original volume _or_ the n-cpu container not having access to the required underlying host block devices. Either way n-cpu DEBUG logs should show us what is at fault here.

Comment 3 Avi Avraham 2017-09-05 10:38:22 UTC
(In reply to Lee Yarwood from comment #2)
> Can we get DEBUG nova-compute logs from the environment?
> 
> Note that the test here isn't correct for testing RHBZ#1285089 that is
> associated with _booting_ an instance from a volume that is _already_
> encrypted. We should be creating an encrypted volume and directly booting
> from it.
> 
> This appears to either be an issue with c-vol encrypting the original volume
> _or_ the n-cpu container not having access to the required underlying host
> block devices. Either way n-cpu DEBUG logs should show us what is at fault
> here.

This Issue been revealed as part of test run for the following RFE on containerize environment. 
https://bugzilla.redhat.com/show_bug.cgi?id=1406802
I will setup debug, run it again and send the output.

Comment 4 Lee Yarwood 2017-09-08 14:27:56 UTC
(In reply to Avi Avraham from comment #3)
> This Issue been revealed as part of test run for the following RFE on
> containerize environment. 
> https://bugzilla.redhat.com/show_bug.cgi?id=1406802
> I will setup debug, run it again and send the output.

Thanks, adding the needinfo back in until we get logs here. 

Please flip it back to me once we have these.

Comment 5 Lee Yarwood 2017-09-28 15:27:02 UTC
Closing this out with INSUFFICIENT_DATA, please reopen if this reproduces with DEBUG nova-compute logs and I'll be happy to take another look at this.

Comment 6 Alan Bishop 2017-10-11 16:14:14 UTC
I wonder if this issue is related bug #1484467, with dealt with an IQN mismatch between services running on bare metal and containers in the same host.

Comment 7 Avi Avraham 2017-10-25 12:31:41 UTC
I reproduce the setup heating the same issue again 
Adding log files + I got a working environment to test it 
Please update if someone can take a look on the setup

Comment 8 Avi Avraham 2017-10-25 12:40:03 UTC
Created attachment 1343211 [details]
nova-compute debug log

Comment 11 Matthew Booth 2017-10-26 14:41:50 UTC
The immediate reason this error is occurring is that luksOpen hangs on the compute. When we try to attach a fresh encrypted volume to a nova instance nova first attaches the iscsi volume, which succeeds, then attaches the luks encryptor. The encryptor does the following sequence of calls:

1. cryptsetup luksOpen: this fails fast, because the the volume isn't luks formatted yet
2. cryptsetup isLuks: this reveals that it's not luks
3. cryptsetup luksFormat: this formats the volume as luks
4. cryptsetup luksOpen: this hangs indefinitely

Because it hangs rather than fails, Nova's cleanup never runs. It never detaches the iscsi volume from the compute, and therefore the volume remains in the attaching state. I don't see that there is anything either Nova or Cinder can do about this.

Note that while luksOpen is hung and never returns, the encrypted block device is created on the compute and appears to be functional:

========
()[root@compute-0 /]# ls /dev/mapper/
control  crypt-scsi-360014059815b9b9aaf94198b915113ee
()[root@compute-0 /]# mke2fs /dev/mapper/crypt-scsi-360014059815b9b9aaf94198b915113ee
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=16 blocks
65408 inodes, 261632 blocks
13081 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
8176 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
========

I replaced /usr/sbin/cryptsetup with a wrapper which invokes it via strace. I've attached the full log of that to the bug, but the pertinent part seems to be:

========
ioctl(5, DM_DEV_CREATE, 0x243da70)      = 0
ioctl(5, DM_TABLE_LOAD, 0x243da70)      = 0
ioctl(5, DM_DEV_SUSPEND, 0x24399b0)     = 0
semget(0xd4d9dce, 1, 0)                 = 0
semctl(0, 0, GETVAL, 0xffffffff)        = 2
semop(0, [{0, -1, IPC_NOWAIT}], 1)      = 0
semop(0, [{0, 0, 0}], 1
========

A search revealed this post, which seems to describe a very similar/identical issue:

https://github.com/moby/moby/issues/22025

I'm going to change the component here to os-brick as that's the immediate high-level component causing the issue. However, I suspect the actual issue is at a lower layer.

Comment 12 Matthew Booth 2017-10-26 14:42:43 UTC
Created attachment 1343802 [details]
strace of cryptsetup which hangs indefinitely

Comment 13 Lee Yarwood 2017-11-01 12:38:36 UTC
*** Bug 1478161 has been marked as a duplicate of this bug. ***

Comment 14 Eric Harney 2017-11-01 18:52:52 UTC
Created attachment 1346673 [details]
cryptsetup luksOpen --debug output

Output of a cryptsetup run that hangs with --debug.

Comment 16 Eric Harney 2017-11-01 19:32:11 UTC
Launching the nova compute container with the docker "--ipc=host" option seems to resolve this issue.

Both the manual cryptsetup luksOpen test from comment #14 succeeded, as did an attach of an encrypted Cinder volume.

Comment 21 Avi Avraham 2017-11-07 15:31:16 UTC
The bug was verified as cherry pick 
RPM was manually installed 
Polarion test run 
https://polarion.engineering.redhat.com/polarion/#/project/RHELOpenStackPlatform/testrun?id=20170813-1217
waiting for the package to be part of rhos12 puddle

Comment 26 Tzach Shefi 2017-11-13 09:04:30 UTC
Verified on:
openstack-tripleo-heat-templates-7.0.3-0.20171024200825.el7ost.noarch

Test on a containerized system, works as expected.
1. Created and successfully attached two Cinder volumes encrypted and none encrypted to an instance 

(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+------------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name             | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+------------------+------+-------------+----------+--------------------------------------+
| 6c33b84b-14c2-4eb8-a9a4-12b7457c4310 | in-use | -                | 1    | -           | false    | 056a61f4-53b0-4dd3-8e78-77ad42901d4a |
| 985a2bc6-8f13-4d35-bd0b-1252e2269355 | in-use | encrypted volume | 1    | LUKS        | false    | 056a61f4-53b0-4dd3-8e78-77ad42901d4a |
+--------------------------------------+--------+------------------+------+-------------+----------+--------------------------------------+

2. $ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| 056a61f4-53b0-4dd3-8e78-77ad42901d4a | inst1 | ACTIVE | -          | Running     | internal=192.168.0.16, 10.0.0.212 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+


3. Wrote txt data to both volumes.

4. Synced volumes 

5. On Cinder volume grep-ed for txt data, only none encrypted data was returned.

# strings /dev/cinder-volumes/* | grep txt
tshefi.txt

# strings /dev/cinder-volumes/* | grep kuku  


Encrypted data kuku.txt isn't returned as expected. 


Encrypted volume is successfully attached to an instance on a containerized nova compute deployment.

Comment 27 Tzach Shefi 2017-11-13 09:44:40 UTC
Created attachment 1351494 [details]
config and logs of verification.

Comment 31 errata-xmlrpc 2017-12-13 22:02:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.