Description of problem:t Customer is testing cinder/barbican LUKS encryption. They have followed the RH guides on setting this up, the only changes we have made are the following to barbicans policies. {"creator":"role:_member_","secret:decrypt":"rule:secret_decrypt_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read or role:admin","secret:delete":"rule:secret_project_admin or rule:secret_project_match or rule:secret_project_creator or role:admin","secret:get":"rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read or role:admin"} This is to allow the member user to be a creator and consumer of barbican secrets. It also allows the cinder/nova servers to handle secrets, they needed these changes to allow live migration to work with encrypted volumes. they also have "resume_guest_on_boot" for nova set to true. Following the following procedure to test: 1. Create an encrypted boot volume 2. boot the instance from this volume 3. check instance is working 4. Power off the hypervisor (hard reboot - poweroff0 5. Power on the hypervisor At this point the following is seen in the nova-compute log: 2019-10-22 13:24:29.707 1 ERROR os_brick.encryptors [req-16cd0b51-6fd3-40b4-ac54-c4486c9d8e1b - - - - -] Failed to retrieve encryption metadata for volume <volume ID>: Unknown auth type: None (HTTP 401): Unauthorized: Unknown auth type: None (HTTP 401) Now the instance is shown as error and is now not able to be started. What we have tried 1. Reset state, reboot 2. reset state, reboot --hard Also at the same time, a virsh list --all seems to show that the instance has been removed completely from the hypervisor, and as far as we can see the domain xml file is also missing. A soft reboot (os shutdown) works as expected. Version-Release number of selected component (if applicable): OSP13 How reproducible: Always Steps to Reproduce: 1. Deploy instance from encrypted cinder volume 2. Perform a power off on the hypervisor by removing power (Not an os shutdown) 3. Bring host back online 4. try to power on instance Actual results: Instance does not boot, is in error state Expected results: instance booting normally Additional info:
What version of Nova is installed here? I'd like to know if https://review.opendev.org/#/c/656464/ is present in their environment already.
one further note: Further testing shows that setting "resume_guest_on_boot" to false does not manifest the problem. If set to false, we can hard power off hosts and everything works as expected (although we obviously then have to manually restart instances). It is only when "resume_guest_on_boot" is set to true that the problem occurs.
The upstream patch Eric referenced in comment #2 seems relevant, and looks to be included in the upcoming 13z9 release (it does not appear to be in z8). I'd like the nova team to confirm all of this is true.
https://review.opendev.org/#/c/656464/ could potentially workaround this but I think the issue is slightly different. I believe the issue here is that we don't have the required user or admin context to satisfy the b-api policy at n-cpu start up, even with this change in place I think we still call out to b-api to fetch the encryption metadata so this might still continue to fail. IIRC when testing upstream with devstack I didn't hit this issue so I wonder if service tokens are the real solution here with TripleO?
I'm going to close this out as WONTFIX as the real solution here is to use service tokens which are enabled by default from OSP 16.0 onwards downstream. Outside of that in OSP 13 users will need to manually start instances using encrypted volumes after the compute has restarted in order for n-cpu to fetch encryption keys from b-api.
(In reply to Lee Yarwood from comment #7) > I'm going to close this out as WONTFIX as the real solution here is to use > service tokens which are enabled by default from OSP 16.0 onwards downstream. > > Outside of that in OSP 13 users will need to manually start instances using > encrypted volumes after the compute has restarted in order for n-cpu to > fetch encryption keys from b-api. I'm reopening this bug after some related issues have been raised upstream even with the policy changes and service tokens being enabled as discussed here.
Any update on this backport to OSP13? Would be great to be able to get my instances to auto restart on a hypervisor crash. Happy to test some stuff if needed, i have a dedicated osp13 test platform.
(In reply to Steve Relf from comment #15) > Any update on this backport to OSP13? > > Would be great to be able to get my instances to auto restart on a > hypervisor crash. > > Happy to test some stuff if needed, i have a dedicated osp13 test platform. Hey Steve, This was fixed and released as part of OSP 13 z15 via bug #1905017 (linked in the blocks field). Please let me know if you have any additional issues with that version of the fix in that bug. Regards, Lee
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543