Bug 1964475

Summary: [OSP 13] After host reboot VM goes to error state due to nova-compute EmptyCatalog error
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-novaAssignee: Lee Yarwood <lyarwood>
Status: CLOSED EOL QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: apavlovs, dasmith, eglynn, enothen, jhakimra, kchamart, lyarwood, sbauza, sgordon, vromanso
Target Milestone: asyncKeywords: Patch, TestOnly, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-17.0.13-38.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-10 17:21:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1979850    

Description David Hill 2021-05-25 14:57:13 UTC
Description of problem:
instances with encrypted volumes won't reboot without a hard reboot after a sysreq but will boot normally after a normal reboot of the hypervisor.  This is an improvement over [1] but given that hosts might not always reboot normally and might crash or lose power, this might still be a problem that requires a normal intervention in order to get VMs back after such a crash.   I suspect libvirt writes data on the disks when killed normally but that data never makes it to disk in the even of a powerloss/hard crash.   I put this issue in nova as we might be missing a bind path in the container but might as well be a by-design libvirt issue or a bug.

[1] https://bugzilla.redhat.com/1905017


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 David Hill 2021-05-25 15:36:16 UTC
Before the crash, we see /etc/libvirt/secrets/$UUID.${EXTS} ... after the crash, that file is gone.

If we normally reboot, that file stays.

I tried manually creating a secret using virsh in the nova_libvirt container and powercycled the VM ... the file remained present.   My next step is to try to reproduce this using nova ...

Comment 2 Lee Yarwood 2021-05-25 18:03:21 UTC
(In reply to David Hill from comment #1)
> Before the crash, we see /etc/libvirt/secrets/$UUID.${EXTS} ... after the
> crash, that file is gone.
> 
> If we normally reboot, that file stays.
> 
> I tried manually creating a secret using virsh in the nova_libvirt container
> and powercycled the VM ... the file remained present.   My next step is to
> try to reproduce this using nova ...

Reproducing the removal after a sysrq crash in an OSP env would be super useful. I'll try to do some background reading on how Docker is using device-mapper in OSP 13 to see if there's something we've missed to ensure these secrets get persisted all the time.

Comment 3 David Hill 2021-05-26 15:39:44 UTC
I tried reproducing this issue with a lvm backed cinder volume with luks and I wasn't able to.

This is interesting.  Is there a service that might be starting in their environment that would cleanup /etc/libvirt/secrets ?

Comment 14 Lon Hohberger 2023-03-16 10:32:39 UTC
According to our records, this should be resolved by openstack-nova-17.0.13-40.el7ost.  This build is available now.

Comment 15 Lon Hohberger 2023-07-10 17:21:17 UTC
OSP13 support officially ended on 27 June 2023