Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1764244

Summary: [OSP14] With NovaResumeGuestsStateOnHostBoot: True a change to /etc/sysconfig/libvirt-guests triggers stop/start of instances
Product: Red Hat OpenStack Reporter: Martin Schuppert <mschuppe>
Component: puppet-tripleoAssignee: Martin Schuppert <mschuppe>
Status: CLOSED NEXTRELEASE QA Contact: nlevinki <nlevinki>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: amodi, jjoyce, jschluet, lyarwood, mbooth, mburns, ramishra, rhos-maint, slinaber, ssmolyak, tvignaud, vkoul
Target Milestone: z5Keywords: Patch, Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-nova-13.3.2-0.20190426043946.d968cc2.el7ost puppet-tripleo-9.4.1-0.20190508182411.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1764243 Environment:
Last Closed: 2019-12-09 09:23:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1761373, 1764240, 1764243    
Bug Blocks:    

Description Martin Schuppert 2019-10-22 14:11:41 UTC
+++ This bug was initially created as a clone of Bug #1764243 +++

+++ This bug was initially created as a clone of Bug #1764240 +++

+++ This bug was initially created as a clone of Bug #1761373 +++

Description of problem:
While Rolling out newest updates from RHSOP13z8, all Instances on our Hosts were automatically Rebooted.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
Instances should not be rebooted

Additional info:


--- Additional comment from Martin Schuppert on 2019-10-21 13:47:28 UTC ---

* Ansible host prepare step which started libvirt-guests correct:
Oct 9 11:40:41 overcloud-compute-15 ansible-stat: Invoked with checksum_algorithm=sha1 get_checksum=True follow=False checksum_algo=sha1 path=/etc/systemd/system/libvirt-guests.service get_md5=None get_mime=True get_attributes=True
Oct 9 11:40:41 overcloud-compute-15 ansible-copy: Invoked with directory_mode=None force=True remote_src=None _original_basename=tmpcf2YY1 owner=None follow=False local_follow=None group=None unsafe_writes=None setype=None content=NOT_LOGGING_PARAMETER serole=None dest=/etc/
systemd/system/libvirt-guests.service selevel=None regexp=None validate=None src=/root/.ansible/tmp/ansible-tmp-1570614041.02-199123311360310/source checksum=b05237d34e522f44407f65882217e7e518b356dc seuser=None delimiter=None mode=None attributes=None backup=False
Oct 9 11:40:41 overcloud-compute-15 ansible-systemd: Invoked with no_block=False force=None name=libvirt-guests enabled=True daemon_reload=True state=started masked=None user=False
Oct 9 11:40:41 overcloud-compute-15 systemd: Reloading.
Oct 9 11:40:41 overcloud-compute-15 systemd: Started Flexible Branding Service.
Oct 9 11:40:42 overcloud-compute-15 systemd: Reloading.
Oct 9 11:40:42 overcloud-compute-15 systemd: Reached target Libvirt guests shutdown.
Oct 9 11:40:42 overcloud-compute-15 systemd: Starting Suspend/Resume Running libvirt Guests...
Oct 9 11:40:42 overcloud-compute-15 systemd: Started Flexible Branding Service.
Oct 9 11:40:42 overcloud-compute-15 systemd: Started Suspend/Resume Running libvirt Guests.
...
Oct 9 11:40:58 overcloud-compute-15 os-collect-config: TASK [is Nova Resume Guests State On Host Boot enabled] ************************
Oct 9 11:40:58 overcloud-compute-15 os-collect-config: ok: [localhost]
Oct 9 11:40:58 overcloud-compute-15 os-collect-config: TASK [libvirt-guests unit to stop nova_compute container before shutdown VMs] ***
Oct 9 11:40:58 overcloud-compute-15 os-collect-config: changed: [localhost]
Oct 9 11:40:58 overcloud-compute-15 os-collect-config: TASK [libvirt-guests enable VM shutdown on compute reboot/shutdown] ************
Oct 9 11:40:58 overcloud-compute-15 os-collect-config: changed: [localhost]

* Then puppet-tripleo got called for the added OS::TripleO::Services::NovaLibvirtGuests service to the compute role:
Oct 9 11:53:56 overcloud-compute-15 puppet-user[560442]: Compiled catalog for overcloud-compute-15.xyz in environment production in 2.96 seconds
Oct 9 11:53:57 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_Compute]/ensure) created
Oct 9 11:53:57 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Tripleo::Profile::Base::Nova::Compute::Libvirt_guests/File[/etc/systemd/system/virt-guest-shutdown.target.wants]/ensure) created
Oct 9 11:53:58 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns) executed successfully
Oct 9 11:53:58 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Tripleo::Profile::Base::Nova::Compute::Libvirt_guests/Systemd::Unit_file[paunch-container-shutdown.service]/File[/etc/systemd/system/virt-guest-shutdown.target.wants/paunch-container-shutdown.service]/ensure) created

- Note we also do a systemctl daemon reload in [1]
Oct 9 11:53:58 overcloud-compute-15 systemd: Reloading.
Oct 9 11:53:59 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Systemd::Systemctl::Daemon_reload/Exec[systemctl-daemon-reload]) Triggered 'refresh' from 1 events
Oct 9 11:53:59 overcloud-compute-15 systemd: Started Flexible Branding Service.
Oct 9 11:53:59 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Nova::Compute::Libvirt_guests/File_line[/etc/sysconfig/libvirt-guests ON_BOOT]/ensure) created
Oct 9 11:53:59 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Nova::Compute::Libvirt_guests/File_line[/etc/sysconfig/libvirt-guests ON_SHUTDOWN]/ensure) created
Oct 9 11:53:59 overcloud-compute-15 puppet-user[560442]: (/Stage[main]/Nova::Compute::Libvirt_guests/File_line[/etc/sysconfig/libvirt-guests SHUTDOWN_TIMEOUT]/ensure) created

- the libvirt-guests stop was a result of the puppet-tripleo/puppet-nova run:
Oct 9 11:53:59 overcloud-compute-15 systemd: Stopping Suspend/Resume Running libvirt Guests...
Oct 9 11:54:05 overcloud-compute-15 journal: 2019-10-09 09:54:05.820+0000: 557266: info : libvirt version: 4.5.0, package: 23.el7_7.1 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2019-08-16-11:33:27, x86-vm-28.build.eng.bos.redhat.com)
Oct 9 11:54:05 overcloud-compute-15 journal: 2019-10-09 09:54:05.820+0000: 557266: info : hostname: overcloud-compute-15
Oct 9 11:54:05 overcloud-compute-15 journal: 2019-10-09 09:54:05.820+0000: 557266: error : virNetSocketReadWire:1806 : End of file while reading data: Input/output error
Oct 9 11:54:05 overcloud-compute-15 dockerd-current: time="2019-10-09T11:54:05.931711967+02:00" level=warning msg="dcb3e6206fe2a41e7d9888fe9a8fe0577516e50ac5f6fb18d6cb8494d0d3e26b cleanup: failed to unmount secrets: invalid argument"
Oct 9 11:54:05 overcloud-compute-15 docker: nova_compute
Oct 9 11:54:06 overcloud-compute-15 libvirt-guests.sh: Running guests on default URI: instance-0000111d, instance-00000e59, instance-000015ea, instance-00000d36
Oct 9 11:54:06 overcloud-compute-15 libvirt-guests.sh: Shutting down guests on default URI...
Oct 9 11:54:06 overcloud-compute-15 libvirt-guests.sh: Starting shutdown on guest: instance-0000111d
Oct 9 11:54:08 overcloud-compute-15 libvirt-guests.sh: Waiting for guest instance-0000111d to shut down, 300 seconds left
...

--- Additional comment from Martin Schuppert on 2019-10-22 14:04:25 UTC ---

If there is a config change to /etc/sysconfig/libvirt-guests, the service is notified
to get restarted which results in a stop of the instances on the compute via libvirt-guests.
Due to the NovaResumeGuestsStateOnHostBoot set to true the instances get later started
again by nova.

Working on a patch to remove the restart on config change as /usr/libexec/libvirt-guests.sh
sources /etc/syscontig/libvirt-guests on each run, so it is not required.