Bug 1692608

Summary: [RFE][OSP17] Restart nova_virtlogd makes console logs cannot be updated
Product: Red Hat OpenStack Reporter: Meiyan Zheng <mzheng>
Component: openstack-tripleo-heat-templatesAssignee: Rajesh Tailor <ratailor>
Status: CLOSED ERRATA QA Contact: Jason Grosso <jgrosso>
Severity: high Docs Contact:
Priority: low    
Version: 17.0 (Wallaby)CC: alifshit, bdobreli, dasmith, dhill, egallen, eglynn, emacchi, igallagh, jamsmith, jgrosso, jhakimra, jparker, jschluet, kchamart, lsvaty, lyarwood, mariel, mburns, mschuppe, owalsh, ratailor, sbaker, sbauza, scohen, sgordon, spower, stephenfin, vromanso
Target Milestone: gaKeywords: FutureFeature, Patch, Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-0.20220607161058.ced328c.el9ost puppet-tripleo-14.2.3-0.20220607163018.bc63c9e.el9ost openstack-tripleo-common-15.4.1-0.20220608080349.caa0c1f.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:09:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: Victoria
Embargoed:

Description Meiyan Zheng 2019-03-26 02:52:15 UTC
Description of problem:
After restarting nova_virtlogd on compute node instance is running, 
the instance on this compute node cannot write logs to console.log file anymore. 


Version-Release number of selected component (if applicable):
container image version: openstack-nova-libvirt:13.0-79.1548959794

How reproducible:

Steps to Reproduce:
1. Restart nova_virtlogd with "docker restart nova_virtlogd"
2. Reboot instance running on that compute node with running command "reboot" in instance
3. Monitor /var/lib/nova/instances/<uuid>/console.log 

Actual results:
No addition console logs in /var/lib/nova/instances/<uuid>/console.log 

Expected results:
Addition booting logs should be recorded in /var/lib/nova/instances/<uuid>/console.log 

Additional info:

Comment 3 Martin Schuppert 2019-03-29 11:57:36 UTC
The behavior is expected when you restart virtlogd in general. If there are changed to virtlogd on a life system, a signal can be send to the process to keep the current logs open instead of a full restart:

~~~
On receipt of SIGUSR1 virtlogd will re-exec() its binary, while maintaining all current logs and clients. This allows for live upgrades of the virtlogd service.
~~~

What is the use case? Updating a compute node where we get a new container version with virtlogd? If yes, then the instances should be migrated off the compute before upgrading the compute.

Comment 4 Martin Schuppert 2019-03-29 16:02:55 UTC
Dan,

from virtlogd/libvirtd site, is there anything which can be done to reopen the console logs when virtlogd gets restarted? For non container envs you could send the signal as mentioned in my last update, but when updating containers this won't work.

Thanks!

Comment 5 Daniel Berrangé 2019-03-29 16:12:51 UTC
No, this is explicitly *NOT* supported. The virtlogd daemon must *never* be stopped while there are running VMs. It is explicitly split out into a separate daemon from libvirtd so that it can upgrade its own software while keeping FDs open by re-exec'ing itself. If running virtlogd in a container this container must never be restarted while VMs are running.

AFAIK this shouldn't be a problem on OSP in general, as the recommended software upgrade process involves live migrating all VMs off to a new host, before upgrading any of the containers / software on the original host.

Comment 8 Steve Baker 2019-04-12 01:45:39 UTC
To me it sounds like virtlogd should not be managed by paunch at all, and it should be treated in a similar way to the neutron l3[1] and dhcp[2] agents.

I'll set a NEEDINFO for beagles to provide an opinion on whether the wrapper approach is appropriate for virtlogd, and how it might be done.

[1] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/l3_agent_wrappers.pp
    https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/neutron/neutron-l3-container-puppet.yaml#L172-L186
[2] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/dhcp_agent_wrappers.pp
    https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/neutron/neutron-dhcp-container-puppet.yaml

Comment 14 Brent Eagles 2019-08-27 18:46:12 UTC
If there were a per-vm process for virtlogd, neutron-esque sidecars could be the way to go. Even if it were a single process, sidecar containers would also remove the interaction with updates etc. However, we do this kind of thing in neutron because of the relationship between openstack resources and processes and the desire to preserve the data plane at all costs. If there is a 1:1 relationship between libvirtd and virtlogd and virtlogd doesn't like being restarted containers or not, I'm not sure that neutron-like sidecars are appropriate

Comment 15 David Hill 2019-11-15 18:49:23 UTC
Guys,   what if we want to change the max_size for logs and restart virtlogd on servers with SR-IOV nics (or any pci_passthrough) ... 
Is there any technical reasons that would prevent us from reloading virtlogd with some code improvment ?

If we absolutely have to reboot after restarting virtlogd, I guess we'll have to update this KCS article [1] to mention this ...

[1] https://access.redhat.com/solutions/3293321

Comment 25 spower 2022-07-05 15:13:06 UTC
TRAC team have stated there will be no RFEs in zstreams for OSP 17.0 so moving this to 17.1. Any questions please contact rhos-trac.

Comment 44 errata-xmlrpc 2023-08-16 01:09:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577