Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: libvirtd service is running on host after upgrade and nova_libvirt container keep restarting: after running upgrade-non-controller.sh --upgrade compute-0 we can see on the compute node: [root@compute-0 heat-admin]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 97b653296f64 192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-26.10 "kolla_start" 15 hours ago Up 9 minutes nova_compute 6dd2914c9bee 192.168.24.1:8787/rhosp12/openstack-iscsid-docker:2017-07-26.10 "kolla_start" 15 hours ago Up 9 minutes iscsid e5d4c5cd5ec7 192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-26.10 "kolla_start" 15 hours ago Restarting (1) 2 minutes ago nova_libvirt [root@compute-0 heat-admin]# systemctl status libvirtd ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2017-08-01 08:33:42 UTC; 7min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 1787 (libvirtd) Memory: 2.0M CGroup: /system.slice/libvirtd.service └─1787 /usr/sbin/libvirtd Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.0-0.20170721174554.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 2. Upgrade to OSP12 3. Check containers on compute node Actual results: nova_libvirt is restarting because libvirtd service is running on the host: Running command: '/usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf' 2017-08-01 08:40:41.026+0000: 16792: info : libvirt version: 3.2.0, package: 14.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-06-21-11:12:42, x86-037.build.eng.bos.redhat.com) 2017-08-01 08:40:41.026+0000: 16792: info : hostname: compute-0.localdomain 2017-08-01 08:40:41.026+0000: 16792: error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable Expected results: libvirtd service on host is stopped and disabled and nova_libvirt containers start fine. Additional info:
o/ @mcornea - marking as triaged and first pass here, questions please: 1. can you confirm what is in your roles_data.yaml? in particular do you have disable_upgrade_deployment set and for which roles please? ` 2. can you confirm your upgrade workflow and env files (i.e. environments/major-upgrade-composable-steps-docker.yaml , then upgrade-non-controller.sh for computes afaics from comment 0 which is when this happens/is seen. So I see on master we still have the disable_upgrade_deployment flag [1] and the tripleo_upgrade_node.sh [2] is still being delivered [3] to the nodes based on that flag. There _is_ an appropriate "stop and disable libvirtd service" ansible task @ [4] but it isn't being executed during the upgrade, again because of that flag. I have just posted [5] (and adding to trackers & the upstream bug for it) which adds the systemctl stop and disable into the tripleo_upgrade_node.sh. Not sure that is all that is needed though, but its a start. In particular I'm concerned that only puppet is being executed in that tripleo_upgrade_node.sh [2] and not the docker tasks (I guess those are happening on converge?) but lets see after testing with [5] thanks, marios [1] https://github.com/openstack/tripleo-heat-templates/blob/5f313f27c9120b0e3bac905d155c2b6d234d27bb/roles/Compute.yaml#L13 [2] https://github.com/openstack/tripleo-heat-templates/blob/29a8a46d9833f095d503941d32ec500f63abf675/extraconfig/tasks/tripleo_upgrade_node.sh [3] https://github.com/openstack/tripleo-heat-templates/blob/c54e9b681b44ab962c4503cf1d88c44b683a972e/puppet/major_upgrade_steps.j2.yaml#L41 [4] https://github.com/openstack/tripleo-heat-templates/blob/a8442ba386082cef7188c3ff8001f8995b1d7ff7/docker/services/nova-libvirt.yaml#L181-L184 [5] https://review.openstack.org/489619
(In reply to marios from comment #1) > o/ @mcornea - marking as triaged and first pass here, questions please: > > 1. can you confirm what is in your roles_data.yaml? in particular do you > have disable_upgrade_deployment set and for which roles please? I was using the default roles_data.yaml provided by tht so disable_upgrade_deployment was set for compute and object store role. > 2. can you confirm your upgrade workflow and env files (i.e. > environments/major-upgrade-composable-steps-docker.yaml , then > upgrade-non-controller.sh for computes afaics from comment 0 which is when > this happens/is seen. 1st - the major-upgrade-composable-steps-docker: openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \ -e /home/stack/docker-osp12.yaml \ then compute upgrade: upgrade-non-controller.sh --upgrade compute-0 > So I see on master we still have the disable_upgrade_deployment flag [1] and > the tripleo_upgrade_node.sh [2] is still being delivered [3] to the nodes > based on that flag. There _is_ an appropriate "stop and disable libvirtd > service" ansible task @ [4] but it isn't being executed during the upgrade, > again because of that flag. > > I have just posted [5] (and adding to trackers & the upstream bug for it) > which adds the systemctl stop and disable into the tripleo_upgrade_node.sh. > Not sure that is all that is needed though, but its a start. In particular > I'm concerned that only puppet is being executed in that > tripleo_upgrade_node.sh [2] and not the docker tasks (I guess those are > happening on converge?) but lets see after testing with [5] > With the patch applied I wasn't able to reproduce the initial error anymore so it looks good.
Code merged upstream.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462