Bug 1477111

Summary: OSP11 -> OSP12 upgrade: libvirtd service is running on host after upgrade and nova_libvirt container keeps restarting
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: dbecker, jfrancoa, mandreou, mbultel, mburns, morazi, owalsh, rhel-osp-director-maint, sathlang, tvignaud
Target Milestone: betaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170805163046.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 21:48:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1399762    

Description Marius Cornea 2017-08-01 08:46:33 UTC
Description of problem:
OSP11 -> OSP12 upgrade: libvirtd service is running on host after upgrade and nova_libvirt container keep restarting:

after running upgrade-non-controller.sh --upgrade compute-0 we can see on the compute node:

[root@compute-0 heat-admin]# docker ps
CONTAINER ID        IMAGE                                                                   COMMAND             CREATED             STATUS                         PORTS               NAMES
97b653296f64        192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-26.10   "kolla_start"       15 hours ago        Up 9 minutes                                       nova_compute
6dd2914c9bee        192.168.24.1:8787/rhosp12/openstack-iscsid-docker:2017-07-26.10         "kolla_start"       15 hours ago        Up 9 minutes                                       iscsid
e5d4c5cd5ec7        192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-26.10   "kolla_start"       15 hours ago        Restarting (1) 2 minutes ago                       nova_libvirt

[root@compute-0 heat-admin]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-08-01 08:33:42 UTC; 7min ago
     Docs: man:libvirtd(8)
           http://libvirt.org
 Main PID: 1787 (libvirtd)
   Memory: 2.0M
   CGroup: /system.slice/libvirtd.service
           └─1787 /usr/sbin/libvirtd

Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.0-0.20170721174554.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11
2. Upgrade to OSP12
3. Check containers on compute node

Actual results:
nova_libvirt is restarting because libvirtd service is running on the host:
Running command: '/usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf'
2017-08-01 08:40:41.026+0000: 16792: info : libvirt version: 3.2.0, package: 14.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-06-21-11:12:42, x86-037.build.eng.bos.redhat.com)
2017-08-01 08:40:41.026+0000: 16792: info : hostname: compute-0.localdomain
2017-08-01 08:40:41.026+0000: 16792: error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable


Expected results:
libvirtd service on host is stopped and disabled and nova_libvirt containers start fine.

Additional info:

Comment 1 Marios Andreou 2017-08-01 14:29:05 UTC
o/ @mcornea - marking as triaged and first pass here, questions please:

   1. can you confirm what is in your roles_data.yaml? in particular do you have disable_upgrade_deployment set and for which roles please?
`
   2. can you confirm your upgrade workflow and env files (i.e. environments/major-upgrade-composable-steps-docker.yaml , then upgrade-non-controller.sh for computes afaics from comment 0 which is when this happens/is seen. 
   
So I see on master we still have the disable_upgrade_deployment flag [1] and the tripleo_upgrade_node.sh [2] is still being delivered [3] to the nodes based on that flag. There _is_ an appropriate "stop and disable libvirtd service" ansible task @ [4] but it isn't being executed during the upgrade, again because of that flag.

I have just posted [5] (and adding to trackers & the upstream bug for it) which adds the systemctl stop and disable into the tripleo_upgrade_node.sh. Not sure that is all that is needed though, but its a start. In particular I'm concerned that only puppet is being executed in that tripleo_upgrade_node.sh [2] and not the docker tasks (I guess those are happening on converge?) but lets see after testing with [5] 

thanks, marios 

[1] https://github.com/openstack/tripleo-heat-templates/blob/5f313f27c9120b0e3bac905d155c2b6d234d27bb/roles/Compute.yaml#L13 
   
[2] https://github.com/openstack/tripleo-heat-templates/blob/29a8a46d9833f095d503941d32ec500f63abf675/extraconfig/tasks/tripleo_upgrade_node.sh

[3] https://github.com/openstack/tripleo-heat-templates/blob/c54e9b681b44ab962c4503cf1d88c44b683a972e/puppet/major_upgrade_steps.j2.yaml#L41

[4] https://github.com/openstack/tripleo-heat-templates/blob/a8442ba386082cef7188c3ff8001f8995b1d7ff7/docker/services/nova-libvirt.yaml#L181-L184

[5] https://review.openstack.org/489619

Comment 2 Marius Cornea 2017-08-02 07:36:33 UTC
(In reply to marios from comment #1)
> o/ @mcornea - marking as triaged and first pass here, questions please:
> 
>    1. can you confirm what is in your roles_data.yaml? in particular do you
> have disable_upgrade_deployment set and for which roles please?

I was using the default roles_data.yaml provided by tht so disable_upgrade_deployment was set for compute and object store role. 

>    2. can you confirm your upgrade workflow and env files (i.e.
> environments/major-upgrade-composable-steps-docker.yaml , then
> upgrade-non-controller.sh for computes afaics from comment 0 which is when
> this happens/is seen. 

1st  - the major-upgrade-composable-steps-docker:

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/docker-osp12.yaml \

then compute upgrade:

upgrade-non-controller.sh --upgrade compute-0

> So I see on master we still have the disable_upgrade_deployment flag [1] and
> the tripleo_upgrade_node.sh [2] is still being delivered [3] to the nodes
> based on that flag. There _is_ an appropriate "stop and disable libvirtd
> service" ansible task @ [4] but it isn't being executed during the upgrade,
> again because of that flag.
> 
> I have just posted [5] (and adding to trackers & the upstream bug for it)
> which adds the systemctl stop and disable into the tripleo_upgrade_node.sh.
> Not sure that is all that is needed though, but its a start. In particular
> I'm concerned that only puppet is being executed in that
> tripleo_upgrade_node.sh [2] and not the docker tasks (I guess those are
> happening on converge?) but lets see after testing with [5] 
> 

With the patch applied I wasn't able to reproduce the initial error anymore so it looks good.

Comment 3 Sofer Athlan-Guyot 2017-08-02 14:38:20 UTC
Code merged upstream.

Comment 9 errata-xmlrpc 2017-12-13 21:48:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462