Bug 1477111 - OSP11 -> OSP12 upgrade: libvirtd service is running on host after upgrade and nova_libvirt container keeps restarting
OSP11 -> OSP12 upgrade: libvirtd service is running on host after upgrade and...
Status: ON_QA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity urgent
: ga
: 12.0 (Pike)
Assigned To: Emilien Macchi
Marius Cornea
: Triaged
Depends On:
Blocks: 1399762
  Show dependency treegraph
 
Reported: 2017-08-01 04:46 EDT by Marius Cornea
Modified: 2017-09-05 20:38 EDT (History)
10 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170805163046.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1707926 None None None 2017-08-01 10:29 EDT
OpenStack gerrit 489619 None None None 2017-08-01 10:31 EDT

  None (edit)
Description Marius Cornea 2017-08-01 04:46:33 EDT
Description of problem:
OSP11 -> OSP12 upgrade: libvirtd service is running on host after upgrade and nova_libvirt container keep restarting:

after running upgrade-non-controller.sh --upgrade compute-0 we can see on the compute node:

[root@compute-0 heat-admin]# docker ps
CONTAINER ID        IMAGE                                                                   COMMAND             CREATED             STATUS                         PORTS               NAMES
97b653296f64        192.168.24.1:8787/rhosp12/openstack-nova-compute-docker:2017-07-26.10   "kolla_start"       15 hours ago        Up 9 minutes                                       nova_compute
6dd2914c9bee        192.168.24.1:8787/rhosp12/openstack-iscsid-docker:2017-07-26.10         "kolla_start"       15 hours ago        Up 9 minutes                                       iscsid
e5d4c5cd5ec7        192.168.24.1:8787/rhosp12/openstack-nova-libvirt-docker:2017-07-26.10   "kolla_start"       15 hours ago        Restarting (1) 2 minutes ago                       nova_libvirt

[root@compute-0 heat-admin]# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-08-01 08:33:42 UTC; 7min ago
     Docs: man:libvirtd(8)
           http://libvirt.org
 Main PID: 1787 (libvirtd)
   Memory: 2.0M
   CGroup: /system.slice/libvirtd.service
           └─1787 /usr/sbin/libvirtd

Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Aug 01 08:40:49 compute-0 libvirtd[1787]: 2017-08-01 08:40:49.975+0000: 1787: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.0-0.20170721174554.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11
2. Upgrade to OSP12
3. Check containers on compute node

Actual results:
nova_libvirt is restarting because libvirtd service is running on the host:
Running command: '/usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf'
2017-08-01 08:40:41.026+0000: 16792: info : libvirt version: 3.2.0, package: 14.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-06-21-11:12:42, x86-037.build.eng.bos.redhat.com)
2017-08-01 08:40:41.026+0000: 16792: info : hostname: compute-0.localdomain
2017-08-01 08:40:41.026+0000: 16792: error : virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable


Expected results:
libvirtd service on host is stopped and disabled and nova_libvirt containers start fine.

Additional info:
Comment 1 marios 2017-08-01 10:29:05 EDT
o/ @mcornea - marking as triaged and first pass here, questions please:

   1. can you confirm what is in your roles_data.yaml? in particular do you have disable_upgrade_deployment set and for which roles please?
`
   2. can you confirm your upgrade workflow and env files (i.e. environments/major-upgrade-composable-steps-docker.yaml , then upgrade-non-controller.sh for computes afaics from comment 0 which is when this happens/is seen. 
   
So I see on master we still have the disable_upgrade_deployment flag [1] and the tripleo_upgrade_node.sh [2] is still being delivered [3] to the nodes based on that flag. There _is_ an appropriate "stop and disable libvirtd service" ansible task @ [4] but it isn't being executed during the upgrade, again because of that flag.

I have just posted [5] (and adding to trackers & the upstream bug for it) which adds the systemctl stop and disable into the tripleo_upgrade_node.sh. Not sure that is all that is needed though, but its a start. In particular I'm concerned that only puppet is being executed in that tripleo_upgrade_node.sh [2] and not the docker tasks (I guess those are happening on converge?) but lets see after testing with [5] 

thanks, marios 

[1] https://github.com/openstack/tripleo-heat-templates/blob/5f313f27c9120b0e3bac905d155c2b6d234d27bb/roles/Compute.yaml#L13 
   
[2] https://github.com/openstack/tripleo-heat-templates/blob/29a8a46d9833f095d503941d32ec500f63abf675/extraconfig/tasks/tripleo_upgrade_node.sh

[3] https://github.com/openstack/tripleo-heat-templates/blob/c54e9b681b44ab962c4503cf1d88c44b683a972e/puppet/major_upgrade_steps.j2.yaml#L41

[4] https://github.com/openstack/tripleo-heat-templates/blob/a8442ba386082cef7188c3ff8001f8995b1d7ff7/docker/services/nova-libvirt.yaml#L181-L184

[5] https://review.openstack.org/489619
Comment 2 Marius Cornea 2017-08-02 03:36:33 EDT
(In reply to marios from comment #1)
> o/ @mcornea - marking as triaged and first pass here, questions please:
> 
>    1. can you confirm what is in your roles_data.yaml? in particular do you
> have disable_upgrade_deployment set and for which roles please?

I was using the default roles_data.yaml provided by tht so disable_upgrade_deployment was set for compute and object store role. 

>    2. can you confirm your upgrade workflow and env files (i.e.
> environments/major-upgrade-composable-steps-docker.yaml , then
> upgrade-non-controller.sh for computes afaics from comment 0 which is when
> this happens/is seen. 

1st  - the major-upgrade-composable-steps-docker:

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/docker-osp12.yaml \

then compute upgrade:

upgrade-non-controller.sh --upgrade compute-0

> So I see on master we still have the disable_upgrade_deployment flag [1] and
> the tripleo_upgrade_node.sh [2] is still being delivered [3] to the nodes
> based on that flag. There _is_ an appropriate "stop and disable libvirtd
> service" ansible task @ [4] but it isn't being executed during the upgrade,
> again because of that flag.
> 
> I have just posted [5] (and adding to trackers & the upstream bug for it)
> which adds the systemctl stop and disable into the tripleo_upgrade_node.sh.
> Not sure that is all that is needed though, but its a start. In particular
> I'm concerned that only puppet is being executed in that
> tripleo_upgrade_node.sh [2] and not the docker tasks (I guess those are
> happening on converge?) but lets see after testing with [5] 
> 

With the patch applied I wasn't able to reproduce the initial error anymore so it looks good.
Comment 3 Sofer Athlan-Guyot 2017-08-02 10:38:20 EDT
Code merged upstream.

Note You need to log in before you can comment on or make changes to this bug.