Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1279652

Summary: Overcloud instances lose floating IP connectivity during update from 7.0 to 7.1
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: James Slagle <jslagle>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: augol, calfonso, dmacpher, kbasil, mburns, ohochman, rhel-osp-director-maint
Target Milestone: y2Keywords: TestOnly, Triaged
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Orphaned OpenStack Networking L3 agent keepalived processes were left running by OpenStack Networking's "netns-cleanup" script. As a result, the OpenStack Networking tenant router failover did not work during the Controller node update in the Overcloud. This fix ensures the keepalived processes are cleaned up properly during the Controller node update. As a result, OpenStack Networking tenant router failover works normally and the high availability of the tenant network is preserved.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-21 16:58:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
controller0 sosreport
none
controller1 sosreport
none
controller2
none
update.yaml none

Description Marius Cornea 2015-11-09 23:49:27 UTC
Description of problem:
During update from 7.0 to 7.1 on HA deployment(3 x ctrls + 1 x compute) with network isolation the overcloud instances lose connectivity via their floating IPs.

Steps to Reproduce:
1. Deploy 7.0
openstack overcloud deploy --templates ~/templates-7.0/my-overcloud -e ~/templates-7.0/my-overcloud/environments/network-isolation.yaml -e ~/templates-7.0/network-environment.yaml  --control-scale 3 --compute-scale 1 --ntp-server clock.redhat.com --libvirt-type qemu

2. Create external network, tenant network, router on the overcloud network.

3. Boot instance and assign it a floating IP on the external network 

4. Update undercloud to 7.1

5. Update stack
openstack overcloud update stack overcloud -i --templates ~/templates-7.1/my-overcloud -e ~/templates-7.1/my-overcloud/overcloud-resource-registry-puppet.yaml -e ~/templates-7.1/my-overcloud/environments/network-isolation.yaml -e ~/templates-7.1/network-environment.yaml  -e ~/templates-7.1/update.yaml

6. Check connectivity to the instance floating IP

Actual results:
There's no connectivity during the update.

Expected results:
The instance keeps connectivity during the update.

Additional info:
I see the l3 agent ha_state in different states during update: active on 2 nodes, active on all of the nodes.

At the end the update fails and there's a single alive l3 agent but it's got a standby ha_state. 

stack@instack:~>>> neutron l3-agent-list-hosting-router tenant-router
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| aef76e90-1157-46cf-a805-2059975639f8 | overcloud-controller-1.localdomain | True           | :-)   | active   |
| 2b195c24-65a1-4923-a506-5dba365d7d5c | overcloud-controller-0.localdomain | True           | :-)   | standby  |
| f8e2331e-fc16-4c7b-a8f3-4a51fd0e3406 | overcloud-controller-2.localdomain | True           | :-)   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+
stack@instack:~>>> neutron l3-agent-list-hosting-router tenant-router
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| aef76e90-1157-46cf-a805-2059975639f8 | overcloud-controller-1.localdomain | True           | :-)   | active   |
| 2b195c24-65a1-4923-a506-5dba365d7d5c | overcloud-controller-0.localdomain | True           | :-)   | active   |
| f8e2331e-fc16-4c7b-a8f3-4a51fd0e3406 | overcloud-controller-2.localdomain | True           | :-)   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+
stack@instack:~>>> neutron l3-agent-list-hosting-router tenant-router
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| aef76e90-1157-46cf-a805-2059975639f8 | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| 2b195c24-65a1-4923-a506-5dba365d7d5c | overcloud-controller-0.localdomain | True           | xxx   | active   |
| f8e2331e-fc16-4c7b-a8f3-4a51fd0e3406 | overcloud-controller-2.localdomain | True           | xxx   | standby  |
+--------------------------------------+------------------------------------+----------------+-------+----------+

Comment 2 Marius Cornea 2015-11-09 23:54:06 UTC
Created attachment 1092013 [details]
controller0 sosreport

Comment 3 Marius Cornea 2015-11-09 23:56:02 UTC
Created attachment 1092014 [details]
controller1 sosreport

Comment 4 Marius Cornea 2015-11-09 23:57:28 UTC
Created attachment 1092015 [details]
controller2

Comment 5 Marius Cornea 2015-11-09 23:58:34 UTC
Created attachment 1092019 [details]
update.yaml

Comment 7 Marius Cornea 2015-12-15 13:49:26 UTC
Results for a ping during update: 

--- 172.16.23.111 ping statistics ---
5409 packets transmitted, 5393 received, 0% packet loss, time 5413793ms
rtt min/avg/max/mdev = 0.815/1.727/7.237/0.378 ms

Comment 8 James Slagle 2015-12-15 21:20:42 UTC
hi, the doc text for this one would be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1285079. I've copied here too as well.

Comment 10 errata-xmlrpc 2015-12-21 16:58:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:2651