| Summary: | VIP network interface in the qlbaas namespaces are not recreated when the OpenStack cluster is standby/unstandby. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Pratik Pravin Bandarkar <pbandark> | ||||
| Component: | openstack-neutron | Assignee: | Nir Magnezi <nmagnezi> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Alexander Stafeyev <astafeye> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.0 (Juno) | CC: | amuller, apevec, chrisw, lhh, nyechiel, pbandark, srevivo, tfreger | ||||
| Target Milestone: | async | Keywords: | TestOnly, Unconfirmed, ZStream | ||||
| Target Release: | 6.0 (Juno) | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-neutron-2014.2.3-35.el7ost | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause:
Upon failover, the pcs cluster invoke ovs-cleanup on the standby node which removes VIP network interface in the qlbaas namespaces.
Consequence:
VIP network interface in the qlbaas namespaces are not recreated when the pcs cluster failover for the second time, now back to the original active node.
When the lbaas agent starts again, it detects that the haproxy process is still running and it attempt to _reload_pool(), which fails with the Error: 'Unable to deploy instance for pool'
Fix:
Modify the neutron-netns-cleanup script to kill haproxy, same as we do for keepalived and radvd in other cases
Result:
The agent reconfigure the interfaces.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-07-06 18:53:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Verification steps for QE: ========================== 0. Have a cluster of two lbaas agents. 1. Configure PCS to invoke neutron-netns-cleanup on the standby node, upon failover. 2. repeat the first 6 steps from comment #0 and verify VIP interfaces are in place and loadbalancing works. 3. Verify neutron-netns-cleanup did not kill any haproxy process that is not related to the lbaas agent. According to our records, this should be resolved by openstack-neutron-2014.2.3-37.el7ost. This build is available now. (In reply to Lon Hohberger from comment #24) > According to our records, this should be resolved by > openstack-neutron-2014.2.3-37.el7ost. This build is available now. Does that mean the fix is already included with "openstack-neutron-2014.2.3-37.el7ost" ? customer confirmed that the test fix solved the issue. Now he is waiting for errata/official fix. (In reply to Pratik Pravin Bandarkar from comment #25) > (In reply to Lon Hohberger from comment #24) > > According to our records, this should be resolved by > > openstack-neutron-2014.2.3-37.el7ost. This build is available now. > > Does that mean the fix is already included with > "openstack-neutron-2014.2.3-37.el7ost" ? > > That would be a yes. If you look at the 'Fixed In Version' field you'll notice that the fix is incorporated starting from openstack-neutron-2014.2.3-35.el7ost > customer confirmed that the test fix solved the issue. Now he is waiting for > errata/official fix. The fix is officially included in OSP6. Created attachment 1176009 [details]
patch
The gerrithub url seem to be broken for unknown reason.
attaching the patch to this rhbz.
Code tested on latest OSP6 puddle - openstack-neutron-2014.2.3-38.el7ost.noarch Within neutron-netns-cleanup file under - /usr/lib/ocf/lib/neutron/neutron-netns-cleanup |
Description of problem: VIP network interface in the qlbaas namespaces are not recreated when the OpenStack cluster is standby/unstandby Version-Release number of selected component (if applicable): RHOS6 How reproducible: (1) Standby OpenStack cluster [root@os3ctrl03 ~]# pcs cluster standby --all (2) Note HAProxy loadbalancer process is still running: [root@os3ctrl03 ~]# ps -ef | grep lbaas nobody 22148 1 0 10:22 ? 00:00:00 haproxy -f /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/conf -p /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/pid -sf 18150 (3) Unstandby cluster [root@os3ctrl03 ~]# pcs cluster unstandby --all (4) Note same HAProxy loadbalancer process is respawned on a new PID neutron 14529 1 11 10:29 ? 00:00:00 /usr/bin/python /usr/bin/neutron-lbaas-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /usr/share/neutron/neutron-lbaas-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/lbaas_agent.ini --config-dir /etc/neutron/conf.d/neutron-lbaas-agent --log-file /var/log/neutron/lbaas-agent.log nobody 14759 1 0 10:29 ? 00:00:00 haproxy -f /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/conf -p /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/pid -sf 22148 (5) Check network interfaces in the namespace of the loadbalancer. Note the VIP network interface is missing! [root@os3ctrl03 neutron(openstack_admin)]# ip netns exec qlbaas-e9adbe8b-fbe3-4c1d-9a49-42b6113de190 ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever (6) Note restarts of neutron-lbaas-agent do not resolve. Only workaround is described below. _______________________________________________ Steps to workaround: (1) Stop neutron-lbaas-agent pcs resource disable neutron-lbaas-agent-clone (2) Kill running haproxy loadbalancer processes [root@os3ctrl03 neutron(openstack_admin)]# ps -ef | grep lbaas nobody 9794 1 0 10:37 ? 00:00:00 haproxy -f /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/conf -p /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/pid -sf 14759 [root@os3ctrl03 neutron(openstack_admin)]# kill 9794 [root@os3ctrl03 neutron(openstack_admin)]# ps -ef | grep lbaas [root@os3ctrl03 neutron(openstack_admin)]# (3) Start neutron-lbaas-agent - agent will recreate haproxy processes and VIP interface in the qlbaas namespaces [root@os3ctrl03 neutron(openstack_admin)]# pcs resource enable neutron-lbaas-agent-clone (4) Check the namespace for network interfaces in the haproxy loadbalancer. Note the VIP network interface has returned! [root@os3ctrl03 neutron(openstack_admin)]# ps -ef | grep lbaas neutron 15123 1 8 10:49 ? 00:00:01 /usr/bin/python /usr/bin/neutron-lbaas-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /usr/share/neutron/neutron-lbaas-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/lbaas_agent.ini --config-dir /etc/neutron/conf.d/neutron-lbaas-agent --log-file /var/log/neutron/lbaas-agent.log nobody 15762 1 0 10:49 ? 00:00:00 haproxy -f /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/conf -p /var/lib/neutron/lbaas/e9adbe8b-fbe3-4c1d-9a49-42b6113de190/pid [root@os3ctrl03 neutron(openstack_admin)]# ip netns exec qlbaas-e9adbe8b-fbe3-4c1d-9a49-42b6113de190 ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 107: tap65d0c08e-e3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether fa:16:3e:bf:1d:19 brd ff:ff:ff:ff:ff:ff inet 192.168.33.33/24 brd 192.168.33.255 scope global tap65d0c08e-e3 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:febf:1d19/64 scope link valid_lft forever preferred_lft forever Actual results: VIP network interface in the qlbaas namespaces are not recreated when the OpenStack cluster is standby/unstandby. Expected results: VIP network interface in the qlbaas namespaces should get recreated when the OpenStack cluster is standby/unstandby. Additional info: lbaas is configured in pacemaker.