Description of problem: Customer is running a netron-ha with PaceMaker that we suspect current scripts configuration does not include neutron-netns-cleanup scripts: Customer knows how manually remove the duplicate one but we are looking for a better solution for their production enviromnet some ideas coming from them,: -reconfigured PaceMaker to invoke neutron-netns-cleanup when failing over, or enabling router_delete_namespaces = True in our L3 agents, or combined. Usually we expect this kind of top should happen in three conditions: 1) When the node is set off the cluster 2) When the neutron-agent resources are took off the node. 3) If the neutron-netns-cleanup script is installed as a service it will clean up all netns namespaces during reboot/poweroff/halt or leaving the programmed runlevels. we have below doubts, - Will it work if l3_agent is failed over to a different node or only when the router is deleted? When the l3_agent is failed over, there is no l3-agent process on the failed node, so I am not sure will it clean the node or not. Packeges involved: openstack-neutron-2013.2.3-16.el6ost.noarch openstack-neutron-openvswitch-2013.2.3-16.el6ost.noarch iproute-2.6.32-130.el6ost.netns.3.x86_64 kernel-2.6.32-504.el6.x86_64 How reproducible: after pacemaker neutron node failover the l3-agents on the active server shutdown at that time and the l3-agents on the passive server started. Then it switched back 20' later. Causing a failover after deleting the namespaces on the passive node in order to test whether this is the root cause. Additional info: we need to produce feedback to customer to test in the preproduction environment, before requestiong a change maintance windown for applying in production.
I was able to reproduce this issue on my environment. A2.
[root@mac848f69fbc4c3 bin(openstack_admin)]# rpm -qa | grep neutron python-neutron-2014.1.3-11.el7ost.noarch python-neutronclient-2.3.4-3.el7ost.noarch openstack-neutron-openvswitch-2014.1.3-11.el7ost.noarch openstack-neutron-ml2-2014.1.3-11.el7ost.noarch openstack-neutron-2014.1.3-11.el7ost.noarch Currently the problem is in the puppet modules. 1.Reboot will clean the namespaces. 2.Failover or moving neutron resources from one cluster node to another will leave namespaces undeleted,so we'll have several nodes with duplicated namespaces. it's a problem with the deployment. it's cloning the neutron-*-cleanup across nodes to make it go faster but then you need this kind of manual intervention we need to ask the deployers not to clone neutron-*-cleanup resources across all nodes. -------------- Clone: neutron-ovs-cleanup-clone Resource: neutron-ovs-cleanup (class=ocf provider=neutron type=OVSCleanup) Operations: start interval=0s timeout=40 (neutron-ovs-cleanup-start-timeout-40) stop interval=0s timeout=300 (neutron-ovs-cleanup-stop-timeout-300) monitor interval=30s (neutron-ovs-cleanup-monitor-interval-30s) Clone: neutron-netns-cleanup-clone Resource: neutron-netns-cleanup (class=ocf provider=neutron type=Net) -------------- this is wrong configuration.
Moving this to ofi, so I can fix it.
Miguel is not on PTO today, so removing myself from needinfo list.
I don't think this is going to be fixed in quickstack, please reopen if needed.