Bug 1022980
Summary: | quantum is creating dup NAT rules when under stress | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jaroslav Henner <jhenner> | ||||
Component: | openstack-quantum | Assignee: | Terry Wilson <twilson> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Ofer Blaut <oblaut> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.0 | CC: | adarazs, chrisw, danken, ggillies, jhenner, lpeer, psedlak, sclewis, twilson, yeylon | ||||
Target Milestone: | z4 | Keywords: | TestBlocker, ZStream | ||||
Target Release: | 3.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | network | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1038737 (view as bug list) | Environment: | |||||
Last Closed: | 2014-01-22 14:59:25 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
iptables -tnat -S -P PREROUTING ACCEPT -P POSTROUTING ACCEPT -P OUTPUT ACCEPT -N quantum-l3-agent-OUTPUT -N quantum-l3-agent-POSTROUTING -N quantum-l3-agent-PREROUTING -N quantum-l3-agent-float-snat -N quantum-l3-agent-snat -N quantum-postrouting-bottom -A PREROUTING -j quantum-l3-agent-PREROUTING -A POSTROUTING -j quantum-l3-agent-POSTROUTING -A POSTROUTING -j quantum-postrouting-bottom -A OUTPUT -j quantum-l3-agent-OUTPUT -A quantum-l3-agent-OUTPUT -d 10.1.2.203/32 -j DNAT --to-destination 172.16.0.10 -A quantum-l3-agent-OUTPUT -d 10.1.2.207/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-OUTPUT -d 10.1.2.198/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-OUTPUT -d 10.1.2.198/32 -j DNAT --to-destination 172.16.0.10 -A quantum-l3-agent-OUTPUT -d 10.1.2.202/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-OUTPUT -d 10.1.2.196/32 -j DNAT --to-destination 172.16.0.12 -A quantum-l3-agent-OUTPUT -d 10.1.2.237/32 -j DNAT --to-destination 172.16.0.16 -A quantum-l3-agent-POSTROUTING ! -i qg-19f7ff97-cc ! -o qg-19f7ff97-cc -m conntrack ! --ctstate DNAT -j ACCEPT -A quantum-l3-agent-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 -A quantum-l3-agent-PREROUTING -d 10.1.2.203/32 -j DNAT --to-destination 172.16.0.10 -A quantum-l3-agent-PREROUTING -d 10.1.2.207/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-PREROUTING -d 10.1.2.198/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-PREROUTING -d 10.1.2.198/32 -j DNAT --to-destination 172.16.0.10 -A quantum-l3-agent-PREROUTING -d 10.1.2.202/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-PREROUTING -d 10.1.2.196/32 -j DNAT --to-destination 172.16.0.12 -A quantum-l3-agent-PREROUTING -d 10.1.2.237/32 -j DNAT --to-destination 172.16.0.16 -A quantum-l3-agent-PREROUTING -d 172.16.0.1/32 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 -A quantum-l3-agent-float-snat -s 172.16.0.10/32 -j SNAT --to-source 10.1.2.203 -A quantum-l3-agent-float-snat -s 172.16.0.11/32 -j SNAT --to-source 10.1.2.207 -A quantum-l3-agent-float-snat -s 172.16.0.11/32 -j SNAT --to-source 10.1.2.198 -A quantum-l3-agent-float-snat -s 172.16.0.10/32 -j SNAT --to-source 10.1.2.198 -A quantum-l3-agent-float-snat -s 172.16.0.11/32 -j SNAT --to-source 10.1.2.202 -A quantum-l3-agent-float-snat -s 172.16.0.12/32 -j SNAT --to-source 10.1.2.196 -A quantum-l3-agent-float-snat -s 172.16.0.16/32 -j SNAT --to-source 10.1.2.237 -A quantum-l3-agent-snat -j quantum-l3-agent-float-snat -A quantum-l3-agent-snat -s 172.16.0.0/16 -j SNAT --to-source 10.1.2.204 -A quantum-postrouting-bottom -j quantum-l3-agent-snat After restart of L3 agent, all the NAT rules disappears and only the correct ones seems to re-appear after more seconds. Therefore I think the rules are correctly stored to the database, but the L3 agent misses some delete event or fails to delete the old rules. I had to modify the test script somewhat to get it to run on my setup. Diff: --- stress 2013-08-06 02:32:51.377999955 -0500 +++ stress_terry 2013-08-06 02:32:47.614999956 -0500 @@ -1,7 +1,6 @@ #!/bin/bash -x vm_count=5 -nova_boot_cmdline="--flavor=m1.tiny --image=d09e66b8-6ddb-468c-912e-a7acd34a8d32 floating_bug" -floatingip_create_cmdline="notrouted-shared" +nova_boot_cmdline="--flavor=m1.tiny --image=cirros floating_bug" vm_ids="" fips="" @@ -16,7 +15,7 @@ function create_assing_fips { for vm_id in $vm_ids; do - fip=`nova floating-ip-create | awk '/ None / { print $2 }'` + fip=`nova floating-ip-create public | awk '/ None / { print $2 }'` echo $fip fips="$fips $fip" done @@ -24,9 +23,10 @@ function randomly_assign_fips { newline=$'\n' - shuffled_vm_ids=`echo $vm_ids | replace ' ' "$newline" | sort -r` - for vm_id in $shuffled_vm_ids; do - nova add-floating-ip "$vm_id" "$fip" + shuffled_vm_ids=(`echo $vm_ids | replace ' ' "$newline" | sort -R`) + fips_arr=($fips) + for ((i=0;i<$vm_count;i++));do + nova add-floating-ip "${shuffled_vm_ids[${i}]}" "${fips_arr[${i}]}" done } @@ -34,6 +34,7 @@ for fip in $fips; do nova floating-ip-delete "$fip" done + fips="" } I have run this many, many times now and not been able to reproduce it (against openstack-quantum-2013.1.4-3.el6ost.noarch) installed via packstack --allinone. My output after running: [root@rhel-6 ~(keystone_demo)]# ip netns exec qrouter-44384dbc-81dc-44e3-8c30-7ed6301b1873 iptables -tnat -S|grep quantum-l3-agent-OUTPUT -N quantum-l3-agent-OUTPUT -A OUTPUT -j quantum-l3-agent-OUTPUT -A quantum-l3-agent-OUTPUT -d 172.24.4.232/32 -j DNAT --to-destination 10.0.0.4 -A quantum-l3-agent-OUTPUT -d 172.24.4.233/32 -j DNAT --to-destination 10.0.0.2 -A quantum-l3-agent-OUTPUT -d 172.24.4.234/32 -j DNAT --to-destination 10.0.0.7 -A quantum-l3-agent-OUTPUT -d 172.24.4.235/32 -j DNAT --to-destination 10.0.0.6 -A quantum-l3-agent-OUTPUT -d 172.24.4.236/32 -j DNAT --to-destination 10.0.0.5 jhenner: I notice that the example shown shows iptables -tnat -S to get the list of iptables rules without an ip netns. Do you have network namespace support disabled, or did you leave that out in the name of brevity? I'm pretty sure we only support using network namespaces now. Also, can you see if you can replicate this with version openstack-quantum-2013.1.4-3.el6ost.noarch on your setup? Maybe this could be related to (already fixed for Havana) https://bugzilla.redhat.com/show_bug.cgi?id=971518 - https://review.openstack.org/#/c/33254/ ? Looking at the attached launchpad bug: https://bugs.launchpad.net/neutron/+bug/1191768 it looks like someone reported afterwards that they were still seeing the issue. (In reply to Terry Wilson from comment #6) > I had to modify the test script somewhat to get it to run on my setup. Diff: > > --- stress 2013-08-06 02:32:51.377999955 -0500 > +++ stress_terry 2013-08-06 02:32:47.614999956 -0500 > @@ -1,7 +1,6 @@ > #!/bin/bash -x > vm_count=5 > -nova_boot_cmdline="--flavor=m1.tiny > --image=d09e66b8-6ddb-468c-912e-a7acd34a8d32 floating_bug" > -floatingip_create_cmdline="notrouted-shared" > +nova_boot_cmdline="--flavor=m1.tiny --image=cirros floating_bug" > > vm_ids="" > fips="" > @@ -16,7 +15,7 @@ > > function create_assing_fips { > for vm_id in $vm_ids; do > - fip=`nova floating-ip-create | awk '/ None / { print $2 }'` > + fip=`nova floating-ip-create public | awk '/ None / { print $2 }'` > echo $fip > fips="$fips $fip" > done > @@ -24,9 +23,10 @@ > > function randomly_assign_fips { > newline=$'\n' > - shuffled_vm_ids=`echo $vm_ids | replace ' ' "$newline" | sort -r` > - for vm_id in $shuffled_vm_ids; do > - nova add-floating-ip "$vm_id" "$fip" > + shuffled_vm_ids=(`echo $vm_ids | replace ' ' "$newline" | sort -R`) > + fips_arr=($fips) > + for ((i=0;i<$vm_count;i++));do > + nova add-floating-ip "${shuffled_vm_ids[${i}]}" "${fips_arr[${i}]}" > done > } > > @@ -34,6 +34,7 @@ > for fip in $fips; do > nova floating-ip-delete "$fip" > done > + fips="" > } > > > I have run this many, many times now and not been able to reproduce it > (against openstack-quantum-2013.1.4-3.el6ost.noarch) installed via packstack > --allinone. > My output after running: > > [root@rhel-6 ~(keystone_demo)]# ip netns exec > qrouter-44384dbc-81dc-44e3-8c30-7ed6301b1873 iptables -tnat -S|grep > quantum-l3-agent-OUTPUT > -N quantum-l3-agent-OUTPUT > -A OUTPUT -j quantum-l3-agent-OUTPUT > -A quantum-l3-agent-OUTPUT -d 172.24.4.232/32 -j DNAT --to-destination > 10.0.0.4 > -A quantum-l3-agent-OUTPUT -d 172.24.4.233/32 -j DNAT --to-destination > 10.0.0.2 > -A quantum-l3-agent-OUTPUT -d 172.24.4.234/32 -j DNAT --to-destination > 10.0.0.7 > -A quantum-l3-agent-OUTPUT -d 172.24.4.235/32 -j DNAT --to-destination > 10.0.0.6 > -A quantum-l3-agent-OUTPUT -d 172.24.4.236/32 -j DNAT --to-destination > 10.0.0.5 > > jhenner: I notice that the example shown shows iptables -tnat -S to get the > list of iptables rules without an ip netns. Do you have network namespace > support disabled, or did you leave that out in the name of brevity? I'm > pretty sure we only support using network namespaces now. Also, can you see > if you can replicate this with version > openstack-quantum-2013.1.4-3.el6ost.noarch on your setup? We _are_ using namespaces. The namespace selection was not copy-pasted here, so it looks like it wasn't used. I cannot reproduce with openstack-quantum-2013.1.4-3.el6ost.noarch. I experienced the issue the other day with python-neutron-2013.2-9.el6ost.noarch too, so I'm cloning the bug to RHOS 4.0. Created attachment 834390 [details]
server.log
I have just hit it on the Grizzly openstack-quantum-2013.1.4-3.el6ost.noarch:
-A quantum-l3-agent-PREROUTING -d 10.34.68.207/32 -j DNAT --to-destination 172.16.0.13
-A quantum-l3-agent-PREROUTING -d 10.34.68.207/32 -j DNAT --to-destination 172.16.0.15
-A quantum-l3-agent-OUTPUT -d 10.34.68.207/32 -j DNAT --to-destination 172.16.0.13
-A quantum-l3-agent-OUTPUT -d 10.34.68.207/32 -j DNAT --to-destination 172.16.0.15
-A quantum-l3-agent-float-snat -s 172.16.0.13/32 -j SNAT --to-source 10.34.68.207
-A quantum-l3-agent-float-snat -s 172.16.0.15/32 -j SNAT --to-source 10.34.68.207
I don't know why I was unable to reproduce it with the script I have posted before.
After discussing with Jhenner we agreed that since there are many issues with races that are being handled in Icehouse and we don't have customer using Grizzly and Neutron we can closing this bug. If the issue can be reproduced in havana or icehouse we'll report a new bug with the relevant logs. |
Created attachment 815744 [details] just a dirty stressing thingy in bash Description of problem: If bunch of floating IPs are created, assigned and then deleted recreated and reassigned, quantum often creates dup rules like -A quantum-l3-agent-OUTPUT -d 10.1.2.198/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-OUTPUT -d 10.1.2.198/32 -j DNAT --to-destination 172.16.0.10 or -A quantum-l3-agent-OUTPUT -d 10.1.2.207/32 -j DNAT --to-destination 172.16.0.11 -A quantum-l3-agent-OUTPUT -d 10.1.2.202/32 -j DNAT --to-destination 172.16.0.11 Version-Release number of selected component (if applicable): openstack-quantum-2013.1.3-1.el6ost.noarch How reproducible: 99% Steps to Reproduce: 1. reproducer attached 2. 3. Actual results: dup rules, vms unreachable. IIRC it was No route to host. Expected results: no dups, VMs connective Additional info: