Red Hat Bugzilla – Bug 859467
All VM's lost their routing tables
Last modified: 2014-06-18 04:31:27 EDT
Description of problem:
During testing of Use Case 4.1 as described in the "Open vSwitch Requirements Document" I identified this bug.
As required by the test case, four VM's were created and partitioned into two VLAN's. The first pair can ping each other, but not to any of the second pair. The same story applies to the second pair.
But, after a while, all VM's lost their routing table, and that resulted in "network unreachable" on ping tests.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Configure an ovsbridge (named ovsbridge0)
2.Create four VM's --- with vport vnet0, vnet1, vnet2 and vnet3 as respective interface.
3. Ran the following to add all to the ovs bridge
ovs-vsctl add-port ovsbridge0 vnet0 tag=10
ovs-vsctl add-port ovsbridge0 vnet1 tag=10
ovs-vsctl add-port ovsbridge0 vnet2 tag=20
ovs-vsctl add-port ovsbridge0 vnet3 tag=20
All four VM's lost their routing tables after a while --- 2 to 3 hours.
All routing tables should stay.
Referring to the "Actual results" about, just ran an experiment and the time to reproduce can be as short as 30 minutes.
"ovs-vsctl show" indicated that vnet2 and vnet3 were having no tags at all. And their corresponding VM's were ping each other.
As an experiment, ran "ovs-vsctl del-port ovsbridge0 vnet2", and "ovs-vsctl del-port ovsbridge0 vnet3". Then, add them back with tag=20. Initially, both corresponding VM's were pinging each other. But, in about 30 minutes, pings failed with "network unreachable".
(In reply to comment #1)
> Referring to the "Actual results" about, just ran an experiment and the time
> to reproduce can be as short as 30 minutes.
> "ovs-vsctl show" indicated that vnet2 and vnet3 were having no tags at all.
> And their corresponding VM's were ping each other.
Just making sure I understand this correctly. After about 30 minutes ovs-vsctl show no longer lists the tag=? And this only happens for vnet2 and vnet3?
Is this correct?
The original issue happened to all four of them. Then, I used virt-mager to "shutoff and then run" the two VM's corresponding to vnet2 and vnet3. After that the two VM's were able to each other since then.
NOTE: I left the other pair(vnet0 and vnet1) alone.
After I submitted the initial description, I ran "ovs-vsctl show" and found out vnet2 and vnet3 were having no tags while vnet0 and vnet 1 were still having tag=10. So, I realized that if I delete vnet2 and vnet3, then add them back with tag=20, I can reproduce the issue.
So, that's what I did. I deleted them, then add them back with tag=20. Initially, VM's corresponding to the two interaces were able to ping each other. But, in about 30 minutes, pings failed and "netstat -r" showed empty table.
Hopefully, you can get a better picture now.
Both vnet2 and vnet3 still had tag=20 after 30 minute based on "ovs-vsctl show".
As I mentioned above, I left vnet0 and vnet1 alone last night. So, their corresponding VM's had empty routing tables and pings failed. The "ovs-vsctl" showed both had tag=10.
As an experiment, I delted them and add them back without tags. Then, at each corresponding VM, ran ifdown and ifup. Bingo! Both route tables were back and pings have been successful since then.
Another experiment: This experiment re-produced the missing routable issue right away.
* Originally, vnet0 was added to ovsbridge0 without VLAN taging. Below is the IP routable of the corresponding VM:
root@test1232 ~]# ip route
10.10.8.0/22 dev eth0 proto kernel scope link src 10.10.10.232
169.254.0.0/16 dev eth0 scope link metric 1002
default via 10.10.11.254 dev eth0
* Ran "ovs-vsctl del-port ovsbridge0 vnet0"
* Ran "ovs-vsctl add-port ovsbridge0 vnet0 tag=10"
* Ran "ifdown eth0"
* Ran "ifup eth0". This failed as "ping 10.10.11.254" failed.
* "ip route" returned empty.
I found out that the issue of losing IP route was related to DHCPREQUEST failure --- each VM's was configured using DHCP.
The log indicated that VM sent out DHCPREQUEST about every 11 minutes to renew the lease. Without tagging, the renew was successful every time. But, once the VLAN tag was turned on, the renew would fail next time. This could be due to the fact that he ACK's from DHCP server got dropped with tag on. Note: The switch likely is not setup to handle tagging.
Configuring VM's with "--bootproto static" network option will eliminate the issue.
Problem related to VLAN tagged frames being dropped on the way to the DHCP server.