Bug 859467

Summary: All VM's lost their routing tables
Product: [Fedora] Fedora Reporter: Jean-Tsung Hsiao <jhsiao>
Component: openvswitchAssignee: Thomas Graf <tgraf>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: chrisw, markmc, rkhan, tgraf
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-08 09:33:27 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Jean-Tsung Hsiao 2012-09-21 11:52:24 EDT
Description of problem:
During testing of Use Case 4.1 as described in the "Open vSwitch Requirements Document" I identified this bug.

As required by the test case, four VM's were created and partitioned into two VLAN's. The first pair can ping each other, but not to any of the second pair. The same story applies to the second pair.

But, after a while, all VM's lost their routing table, and that resulted in "network unreachable" on ping tests.

Version-Release number of selected component (if applicable):
1.17.1


How reproducible:
It's reproducible.

Steps to Reproduce:
1.Configure an ovsbridge (named ovsbridge0)
2.Create four VM's --- with vport vnet0, vnet1, vnet2 and vnet3 as respective interface.
3. Ran the following to add all to the ovs bridge
ovs-vsctl add-port ovsbridge0 vnet0 tag=10
ovs-vsctl add-port ovsbridge0 vnet1 tag=10
ovs-vsctl add-port ovsbridge0 vnet2 tag=20
ovs-vsctl add-port ovsbridge0 vnet3 tag=20

Actual results:
All four VM's lost their routing tables after a while --- 2 to 3 hours.

Expected results:
All routing tables should stay.

Additional info:
Comment 1 Jean-Tsung Hsiao 2012-09-21 12:41:39 EDT
Referring to the "Actual results" about, just ran an experiment and the time to reproduce can be as short as 30 minutes.

Experiment:

"ovs-vsctl show" indicated that vnet2 and vnet3 were having no tags at all. And their corresponding VM's were ping each other.

As an experiment, ran "ovs-vsctl del-port ovsbridge0 vnet2", and "ovs-vsctl del-port ovsbridge0 vnet3". Then, add them back with tag=20. Initially, both corresponding VM's were pinging each other. But, in about 30 minutes, pings failed with "network unreachable".
Comment 2 Thomas Graf 2012-09-21 13:11:56 EDT
(In reply to comment #1)
> Referring to the "Actual results" about, just ran an experiment and the time
> to reproduce can be as short as 30 minutes.
> 
> Experiment:
> 
> "ovs-vsctl show" indicated that vnet2 and vnet3 were having no tags at all.
> And their corresponding VM's were ping each other.

Just making sure I understand this correctly. After about 30 minutes ovs-vsctl show no longer lists the tag=? And this only happens for vnet2 and vnet3?

Is this correct?
Comment 3 Jean-Tsung Hsiao 2012-09-21 14:19:37 EDT
The original issue happened to all four of them. Then, I used virt-mager to "shutoff and then run" the two VM's corresponding to vnet2 and vnet3. After that the two VM's were able to each other since then.

NOTE: I left the other pair(vnet0 and vnet1) alone.

After I submitted the initial description, I ran "ovs-vsctl show" and found out vnet2 and vnet3 were having no tags while vnet0 and vnet 1 were still having tag=10. So, I realized that if I delete vnet2 and vnet3, then add them back with tag=20, I can reproduce the issue.

So, that's what I did. I deleted them, then add them back with tag=20. Initially, VM's corresponding to the two interaces were able to ping each other. But, in about 30 minutes, pings failed and "netstat -r" showed empty table.

Hopefully, you can get a better picture now.

Thanks!

Jean
Comment 4 Jean-Tsung Hsiao 2012-09-21 14:32:22 EDT
Hi Thomas,

Both vnet2 and vnet3 still had tag=20 after 30 minute based on "ovs-vsctl show".

Thanks!

Jean
Comment 5 Jean-Tsung Hsiao 2012-09-21 15:09:15 EDT

As I mentioned above, I left vnet0 and vnet1 alone last night. So, their corresponding VM's had empty routing tables and pings failed. The "ovs-vsctl" showed both had tag=10.

As an experiment, I delted them and add them back without tags. Then, at each corresponding VM, ran ifdown and ifup. Bingo! Both route tables were back and pings have been successful since then.
Comment 6 Jean-Tsung Hsiao 2012-09-24 13:28:30 EDT
Another experiment: This experiment re-produced the missing routable issue right away.

* Originally, vnet0 was added to ovsbridge0 without VLAN taging. Below is the IP routable of the corresponding VM:

root@test1232 ~]# ip route
10.10.8.0/22 dev eth0  proto kernel  scope link  src 10.10.10.232 
169.254.0.0/16 dev eth0  scope link  metric 1002 
default via 10.10.11.254 dev eth0 

* Ran "ovs-vsctl del-port ovsbridge0 vnet0"

* Ran "ovs-vsctl add-port ovsbridge0 vnet0 tag=10"

* Ran "ifdown eth0"

* Ran "ifup eth0". This failed as "ping 10.10.11.254" failed.

* "ip route" returned empty.
Comment 7 Jean-Tsung Hsiao 2012-09-26 09:59:11 EDT
I found out that the issue of losing IP route was related to DHCPREQUEST failure --- each VM's was configured using DHCP.

The log indicated that VM sent out DHCPREQUEST about every 11 minutes to renew the lease. Without tagging, the renew was successful every time. But, once the VLAN tag was turned on, the renew would fail next time. This could be due to the fact that he ACK's from DHCP server got dropped with tag on. Note: The switch likely is not setup to handle tagging.
Comment 8 Jean-Tsung Hsiao 2012-09-30 21:21:38 EDT
Configuring VM's with "--bootproto static" network option will eliminate the issue.
Comment 9 Thomas Graf 2012-10-08 09:33:27 EDT
Problem related to VLAN tagged frames being dropped on the way to the DHCP server.