Bug 859467 - All VM's lost their routing tables
All VM's lost their routing tables
Product: Fedora
Classification: Fedora
Component: openvswitch (Show other bugs)
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Thomas Graf
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2012-09-21 11:52 EDT by Jean-Tsung Hsiao
Modified: 2014-06-18 04:31 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-10-08 09:33:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jean-Tsung Hsiao 2012-09-21 11:52:24 EDT
Description of problem:
During testing of Use Case 4.1 as described in the "Open vSwitch Requirements Document" I identified this bug.

As required by the test case, four VM's were created and partitioned into two VLAN's. The first pair can ping each other, but not to any of the second pair. The same story applies to the second pair.

But, after a while, all VM's lost their routing table, and that resulted in "network unreachable" on ping tests.

Version-Release number of selected component (if applicable):

How reproducible:
It's reproducible.

Steps to Reproduce:
1.Configure an ovsbridge (named ovsbridge0)
2.Create four VM's --- with vport vnet0, vnet1, vnet2 and vnet3 as respective interface.
3. Ran the following to add all to the ovs bridge
ovs-vsctl add-port ovsbridge0 vnet0 tag=10
ovs-vsctl add-port ovsbridge0 vnet1 tag=10
ovs-vsctl add-port ovsbridge0 vnet2 tag=20
ovs-vsctl add-port ovsbridge0 vnet3 tag=20

Actual results:
All four VM's lost their routing tables after a while --- 2 to 3 hours.

Expected results:
All routing tables should stay.

Additional info:
Comment 1 Jean-Tsung Hsiao 2012-09-21 12:41:39 EDT
Referring to the "Actual results" about, just ran an experiment and the time to reproduce can be as short as 30 minutes.


"ovs-vsctl show" indicated that vnet2 and vnet3 were having no tags at all. And their corresponding VM's were ping each other.

As an experiment, ran "ovs-vsctl del-port ovsbridge0 vnet2", and "ovs-vsctl del-port ovsbridge0 vnet3". Then, add them back with tag=20. Initially, both corresponding VM's were pinging each other. But, in about 30 minutes, pings failed with "network unreachable".
Comment 2 Thomas Graf 2012-09-21 13:11:56 EDT
(In reply to comment #1)
> Referring to the "Actual results" about, just ran an experiment and the time
> to reproduce can be as short as 30 minutes.
> Experiment:
> "ovs-vsctl show" indicated that vnet2 and vnet3 were having no tags at all.
> And their corresponding VM's were ping each other.

Just making sure I understand this correctly. After about 30 minutes ovs-vsctl show no longer lists the tag=? And this only happens for vnet2 and vnet3?

Is this correct?
Comment 3 Jean-Tsung Hsiao 2012-09-21 14:19:37 EDT
The original issue happened to all four of them. Then, I used virt-mager to "shutoff and then run" the two VM's corresponding to vnet2 and vnet3. After that the two VM's were able to each other since then.

NOTE: I left the other pair(vnet0 and vnet1) alone.

After I submitted the initial description, I ran "ovs-vsctl show" and found out vnet2 and vnet3 were having no tags while vnet0 and vnet 1 were still having tag=10. So, I realized that if I delete vnet2 and vnet3, then add them back with tag=20, I can reproduce the issue.

So, that's what I did. I deleted them, then add them back with tag=20. Initially, VM's corresponding to the two interaces were able to ping each other. But, in about 30 minutes, pings failed and "netstat -r" showed empty table.

Hopefully, you can get a better picture now.


Comment 4 Jean-Tsung Hsiao 2012-09-21 14:32:22 EDT
Hi Thomas,

Both vnet2 and vnet3 still had tag=20 after 30 minute based on "ovs-vsctl show".


Comment 5 Jean-Tsung Hsiao 2012-09-21 15:09:15 EDT

As I mentioned above, I left vnet0 and vnet1 alone last night. So, their corresponding VM's had empty routing tables and pings failed. The "ovs-vsctl" showed both had tag=10.

As an experiment, I delted them and add them back without tags. Then, at each corresponding VM, ran ifdown and ifup. Bingo! Both route tables were back and pings have been successful since then.
Comment 6 Jean-Tsung Hsiao 2012-09-24 13:28:30 EDT
Another experiment: This experiment re-produced the missing routable issue right away.

* Originally, vnet0 was added to ovsbridge0 without VLAN taging. Below is the IP routable of the corresponding VM:

root@test1232 ~]# ip route dev eth0  proto kernel  scope link  src dev eth0  scope link  metric 1002 
default via dev eth0 

* Ran "ovs-vsctl del-port ovsbridge0 vnet0"

* Ran "ovs-vsctl add-port ovsbridge0 vnet0 tag=10"

* Ran "ifdown eth0"

* Ran "ifup eth0". This failed as "ping" failed.

* "ip route" returned empty.
Comment 7 Jean-Tsung Hsiao 2012-09-26 09:59:11 EDT
I found out that the issue of losing IP route was related to DHCPREQUEST failure --- each VM's was configured using DHCP.

The log indicated that VM sent out DHCPREQUEST about every 11 minutes to renew the lease. Without tagging, the renew was successful every time. But, once the VLAN tag was turned on, the renew would fail next time. This could be due to the fact that he ACK's from DHCP server got dropped with tag on. Note: The switch likely is not setup to handle tagging.
Comment 8 Jean-Tsung Hsiao 2012-09-30 21:21:38 EDT
Configuring VM's with "--bootproto static" network option will eliminate the issue.
Comment 9 Thomas Graf 2012-10-08 09:33:27 EDT
Problem related to VLAN tagged frames being dropped on the way to the DHCP server.

Note You need to log in before you can comment on or make changes to this bug.