RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 1251970 - openvswitch tunnel no communication after update to kernel 3.10.0-229.11.1.el7.x86_64
Summary: openvswitch tunnel no communication after update to kernel 3.10.0-229.11.1.el...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: RDO
Classification: Community
Component: openvswitch
Version: Kilo
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: Kilo
Assignee: Flavio Leitner
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-10 12:36 UTC by mariojmdavid
Modified: 2016-04-27 03:09 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-09-08 19:08:58 UTC
Embargoed:


Attachments (Terms of Use)

Description mariojmdavid 2015-08-10 12:36:59 UTC
Description of problem:
openstack kilo with all updates
centos7.1
openvswitch 2.3.1

with centos7 kernel 3.10.0-123.20.1.el7.x86_64 I had a working openstack infrastructure
I can instantiate a VM, it fetches metadata correctly, a private IP is correctly associated with the instance
I can

# ip netns exec qrouter-97be4b64-71d0-4443-84d0-ec8cfb9f94a4 ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.387 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.426 ms

After upgrading all openstack nodes, including the network node (neutron agents) and compute node (nova with kvm)
to kernel 3.10.0-229.11.1.el7.x86_64
no changes in the configuration whatsoever

# ip netns exec qrouter-97be4b64-71d0-4443-84d0-ec8cfb9f94a4 ping 192.168.1.3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
From 192.168.1.254 icmp_seq=1 Destination Host Unreachable
From 192.168.1.254 icmp_seq=2 Destination Host Unreachable

I have fetched openvswitch 2.4.0, and compiled it against the latest kernel as well as compiled the openvswitch kernel module
still with no success

I have taken a deep look into all logs (with debug mode), I have performed route, tcpdumps in the net namespace
and couldn't determine what was the exact problem

ping reaches the private router
]# ip netns exec qrouter-43dbd8f3-b36c-4f43-be95-91e3d92cc86a ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
64 bytes from 192.168.1.254: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 192.168.1.254: icmp_seq=2 ttl=64 time=0.025 ms

but seems the problem in the br-int does not pass any packets to the instance
doing

# ip netns exec qrouter-43dbd8f3-b36c-4f43-be95-91e3d92cc86a ping 192.168.1.2

I see

# ip netns exec qrouter-43dbd8f3-b36c-4f43-be95-91e3d92cc86a tcpdump -i qr-871497f4-1b -vvv host 192.168.1.2
tcpdump: listening on qr-871497f4-1b, link-type EN10MB (Ethernet), capture size 65535 bytes
13:19:27.627954 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.2 tell nimbus-net01.ncg.ingrid.pt, length 28
13:19:28.629674 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.2 tell nimbus-net01.ncg.ingrid.pt, length 28

there are requests, but no replies

For kernel 3.10.0-123.20.1.el7.x86_64

# modinfo openvswitch
filename:       /lib/modules/3.10.0-123.20.1.el7.x86_64/kernel/net/openvswitch/openvswitch.ko
license:        GPL
description:    Open vSwitch switching datapath
srcversion:     1241855A733802E089FD201
depends:        libcrc32c,vxlan,gre
intree:         Y
vermagic:       3.10.0-123.20.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        18:2E:BB:09:CD:40:C9:4C:A0:C3:CE:4E:E3:F7:1D:F5:20:B4:DA:80
sig_hashalgo:   sha256

# modinfo gre
filename:       /lib/modules/3.10.0-123.20.1.el7.x86_64/kernel/net/ipv4/gre.ko
license:        GPL
author:         D. Kozlov (xeb)
description:    GRE over IPv4 demultiplexer driver
srcversion:     976DD3A723FD7DBEA067264
depends:        
intree:         Y
vermagic:       3.10.0-123.20.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        18:2E:BB:09:CD:40:C9:4C:A0:C3:CE:4E:E3:F7:1D:F5:20:B4:DA:80
sig_hashalgo:   sha256


For kernel 3.10.0-229.11.1.el7.x86_64

# modinfo openvswitch
filename:       /lib/modules/3.10.0-229.11.1.el7.x86_64/kernel/net/openvswitch/openvswitch.ko
license:        GPL
description:    Open vSwitch switching datapath
rhelversion:    7.1
srcversion:     FFFD428FDB7B6580B22B985
depends:        libcrc32c,vxlan,gre
intree:         Y
vermagic:       3.10.0-229.11.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        99:7D:A0:E2:1A:70:E7:B6:13:42:3A:B6:22:65:07:4A:78:60:35:4C
sig_hashalgo:   sha256

# modinfo gre
filename:       /lib/modules/3.10.0-229.11.1.el7.x86_64/kernel/net/ipv4/gre.ko
license:        GPL
author:         D. Kozlov (xeb)
description:    GRE over IPv4 demultiplexer driver
rhelversion:    7.1
srcversion:     4F8C563CCD7AC190E40FEE6
depends:        
intree:         Y
vermagic:       3.10.0-229.11.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        99:7D:A0:E2:1A:70:E7:B6:13:42:3A:B6:22:65:07:4A:78:60:35:4C
sig_hashalgo:   sha256



if requested I can post 
ovs-vsctl show
ifconfig
and any logs

Version-Release number of selected component (if applicable):


How reproducible:
In principle it should be reproducible always with the following steps

Steps to Reproduce:
1. fully working openstack kilo in centos 7.1 where network and compute node are at kernel 3.10.0-123.20.1.el7.x86_64
2. instantiate a VM and check the network (private net) to it. do not destroy this VM
3. upgrade network and compute node to kernel 3.10.0-229.11.1.el7.x86_64, startup the same VM, and check the network

Actual results:


Expected results:


Additional info:

Comment 1 Flavio Leitner 2015-08-26 20:15:36 UTC
What happens if you just boot with the previous kernel?
Does it work?
Thanks

Comment 2 mariojmdavid 2015-08-26 20:33:06 UTC
hi Flavio

my latest tests that where exactly that
I was just booting to the previous kernel, and getting successful the tunnels and
all network to the VM

best
Mario

Comment 3 Flavio Leitner 2015-09-02 21:56:44 UTC
Hi Mario,

I've checked with OVS QE and RHOS QE and the OVS works with that kernel, so I am surprised that changing the kernel breaks something. So, I will need additional information.

Could you please provide the output of the following commands after had reproduced the issue?

# rpm -qi openvswitch
# rpm -V openvswitch
# ovs-vsctl show
# iptables -L -nv
# iptables -t nat -L -nv
# ovs-ofctl dump-flows br-ex
# ovs-ofctl dump-flows br-int
# ip netns exec qrouter-<UID> ip addr list
# ip netns exec qrouter-<UID> ip link list
# ip netns exec qrouter-<UID> iptables -L -nv
# ip netns exec qrouter-<UID> iptables -t nat -L -nv
# dmesg
# /var/log/openvswitch/*log
# systemctl

Thanks!

Comment 4 Flavio Leitner 2015-09-02 21:57:47 UTC
I forgot one command:
# plotnetcfg

Thanks
fbl

Comment 5 mariojmdavid 2015-09-08 15:51:55 UTC
hi Flavio

sorry for the late reply, was in vacations
in the meantime I have upgraded to the rpms from the 24 August
don't know at the moment if that was the problem, but it disapeared

I managed to instantiate successfully an instance with both the net and comp nodes with the latest kernel, and have network working properly
both with a private ip and and with a public IP

apologies for this, and thanks for the help

you can close the bug (or non bug as I see it now)

best
Mario

Comment 6 Flavio Leitner 2015-09-08 19:08:58 UTC
Hi Mario,

Ok, unfortunately I couldn't reproduce the issue yet, so I can't dig deeper.
I will close this bug with insufficient data and you're free to re-open if you see the issue again.
Thanks!


Note You need to log in before you can comment on or make changes to this bug.