Bug 1289116 - vdsm creates network problems for VM with docker container
Summary: vdsm creates network problems for VM with docker container
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.17.11
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.1.1
: ---
Assignee: Petr Horáček
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-07 13:28 UTC by Thomas Hamel
Modified: 2017-02-20 12:21 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-02-20 12:21:33 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
bridge details (5.14 KB, text/plain)
2015-12-08 08:55 UTC, Thomas Hamel
no flags Details
Network overview (61.75 KB, image/jpeg)
2015-12-08 08:56 UTC, Thomas Hamel
no flags Details
vdsm log (408.87 KB, application/octet-stream)
2015-12-08 08:57 UTC, Thomas Hamel
no flags Details

Description Thomas Hamel 2015-12-07 13:28:18 UTC
Description of problem:

A docker container in a VM is having network problems when running in a vsdm environment.
Packets from external servers are reaching the network interface eth0 of the VM but are not properly NATted to the docker0 bridge.
Some packets are lost in the natting process from eth0 to docker0.

When the same VM (setup via PXE) with a docker container is running on a host with KVM, these network problems are not present.

The iptables configurations for the filter and nat tables are identical in both VM's.


Version-Release number of selected component (if applicable):

Software versions of the host with vdsm:

OS Version:         RHEL - 7 - 1.1503.el7.centos.2.8
Kernel Version:     3.10.0 - 229.20.1.el7.x86_64
KVM Version:        2.3.0 - 29.1.el7
LIBVIRT Version:    libvirt-1.2.8-16.el7_1.5
VDSM Version:       vdsm-4.17.12-0.el7.centos
SPICE Version:      0.12.4 - 9.el7_1.3


Software version of the host running KVM:

[root@E7 /]# uname -a
Linux E7.test.local 3.10.0-229.20.1.el7.x86_64 #1 SMP Tue Nov 3 19:10:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[root@E7 /]# rpm -qa | grep qemu-kvm
qemu-kvm-1.5.3-86.el7_1.8.x86_64
qemu-kvm-common-1.5.3-86.el7_1.8.x86_64

[root@E7 /]# rpm -qa | grep libvirt
libvirt-daemon-driver-network-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-kvm-1.2.8-16.el7_1.5.x86_64
libvirt-python-1.2.8-7.el7_1.1.x86_64
libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-lxc-1.2.8-16.el7_1.5.x86_64
libvirt-1.2.8-16.el7_1.5.x86_64
libvirt-gobject-0.1.7-3.el7.x86_64
libvirt-daemon-driver-interface-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-storage-1.2.8-16.el7_1.5.x86_64
libvirt-client-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.5.x86_64
libvirt-gconfig-0.1.7-3.el7.x86_64
libvirt-daemon-driver-secret-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-config-network-1.2.8-16.el7_1.5.x86_64
libvirt-glib-0.1.7-3.el7.x86_64
libvirt-daemon-1.2.8-16.el7_1.5.x86_64


How reproducible:

Allways


Initiate a TCP session to a host.

Steps to Reproduce:
1. Install a VM.
2. Install docker within the VM.
3. Initiate a TCP session to a host.


Actual results:

Packets are lost in the NATting between the eth0 interface of the VM and the docker0 bridge in the VM.


Expected results:

No packets are lost.


Additional info:

Comment 1 Dan Kenigsberg 2015-12-08 06:51:14 UTC
Can you share vdsm.log of the VM statup (since VM.create to Up state)?

Can you share your `brctl show` as well as your bridge's connectivity to the outside world?

Can you try installing vdsm-hook-macspoof.rpm and configuring it on you VM interface? (Cf. http://www.ovirt.org/Vdsm_Hooks to learn how) Just to rule out a set of other issues.

Comment 2 Thomas Hamel 2015-12-08 08:55:59 UTC
Created attachment 1103484 [details]
bridge details

Comment 3 Thomas Hamel 2015-12-08 08:56:41 UTC
Created attachment 1103485 [details]
Network overview

Comment 4 Thomas Hamel 2015-12-08 08:57:26 UTC
Created attachment 1103487 [details]
vdsm log

Comment 5 Thomas Hamel 2015-12-08 09:00:27 UTC
I installed vdsm-hook-macspoof at the host and did the config change in the engine, also created the macspoof key at the VM.

There is no change in the behaviour, still packets lost.

Comment 6 Dan Kenigsberg 2015-12-09 09:43:05 UTC
thanks, Thomas. Can you provide more information about your guest OS and applications (versions, which containers)?

Federico, have you noticed anything like this during your integration work http://www.ovirt.org/images/d/dd/2014-ovirt-docker-integration.pdf ?

Comment 7 Thomas Hamel 2015-12-09 10:14:51 UTC
We are running CentOS 7.1 x86_64 with the latest updates on all components:

- Host
- VM
- Docker container

The CentOS 7.1 of the VM and the container are minimal installations.

The docker RPMs in the VM are out of the CentOS-7.1-Extras repository.


We start the container via the command line

    docker run -it centos /bin/bash


When we start the container with the network mode bridging (IP address of the VM is used) instead of nat, we do not face the network problem:

    docker run -it --net=host centos71 /bin/bash

But based on our application we need to run multiple containers within a VM which must be accessible from outside and therefore the bridging option is not an option for us.



I tried a different way of installation at one host:

- Install the CentOS 7.1 via PXE
- Install KVM (qemu-kvm, libvirt & virt-manager RPMs)
- Attach the host into an oVirt cluster
- Migrate an existing VM with a running container to this newly installed host

There the container was able to access the internet.
But after a reboot of this host the problem was present again.
 
So some part of the KVM installation seemed to do some changes which were reverted after teh reboot.

Comment 8 Thomas Hamel 2015-12-09 10:17:46 UTC
In the VM we are running docker 1.8.2:

[root@dc2-ovm1 /]# docker version
Client:
 Version:      1.8.2
 API version:  1.20
 Package Version: docker-1.8.2-7.el7.centos.x86_64
 Go version:   go1.4.2
 Git commit:   bb472f0/1.8.2
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Package Version:
 Go version:   go1.4.2
 Git commit:   bb472f0/1.8.2
 Built:
 OS/Arch:      linux/amd64

Comment 9 Petr Horáček 2017-01-23 11:31:15 UTC
Hello, I tried to reproduce your problem, but with no success.

I started CentOS docker container with mapped ports 8000:8000 on CentOS VM and then tried to ping VM:8000 via https://code.google.com/archive/p/paping/, but no packets were lost. When I tried to ping outside network from container itself via ICMP ping it was OK as well.

Could you please check if my steps were wrong or the bug is not there anymore?

Have you used some special iptables rules other than Docker port mapping?

Thanks,
Petr

Comment 10 Thomas Hamel 2017-01-24 07:17:43 UTC
Hi,

we have not used any port mapping for the Docker container.

In the container we used curl for testing a HTTP access to a server.

E.g. in the container just run 

    curl http://www.google.com

or any other web site.

I use to run tcpdump in parallel at the docker bridge in the VM and at the outgoing interface of the VM.
There I spotted that acknowledgement packets from the HTTP access are arriving at the VM interface but not forwarded to the docker bridge, so never reaching the Docker container.

Comment 11 Thomas Hamel 2017-01-24 07:20:23 UTC
For the Docker container, we haven't modified any iptables rules.
Only what was configured by Docker itself.

Comment 12 Dan Kenigsberg 2017-02-15 09:28:19 UTC
We still find it hard to reproduce your condition.

Do you still see it? Have you tried newer versions of docker or ovirt since?

Can you share a sanitized tcpdump, so we could attempt to understand which packets are lost, and when?

Comment 13 Petr Horáček 2017-02-16 21:41:37 UTC
Hello again,

I ran `curl www.google.com` 1000 times (poor Google) while `tcpdump -i eth0 -s 65535 -w eth0_1000.pcap` and `tcpdump -i docker0 -s 65535 -w docker0_1000.pcap` were running. I checked captured traffic (docker0_1000.pcap and eth0_1000.pcap) via Wireshark, some requests myself, the rest via tcp.analysis.lost_segment filter and I was not able to find any lost segments/dropped packets there.

Could you please try the same yourself to see if my test was invalid of if it is just working in my environment?

Thank you very much.

Comment 14 Thomas Hamel 2017-02-17 12:56:25 UTC
I just retested with CentOS 7.3 as the OS in the VM (with latest updates).

For docker I installed docker-engine-1.13.1.

Here I can run a wget inside the docker container and pull an ISO image without any packet loss.

Comment 15 Petr Horáček 2017-02-20 12:21:33 UTC
Thanks for the response, closing as NOTABUG, please reopen if it appears again.


Note You need to log in before you can comment on or make changes to this bug.