Bug 1273327 - [DOCS] docker push fails - large packets get dropped by vxlan
Summary: [DOCS] docker push fails - large packets get dropped by vxlan
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: brice
QA Contact: Vikram Goyal
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1277592
TreeView+ depends on / blocked
 
Reported: 2015-10-20 08:45 UTC by Johnny Liu
Modified: 2016-05-17 01:21 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1277592 (view as bug list)
Environment:
Last Closed: 2016-05-17 01:19:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Johnny Liu 2015-10-20 08:45:39 UTC
Description of problem:
sti build failed when registry pod and sti-builder pod are running on different nodes.

Version-Release number of selected component (if applicable):
AtomicOpenShift/3.1/2015-10-17.1
openshift3/ose-sti-builder:v3.0.2.901          3e544fca2924

How reproducible:
Always

Steps to Reproduce:
1. Set up 3.1 OSE env, 1 master + 2 nodes. (Apply workaround for BZ#1273294)
2. Install docker-registry, which is running on node-1.
3. New an app, and trigger sti build, make sure the whole process of sti build is happening on node-2 (here we could use "oadm manage-node" to make node1  scheduling disabled). 

Actual results:
$ oc new-app https://github.com/openshift/simple-openshift-sinatra-sti.git -i openshift/ruby

$ oc build-logs simple-openshift-sinatra-sti-2
<--snip-->
I1020 03:18:31.725949       1 sti.go:288] Successfully built 172.30.196.230:5000/jialiu1/simple-openshift-sinatra-sti:latest
I1020 03:18:36.344326       1 cleanup.go:23] Removing temporary directory /tmp/sti649809172
I1020 03:18:36.344358       1 fs.go:99] Removing directory '/tmp/sti649809172'
I1020 03:18:36.364885       1 sti.go:162] Using provided push secret for pushing 172.30.196.230:5000/jialiu1/simple-openshift-sinatra-sti:latest image
I1020 03:18:36.364912       1 sti.go:166] Pushing 172.30.196.230:5000/jialiu1/simple-openshift-sinatra-sti:latest image ...
I1020 03:34:03.519356       1 sti.go:171] Registry server Address: 
I1020 03:34:03.519378       1 sti.go:172] Registry server User Name: serviceaccount
I1020 03:34:03.519386       1 sti.go:173] Registry server Email: serviceaccount
I1020 03:34:03.519393       1 sti.go:178] Registry server Password: <<non-empty>>
F1020 03:34:03.519411       1 builder.go:54] Build error: Failed to push image. Response from registry is: Post http://172.30.196.230:5000/v2/jialiu1/simple-openshift-sinatra-sti/blobs/uploads/: EOF


Expected results:
sti build is completed successfully.

Additional info:
When env only has one node, run sti build succssfully.

Comment 1 Scott Dodson 2015-10-20 14:27:30 UTC
Possibly a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1273129

Can you verify that you don't have an iptables rule allowing UDP 4789? If you add it to all of the hosts does it suddenly start working?

Comment 2 Ma xiaoqiang 2015-10-21 08:20:46 UTC
QE install env with opening '4789' port today, we still can not deploy the pod.

After running 'iptables -F' to flush iptable rules, the pod can be deploy successfully. User can access the registry from all nodea, but can not push the images when registry pod and sti-builder pod are running on different nodes.

Comment 3 Ben Parees 2015-10-22 13:48:49 UTC
Sounds like this would be a networking issue to me.

Comment 4 Wenjing Zheng 2015-10-26 08:51:35 UTC
Also cannot connect to db master with service name in either master pod or slave pod, but succeed with master pod ip. Below error appears when try to use service name to connect (master and slave pods are on different nodes);
ERROR 2005 (HY000): Unknown MySQL server host 'mysql-master' (110)

Comment 5 Dan Winship 2015-10-26 21:02:49 UTC
(In reply to Ma xiaoqiang from comment #2)
> After running 'iptables -F' to flush iptable rules

You can't do that. Kubernetes and OpenShift depend on certain iptables rules. It is absolutely expected that things will break if you delete them.

Comment 6 Ma xiaoqiang 2015-10-27 02:01:32 UTC
(In reply to Dan Winship from comment #5)
> (In reply to Ma xiaoqiang from comment #2)
> > After running 'iptables -F' to flush iptable rules
> 
> You can't do that. Kubernetes and OpenShift depend on certain iptables
> rules. It is absolutely expected that things will break if you delete them.

You can refer to bg#1273294, on the other hand, we didn't flush iptable rules on puddle[3.1/2015-10-21.4]. the deployment on it worked fine. but this issue still existed.

Comment 7 Meng Bo 2015-10-27 05:49:22 UTC
(In reply to Dan Winship from comment #5)
> (In reply to Ma xiaoqiang from comment #2)
> > After running 'iptables -F' to flush iptable rules
> 
> You can't do that. Kubernetes and OpenShift depend on certain iptables
> rules. It is absolutely expected that things will break if you delete them.

Seems after `iptables -F`, restart the node service can re-create the necessary iptables rules for kubernetes.

Comment 8 Meng Bo 2015-10-27 06:58:39 UTC
(In reply to Meng Bo from comment #7)
> (In reply to Dan Winship from comment #5)
> > (In reply to Ma xiaoqiang from comment #2)
> > > After running 'iptables -F' to flush iptable rules
> > 
> > You can't do that. Kubernetes and OpenShift depend on certain iptables
> > rules. It is absolutely expected that things will break if you delete them.
> 
> Seems after `iptables -F`, restart the node service can re-create the
> necessary iptables rules for kubernetes.

And restart docker service.

Comment 9 Johnny Liu 2015-10-27 11:03:35 UTC
(In reply to Dan Winship from comment #5)
> (In reply to Ma xiaoqiang from comment #2)
> > After running 'iptables -F' to flush iptable rules
> 
> You can't do that. Kubernetes and OpenShift depend on certain iptables
> rules. It is absolutely expected that things will break if you delete them.

"iptables -F" was used as workaround for BZ#1273294, if "iptables -F" is not allowed, we have to run "iptables -D FORWARD -j REJECT --reject-with icmp-host-prohibited" as workaround for pod deploy failure issue. After that, the result is the same - build failed upon pushing image data to docker-registry.

Comment 11 Johnny Liu 2015-10-28 06:40:31 UTC
Re-test against AtomicOpenShift/3.1/2015-10-27.1, now we do NOT need run "iptables -D FORWARD -j REJECT --reject-with icmp-host-prohibited" workaround, but sti build still failed.

Comment 12 Ben Parees 2015-10-28 16:59:04 UTC
Johnny, please provide full loglevel 5 logs for the most recent failure and confirm whether or not the builds succeed in a single node environment.

Thanks.

Comment 13 Ben Parees 2015-10-28 16:59:38 UTC
I should say, level 5 build logs for the failure. I don't think we need openshift logs at present.

registry pod logs would also be good.

Comment 14 Dan Winship 2015-10-28 19:42:28 UTC
With latest packages, tried setting up a multi-node cluster, running docker-registry on one node, and doing a build on another. It works fine every time. With the single-tenant or multi-tenant plugin.

Please try again with the latest packages, and if you still see this problem, post more detailed how-to-reproduce instructions, and get the output of the "debug.sh" script while the build is running

Comment 16 Dan Winship 2015-10-29 18:17:34 UTC
Right. So:

(In reply to Johnny Liu from comment #15)
> Here is build logs:

> I1029 03:20:12.035706       1 sti.go:214] Pushing
> 172.30.137.27:5000/jialiu/simple-openshift-sinatra-sti:latest image ...

> Here is registry logs:

> 10.1.1.1 - - [29/Oct/2015:03:20:12 -0400] "GET /v2/ HTTP/1.1" 401 114 ""
> "docker/1.8.2 go/go1.4.2 kernel/3.10.0-326.el7.x86_64 os/linux arch/amd64"
> time="2015-10-29T03:20:12-04:00" level=error msg="error authorizing context:
> authorization header with basic token required"

So two problems:

  1. sti (or something) is apparently not sending your auth info to the
     docker registry.

  2. it then fails to notice that it got an error and spends 15 minutes
     waiting for something that's never going to happen

Comment 17 Cesar Wong 2015-10-29 18:53:17 UTC
It seems that the registry is just having trouble communicating back with the master to authenticate users.

Outside of a build, if I run:
docker login http://172.30.137.27:5000
Usrname: serviceaccount
Password: [Enter valid sa token]
Email: test

It will hang for 15 min just as with the build. I do see the 401 line in the registry log:
10.1.1.1 - - [29/Oct/2015:14:41:19 -0400] "GET /v2/ HTTP/1.1" 401 114 "" "docker/1.8.2 go/go1.4.2 kernel/3.10.0-326.el7.x86_64 os/linux arch/amd64"

but I believe that gets output for every registry auth challenge.

Comment 19 Cesar Wong 2015-10-29 21:28:07 UTC
Paul, I'm sending this one your way. Given that I can reproduce the behavior with the Docker client, the builder or s2i doesn't seem to be involved. Please see my comment above. On the same machine that the docker client hangs on login, I can login just fine to the registry.

Comment 20 Cesar Wong 2015-10-29 21:30:38 UTC
Sorry just realized my statement above is confusing. I can login just fine to DockerHub. Not to the OPenShift registry.

Comment 24 Dan Winship 2015-10-30 17:05:28 UTC
ok, strace shows that the kube-proxy part of openshift on the build node IS receiving a "POST /v2/default/sti-php/blobs/uploads/" request from the builder, opening a connection to the registry, and proxying the request data. The registry sees the connection being opened but never sees any data written on it; those packets seem to get dropped somewhere.

Comment 25 Michal Minar 2015-10-30 18:37:06 UTC
I'm confirming networking issues. Registry seems to be working fine. If I curl it on the registry-running node, I get:
# curl -k 172.30.41.4:5000
{"errors":[{"code":"UNAUTHORIZED","message":"access to the requested resource is not authorized","detail":null}]}

which is expected. On master node however:

# curl -k 172.30.41.4:5000
curl: (7) Failed connect to 172.30.41.4:5000; Connection refused

Comment 26 Dan Winship 2015-10-30 19:15:54 UTC
(In reply to Michal Minar from comment #25)
> # curl -k 172.30.41.4:5000
> curl: (7) Failed connect to 172.30.41.4:5000; Connection refused

connection refused would be a different problem.

The problem here is something MTU-related, because the ridiculous-long passwords used cause the POST request to be larger than the MTU of the tunnel, and something is going wrong with fragmentation/segmentation/something. (This is why the unauthenticated GET request gets through reliably every time, but the authenticated POST request gets dropped every time.)

Comment 27 Dan Winship 2015-10-30 19:16:42 UTC
FTR,

  ovs-vsctl set Interface vxlan0 options:df_default=false

fixes the problem. I'm trying to figure out now if that has any negative side effects.

Comment 31 Dan Winship 2015-11-03 15:00:01 UTC
OK, figured this out; the problem is that since you are deploying on top of openstack, eth0 has an MTU of 1400, but we were still using the default MTU of 1450 on the veths/bridge/tunnel. Since PMTU through OVS doesn't work, this meant that packets of size 1400-1450 ended up getting dropped.

Changing mtu to 1350 in node-config.yaml and restarting the node service fixes the problem.

So:

    1. This is technically "user error".

    2. We obviously need to document this better.

    3. We should have the installer sanity-check this if we can
       (That the node-config.yaml mtu value is less than the default
       route's MTU.)

    4. We should possibly have openshift itself sanity-check this at
       startup

Comment 32 Dan Winship 2015-11-03 15:40:59 UTC
reassigning to docs (will clone another bug for the installer)

The install docs (https://docs.openshift.org/latest/install_config/install/advanced_install.html ?) need to mention that if you are installing on top of a virtual network (eg, OpenStack) that you need to set the mtu accordingly in the ansible config to, eg, 50 less than the MTU of eth0.

Comment 33 Johnny Liu 2015-11-04 07:40:07 UTC
Hi Dan,

Great work!

Following your suggestion, I changed openstack mtu setting to 1500, instance's eth0 would have 1500 mtu. Then docker can push data across nodes to registry successfully.

Because now this bug is changed to DOC component, leave the left verification work to doc team.

Comment 35 Johnny Liu 2016-01-14 06:36:33 UTC
You could find the the setting in /etc/origin/node/node-config.yaml.
<--snip-->
networkConfig:
   mtu: 1450
   networkPluginName: redhat/openshift-ovs-subnet
<--snip-->

# ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:1f:a0:7f brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.35/24 brd 192.168.0.255 scope global dynamic eth0
       valid_lft 73385sec preferred_lft 73385sec
    inet6 fe80::f816:3eff:fe1f:a07f/64 scope link 
       valid_lft forever preferred_lft forever


the "eth0"'s mts is set to 1500, then in openshift SDN network, mtu should be less than its real NIC's mtu.

Comment 36 brice 2016-04-20 03:42:26 UTC
Johnny Liu,

I've created a PR for this BZ:

https://github.com/openshift/openshift-docs/pull/1923

I've added a section on troubleshooting virtual networks to the troubleshooting SDN file. Please let me know if this fulfills the requirements of this BZ.

Thanks!

Comment 37 Johnny Liu 2016-04-20 05:35:21 UTC
Your PR looks good to me.

Comment 38 brice 2016-04-20 05:43:50 UTC
Thanks for that. I'll put this onto review.

Comment 39 Johnny Liu 2016-04-21 13:19:56 UTC
According to comment 37, move it to verified.

Comment 40 openshift-github-bot 2016-04-28 23:36:31 UTC
Commit pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/4dd4600a7fb6a0544adf7c4dd1d88dc40dfeb8da
Merge pull request #1923 from bfallonf/bz1273327

Bug 1273327 : added troubleshooting for virtual networking


Note You need to log in before you can comment on or make changes to this bug.