Bug 1225410

Summary:	Fail to do STI build for sample-app example
Product:	OpenShift Container Platform	Reporter:	xjia <xjia>
Component:	Build	Assignee:	Rajat Chopra <rchopra>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Johnny Liu <jialiu>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.0.0	CC:	akostadi, cewong, ejacobs, jialiu, libra-bugs, maschmid, sdodson, tschloss, wzheng, xtian, zroubali
Target Milestone:	---	Keywords:	TestBlocker
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openshift-0.5.2.2-0.git.19.8dc4a9a.el7ose.x86_64	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-23 14:43:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description xjia 2015-05-27 10:22:10 UTC

Description:
Following the instruction on page https://github.com/openshift/origin/tree/master/examples/sample-app to do STI build, it always to get error "Could not fetch specs from https://rubygems.org/"


Version-Release number of selected component (if applicable):
openshift v0.5.2.0-176-gc386339
kubernetes v0.17.0-441-g6b6b47a
puddle 3.0/2015-05-26.2/

How reproducible:
always

steps to Reproduce:
1. https://github.com/openshift/origin/tree/master/examples/sample-app, trigger a build
2. osc build-logs ruby-sample-build-1

Actual results:
I0526 21:53:46.564011       1 docker.go:364] Waiting for container
I0526 21:53:47.055761       1 sti.go:392] ---> Installing application source
I0526 21:53:47.073206       1 sti.go:392] ---> Building your Ruby application from source
I0526 21:53:47.073246       1 sti.go:392] ---> Running 'bundle install --deployment'
I0526 21:54:27.779183       1 sti.go:392] Fetching source index from https://rubygems.org/
I0526 21:55:07.892978       1 sti.go:392] Could not fetch specs from https://rubygems.org/
I0526 21:55:08.000742       1 docker.go:370] Container exited
I0526 21:55:09.364931       1 cleanup.go:24] Removing temporary directory /tmp/sti376064972


Expected results:
Should work

Additional info:

Comment 2 Cesar Wong 2015-05-27 22:49:13 UTC

xjia - it looks like either rubygems.org is not accessible from the build container, or dns lookup is not working. Can you please check that you can access rubygems.org from the node itself?

Comment 3 xjia 2015-05-28 00:27:25 UTC

Yep, I have checked my environment. Both node and container could connect to "rubygems.org" (using curl -k https://rubygems.org)
But i have no idea why it always failed to fetch data from rubygems.org.

Comment 4 Cesar Wong 2015-05-28 01:59:31 UTC

Is there any way that I could access your environment to troubleshoot?

Comment 5 Wenjing Zheng 2015-05-28 10:11:05 UTC

Not just downloading from rubygem.org, also have problem when try to install dependency in container built from perl images:
I0528 03:42:56.500439       1 docker.go:354] Starting container
I0528 03:42:56.691054       1 docker.go:364] Waiting for container
I0528 03:42:56.907048       1 sti.go:392] ---> Installing application source
I0528 03:42:56.948766       1 sti.go:392] ---> Installing modules from cpanfile ...
E0528 03:43:37.563564       1 sti.go:418] ! Finding Module::CoreList on cpanmetadb failed.
E0528 03:43:37.563628       1 sti.go:418] ! Finding Module::CoreList on cpanmetadb failed.

Comment 6 Cesar Wong 2015-05-28 15:19:02 UTC

related to https://github.com/openshift/origin/issues/2482

Comment 7 Tomas Schlosser 2015-05-29 08:22:33 UTC

I have a very similar issue with EAP STI. It seems that the openshift-master has to configure docker containers (I can see that /etc/resolv.conf is different in containers created by openshift and in containers created manually using docker). But these settings are not passed to the subsequent STI builder (i.e. EAP image). 

When I try to use STI, the ose-sti-builder doownloads sources (resolving correctly git url and accessing the server), but once the EAP part starts (in new container with EAP image), the /etc/resolv.conf is in default state and the connection outside container doesn't work (have tried to resolve DNS and access IP address).

It blocks xPaaS testing of Beta4.

Comment 8 Johnny Liu 2015-05-29 09:04:03 UTC

In my scenarios, I did not see DNS resolver issues, but still can not fetch data from rubygems.org (the same behaviour as the initial report)

I could see STI build container have the correct DNS resolver.
# cat /var/lib/docker/containers/71a1d75b1af017af4d306fa99832ecbdd0e6004f1e5759aaf6b091df951589e5/resolv.conf
nameserver 192.168.1.192
nameserver 10.11.5.19
search jialiu.cluster.local cluster.local openstacklocal cluster.local

71a1d75b1af017af4d306fa99832ecbdd0e6004f1e5759aaf6b091df951589e5 is the STI builder container UUID.

Because the container is already terminated after sti build failure, I have no chance to log into it to check DNS stuff. 

Here I log into router/docker-registry docker container to check its connection to rubygems.org, I even tried "bundle install" just like what sti build do, every thing is going well.

# docker exec -t -i <router-container-ID> /bin/sh
sh-4.2# cat /etc/resolv.conf 
nameserver 192.168.1.192
nameserver 10.11.5.19
search default.cluster.local cluster.local openstacklocal cluster.local

The same resolver order, just like what sti build container have.
192.168.1.192 is master where SkyDNS is running.
10.11.5.19 is office network DNS resolver.


sh-4.2# dig @192.168.1.192 rubygems.org

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 <<>> @192.168.1.192 rubygems.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 22025
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;rubygems.org.			IN	A

;; Query time: 2 msec
;; SERVER: 192.168.1.192#53(192.168.1.192)
;; WHEN: Fri May 29 04:52:32 EDT 2015
;; MSG SIZE  rcvd: 30


sh-4.2# dig rubygems.org               

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 <<>> rubygems.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 20393
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;rubygems.org.			IN	A

;; Query time: 1 msec
;; SERVER: 192.168.1.192#53(192.168.1.192)
;; WHEN: Fri May 29 04:53:59 EDT 2015
;; MSG SIZE  rcvd: 30


This is wired, seem like in container, the 2nd resolver does not be used when the 1st resolver could not resolve the DNS.

But interestingly, I could run the following command successfully without any change to /etc/resolv.conf in container.
sh-4.2# yum install rubygem-bundler
sh-4.2# curl -k https://rubygems.org/
sh-4.2# gem install rack
sh-4.2# bundle install

Comment 9 Tomas Schlosser 2015-05-29 09:24:06 UTC

I have additional information about how the build goes:
1) build is started
2) docker container from ose-pod image is created
3) docker container from ose-sti-builder image is created
   - this image has access to network, is set-up to use container created in 2) as NetworkMode
   - this node downloads sources from github
4) docker container from eap-openshift image is created
   - this docker container is not using container created in 2) as NetworkMode
   - this node has no access to internet (through IP od DNS)
5) build fails

It seems that the sti builder is creating the docker container itself which causes problems, because in current setup, docker containers created by docker do not have access anywhere.

Comment 10 Tomas Schlosser 2015-05-29 09:28:11 UTC

(In reply to Johnny Liu from comment #8)
Yes, when I connect to router, everything works just fine (resolving DNS, downloading from github), but if I just create docker container (e.g. docker run --rm -it fedora bash), it is unable to connect to the internet.

By doing 'watch docker ps' I found out that new container is started by STI build. That container is created by docker run, so it has no access anywhere (verified it since maven build takes time to fail).

Comment 11 Johnny Liu 2015-05-29 10:33:13 UTC

(In reply to Tomas Schlosser from comment #10)
> (In reply to Johnny Liu from comment #8)
> Yes, when I connect to router, everything works just fine (resolving DNS,
> downloading from github), but if I just create docker container (e.g. docker
> run --rm -it fedora bash), it is unable to connect to the internet.
> 
> By doing 'watch docker ps' I found out that new container is started by STI
> build. That container is created by docker run, so it has no access anywhere
> (verified it since maven build takes time to fail).

Agree.

Comment 12 Cesar Wong 2015-05-29 17:15:01 UTC

The issue is caused by the container run by the STI builder not getting the same nameserver configuration as the builder pod. 

The solution is to manually add the ip of the openshift master to each node's resolv.conf

We will print out a warning if this is not the case when starting the node. However, this needs to be done manually or by deploy scripts. I'll move this bug to ON_QA when documentation has been updated.

Comment 13 Cesar Wong 2015-05-29 18:16:06 UTC

Jhon, assigning this one to you, since you are going to be working on the documentation.

Comment 14 Scott Dodson 2015-05-30 15:33:47 UTC

(In reply to Cesar Wong from comment #12)
> The issue is caused by the container run by the STI builder not getting the
> same nameserver configuration as the builder pod. 
> 
> The solution is to manually add the ip of the openshift master to each
> node's resolv.conf
> 
> We will print out a warning if this is not the case when starting the node.
> However, this needs to be done manually or by deploy scripts. I'll move this
> bug to ON_QA when documentation has been updated.

We just reconfigured skydns not to recurse. Are you adding only the master's ip to the resolv.conf of the containers spawned by S2I builder? I think we'd need to be consistent with other pods and add the master first, then the node's nameservers after that.

See https://github.com/openshift/origin/pull/2569 for discussion regarding disabling recursion in skydns.

Comment 15 Tomas Schlosser 2015-05-31 18:15:47 UTC

(In reply to Cesar Wong from comment #12)
> The issue is caused by the container run by the STI builder not getting the
> same nameserver configuration as the builder pod. 
I don't think this is a DNS issue, I have tried curl with IP address with result "No route to host".

> The solution is to manually add the ip of the openshift master to each
> node's resolv.conf
> 
> We will print out a warning if this is not the case when starting the node.
> However, this needs to be done manually or by deploy scripts. I'll move this
> bug to ON_QA when documentation has been updated.

I have tried adding master to each node's resolv.conf. I tried to add it as second nameserver (after the real world one) as well as the first nameserver. None of these setups work and STI build still fails.

I have updated the openshift to latest version:
openshift-0.5.2.2-0.git.13.685a58e.el7ose.x86_64
openshift-master-0.5.2.2-0.git.13.685a58e.el7ose.x86_64
tuned-profiles-openshift-node-0.5.2.2-0.git.13.685a58e.el7ose.x86_64
openshift-sdn-ovs-0.5.2.2-0.git.13.685a58e.el7ose.x86_64
openshift-node-0.5.2.2-0.git.13.685a58e.el7ose.x86_64

Comment 16 Tomas Schlosser 2015-06-01 06:34:22 UTC

I went through docker inspect output again and it seems that the main difference between manually run container and container run by OSE is the NetworkMode.

The container created by docker run has the NetworkMode set to "bridge" while the openshift-pod container (that serves the network to sti-builder) has the NetworkMode set to "" (empty string). I didn't find a way how to reproduce this using docker run command so can't check, if it would solve the problem.

Comment 17 Johnny Liu 2015-06-01 06:49:01 UTC

(In reply to Cesar Wong from comment #12)
> The issue is caused by the container run by the STI builder not getting the
> same nameserver configuration as the builder pod. 
> 
> The solution is to manually add the ip of the openshift master to each
> node's resolv.conf
> 
> We will print out a warning if this is not the case when starting the node.
> However, this needs to be done manually or by deploy scripts. I'll move this
> bug to ON_QA when documentation has been updated.

My behaviour is the same as what is described in comment 15, seem like adding master ip to node's resolv.conf does not resolve this issue.

Here is my testing steps:
1. openshift verison
# openshift version
openshift v0.5.2.2-14-gef0f6ad
kubernetes v0.17.1-804-g496be63
# docker images|grep sti
docker-buildvm-rhose.usersys.redhat.com:5000/openshift3_beta/ose-sti-image-builder   v0.5.2.2            ad33ee97468d        3 days ago          445.4 MB
docker-buildvm-rhose.usersys.redhat.com:5000/openshift3_beta/ose-sti-builder         v0.5.2.2            63a3596cbba6        3 days ago          289.1 MB

2. According to comment 12, add master ip to /etc/resolv.conf on nodes.
# cat /etc/resolv.conf 
; generated by /usr/sbin/dhclient-script
search openstacklocal cluster.local
nameserver 192.168.1.192
nameserver 10.11.5.19

3. Trigger sti build.

4. During sti build, the following docker container would be spawned. (NOTE: this container does not have associated ose-pod container)
CONTAINER ID        IMAGE                                                                    COMMAND                CREATED             STATUS              PORTS
                                      NAMES
b964e9349b4d        openshift/ruby-20-rhel7:latest                                           "/bin/sh -c 'tar -C    3 seconds ago	Up 2 seconds        8080/tcp
                                      loving_pike
5. Before this container is terminated, check its resolv.conf to make sure master ip be there.
# docker inspect b964e9349b4d|grep res
        "CpuShares": 0,
        "MacAddress": "",
        "CpuShares": 0,
        "GlobalIPv6Address": "",
        "IPAddress": "10.1.0.8",
        "LinkLocalIPv6Address": "fe80::42:aff:fe1:8",
        "MacAddress": "02:42:0a:01:00:08",
    "ResolvConfPath": "/var/lib/docker/containers/b964e9349b4dbfa6efc4687db959d08e8cedfef667ab42782fbb51560bf87138/resolv.conf",

# cat /var/lib/docker/containers/b964e9349b4dbfa6efc4687db959d08e8cedfef667ab42782fbb51560bf87138/resolv.conf
; generated by /usr/sbin/dhclient-script
search openstacklocal cluster.local
nameserver 192.168.1.192
nameserver 10.11.5.19


But still the same error in build log.
$ osc build-logs ruby-sample-build-1
Switched to a new branch 'beta3'
Branch beta3 set up to track remote branch beta3 from origin.
I0601 02:16:57.711274       1 sti.go:392] ---> Installing application source
I0601 02:16:57.722962       1 sti.go:392] ---> Building your Ruby application from source
I0601 02:16:57.723158       1 sti.go:392] ---> Running 'bundle install --deployment'
I0601 02:17:38.268773       1 sti.go:392] Fetching source index from https://rubygems.org/
I0601 02:18:18.333187       1 sti.go:392] Could not fetch specs from https://rubygems.org/
F0601 02:18:19.510684       1 builder.go:75] Build error: non-zero (13) exit code from openshift/ruby-20-rhel7

Comment 18 Cesar Wong 2015-06-01 13:33:10 UTC

Assigning back to myself to investigate further

Comment 19 Scott Dodson 2015-06-02 01:45:44 UTC

Late today Cesar and I did some debugging and we hope to test a fix tomorrow related to the NetworkMode not being set properly on containers created by sti-builder pod, we found that those containers couldn't even ping the default router.

Hopefully we'll have a fix tomorrow.

Comment 20 Johnny Liu 2015-06-02 12:55:52 UTC

Seem like this is sdn network configuration issue, after restart openshift-node service, openshift-sdn-kube-subnet-setup.sh would be called, "ip route" is showing as the following. But this would cause container (started by "docker run" or spawned by sti builder) lose its network connection.

# ip route
default via 192.168.1.1 dev eth0 
10.1.0.0/24 dev tun0  proto kernel  scope link  src 10.1.0.1 
10.1.0.0/16 dev tun0  proto kernel  scope link 
169.254.0.0/16 dev eth0  scope link  metric 1002 
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.193 


Check all the steps of openshift-sdn-kube-subnet-setup.sh, the following line would lead container lose its network connection.

<-->
# delete the subnet routing entry created because of lbr0
ip route del ${subnet} dev lbr0 proto kernel scope link src ${subnet_gateway} || true
<-->

I am not sure if this is by design, what is the reason???

At least, if I comment out this line, ip route would show as the following:
# ip route
default via 192.168.1.1 dev eth0 
10.1.0.0/24 dev lbr0  proto kernel  scope link  src 10.1.0.1 
10.1.0.0/24 dev tun0  proto kernel  scope link  src 10.1.0.1 
10.1.0.0/16 dev tun0  proto kernel  scope link 
169.254.0.0/16 dev eth0  scope link  metric 1002 
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.193 


Then container could connect outside successfully.

Comment 21 Cesar Wong 2015-06-02 13:21:58 UTC

Rajat, assigning this one to you, given that in SDN environments this is a networking issue. Containers not started by openshift have no access to the outside network.

Comment 22 Tomas Schlosser 2015-06-02 17:24:26 UTC

(In reply to Johnny Liu from comment #20)

I have tried that setup as well and it bring two problems:
 - deployments end with "dial tcp: i/o timeout"
 - pods on the same node can't connect to each other (e.g. build on node1 can't use docker-registry on node1)
 - liveness probes don't work (because curl from node to it's containers doesn't work)

So it won't work for us even as a workaround.

Comment 23 Rajat Chopra 2015-06-02 23:46:32 UTC

Fixed with https://github.com/openshift/origin/pull/2719
@johnny @xjia Could we ask you to test this before it is merged? Thanks.

Comment 24 Johnny Liu 2015-06-03 02:58:57 UTC

Seem like 3.0/2015-06-02.3 already merged the PR mentioned in comment 23, so re-test this bug with 3.0/2015-06-02.3, container still can not get outside network connection.

Comment 25 Johnny Liu 2015-06-03 08:50:15 UTC

Add more log info, help that could help your debug.

Node log message:
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + lock_file=/var/lock/openshift-sdn.lock
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + subnet_gateway=10.1.0.1
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + subnet=10.1.0.0/24
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + cluster_subnet=10.1.0.0/16
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + subnet_mask_len=24
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + tun_gateway=10.1.0.1
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + printf 'Container network is "%s"; local host has subnet "%s" and gateway "%s".\n' 10.1.0.0/16 10.1.0.0/24 10.1.0.1
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: Container network is "10.1.0.0/16"; local host has subnet "10.1.0.0/24" and gateway "10.1.0.1".
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + TUN=tun0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + lockwrap setup
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + flock 200
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + setup
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + rm -f /etc/openshift-sdn/config.env
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl del-br br0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl add-br br0 -- set Bridge br0 fail-mode=secure
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl set bridge br0 protocols=OpenFlow13
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl del-port br0 vxlan0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ovs-vsctl: no port named vxlan0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + true
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl add-port br0 vxlan0 -- set Interface vxlan0 type=vxlan options:remote_ip=flow options:key=flow ofport_request=1
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl add-port br0 tun0 -- set Interface tun0 type=internal ofport_request=2
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link del vlinuxbr
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link add vlinuxbr type veth peer name vovsbr
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set vlinuxbr up
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set vovsbr up
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set vlinuxbr txqueuelen 0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set vovsbr txqueuelen 0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl del-port br0 vovsbr
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ovs-vsctl: no port named vovsbr
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + true
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ovs-vsctl add-port br0 vovsbr -- set Interface vovsbr ofport_request=9
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set lbr0 down
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + brctl delbr lbr0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + brctl addbr lbr0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip addr add 10.1.0.1/24 dev lbr0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set lbr0 up
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + brctl addif lbr0 vlinuxbr
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip addr add 10.1.0.1/24 dev tun0
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip link set tun0 up
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip route add 10.1.0.0/16 dev tun0 proto kernel scope link
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -t nat -D POSTROUTING -s 10.1.0.0/16 '!' -d 10.1.0.0/16 -j MASQUERADE
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -t nat -A POSTROUTING -s 10.1.0.0/16 '!' -d 10.1.0.0/16 -j MASQUERADE
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -D INPUT -p udp -m multiport --dports 4789 -m comment --comment '001 vxlan incoming' -j ACCEPT
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -D INPUT -i tun0 -m comment --comment 'traffic from docker for internet' -j ACCEPT
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ iptables -nvL INPUT --line-numbers
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ grep 'state RELATED,ESTABLISHED'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ awk '{print $1}'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + lineno=
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -I INPUT -p udp -m multiport --dports 4789 -m comment --comment '001 vxlan incoming' -j ACCEPT
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -I INPUT 1 -i tun0 -m comment --comment 'traffic from docker for internet' -j ACCEPT
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ iptables -nvL FORWARD --line-numbers
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ grep 'reject-with icmp-host-prohibited'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ tail -n 1
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: ++ awk '{print $1}'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + fwd_lineno=
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -I FORWARD -d 10.1.0.0/16 -j ACCEPT
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + iptables -I FORWARD -s 10.1.0.0/16 -j ACCEPT
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + [[ -z '' ]]
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + DOCKER_NETWORK_OPTIONS='-b=lbr0 --mtu=1450'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + grep -q '^DOCKER_NETWORK_OPTIONS='\''-b=lbr0 --mtu=1450'\''' /etc/sysconfig/docker-network
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + systemctl daemon-reload
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + systemctl restart docker.service
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + ip route del 10.1.0.0/24 dev lbr0 proto kernel scope link src 10.1.0.1
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + mkdir -p /etc/openshift-sdn
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + echo 'export OPENSHIFT_SDN_TAP1_ADDR=10.1.0.1'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: + echo 'export OPENSHIFT_CLUSTER_SUBNET=10.1.0.0/16'
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: I0603 16:15:50.534760   86703 kube.go:78] Output of adding table=0,cookie=0xac,priority=100,ip,nw_dst=10.1.1.0/...:  (<nil>)
Jun 03 16:15:50 minion1.cluster.local openshift-node[86703]: I0603 16:15:50.546071   86703 kube.go:80] Output of adding table=0,cookie=0xac,priority=100,arp,nw_dst=10.1.1.0...:  (<nil>)
Jun 03 16:15:50 minion1.cluster.local systemd[1]: Started OpenShift Node.



# brctl show
bridge name	bridge id		STP enabled	interfaces
lbr0		8000.0eb084ed433d	no		vlinuxbr


# ovs-vsctl show
bc8dae8c-d22c-4dce-9d9b-10a816018729
    Bridge "br0"
        fail_mode: secure
        Port "veth2bc9197"
            Interface "veth2bc9197"
        Port "tun0"
            Interface "tun0"
                type: internal
        Port "veth347c6dc"
            Interface "veth347c6dc"
        Port "br0"
            Interface "br0"
                type: internal
        Port vovsbr
            Interface vovsbr
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {key=flow, remote_ip=flow}
        Port "veth5c96ddf"
            Interface "veth5c96ddf"
    ovs_version: "2.3.1-git3282e51"


# ip route
default via 192.168.1.1 dev eth0 
10.1.0.0/24 dev tun0  proto kernel  scope link  src 10.1.0.1 
10.1.0.0/16 dev tun0  proto kernel  scope link 
169.254.0.0/16 dev eth0  scope link  metric 1002 
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.193 


# iptables -L -n -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-PORTALS-CONTAINER  all  --  0.0.0.0/0            0.0.0.0/0            /* handle Portals; NOTE: this must be before the NodePort rules */
KUBE-NODEPORT-CONTAINER  all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL /* handle service NodePorts; NOTE: this must be the last rule in the chain */

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-PORTALS-HOST  all  --  0.0.0.0/0            0.0.0.0/0            /* handle Portals; NOTE: this must be before the NodePort rules */
KUBE-NODEPORT-HOST  all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL /* handle service NodePorts; NOTE: this must be the last rule in the chain */
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
MASQUERADE  all  --  10.1.0.0/24          0.0.0.0/0           
MASQUERADE  all  --  10.1.0.0/16         !10.1.0.0/16         
MASQUERADE  tcp  --  10.1.0.5             10.1.0.5             tcp dpt:1936
MASQUERADE  tcp  --  10.1.0.5             10.1.0.5             tcp dpt:443
MASQUERADE  tcp  --  10.1.0.5             10.1.0.5             tcp dpt:80

Chain DOCKER (1 references)
target     prot opt source               destination         
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1936 to:10.1.0.5:1936
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:443 to:10.1.0.5:443
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:80 to:10.1.0.5:80

Comment 26 Erik M Jacobs 2015-06-03 12:32:18 UTC

I think this is actually an SDN issue. Two containers can't reach eachother, and the container can't reach the host's gateway:

[root@ose3-master openshift-ansible]# docker run -it google/golang /bin/bash
root@e94e34575453:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
19: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 02:42:0a:01:00:04 brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.4/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe01:4/64 scope link 
       valid_lft forever preferred_lft forever
root@e94e34575453:/# ping 10.1.0.1
PING 10.1.0.1 (10.1.0.1) 56(84) bytes of data.
64 bytes from 10.1.0.1: icmp_req=1 ttl=64 time=0.141 ms
^C
--- 10.1.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.141/0.141/0.141/0.000 ms
root@e94e34575453:/# ping 192.168.133.1
PING 192.168.133.1 (192.168.133.1) 56(84) bytes of data.
^C
--- 192.168.133.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms

*** other container on different host***
[root@ose3-node1 ~]# docker run -it google/golang /bin/bash
root@744ba0f177d9:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
15: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 02:42:0a:01:01:02 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe01:102/64 scope link 
       valid_lft forever preferred_lft forever
root@744ba0f177d9:/# ping 10.1.0.4
PING 10.1.0.4 (10.1.0.4) 56(84) bytes of data.
^C
--- 10.1.0.4 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Comment 28 Rajat Chopra 2015-06-03 22:03:45 UTC

The following command is required on all nodes:
sysctl -w net.bridge.bridge-nf-call-iptables=0

Will get eventually fixed in ansible/vagrant. Being tracked here:
https://github.com/detiber/openshift-ansible/issues/33

Comment 29 xjia 2015-06-04 01:57:27 UTC

Version:
3.0/2015-06-03.2/

Verify:
It can download successfully. 

[xuan@master sample-app]$ osc build-logs ruby-sample-build-1 -n jia
I0603 21:55:02.909533       1 cfg.go:50] Problem accessing /root/.dockercfg: stat /root/.dockercfg: no such file or directory
I0603 21:55:02.911822       1 sti.go:67] Creating a new S2I builder with build request: api.Request{BaseImage:"openshift/ruby-20-centos7:latest", DockerConfig:(*api.DockerConfig)(0xc20803b680), DockerCfgPath:"", PullAuthentication:docker.AuthConfiguration{Username:"", Password:"", Email:"", ServerAddress:""}, PreserveWorkingDir:false, Source:"git://github.com/openshift/ruby-hello-world.git", Ref:"", Tag:"172.30.168.21:5000/jia/origin-ruby-sample:latest", Incremental:true, RemovePreviousImage:false, Environment:map[string]string{"OPENSHIFT_BUILD_NAME":"ruby-sample-build-1", "OPENSHIFT_BUILD_NAMESPACE":"jia", "OPENSHIFT_BUILD_SOURCE":"git://github.com/openshift/ruby-hello-world.git"}, CallbackURL:"", ScriptsURL:"", Location:"", ForcePull:false, WorkingDir:"", LayeredBuild:false, InstallDestination:"", Quiet:false, ContextDir:""}
I0603 21:55:02.915286       1 docker.go:170] Image openshift/ruby-20-centos7:latest available locally
I0603 21:55:02.918042       1 sti.go:73] Starting S2I build from jia/ruby-sample-build-1 BuildConfig ...
I0603 21:55:02.918084       1 sti.go:114] Building 172.30.168.21:5000/jia/origin-ruby-sample:latest
I0603 21:55:02.918740       1 clone.go:26] Cloning into /tmp/sti640436419/upload/src
I0603 21:55:05.204811       1 docker.go:170] Image openshift/ruby-20-centos7:latest available locally
I0603 21:55:05.204858       1 docker.go:222] Image contains STI_SCRIPTS_URL set to 'image:///usr/local/sti'
I0603 21:55:05.204907       1 download.go:55] Using image internal scripts from: image:///usr/local/sti/assemble
I0603 21:55:05.204923       1 download.go:55] Using image internal scripts from: image:///usr/local/sti/run
I0603 21:55:05.207682       1 docker.go:170] Image openshift/ruby-20-centos7:latest available locally
I0603 21:55:05.207702       1 docker.go:222] Image contains STI_SCRIPTS_URL set to 'image:///usr/local/sti'
I0603 21:55:05.207722       1 download.go:55] Using image internal scripts from: image:///usr/local/sti/save-artifacts
I0603 21:55:05.207741       1 sti.go:185] Using assemble from image:///usr/local/sti
I0603 21:55:05.207756       1 sti.go:185] Using run from image:///usr/local/sti
I0603 21:55:05.207766       1 sti.go:185] Using save-artifacts from image:///usr/local/sti
I0603 21:55:05.208856       1 sti.go:122] Clean build will be performed
I0603 21:55:05.208892       1 sti.go:125] Performing source build from git://github.com/openshift/ruby-hello-world.git
I0603 21:55:05.208906       1 sti.go:133] Building 172.30.168.21:5000/jia/origin-ruby-sample:latest
I0603 21:55:05.208921       1 sti.go:330] Using image name openshift/ruby-20-centos7:latest
I0603 21:55:05.208998       1 environment.go:52] Setting 'RACK_ENV' to 'production'
I0603 21:55:05.210884       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/.gitignore as src/.gitignore
I0603 21:55:05.211088       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/.sti/bin/README as src/.sti/bin/README
I0603 21:55:05.211183       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/.sti/environment as src/.sti/environment
I0603 21:55:05.211272       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/Dockerfile as src/Dockerfile
I0603 21:55:05.211360       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/Gemfile as src/Gemfile
I0603 21:55:05.211495       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/Gemfile.lock as src/Gemfile.lock
I0603 21:55:05.211605       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/README.md as src/README.md
I0603 21:55:05.211697       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/Rakefile as src/Rakefile
I0603 21:55:05.211790       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/app.rb as src/app.rb
I0603 21:55:05.211932       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/config/database.rb as src/config/database.rb
I0603 21:55:05.212036       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/config/database.yml as src/config/database.yml
I0603 21:55:05.214260       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/config.ru as src/config.ru
I0603 21:55:05.214494       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/db/migrate/20141102191902_create_key_pair.rb as src/db/migrate/20141102191902_create_key_pair.rb
I0603 21:55:05.214648       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/models.rb as src/models.rb
I0603 21:55:05.214764       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/run.sh as src/run.sh
I0603 21:55:05.214919       1 tar.go:133] Adding to tar: /tmp/sti640436419/upload/src/views/main.erb as src/views/main.erb
I0603 21:55:05.218025       1 docker.go:268] Base directory for STI scripts is '/usr/local/sti'. Untarring destination is '/tmp'.
I0603 21:55:05.218066       1 docker.go:294] Creating container using config: {Hostname: Domainname: User: Memory:0 MemorySwap:0 CPUShares:0 CPUSet: AttachStdin:false AttachStdout:true AttachStderr:false PortSpecs:[] ExposedPorts:map[] Tty:false OpenStdin:true StdinOnce:true Env:[RACK_ENV=production OPENSHIFT_BUILD_NAME=ruby-sample-build-1 OPENSHIFT_BUILD_NAMESPACE=jia OPENSHIFT_BUILD_SOURCE=git://github.com/openshift/ruby-hello-world.git] Cmd:[/bin/sh -c tar -C /tmp -xf - && /usr/local/sti/assemble] DNS:[] Image:openshift/ruby-20-centos7:latest Volumes:map[] VolumesFrom: WorkingDir: MacAddress: Entrypoint:[] NetworkDisabled:false SecurityOpts:[] OnBuild:[] Labels:map[]}
I0603 21:55:07.298492       1 docker.go:301] Attaching to container
I0603 21:55:07.305606       1 docker.go:354] Starting container
I0603 21:55:08.064878       1 docker.go:364] Waiting for container
I0603 21:55:09.147178       1 sti.go:392] ---> Installing application source
I0603 21:55:09.223986       1 sti.go:392] ---> Building your Ruby application from source
I0603 21:55:09.224019       1 sti.go:392] ---> Running 'bundle install --deployment'
I0603 21:55:14.053936       1 sti.go:392] Fetching gem metadata from https://rubygems.org/..........
I0603 21:55:17.846857       1 sti.go:392] Installing rake (10.3.2) 
I0603 21:55:18.203740       1 sti.go:392] Installing i18n (0.6.11) 
I0603 21:55:21.842195       1 sti.go:392] Installing json (1.8.1) 
I0603 21:55:23.026363       1 sti.go:392] Installing minitest (5.4.2) 
I0603 21:55:23.527062       1 sti.go:392] Installing thread_safe (0.3.4) 
I0603 21:55:23.912921       1 sti.go:392] Installing tzinfo (1.2.2) 
I0603 21:55:24.448167       1 sti.go:392] Installing activesupport (4.1.7) 
I0603 21:55:24.664853       1 sti.go:392] Installing builder (3.2.2) 
I0603 21:55:24.865857       1 sti.go:392] Installing activemodel (4.1.7) 
I0603 21:55:25.125443       1 sti.go:392] Installing arel (5.0.1.20140414130214) 
I0603 21:55:25.719118       1 sti.go:392] Installing activerecord (4.1.7) 
I0603 21:55:31.490834       1 sti.go:392] Installing mysql2 (0.3.16) 
I0603 21:55:32.685277       1 sti.go:392] Installing rack (1.5.2) 
I0603 21:55:32.934857       1 sti.go:392] Installing rack-protection (1.5.3) 
I0603 21:55:33.153033       1 sti.go:392] Installing tilt (1.4.1) 
I0603 21:55:33.602519       1 sti.go:392] Installing sinatra (1.4.5) 
I0603 21:55:33.728385       1 sti.go:392] Installing sinatra-activerecord (2.0.3) 
I0603 21:55:33.728701       1 sti.go:392] Using bundler (1.3.5) 
I0603 21:55:33.763013       1 sti.go:392] Your bundle is complete!
I0603 21:55:33.763054       1 sti.go:392] It was installed into ./bundle
I0603 21:55:33.803384       1 sti.go:392] ---> Cleaning up unused ruby gems
I0603 21:55:34.712984       1 docker.go:370] Container exited
I0603 21:55:34.713013       1 docker.go:376] Invoking postExecution function
I0603 21:55:34.713109       1 environment.go:52] Setting 'RACK_ENV' to 'production'
I0603 21:55:34.713146       1 docker.go:408] Committing container with config: {Hostname: Domainname: User: Memory:0 MemorySwap:0 CPUShares:0 CPUSet: AttachStdin:false AttachStdout:false AttachStderr:false PortSpecs:[] ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[RACK_ENV=production OPENSHIFT_BUILD_NAME=ruby-sample-build-1 OPENSHIFT_BUILD_NAMESPACE=jia OPENSHIFT_BUILD_SOURCE=git://github.com/openshift/ruby-hello-world.git] Cmd:[/usr/local/sti/run] DNS:[] Image: Volumes:map[] VolumesFrom: WorkingDir: MacAddress: Entrypoint:[] NetworkDisabled:false SecurityOpts:[] OnBuild:[] Labels:map[]}
I0603 21:55:44.180502       1 sti.go:249] Successfully built 172.30.168.21:5000/jia/origin-ruby-sample:latest
I0603 21:55:44.180540       1 sti.go:250] Tagged 811131ea1b64dd87c8fd5625b885f7d9ba6d91b61d98a25310d96b4fef12083a as 172.30.168.21:5000/jia/origin-ruby-sample:latest
I0603 21:55:48.197948       1 cleanup.go:24] Removing temporary directory /tmp/sti640436419
I0603 21:55:48.198003       1 fs.go:99] Removing directory '/tmp/sti640436419'
I0603 21:55:48.202826       1 cfg.go:50] Problem accessing /root/.dockercfg: stat /root/.dockercfg: no such file or directory
I0603 21:55:48.202856       1 sti.go:92] Pushing 172.30.168.21:5000/jia/origin-ruby-sample:latest image ...