Bug 1294061
Summary: | Cannot fetch "https://rubygems.org/" when docker/custom build with ruby-hello-world for ruby-22 image | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | wewang <wewang> | |
Component: | Networking | Assignee: | Dan Williams <dcbw> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Meng Bo <bmeng> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 3.1.0 | CC: | aos-bugs, bleanhar, bparees, dcbw, eparis, haowang, jhonce, jokerman, mmccomas, rchopra, tdawson | |
Target Milestone: | --- | Keywords: | Regression, Reopened | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1390478 (view as bug list) | Environment: | ||
Last Closed: | 2016-01-29 20:57:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
wewang
2015-12-24 10:54:30 UTC
1. It works when source build ,can fetch https://rubygems.org/ 2. It doesn't work when docker build, cannot fetch: https://rubygems.org/ Ruby hello world cannot be docker built with the Ruby 2.0 image anymore, we switched it to use the Ruby 2.2 Scl package, hence the error you are getting. This was an intentional change. @wewang , this is not a bug, closed. reopen this bug : 1.As the bug tittle , we are test against ruby2.2 image ,from the build log we can get this. 2. current env info: 3.1.1.0 container install , sti build works fine with the ruby22 builder iamge, docker build and costom build will fail. 3. the same builder images docker build works fine with 3.1.0.4 rpm install env. So , Ben , could you please help verify this , and find the problem ? My apologies, I'm not sure why I thought you were using the 2.0 image. I'm able to run this successfully on origin (but using the same rcm ruby image you used), and the error you're getting certainly seems like a networking one (unable to reach rubygems.org) though I understand you were successful with the source-type build. I'd like to have you start by trying again just to rule out network issues. I'd also like to know if you have this problem with origin installations (but using the RCM ruby-22-rhel7 image to build) here's my log output from the build: Step 0 : FROM rcm-img-docker01.build.eng.bos.redhat.com:5001/rhscl/ruby-22-rhel7 ---> 9416bc460d0c Step 1 : ENV "EXAMPLE" "sample-app" ---> Running in 946a0401b355 ---> 19870ac8688f Removing intermediate container 946a0401b355 Step 2 : USER default ---> Running in f1057f027cab ---> 65e0ae8f9fac Removing intermediate container f1057f027cab Step 3 : EXPOSE 8080 ---> Running in dffab589eaec ---> 4dc91b8af03d Removing intermediate container dffab589eaec Step 4 : ENV RACK_ENV production ---> Running in fba4c440983b ---> 6956746fa073 Removing intermediate container fba4c440983b Step 5 : ENV RAILS_ENV production ---> Running in 026f47dbf1c3 ---> a097a875a602 Removing intermediate container 026f47dbf1c3 Step 6 : COPY . /opt/app-root/src/ ---> c3ae348617b8 Removing intermediate container 9fe9b26d0581 Step 7 : RUN scl enable rh-ruby22 "bundle install" ---> Running in 62c3b35222df Fetching gem metadata from https://rubygems.org/.......... Installing rake 10.3.2 Installing i18n 0.6.11 Installing json 1.8.3 Installing minitest 5.4.2 Installing thread_safe 0.3.4 Installing tzinfo 1.2.2 Installing activesupport 4.1.7 Installing builder 3.2.2 Installing activemodel 4.1.7 Installing arel 5.0.1.20140414130214 Installing activerecord 4.1.7 Installing mysql2 0.3.16 Installing rack 1.5.2 Installing rack-protection 1.5.3 Installing tilt 1.4.1 Installing sinatra 1.4.5 Installing sinatra-activerecord 2.0.3 Using bundler 1.7.8 Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed. ---> 242d39c8204b Removing intermediate container 62c3b35222df Step 8 : CMD scl enable rh-ruby22 ./run.sh ---> Running in 3151ed30cdb1 ---> ac98d0e4a8b7 Removing intermediate container 3151ed30cdb1 Step 9 : USER root ---> Running in 9e2bd4e5d47f ---> 88d6943b2bdb Removing intermediate container 9e2bd4e5d47f Step 10 : RUN chmod og+rw /opt/app-root/src/db ---> Running in 2170eda54edd ---> 63c357bbde11 <truncated> Ben: AFAIK, when we run docker and sti build, will start a container not scheduled by kubenetes, the container network seems different from the pod network,(network topology here: https://github.com/openshift/openshift-sdn/blob/master/isolation-node-interfaces-diagram.pdf), the container will bridged to the lbr0 interface, so I start a container on one node : $docker run -ti bmeng/hello-openshift /bin/bash 1. then curl the rubygem.org inside the container $curl -k https://rubygems.org/ curl: (6) Could not resolve host: rubygems.org; Unknown error 2. cannot connect to the nameserver inside the container: bash-4.3# cat /etc/resolv.conf # Generated by NetworkManager search openstacklocal nay.redhat.com nameserver 10.11.5.19 ping 10.11.5.19 PING 10.11.5.19 (10.11.5.19): 56 data bytes ^C --- 10.11.5.19 ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss The env is container install env on atomic host, when I do same operation on the rpm install env , the curl options works. so seems there is maybe a network problem , but I am not sure what's the differentce of source and docker build ,and why source build works and have no network problem. I will paste the env info in next private comment ,could you please help to check this ? Yes this sounds like a networking issue. s2i builds have some special logic to pick up the cluster networking config when launching a container: https://github.com/openshift/origin/pull/5372 it seems we may need to do something similar for Docker and Custom builds so those containers also have the right networking configuration. Rajat can you confirm that is the right solution here (since you were tagged in the s2i PR)? For custom builds, there's nothing we can do since the custom image itself is going to control the network configuration of any docker operations it performs. (the custom container itself has the correct network settings because it's just a pod, but the container it creates as a result of running docker build does not/will not). for docker builds, i don't see a way for us to configure the network like you can for docker run, so i'm not convinced there is anything we can do to fix that either from our side. Would like the network and container teams to weigh in on this though. the rpm install env also have the problem as the comment 6 describe . openshift v3.1.1.1 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 update the priority to high for this network problem. The issue appears to be that net.bridge.bridge-nf-call-iptables = 1 when it should be 0. That is likely due to https://github.com/openshift/openshift-sdn/pull/244/files. That openshift-sdn PR has been merged; we must now wait for an openshift-sdn update to be merged to origin repos. Verified works in origin env after modify the net.bridge.bridge-nf-call-iptables to 0 on the node. Move the bug back, and please move it to ON_QA once the change merged into latest OSE rpm build. The fix was not included in build 2016-01-13.2 with version: atomic-openshift-node-3.1.1.2-1.git.0.30f8d65.el7aos.x86_64 atomic-openshift-sdn-ovs-3.1.1.2-1.git.0.30f8d65.el7aos.x86_64 # sysctl -a | grep ^net.*iptables net.bridge.bridge-nf-call-iptables = 1 Reduce the severity since the workaround can be applied by user. Looks like we missed the latest SDN update with yesterday's build: https://github.com/openshift/origin/pull/6650 There will be another build today so I'll move this back ON_QA once it's ready. The fix has been merged into OSE build, 2016-01-14.1 with rpm: atomic-openshift-node-3.1.1.3-1.git.0.59b3b7b.el7aos.x86_64 atomic-openshift-sdn-ovs-3.1.1.3-1.git.0.59b3b7b.el7aos.x86_64 But I found that the value of net.bridge.bridge-nf-call-iptables will be modified to 1 not 0 after node service restart. # sysctl -a | grep bridge.*iptables net.bridge.bridge-nf-call-iptables = 1 # sysctl -w net.bridge.bridge-nf-call-iptables=0 net.bridge.bridge-nf-call-iptables = 0 # sysctl -a | grep bridge.*iptables net.bridge.bridge-nf-call-iptables = 0 # systemctl restart atomic-openshift-node # sysctl -a | grep bridge.*iptables net.bridge.bridge-nf-call-iptables = 1 Please help confirm what's the problem. Ok, I think I know the issue. Docker's libnetwork bridge driver (which is used because we give tell docker to use lbr0) always sets bridge-nf-call-iptables=1 when it starts up. openshift sets bridge-nf-call-iptables=0 when it starts up, but *only* if it thinks the SDN is not yet configured. openshift also restarts docker when it starts up, but then terminates setup early when it thinks the SDN is configured, and that's long before it sets bridge-nf-call-iptables=0. Although, openshift only restarts docker if the options changed, so I'm still investigating what's going on here. It is possible, and even likely, that he restarted docker by hand at some oint. I have heard of an operations group that sometimes does so... Final analysis: 1) docker is not the cause; it will bridge-nf-call-iptables=1 but only if inter-container communication (ICC) is disabled, and that defaults to enabled. 2) the actual issue is the upstream Kubernetes proxy code, which also sets bridge-nf-call-iptables=1 and runs after the SDN code runs, and thus always overwrites the value the SDN code set. https://github.com/openshift/origin/pull/6688 replaces the above PR because of difficulties in ordering related to 6686. moving MODIFIED as this is in HEAD and should make the next OSE 3.1.1 build. Checked with latest origin build, the net.bridge.bridge-nf-call-iptables will be updated to 0 after node started. # openshift version openshift v1.1-806-gd95ec08 kubernetes v1.1.0-origin-1107-g4c8e6f4 # sysctl -a | grep ^net.bridge.bridge.*iptables net.bridge.bridge-nf-call-iptables = 0 # sysctl -w net/bridge/bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-iptables = 1 # systemctl restart openshift-node # sysctl -a | grep ^net.bridge.bridge.*iptables net.bridge.bridge-nf-call-iptables = 0 @dcbw I'm curious why this was not a problem before setup.sh refactor? At least, the docker build works fine when the 3.1.0 released. Does it mean all the functions in the setup.sh are running earlier during the node starts after re-written in golang? Just confirmed that the fix has been included in latest OSE build 2016-01-16.1 (In reply to Meng Bo from comment #26) > @dcbw I'm curious why this was not a problem before setup.sh refactor? At > least, the docker build works fine when the 3.1.0 released. Does it mean all > the functions in the setup.sh are running earlier during the node starts > after re-written in golang? It may have been due to 8971251ba2095bb6daace2e6396ff1a1f6882b27 (committed upstream Nov 10th, after 3.1 was released) which changed node initialization from being done in a goroutine to being synchronous. Before that change, init from a goroutine would have usually allowed RunProxy() to happen before node initialization completed, though it was not guaranteed. Verified on openshift v3.1.1.5. |