Bug 1507424 - Build always failed in cri-o env
Summary: Build always failed in cri-o env
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.8.0
Assignee: Ben Parees
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-30 08:54 UTC by DeShuai Ma
Modified: 2018-03-28 14:09 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Build pods launch containers using the docker daemon. When run in environments in which the build pod itself was managed by CRIO, permission issues arose between the build pod and the docker-launched container. Consequence: The docker-launched container would be unable to access cluster resources such as the network and builds could fail. Fix: Additional permissions are granted to the docker-launched container to ensure it can access the cluster network. Result: Builds can succeed when run on a cluster using CRIO to run pods and docker to run build containers.
Clone Of:
Environment:
Last Closed: 2018-03-28 14:08:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
build-logs.txt (129.56 KB, text/plain)
2017-10-30 16:07 UTC, DeShuai Ma
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:09:21 UTC

Description DeShuai Ma 2017-10-30 08:54:00 UTC
Description of problem:
Install openshift with cri-o, when do sti and docker build, the build pod always failed with error. For the build log it seem that the process can't access external net, but in db pods we can access external net success.

Version-Release number of selected component (if applicable):
openshift v3.7.0-0.184.0
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8
cri-o : 1.0.1

How reproducible:
Always

Steps to Reproduce:
[root@qe-dma-master-etcd-1 cri-o.0]# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
django-psql-example-1-build   0/1       Error     0          4m
postgresql-1-gnb2b            1/1       Running   0          4m
ruby-ex-1-build               0/1       Error     0          25s
[root@qe-dma-master-etcd-1 cri-o.0]# oc logs ruby-ex-1-build
---> Installing application source ...
---> Building your Ruby application from source ...
---> Running 'bundle install --deployment --without development:test' ...
Fetching source index from https://rubygems.org/

Retrying fetcher due to error (2/4): Bundler::HTTPError Could not fetch specs from https://rubygems.org/
Retrying fetcher due to error (3/4): Bundler::HTTPError Could not fetch specs from https://rubygems.org/
Retrying fetcher due to error (4/4): Bundler::HTTPError Could not fetch specs from https://rubygems.org/Could not fetch specs from https://rubygems.org/
error: build error: non-zero (13) exit code from registry.access.redhat.com/rhscl/ruby-24-rhel7@sha256:79c08145aa2d6a680a3dc8b0f026b48b77b970bfcfe184898b3cb6eaf65276d0
[root@qe-dma-master-etcd-1 cri-o.0]# oc logs django-psql-example-1-build
---> Installing application source ...
---> Installing dependencies ...
Collecting django<1.12,>=1.11 (from -r requirements.txt (line 1))
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Could not find a version that satisfies the requirement django<1.12,>=1.11 (from -r requirements.txt (line 1)) (from versions: )
No matching distribution found for django<1.12,>=1.11 (from -r requirements.txt (line 1))
error: build error: non-zero (13) exit code from registry.access.redhat.com/rhscl/python-35-rhel7@sha256:be9df8f0385cb443c5c8ceabfa8b98aa3f213fa60ef1cd40c3649f650693df2e


Actual results:

Expected results:
Build successfully

Additional info:
]# docker run -ti --entrypoint /bin/bash registry.access.redhat.com/openshift3/ose-sti-builder:latest
[root@3951d95c493e origin]# ping rubygems.org
PING rubygems.org (151.101.2.2) 56(84) bytes of data.
64 bytes from 151.101.2.2 (151.101.2.2): icmp_seq=1 ttl=55 time=10.2 ms
64 bytes from 151.101.2.2 (151.101.2.2): icmp_seq=2 ttl=55 time=9.97 ms
64 bytes from 151.101.2.2 (151.101.2.2): icmp_seq=3 ttl=55 time=9.97 ms

Comment 1 DeShuai Ma 2017-10-30 08:55:46 UTC
This block our test build on cri-o env

Comment 2 Ben Parees 2017-10-30 14:18:38 UTC
The assemble container is supposed to be launched using the network namespace from the build pod.  Mrunal can you take a look?

DeShuai if you run the build with loglevel 5 I think we'll dump more information about the way the assemble container is being launched.

Comment 3 DeShuai Ma 2017-10-30 16:07:55 UTC
Created attachment 1345529 [details]
build-logs.txt

Add build-loglevel=5 to get the detail build logs.

Comment 4 Ben Parees 2017-10-30 16:15:57 UTC
here's the network mode value we used to launch the container in question:

NetworkMode: netns:/proc/50018/ns/net

Comment 5 Ben Parees 2017-10-30 16:20:02 UTC
this also looks like it could be dns issues in the container, so perhaps a problem w/ the resolv.conf in the crio pod which is being mounted into the assemble container.


DeShuai can your other pods(not build pods) successfully perform DNS resolution?

Comment 10 Daniel Walsh 2017-10-30 17:49:37 UTC
Yes that would work.

Comment 11 Ben Parees 2017-10-30 20:47:13 UTC
https://github.com/openshift/origin/pull/17094

Comment 14 Ben Parees 2017-11-30 23:05:42 UTC
ignore comment 13.


relevant PR (comment 11) has merged.

Comment 15 XiuJuan Wang 2017-12-21 09:31:28 UTC
Test in openshift cluster v3.8.22
Can't reproduce this bug in cri-o env,s2i and docker builds work well.

Comment 17 XiuJuan Wang 2018-01-03 08:05:03 UTC
s2i and docker builds work well in openshift cluster v3.9.0-0.9.0
Move this bug as verified

Comment 20 errata-xmlrpc 2018-03-28 14:08:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.