Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1507424 - Build always failed in cri-o env
Build always failed in cri-o env
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build (Show other bugs)
3.7.0
Unspecified Unspecified
urgent Severity urgent
: ---
: 3.8.0
Assigned To: Ben Parees
Wenjing Zheng
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-30 04:54 EDT by DeShuai Ma
Modified: 2018-03-28 10:09 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Build pods launch containers using the docker daemon. When run in environments in which the build pod itself was managed by CRIO, permission issues arose between the build pod and the docker-launched container. Consequence: The docker-launched container would be unable to access cluster resources such as the network and builds could fail. Fix: Additional permissions are granted to the docker-launched container to ensure it can access the cluster network. Result: Builds can succeed when run on a cluster using CRIO to run pods and docker to run build containers.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-28 10:08:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
build-logs.txt (129.56 KB, text/plain)
2017-10-30 12:07 EDT, DeShuai Ma
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:09 EDT

  None (edit)
Description DeShuai Ma 2017-10-30 04:54:00 EDT
Description of problem:
Install openshift with cri-o, when do sti and docker build, the build pod always failed with error. For the build log it seem that the process can't access external net, but in db pods we can access external net success.

Version-Release number of selected component (if applicable):
openshift v3.7.0-0.184.0
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8
cri-o : 1.0.1

How reproducible:
Always

Steps to Reproduce:
[root@qe-dma-master-etcd-1 cri-o.0]# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
django-psql-example-1-build   0/1       Error     0          4m
postgresql-1-gnb2b            1/1       Running   0          4m
ruby-ex-1-build               0/1       Error     0          25s
[root@qe-dma-master-etcd-1 cri-o.0]# oc logs ruby-ex-1-build
---> Installing application source ...
---> Building your Ruby application from source ...
---> Running 'bundle install --deployment --without development:test' ...
Fetching source index from https://rubygems.org/

Retrying fetcher due to error (2/4): Bundler::HTTPError Could not fetch specs from https://rubygems.org/
Retrying fetcher due to error (3/4): Bundler::HTTPError Could not fetch specs from https://rubygems.org/
Retrying fetcher due to error (4/4): Bundler::HTTPError Could not fetch specs from https://rubygems.org/Could not fetch specs from https://rubygems.org/
error: build error: non-zero (13) exit code from registry.access.redhat.com/rhscl/ruby-24-rhel7@sha256:79c08145aa2d6a680a3dc8b0f026b48b77b970bfcfe184898b3cb6eaf65276d0
[root@qe-dma-master-etcd-1 cri-o.0]# oc logs django-psql-example-1-build
---> Installing application source ...
---> Installing dependencies ...
Collecting django<1.12,>=1.11 (from -r requirements.txt (line 1))
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', gaierror(-2, 'Name or service not known'))': /simple/django/
  Could not find a version that satisfies the requirement django<1.12,>=1.11 (from -r requirements.txt (line 1)) (from versions: )
No matching distribution found for django<1.12,>=1.11 (from -r requirements.txt (line 1))
error: build error: non-zero (13) exit code from registry.access.redhat.com/rhscl/python-35-rhel7@sha256:be9df8f0385cb443c5c8ceabfa8b98aa3f213fa60ef1cd40c3649f650693df2e


Actual results:

Expected results:
Build successfully

Additional info:
]# docker run -ti --entrypoint /bin/bash registry.access.redhat.com/openshift3/ose-sti-builder:latest
[root@3951d95c493e origin]# ping rubygems.org
PING rubygems.org (151.101.2.2) 56(84) bytes of data.
64 bytes from 151.101.2.2 (151.101.2.2): icmp_seq=1 ttl=55 time=10.2 ms
64 bytes from 151.101.2.2 (151.101.2.2): icmp_seq=2 ttl=55 time=9.97 ms
64 bytes from 151.101.2.2 (151.101.2.2): icmp_seq=3 ttl=55 time=9.97 ms
Comment 1 DeShuai Ma 2017-10-30 04:55:46 EDT
This block our test build on cri-o env
Comment 2 Ben Parees 2017-10-30 10:18:38 EDT
The assemble container is supposed to be launched using the network namespace from the build pod.  Mrunal can you take a look?

DeShuai if you run the build with loglevel 5 I think we'll dump more information about the way the assemble container is being launched.
Comment 3 DeShuai Ma 2017-10-30 12:07 EDT
Created attachment 1345529 [details]
build-logs.txt

Add build-loglevel=5 to get the detail build logs.
Comment 4 Ben Parees 2017-10-30 12:15:57 EDT
here's the network mode value we used to launch the container in question:

NetworkMode: netns:/proc/50018/ns/net
Comment 5 Ben Parees 2017-10-30 12:20:02 EDT
this also looks like it could be dns issues in the container, so perhaps a problem w/ the resolv.conf in the crio pod which is being mounted into the assemble container.


DeShuai can your other pods(not build pods) successfully perform DNS resolution?
Comment 10 Daniel Walsh 2017-10-30 13:49:37 EDT
Yes that would work.
Comment 11 Ben Parees 2017-10-30 16:47:13 EDT
https://github.com/openshift/origin/pull/17094
Comment 14 Ben Parees 2017-11-30 18:05:42 EST
ignore comment 13.


relevant PR (comment 11) has merged.
Comment 15 XiuJuan Wang 2017-12-21 04:31:28 EST
Test in openshift cluster v3.8.22
Can't reproduce this bug in cri-o env,s2i and docker builds work well.
Comment 17 XiuJuan Wang 2018-01-03 03:05:03 EST
s2i and docker builds work well in openshift cluster v3.9.0-0.9.0
Move this bug as verified
Comment 20 errata-xmlrpc 2018-03-28 10:08:55 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.