1743125 – Deploy pods failed to pull image in global proxy env

Bug 1743125 - Deploy pods failed to pull image in global proxy env

Summary: Deploy pods failed to pull image in global proxy env

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Mrunal Patel
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Depends On:	1743507
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-19 07:10 UTC by wewang
Modified:	2019-10-16 06:36 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:36:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:36:36 UTC

Comment 4 Gabe Montero 2019-08-19 14:13:46 UTC

For completeness, the proxy settings on the build pod look correct ... they reflect the settings on the proxy config status subresource.  Your wording in the description seemed backwards @wenwang.

the spec no proxy was "test.no-proxy.com"

the status no proxy was "10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.qe-xiuwang-proxy.qe.devcluster.openshift.com,api.qe-xiuwang-proxy.qe.devcluster.openshift.com,etcd-0.qe-xiuwang-proxy.qe.devcluster.openshift.com,localhost,test.no-proxy.com"  .... in other words, the proxy controller added more items on top of what you specified in spec

and the build pod has a noproxy of "10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.qe-xiuwang-proxy.qe.devcluster.openshift.com,api.qe-xiuwang-proxy.qe.devcluster.openshift.com,etcd-0.qe-xiuwang-proxy.qe.devcluster.openshift.com,localhost,test.no-proxy.com"

Comment 6 wewang 2019-08-20 03:17:55 UTC

(In reply to Gabe Montero from comment #4)
> For completeness, the proxy settings on the build pod look correct ... they
> reflect the settings on the proxy config status subresource.  Your wording
> in the description seemed backwards @wenwang.
> 
> the spec no proxy was "test.no-proxy.com"
> 
> the status no proxy was
> "10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.qe-xiuwang-proxy.qe.
> devcluster.openshift.com,api.qe-xiuwang-proxy.qe.devcluster.openshift.com,
> etcd-0.qe-xiuwang-proxy.qe.devcluster.openshift.com,localhost,test.no-proxy.
> com"  .... in other words, the proxy controller added more items on top of
> what you specified in spec
> 
> and the build pod has a noproxy of
> "10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.qe-xiuwang-proxy.qe.
> devcluster.openshift.com,api.qe-xiuwang-proxy.qe.devcluster.openshift.com,
> etcd-0.qe-xiuwang-proxy.qe.devcluster.openshift.com,localhost,test.no-proxy.
> com"

yes, gabe, you are right, I should take back the worlds.

Comment 9 Gabe Montero 2019-08-21 13:31:50 UTC

Moving to post per discussion with Adam

This bug does not need to be worked from a build perspective.

But we want QA to complete the Build related tests they were performing here once https://bugzilla.redhat.com/show_bug.cgi?id=1743507
is done, so we are not closing as a dupe

Note, as of this comment, the above bug is in modified and its PR https://github.com/openshift/cluster-network-operator/pull/295 is 
on the merge queue.

Comment 10 Gabe Montero 2019-08-21 15:14:47 UTC

PR https://github.com/openshift/cluster-network-operator/pull/295 has merged and https://bugzilla.redhat.com/show_bug.cgi?id=1743507 is in modified

moving this one to modified

Comment 11 XiuJuan Wang 2019-08-22 06:03:54 UTC

@Gabe,
https://bugzilla.redhat.com/show_bug.cgi?id=1743507 has been fixed, the builds could go completed.
But deploy pods are still broken,

$ oc get pods 
NAME                        READY   STATUS         RESTARTS   AGE
jenkins-1-deploy            0/1     Error          0          12m
jenkins-2-deploy            1/1     Running        0          11s
jenkins-2-ft279             0/1     ErrImagePull   0          8s
ruby-hello-world-1-build    0/1     Completed      0          13m
ruby-hello-world-1-deploy   0/1     Error          0          11m
ruby-hello-world-2-deploy   1/1     Running        0          5s
ruby-hello-world-2-zd8hz    0/1     ErrImagePull   0          3s

$oc describe pods ruby-hello-world-2-zd8hz
  Warning  Failed     6s (x2 over 20s)  kubelet, compute-0  Failed to pull image "image-registry.openshift-image-registry.svc:5000/xiuwang/ruby-hello-world@sha256:9c86cf5a1ac111a687539c03c2b9afa51ec8af764b5a15746291dec455fda4e8": rpc error: code = Unknown desc = pinging docker registry returned: Get https://image-registry.openshift-image-registry.svc:5000/v2/: Service Unavailable
  Warning  Failed     6s (x2 over 20s)  kubelet, compute-0  Error: ErrImagePull

No proxy info setting in the deploymentconfig.
$oc get deploymentconfig ruby-hello-world -o yaml | grep  -i proxy 
$oc get dc jenkins -o yaml | grep  -i proxy 

Could pull the image image-registry.openshift-image-registry.svc:5000/xiuwang/ruby-hello-world@sha256:9c86cf5a1ac111a687539c03c2b9afa51ec8af764b5a15746291dec455fda4e8 from node.

After set proxy in deploymentconfig manually, still can't pull images with same error in the pod.

Comment 13 Gabe Montero 2019-08-22 14:12:19 UTC

OK I'll be taking a detailed look today.  Once I have specifics we'll see about re-routing to new components, citing as works for me, etc. as needed

Comment 14 Gabe Montero 2019-08-22 17:37:53 UTC

OK investigation finished, including discussion with Daneyon Hansen, Ben Parees, and Antonio Murdaca.

First XiuJuan, the inner workings are different for deployments/deploymentconfigs than they are for Builds.

The kubelet gets the proxy settings from the MCO, which in turn gets it from the node/install.

According to Antonio, the proxy cfg information is stored for the kubelet at:

/etc/systemd/system/kubelet.service.d/10-default-env.conf

So that is where to look vs. env vars on the deployemntconfig/replicationcontroller/pod.

Antonio asks that you provide the contents of this file in the environment where you see builds
properly leverage the proxy config, but DCs etc. not doing so.

Comment 15 Gabe Montero 2019-08-22 17:40:52 UTC

Reassigning to container engine team per exchange in https://coreos.slack.com/archives/CLJSH16J0/p1566494588075000

Comment 16 Gabe Montero 2019-08-22 17:43:04 UTC

Based on what XiuJuan provides wrt /etc/systemd/system/kubelet.service.d/10-default-env.conf we can see if this is a setup problem, code problem (and where that might be), etc.

Comment 18 Daneyon Hansen 2019-08-25 19:00:01 UTC

".svc" and ".cluster.local" is missing from NO_PROXY. Can you verify your installer version includes https://github.com/openshift/installer/commit/045402165b7aa5cb95ab050d4ca0827016e05dd1 and https://github.com/openshift/cluster-network-operator/commit/6739a0af92cb6593c9910fca637673c42da0dad2?

For example:
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: null
  name: cluster
spec:
  httpProxy: http://admin:admin@35.196.128.173:3128
  httpsProxy: https://admin:admin@35.231.5.161:3128
  trustedCA:
    name: user-ca-bundle
status:
  httpProxy: http://admin:admin@35.196.128.173:3128
  httpsProxy: https://admin:admin@35.231.5.161:3128
  noProxy: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.ericproxy.devcluster.openshift.com,api.ericproxy.devcluster.openshift.com,etcd-0.ericproxy.devcluster.openshift.com,etcd-1.ericproxy.devcluster.openshift.com,etcd-2.ericproxy.devcluster.openshift.com,localhost

Comment 19 XiuJuan Wang 2019-08-26 03:16:14 UTC

Test with 4.2.0-0.nightly-2019-08-25-233755 which has included https://github.com/openshift/installer/pull/2257 and https://github.com/openshift/cluster-network-operator/pull/295.

The deployment|deploymentconfig pods could pull images from internal registry, and run well.
Qe could mark this as verified after it move to on_qa status.

$ oc debug  node/compute-0 
Starting pod/compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 139.178.76.8
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4#  cat /etc/systemd/system/kubelet.service.d/10-default-env.conf 
[Service]
Environment=HTTP_PROXY="http://proxy-user1:***********@139.178.76.57:3128"
Environment=HTTPS_PROXY="http://proxy-user1:***********@139.178.76.57:3128"
Environment=NO_PROXY=".cluster.local,.svc,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.qe-xiuwang-proxy-826.qe.devcluster.openshift.com,api.qe-xiuwang-proxy-826.qe.devcluster.openshift.com,etcd-0.qe-xiuwang-proxy-826.qe.devcluster.openshift.com,localhost,test.no-proxy.com"


Thanks

Comment 21 Wenjing Zheng 2019-08-26 06:54:49 UTC

Removing TestBlocker keyworkd and QE could mark this as verified after it move to on_qa status, thanks!

Comment 22 errata-xmlrpc 2019-10-16 06:36:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.