Bug 1747509

Summary: operator reporting a true degraded and progressing state when proxy is enabled.
Product: OpenShift Container Platform Reporter: Daneyon Hansen <dhansen>
Component: openshift-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED DUPLICATE QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.0CC: adam.kaplan, aos-bugs, gblomqui, mfojtik, slaznick
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-04 14:08:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daneyon Hansen 2019-08-30 16:16:21 UTC
Description of problem:
openshift-samples-operator is reporting degraded=true and progressing=true after a cluster installation with proxy enabled.

Version-Release number of selected component (if applicable):
4.2.0-0.okd-2019-08-29-220902

How reproducible:
Always

Steps to Reproduce:
1. Install a cluster on aws with proxy enabled
2. oc get clusteroperator/openshift-samples

Actual results:
$ oc get clusteroperator/openshift-samples
NAME                VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
openshift-samples   4.2.0-0.okd-2019-08-29-220902   True        True          True       16h

Expected results:
$ oc get clusteroperator/openshift-samples
NAME                VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
openshift-samples   4.2.0-0.okd-2019-08-29-220902   True        False          False       16h

Additional info:

coperator pod is making external https calls to https://registry.redhat.io/v2 and failing due to:

certificate signed
      by unknown authority<imagestream/fuse7-java-openshift><imagestream/jboss-webserver31-tomcat7-openshift>Internal


Since this is considered an external call, the operator should be consuming proxy env vars and mounting the trust bundle.

Comment 1 Daneyon Hansen 2019-08-30 19:59:40 UTC
I submitted https://github.com/openshift/cluster-samples-operator/pull/181 to fix this issue. With the PR I can see the proxy env vars and mounted trust bundle in the operator container, but I'm still hitting the x509 cert error:

$ oc exec -it cluster-samples-operator-7d88d5867f-g9ccr -n openshift-cluster-samples-operator env | grep PROX
HTTP_PROXY=http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3129
HTTPS_PROXY=http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3129
NO_PROXY=.cluster.local,.svc,.us-west-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.latest-proxy.devcluster.openshift.com,api.latest-proxy.devcluster.openshift.com,etcd-0.latest-proxy.devcluster.openshift.com,etcd-1.latest-proxy.devcluster.openshift.com,etcd-2.latest-proxy.devcluster.openshift.com,localhost

$ oc get cm/trusted-ca -n openshift-cluster-samples-operator -o yaml
apiVersion: v1
data:
  ca-bundle.crt: |
    -----BEGIN CERTIFICATE-----
<SNIP>

$ oc exec -it cluster-samples-operator-7d88d5867f-g9ccr -n openshift-cluster-samples-operator cat /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
-----BEGIN CERTIFICATE-----
<SNIP>

$ oc logs cluster-samples-operator-7d88d5867f-g9ccr -n openshift-cluster-samples-operator
<SNIP>
time="2019-08-30T19:44:32Z" level=warning msg="Image import for imagestream redhat-sso70-openshift tag 1.3 generation 2 failed with detailed message Internal error occurred: Get https://registry.redhat.io/v2/: x509: certificate signed by unknown authority"
time="2019-08-30T19:44:32Z" level=warning msg="Image import for imagestream fuse7-java-openshift tag 1.0 generation 2 failed with detailed message Internal error occurred: Get https://registry.redhat.io/v2/: x509: certificate signed by unknown authority"
time="2019-08-30T19:44:32Z" level=warning msg="Image import for imagestream redhat-openjdk18-openshift tag 1.0 generation 2 failed with detailed message Internal error occurred: Get https://registry.redhat.io/v2/: x509: certificate signed by unknown authority"

Comment 2 Adam Kaplan 2019-08-30 20:33:46 UTC
@Daneyon The imagestream import is run by the openshift-apiserver, not by the samples operator. The log is echoing the error message from the imagestream. The operator today reports itself degraded if imagestreams fail to import.

Comment 3 Standa Laznicka 2019-09-03 13:12:31 UTC
Might be fixed by https://github.com/openshift/cluster-openshift-apiserver-operator/pull/231

Comment 4 Michal Fojtik 2019-09-04 09:02:37 UTC
*** Bug 1748633 has been marked as a duplicate of this bug. ***

Comment 5 Greg Blomquist 2019-09-04 14:08:31 UTC

*** This bug has been marked as a duplicate of bug 1747260 ***