Bug 1917114

Summary: Upgrade from 4.5.9 to 4.7 fails as authentication operator is Degraded due to '"ProxyConfigController" controller failed to sync "key"' error
Product: OpenShift Container Platform Reporter: Robert Heinzmann <rheinzma>
Component: apiserver-authAssignee: Standa Laznicka <slaznick>
Status: CLOSED ERRATA QA Contact: pmali
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: aos-bugs, eparis, jokerman, mfojtik, satwsing, trees
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A naive implementation of NO_PROXY environment variable content maching. Consequence: The controller checking whether proxy environment variables are properly configured might report false positives. Fix: Implement NO_PROXY environment variable hostname matching coherent with upstream Golang. Result: No more false positives should be reported when the oauth-server's endpoint appears in a subdomain configured in the NO_PROXY environment variable.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:53:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Heinzmann 2021-01-17 10:28:26 UTC
Description of problem:

When updating 4.6.9 to 4.7.0-fc.2 the update is stuck as the authentication operator is degraded. Error message indicates that the ProxyConfigController of the authentication operator is blocking the update due to some checks for the oauth noproxy record.

Version-Release number of selected component (if applicable):

UPDATE
FROM OpenShift 4.6.9
TO OpenShift 4.7.0-fc.2

How reproducible:

Currently my no_proxy settings are as follows and worked for a fresh 4.7.0-fc.2 installation as well as a a 4.6.9 installation.

~~~
proxy:
  httpProxy: http://192.168.100.73:3128
  httpsProxy: http://192.168.100.73:3128
  noProxy: "x.x.x.x,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io"
~~~

With x.x.x.x beeing the IP of the server and ocp.ocp-test1.osp16amd.x.x.x.x.nip.io beeing the cluster base domain. 


Steps to Reproduce:
1. Deploy 4.6.9 with install-config.yaml from above
2. Trigger update (disconnected)
3. Verify the auth operator

Actual results:

Operator degraded:

[stack@osp16amd ocp-test1]$ oc get clusteroperator
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-fc.2   True        False         True       157m
baremetal                                  4.7.0-fc.2   True        False         False      10h
cloud-credential                           4.7.0-fc.2   True        False         False      11h
cluster-autoscaler                         4.7.0-fc.2   True        False         False      11h
config-operator                            4.7.0-fc.2   True        False         False      11h
console                                    4.7.0-fc.2   True        False         False      158m
csi-snapshot-controller                    4.7.0-fc.2   True        False         False      10h
dns                                        4.6.9        True        False         False      11h
etcd                                       4.7.0-fc.2   True        False         False      11h
image-registry                             4.7.0-fc.2   True        False         False      11h
ingress                                    4.7.0-fc.2   True        False         False      11h
insights                                   4.7.0-fc.2   True        False         False      11h
kube-apiserver                             4.7.0-fc.2   True        False         False      11h
kube-controller-manager                    4.7.0-fc.2   True        False         False      11h
kube-scheduler                             4.7.0-fc.2   True        False         False      11h
kube-storage-version-migrator              4.7.0-fc.2   True        False         False      11h
machine-api                                4.7.0-fc.2   True        False         False      11h
machine-approver                           4.7.0-fc.2   True        False         False      11h
machine-config                             4.6.9        True        False         False      11h
marketplace                                4.7.0-fc.2   True        False         False      10h
monitoring                                 4.7.0-fc.2   True        False         False      3h42m
network                                    4.7.0-fc.2   True        False         False      10h
node-tuning                                4.7.0-fc.2   True        False         False      10h
openshift-apiserver                        4.7.0-fc.2   True        False         False      158m
openshift-controller-manager               4.7.0-fc.2   True        False         False      11h
openshift-samples                          4.7.0-fc.2   True        False         False      10h
operator-lifecycle-manager                 4.7.0-fc.2   True        False         False      11h
operator-lifecycle-manager-catalog         4.7.0-fc.2   True        False         False      11h
operator-lifecycle-manager-packageserver   4.7.0-fc.2   True        False         False      158m
service-ca                                 4.7.0-fc.2   True        False         False      11h
storage                                    4.7.0-fc.2   True        False         False      10h


[stack@osp16amd ocp-test1]$ oc describe clusteroperator authentication
Name:         authentication
Namespace:    
Labels:       <none>
Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
              include.release.openshift.io/self-managed-high-availability: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-01-16T21:01:08Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:exclude.release.openshift.io/internal-openshift-hosted:
          f:include.release.openshift.io/self-managed-high-availability:
      f:spec:
      f:status:
        .:
        f:extension:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2021-01-16T21:01:08Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:relatedObjects:
        f:versions:
    Manager:         authentication-operator
    Operation:       Update
    Time:            2021-01-16T21:56:46Z
  Resource Version:  200503
  Self Link:         /apis/config.openshift.io/v1/clusteroperators/authentication
  UID:               add58efa-1eb1-45f3-948e-d4a5532ac447
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-01-16T22:12:25Z
    Message:               ProxyConfigControllerDegraded: failed to reach endpoint("https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz") missing in NO_PROXY(".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.150.0/24,x.x.x.x,api-int.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-0.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-1.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-2.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,localhost") with error: Get "https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz": EOF
    Reason:                ProxyConfigController_SyncError
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-01-16T22:11:32Z
    Message:               All is well
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-01-17T06:06:55Z
    Message:               OAuthServerDeploymentAvailable: availableReplicas==2
    Reason:                AsExpected
    Status:                True
    Type:                  Available
    Last Transition Time:  2021-01-16T21:04:33Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:      operator.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   infrastructures
    Group:      config.openshift.io
    Name:       cluster
    Resource:   oauths
    Group:      route.openshift.io
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   routes
    Group:      
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   services
    Group:      
    Name:       openshift-config
    Resource:   namespaces
    Group:      
    Name:       openshift-config-managed
    Resource:   namespaces
    Group:      
    Name:       openshift-authentication
    Resource:   namespaces
    Group:      
    Name:       openshift-authentication-operator
    Resource:   namespaces
    Group:      
    Name:       openshift-ingress
    Resource:   namespaces
    Group:      
    Name:       openshift-oauth-apiserver
    Resource:   namespaces
  Versions:
    Name:     oauth-apiserver
    Version:  4.7.0-fc.2
    Name:     oauth-openshift
    Version:  4.7.0-fc.2_openshift
    Name:     operator
    Version:  4.7.0-fc.2
Events:       <none>

Error log indicates:

E0117 10:05:13.066689       1 base_controller.go:250] "ProxyConfigController" controller failed to sync "key", err: failed to reach endpoint("https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz") missing in NO_PROXY(".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.150.0/24,x.x.x.x,api-int.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-0.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-1.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,etcd-2.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,localhost") with error: Get "https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz": EOF

However curl works

[stack@osp16amd ocp-test1]$ oc exec authentication-operator-95cfc9cbd-bmw8m -- curl -s -q --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io/healthz
ok

Expected results:

Installation succeeds.

Additional info:

After adding the oauth record explicitly, the update continues:

FROM: 

~~~
spec:
  httpProxy: http://192.168.100.73:3128
  httpsProxy: http://192.168.100.73:3128
  noProxy: x.x.x.x,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io
  trustedCA:
    name: user-ca-bundle
~~~

TO: 

~~~
spec:
  httpProxy: http://192.168.100.73:3128
  httpsProxy: http://192.168.100.73:3128
  noProxy: x.x.x.x,apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io,oauth-openshift.apps.ocp.ocp-test1.osp16amd.x.x.x.x.nip.io
  trustedCA:
    name: user-ca-bundle
~~~

However as the original record is "inclusive" for the required record, the ProxyConfigController introduced here https://github.com/openshift/cluster-authentication-operator/blame/f363a21c37613ec5d47fc7dea923afa074c1a617/pkg/operator/starter.go#L394 should probably also accept "subdomain wildcards" and not only explicit records for the oauth proxy.

Comment 3 Prashanth Sundararaman 2021-02-03 15:24:34 UTC
*** Bug 1921018 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2021-02-24 15:53:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633