Bug 1935528 - [AWS][Proxy] ingress reports degrade with CanaryChecksSucceeding=False in the cluster with proxy setting
Summary: [AWS][Proxy] ingress reports degrade with CanaryChecksSucceeding=False in the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Stephen Greene
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks: 1936093
TreeView+ depends on / blocked
 
Reported: 2021-03-05 03:30 UTC by Hongan Li
Modified: 2021-07-27 22:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The fix for Bug 1932401 overrides the default go HTTP client transport. Consequence: Cluster-wide proxy settings are not plumbed through to the ingress operator pod. Canary tests mail fail on a cluster with a cluster-wide egress proxy. Fix: Explicitly set proxy settings in canary client's HTTP transport. Result: Canary checks work with any cluster-wide proxies.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:51:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 565 0 None open Bug 1935528: Canary: Use cluster-wide proxy for canary client 2021-03-05 15:06:33 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:51:41 UTC

Description Hongan Li 2021-03-05 03:30:27 UTC
Description of problem:
ingress reports degrade with CanaryChecksSucceeding=False in the cluster with proxy setting

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-03-04-093123

profile: aos-4_7/ipi-on-aws/versioned-installer-customer_vpc-http_proxy-multiblockdevices-fips-ovn-ipsec-c

How reproducible:
always

Steps to Reproduce:
1. install cluster with the profile: aos-4_7/ipi-on-aws/versioned-installer-customer_vpc-http_proxy-multiblockdevices-fips-ovn-ipsec-ci


Actual results:
co/ingress reports degrade after installation

$ oc get co/ingress
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
ingress   4.7.0-0.nightly-2021-03-04-093123   True        False         True       91m

$ oc get co/ingress -oyaml
<---snip--->
spec: {}
status:
  conditions:
  - lastTransitionTime: "2021-03-05T01:48:49Z"
    message: desired and current number of IngressControllers are equal
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2021-03-05T01:48:49Z"
    message: desired and current number of IngressControllers are equal
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2021-03-05T01:50:07Z"
    message: 'Some ingresscontrollers are degraded: ingresscontroller "default" is
      degraded: DegradedConditions: One or more other status conditions indicate a
      degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures:
      Canary route checks for the default ingress controller are failing)'
    reason: IngressControllersDegraded
    status: "True"
    type: Degraded

$ oc -n openshift-ingress-operator rsh ingress-operator-6866b54f8b-npcx2
Defaulting container name to ingress-operator.
Use 'oc describe pod/ingress-operator-6866b54f8b-npcx2 -n openshift-ingress-operator' to see all of the containers in this pod.
sh-4.4$ 
sh-4.4$ env | grep -i proxy
HTTP_PROXY=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-18-188-209-51.us-east-2.compute.amazonaws.com:3128
NO_PROXY=.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.zhsungre1.qe.devcluster.openshift.com,localhost,test.no-proxy.com
HTTPS_PROXY=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-18-188-209-51.us-east-2.compute.amazonaws.com:3128
sh-4.4$ 
sh-4.4$ curl http://canary-openshift-ingress-canary.apps.zhsungre1.qe.devcluster.openshift.com/ -v
* Uses proxy env variable NO_PROXY == '.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.zhsungre1.qe.devcluster.openshift.com,localhost,test.no-proxy.com'
*   Trying 3.18.114.75...
* TCP_NODELAY set

------->>>>>>> (   time out eventually )



sh-4.4$ curl https://canary-openshift-ingress-canary.apps.zhsungre1.qe.devcluster.openshift.com/ -v
* Uses proxy env variable NO_PROXY == '.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.zhsungre1.qe.devcluster.openshift.com,localhost,test.no-proxy.com'
* Uses proxy env variable HTTPS_PROXY == 'http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-18-188-209-51.us-east-2.compute.amazonaws.com:3128'
*   Trying 10.0.3.228...
* TCP_NODELAY set
* Connected to ec2-18-188-209-51.us-east-2.compute.amazonaws.com (10.0.3.228) port 3128 (#0)



Expected results:
co/ingress should not report degraded in such cluster

Additional info:
seems "curl http://canary..." didn't use the proxy and time out, but "curl https://canary..." used the proxy and can work

Comment 2 Hongan Li 2021-03-05 08:41:35 UTC
Tested same version on Azure/GCP with proxy setting, but cannot reproduce the issue.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-03-04-093123   True        False         28m     Cluster version is 4.7.0-0.nightly-2021-03-04-093123

$ oc get infrastructures.config.openshift.io cluster -oyaml
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: GCP
status:
  apiServerInternalURI: https://api-int.hongli-gcpxy.qe.gcp.devcluster.openshift.com:6443
  apiServerURL: https://api.hongli-gcpxy.qe.gcp.devcluster.openshift.com:6443
  etcdDiscoveryDomain: ""
  infrastructureName: hongli-gcpxy-dchd6
  platform: GCP
  platformStatus:
    gcp:
      projectID: openshift-qe
      region: us-central1
    type: GCP

$ oc get proxies.config.openshift.io cluster -oyaml
spec:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
  noProxy: test.no-proxy.com
  trustedCA:
    name: ""
status:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
  noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.hongli-gcpxy.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com

$ oc -n openshift-ingress-canary get route
NAME     HOST/PORT                                                                           PATH   SERVICES         PORT   TERMINATION     WILDCARD
canary   canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com          ingress-canary   8080   edge/Redirect   None


$ oc -n openshift-ingress-operator rsh ingress-operator-58c955ffd4-tkxnr
Defaulting container name to ingress-operator.
Use 'oc describe pod/ingress-operator-58c955ffd4-tkxnr -n openshift-ingress-operator' to see all of the containers in this pod.
sh-4.4$ 
sh-4.4$ 
sh-4.4$ env | grep -i proxy
HTTP_PROXY=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.hongli-gcpxy.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com
HTTPS_PROXY=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
sh-4.4$ 
sh-4.4$ curl http://canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com -v -kL
* Rebuilt URL to: http://canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com/
* Uses proxy env variable NO_PROXY == '.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.hongli-gcpxy.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com'
*   Trying 130.211.196.23...
* TCP_NODELAY set
* Connected to canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com (130.211.196.23) port 80 (#0)
> GET / HTTP/1.1
> Host: canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Content-length: 0
< Location: https://canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com/
< 
* Connection #0 to host canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com left intact
* Issue another request to this URL: 'https://canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com/'
* Uses proxy env variable NO_PROXY == '.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.hongli-gcpxy.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com'
* Uses proxy env variable HTTPS_PROXY == 'http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128'
*   Trying 10.0.0.2...
* TCP_NODELAY set
* Connected to 10.0.0.2 (10.0.0.2) port 3128 (#1)
* allocate connect buffer!
* Establish HTTP proxy tunnel to canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com:443
* Proxy auth using Basic with user 'proxy-user1'
> CONNECT canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com:443 HTTP/1.1
> Host: canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com:443
> Proxy-Authorization: Basic cHJveHktdXNlcjE6SllnVThxUlpWNERZNFBYSmJ4Sks=
> User-Agent: curl/7.61.1
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 200 Connection established
< 
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CONNECT phase completed!
* CONNECT phase completed!
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com
*  start date: Mar  5 07:42:38 2021 GMT
*  expire date: Mar  5 07:42:39 2023 GMT
*  issuer: CN=ingress-operator@1614930157
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/1.1
> Host: canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200 OK
< X-Request-Port: 8080
< Date: Fri, 05 Mar 2021 08:39:04 GMT
< Content-Length: 17
< Content-Type: text/plain; charset=utf-8
< Set-Cookie: c6e529a6ab19a530fd4f1cceb91c08a9=79d2a53e9db507e1f33d020ea2b01580; path=/; HttpOnly; Secure; SameSite=None
< Cache-control: private
< 
Hello OpenShift!
* Connection #1 to host 10.0.0.2 left intact
sh-4.4$ 
sh-4.4$ 
sh-4.4$ curl https://canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com -v -kL
* Rebuilt URL to: https://canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com/
* Uses proxy env variable NO_PROXY == '.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.hongli-gcpxy.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com'
* Uses proxy env variable HTTPS_PROXY == 'http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128'
*   Trying 10.0.0.2...
* TCP_NODELAY set
* Connected to 10.0.0.2 (10.0.0.2) port 3128 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com:443
* Proxy auth using Basic with user 'proxy-user1'
> CONNECT canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com:443 HTTP/1.1
> Host: canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com:443
> Proxy-Authorization: Basic cHJveHktdXNlcjE6SllnVThxUlpWNERZNFBYSmJ4Sks=
> User-Agent: curl/7.61.1
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 200 Connection established
< 
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CONNECT phase completed!
* CONNECT phase completed!
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=*.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com
*  start date: Mar  5 07:42:38 2021 GMT
*  expire date: Mar  5 07:42:39 2023 GMT
*  issuer: CN=ingress-operator@1614930157
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/1.1
> Host: canary-openshift-ingress-canary.apps.hongli-gcpxy.qe.gcp.devcluster.openshift.com
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200 OK
< X-Request-Port: 8080
< Date: Fri, 05 Mar 2021 08:39:14 GMT
< Content-Length: 17
< Content-Type: text/plain; charset=utf-8
< Set-Cookie: c6e529a6ab19a530fd4f1cceb91c08a9=46e9977a9255b48ae1413425f8ae159f; path=/; HttpOnly; Secure; SameSite=None
< Cache-control: private
< 
Hello OpenShift!
* Connection #0 to host 10.0.0.2 left intact
sh-4.4$

Comment 5 Hongan Li 2021-03-08 11:18:33 UTC
Thank you for your detailed explanation. @Stephen

verified with 4.8.0-0.nightly-2021-03-08-053437 and passed.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-03-08-053437   True        False         47m     Cluster version is 4.8.0-0.nightly-2021-03-08-053437

$ oc get co/ingress
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
ingress   4.8.0-0.nightly-2021-03-08-053437   True        False         False      61m

$ oc get proxy cluster -oyaml
<---spec:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-142-53-28.us-east-2.compute.amazonaws.com:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-142-53-28.us-east-2.compute.amazonaws.com:3128
  noProxy: test.no-proxy.com
  trustedCA:
    name: ""
status:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-142-53-28.us-east-2.compute.amazonaws.com:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-142-53-28.us-east-2.compute.amazonaws.com:3128
  noProxy: .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.hongli-np.qe.devcluster.openshift.com,localhost,test.no-proxy.comsnip--->

Comment 8 errata-xmlrpc 2021-07-27 22:51:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.