Bug 2094562
| Summary: | Cluster wide proxy values passed to Update service operator pod, but not operand pod | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chad Crum <ccrum> |
| Component: | OpenShift Update Service | Assignee: | Pratik Mahajan <pmahajan> |
| OpenShift Update Service sub component: | operand | QA Contact: | Yang Yang <yanyang> |
| Status: | CLOSED ERRATA | Docs Contact: | Kathryn Alexander <kalexand> |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-team-ota, ccrum, jiajliu, lmohanty, pmahajan, yanyang |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-03 11:50:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Reproducing it with osus4.9.1 on ocp 4.10.18
Steps to reproduce:
1. Install an 4.10 OCP cluster with proxy enabled (w/o CA)
# oc get proxy -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
creationTimestamp: "2022-06-24T06:45:50Z"
generation: 1
name: cluster
resourceVersion: "624"
uid: ec2ba7df-a596-4c9c-8fc3-13b2f05a20db
spec:
httpProxy: http://$user:pass@$ip:3128
httpsProxy: http://$user:pass@$ip:3128
noProxy: test.no-proxy.com
trustedCA:
name: ""
status:
httpProxy: http://$user:pass@$ip:3128
httpsProxy: http://$user:pass@$ip:3128
noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.yanyang-0624b.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com
kind: List
metadata:
resourceVersion: ""
2. Install osus4.9.1 on the cluster
# oc get pod
NAME READY STATUS RESTARTS AGE
sample-64f5b54867-kcfxc 1/2 Running 0 8m2s
sample-ff75f8c58-c4wq5 1/2 Running 0 12m
updateservice-operator-645c68cc7c-g8vcc 1/1 Running 0 17m
Graph builder is not ready.
# oc logs pod/sample-64f5b54867-kcfxc
Defaulted container "graph-builder" out of: graph-builder, policy-engine, graph-data (init)
[2022-06-24T07:44:24Z DEBUG graph_builder] application settings:
AppSettings {
address: ::,
credentials_path: None,
mandatory_client_parameters: {},
manifestref_key: "io.openshift.upgrades.graph.release.manifestref",
path_prefix: "",
pause_secs: 300s,
scrape_timeout_secs: None,
port: 8080,
registry: "quay.io",
repository: "openshift-release-dev/ocp-release",
status_address: ::,
status_port: 9080,
verbosity: Trace,
fetch_concurrency: 16,
metrics_required: {
"graph_upstream_raw_releases",
},
plugin_settings: [
ReleaseScrapeDockerv2Settings {
registry: "quay.io",
repository: "openshift-release-dev/ocp-release",
manifestref_key: "io.openshift.upgrades.graph.release.manifestref",
fetch_concurrency: 16,
username: None,
password: None,
credentials_path: Some(
"/var/lib/cincinnati/registry-credentials/.dockerconfigjson",
),
},
OpenshiftSecondaryMetadataParserSettings {
data_directory: "/var/lib/cincinnati/graph-data",
key_prefix: "io.openshift.upgrades.graph",
default_arch: "amd64",
disallowed_errors: {},
},
EdgeAddRemovePlugin {
key_prefix: "io.openshift.upgrades.graph",
remove_all_edges_value: "*",
remove_consumed_metadata: false,
},
],
tracing_endpoint: None,
}
[2022-06-24T07:44:24Z DEBUG graph_builder::graph] graph update triggered
[2022-06-24T07:44:24Z TRACE cincinnati::plugins] Running next plugin 'release-scrape-dockerv2'
Graph builder hang.
Checking the proxy setting in the osus.
Well it's passed to operator
# oc get pod/updateservice-operator-645c68cc7c-g8vcc -oyaml | grep -i proxy
- name: HTTP_PROXY
value: http://$user:pass@$ip:3128
- name: HTTPS_PROXY
value: http://$user:pass@$ip:3128
- name: NO_PROXY
value: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.yanyang-0624b.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com
Whereas, proxy is not passed to operand
# oc get pod/sample-64f5b54867-kcfxc -oyaml | grep -i proxy
It's reproduced.
Verifying with osus5.0.0 on ocp4.10.20
cincinnati-container-v5.0.0-4
cincinnati-operator-bundle-container-v5.0.0-3
cincinnati-operator-container-v5.0.0-4
Steps to reproduce:
1. Install an 4.10 OCP cluster with proxy enabled (w/o CA)
# oc get proxy -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
creationTimestamp: "2022-06-28T02:44:05Z"
generation: 1
name: cluster
resourceVersion: "618"
uid: 9e901142-95e1-499e-931d-c0d124aa607d
spec:
httpProxy: http://$user:$pass@$ip:3128
httpsProxy: http://$user:$pass@$ip:3128
noProxy: test.no-proxy.com
trustedCA:
name: ""
status:
httpProxy: http://$user:$pass@$ip:3128
httpsProxy: http://$user:$pass@$ip:3128
noProxy: test.no-proxy.com,$ips
kind: List
metadata:
resourceVersion: ""
2. Install osus5.0.0 on the cluster
# oc get all
NAME READY STATUS RESTARTS AGE
pod/sample-574c488d4f-lxvfj 2/2 Running 0 3m54s
pod/updateservice-operator-54f7686cb-mdms8 1/1 Running 0 118m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/sample-graph-builder ClusterIP 172.30.236.196 <none> 8080/TCP,9080/TCP 31m
service/sample-policy-engine ClusterIP 172.30.41.29 <none> 80/TCP,9081/TCP 31m
service/updateservice-operator-metrics ClusterIP 172.30.161.56 <none> 8443/TCP 118m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/sample 1/1 1 1 31m
deployment.apps/updateservice-operator 1/1 1 1 118m
NAME DESIRED CURRENT READY AGE
replicaset.apps/sample-574c488d4f 1 1 1 3m54s
replicaset.apps/sample-86ccf6d94b 0 0 0 31m
replicaset.apps/updateservice-operator-54f7686cb 1 1 1 118m
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/sample-route sample-route-openshift-update-service.apps.yanyang-0628a.qe.gcp.devcluster.openshift.com sample-policy-engine policy-engine edge/None None
# oc get pod/sample-574c488d4f-lxvfj -oyaml | grep -i proxy
- name: HTTP_PROXY
value: http://$user:$pass@$ip:3128
- name: HTTPS_PROXY
value: http://$user:$pass@$ip:3128
- name: NO_PROXY
value: test.no-proxy.com,$ips
3. Check the graph url works
# curl -sk --header Accept:application/json --output /dev/null --write-out "%{http_code}" "${POLICY_ENGINE_GRAPH_URI}?channel=stable-4.10"
200
4. Configure cluster to use the local osus
# oc adm upgrade
Cluster version is 4.10.20
Upstream: https://sample-route-openshift-update-service.apps.yanyang-0628a.qe.gcp.devcluster.openshift.com/api/upgrades_info/v1/graph
Channel: candidate-4.11 (available channels: candidate-4.10, candidate-4.11)
No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and may result in downtime or data loss.
Moving it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHEA: OSUS Enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:5871 |
Description of problem (please be detailed as possible and provide log snippests): Installed a 4.10 IPI BM cluster (ipv4 connected env) with cluster-wide proxy, then tried to deploy Update Service operator + operand. Update Service operator automatically had the cluster-wide proxy values added as environment vars for the operator pod, but the update service operand did not have proxy values added, causing the graph builder to hang trying to query the image registry. # Graph builder hangs here oc logs -f test2-597cbfbd8-mblhr graph-builder tracing_endpoint: None, } [2022-06-07T21:50:42Z DEBUG graph_builder::graph] graph update triggered [2022-06-07T21:50:42Z TRACE cincinnati::plugins] Running next plugin 'release-scrape-dockerv2' Scaling back the operator deployment and adding cluster wide proxy env vars to the operand deployment allows the operand to full deploy / graph builder to properly query through the proxy. Version of all relevant components (if applicable): update-service-operator.v4.9.1 OpenShift Update Service 4.9.1 4.10.0-0.nightly-2022-06-06-184250 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? This was discovered when trying to configure a customer. Customer has a connected "hub" cluster (connected via proxy) that runs the update service and disconnected spoke clusters. Is there any workaround available to the best of your knowledge? Yes - scale back operator deployment and manually add proxy vars that are set in the operator deployment in the operand deployment. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: No Steps to Reproduce: 1. Must have connected ipv4 ipi hub cluster deployed via cluster wide proxy 2. Deploy update service operator via operatorhub 3. Create update service operand with the following spec: spec: graphDataImage: quay.io/chadcrum0/cincinnati-graph-data-container:latest releases: quay.io/openshift-release-dev/ocp-release Actual results: # Graph builder hangs here oc logs -f test2-597cbfbd8-mblhr graph-builder tracing_endpoint: None, } [2022-06-07T21:50:42Z DEBUG graph_builder::graph] graph update triggered [2022-06-07T21:50:42Z TRACE cincinnati::plugins] Running next plugin 'release-scrape-dockerv2' Expected results: # Graph builder properly queries image registry api via proxy Additional info: The spec is using public quay.io for graph data image and releases in this scenario - This is an initial attempt to get everything working (Without added complexity of certs). I expect the next step to use an internal mirror registry for both values. However this seems like a valid scenario, even if using a proxy to get to an internal registry.