Description of problem: It's possible that for one reason or another we're unable to reach the Manila endpoint. In the past we've tried to be smart and handled errors differently, if it's a 404, a 403, or other types of errors. The problem with this approach is that it's very easy to forget valid failure cases. We've had a recent example with proxy setting not correctly propagated to the Manila pod and degrading the cluster. We should instead treat all failures to reach the Manila endpoint as a non fatal error and disable the Manila operator instead of making the cluster degraded.
Verified on OCP 4.10.0-0.nightly-2021-09-09-163608 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type. The IPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance is blocking the egress traffic going to OSP manila endpoint: $ openstack catalog show manila | grep public | | public: https://10.46.44.10:13786/v1/65d84c01ef224b0c9fe8892d43fa804a | # Egress rules on the instance where the proxy is running: $ openstack security group rule list --egress installer_host-sg +--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+ | ID | IP Protocol | Ethertype | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+ | 17a7dccc-d005-4f22-8369-bc511b86ff83 | udp | IPv4 | 0.0.0.0/0 | | None | | 22c83d2c-33f1-401a-b8fe-319628066615 | tcp | IPv4 | 0.0.0.0/0 | 13787:65000 | None | | 45d10ca3-9954-49c1-ad47-abfcc63a0d93 | tcp | IPv4 | 0.0.0.0/0 | 1:13785 | None | | d39faa61-7294-4e7e-8a29-aac757354233 | None | IPv6 | ::/0 | | None | +--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+ This is provoking that the manila-csi-driver-operator is getting a timeout while reaching the manila API, but it is working for the rest (tested with keystone): - manila OSP API is not reachable: $ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name) sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13786/v1/65d84c01ef224b0c9fe8892d43fa804a curl: (28) Operation timed out after 5002 milliseconds with 0 out of 0 bytes received - However, keystone is reachable: sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000 {"versions": {"values": [{"id": "v3.13", "status": "stable", "updated": "2019-07-19T00:00:00Z", "links": [{"rel": "self", "href": "https://10.46.44.10:13000/v3/"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}]}}sh-4.4$ Under these circumstances, the IPI installation works fine: DEBUG Time elapsed per stage: DEBUG : 1m49s DEBUG Bootstrap Complete: 24m4s DEBUG API: 3m26s DEBUG Bootstrap Destroy: 33s DEBUG Cluster Operators: 20m14s INFO Time elapsed: 47m36s All cluster operators are available: $ oc get clusteroperators NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.10.0-0.nightly-2021-09-09-163608 True False False 23m baremetal 4.10.0-0.nightly-2021-09-09-163608 True False False 47m cloud-controller-manager 4.10.0-0.nightly-2021-09-09-163608 True False False 55m cloud-credential 4.10.0-0.nightly-2021-09-09-163608 True False False 62m cluster-autoscaler 4.10.0-0.nightly-2021-09-09-163608 True False False 47m config-operator 4.10.0-0.nightly-2021-09-09-163608 True False False 52m console 4.10.0-0.nightly-2021-09-09-163608 True False False 27m csi-snapshot-controller 4.10.0-0.nightly-2021-09-09-163608 True False False 48m dns 4.10.0-0.nightly-2021-09-09-163608 True False False 47m etcd 4.10.0-0.nightly-2021-09-09-163608 True False False 49m image-registry 4.10.0-0.nightly-2021-09-09-163608 True False False 31m ingress 4.10.0-0.nightly-2021-09-09-163608 True False False 29m insights 4.10.0-0.nightly-2021-09-09-163608 True False False 45m kube-apiserver 4.10.0-0.nightly-2021-09-09-163608 True False False 46m kube-controller-manager 4.10.0-0.nightly-2021-09-09-163608 True False False 48m kube-scheduler 4.10.0-0.nightly-2021-09-09-163608 True False False 48m kube-storage-version-migrator 4.10.0-0.nightly-2021-09-09-163608 True False False 50m machine-api 4.10.0-0.nightly-2021-09-09-163608 True False False 41m machine-approver 4.10.0-0.nightly-2021-09-09-163608 True False False 47m machine-config 4.10.0-0.nightly-2021-09-09-163608 True False False 46m marketplace 4.10.0-0.nightly-2021-09-09-163608 True False False 47m monitoring 4.10.0-0.nightly-2021-09-09-163608 True False False 27m network 4.10.0-0.nightly-2021-09-09-163608 True False False 49m node-tuning 4.10.0-0.nightly-2021-09-09-163608 True False False 47m openshift-apiserver 4.10.0-0.nightly-2021-09-09-163608 True False False 42m openshift-controller-manager 4.10.0-0.nightly-2021-09-09-163608 True False False 47m openshift-samples 4.10.0-0.nightly-2021-09-09-163608 True False False 44m operator-lifecycle-manager 4.10.0-0.nightly-2021-09-09-163608 True False False 49m operator-lifecycle-manager-catalog 4.10.0-0.nightly-2021-09-09-163608 True False False 49m operator-lifecycle-manager-packageserver 4.10.0-0.nightly-2021-09-09-163608 True False False 44m service-ca 4.10.0-0.nightly-2021-09-09-163608 True False False 52m storage 4.10.0-0.nightly-2021-09-09-163608 True False False 43m and Manila is not deployed as stated on the clusteroperator storage: $ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")' { "lastTransitionTime": "2021-09-10T14:42:33Z", "message": "OpenStackCinderCSIDriverOperatorCRAvailable: All is well\nManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot list available share types: Get \"https://10.46.44.10:13786/v2/65d84c01ef224b0c9fe8892d43fa804a/types\": Service Unavailable", "reason": "AsExpected", "status": "True", "type": "Available" } $ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 65m standard-csi cinder.csi.openstack.org Delete WaitForFirstConsumer true 63m $ oc get pods -A | grep -i manila openshift-cluster-csi-drivers manila-csi-driver-operator-66d4476d74-x9bs2 1/1 Running 0 64m $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-09-09-163608 True False 40m Cluster version is 4.10.0-0.nightly-2021-09-09-163608
Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056