Bug 2002556
Summary: | Cluster becomes degraded if it can't talk to Manila | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Martin André <m.andre> |
Component: | Storage | Assignee: | Eric Duen <eduen> |
Storage sub component: | OpenStack CSI Drivers | QA Contact: | rlobillo |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | adeshpan, aos-bugs, eduen, juriarte, openshift-bugzilla-robot, pprinett, rlobillo |
Version: | 4.6 | Keywords: | FastFix, Triaged |
Target Milestone: | --- | ||
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 2002555 | Environment: | |
Last Closed: | 2021-09-29 12:06:50 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2002555 | ||
Bug Blocks: |
Comment 1
Martin André
2021-09-15 09:28:32 UTC
Pre-verified on 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest (cluster-bot build for build openshift/csi-driver-manila-operator#127) on on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type. The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance allows all the egress traffic. # Egress rules on the instance where the proxy is running: $ openstack security group rule list --egress installer_host-sg +--------------------------------------+-------------+-----------+-----------+------------+-----------------------+ | ID | IP Protocol | Ethertype | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+-----------+-----------+------------+-----------------------+ | 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None | IPv6 | ::/0 | | None | | 06a10b42-a8f3-4227-b294-bbc5fe6775ca | None | IPv4 | 0.0.0.0/0 | | None | +--------------------------------------+-------------+-----------+-----------+------------+-----------------------+ $ oc get proxy cluster -o json | jq .status { "httpProxy": "http://dummy:dummy@172.16.0.3:3128/", "httpsProxy": "http://dummy:dummy@172.16.0.3:3128/", "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost" } Due to a known limitation, the manila-csi-driver-operator is getting a timeout while reaching OSP endpoints, as it does not have the PROXY env variables: $ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name) sh-4.4$ env | grep -i http KUBERNETES_SERVICE_PORT_HTTPS=443 sh-4.4$ env | grep -i proxy sh-4.4$ # As an example, the access to keystone OSP service is not working inside the operator: sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000 curl: (7) Failed to connect to 10.46.44.10 port 13000: No route to host sh-4.4$ Despite the above, the UPI installation works fine and all cluster operators are available: $ oc get clusteroperators NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 8m8s cloud-credential 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 83m cluster-autoscaler 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m config-operator 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 67m console 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 14m csi-snapshot-controller 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m dns 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m etcd 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 65m image-registry 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 25m ingress 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 24m insights 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 67m kube-apiserver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 61m kube-controller-manager 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m kube-scheduler 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m kube-storage-version-migrator 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 25m machine-api 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 58m machine-approver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 63m machine-config 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 60m marketplace 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m monitoring 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 20m network 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 69m node-tuning 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m openshift-apiserver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m openshift-controller-manager 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 63m openshift-samples 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m operator-lifecycle-manager 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m operator-lifecycle-manager-catalog 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m operator-lifecycle-manager-packageserver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m service-ca 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 67m storage 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 66m Manila is not deployed as stated on the clusteroperator storage: $ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")' { "lastTransitionTime": "2021-09-15T14:51:09Z", "message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot authenticate with given credentials: Get \"https://10.46.44.10:13000/\": dial tcp 10.46.44.10:13000: connect: no route to host", "reason": "AsExpected", "status": "True", "type": "Available" } $ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 53m $ oc get pods -A | grep -i manila openshift-cluster-csi-drivers manila-csi-driver-operator-7d7c4b7b89-mzmxs 1/1 Running 2 53m $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False 7m59s Cluster version is 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest and the cluster is fully operational: $ oc get pods,pvc -n demo NAME READY STATUS RESTARTS AGE pod/app2 1/1 Running 0 80s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/pvc2 Bound pvc-71156835-446a-4f17-84d3-a789f738335a 1Gi RWO sc-test-intree 80s Verified on 4.6.0-0.nightly-2021-09-16-160553 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type. The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance allows all the egress traffic. # Egress rules on the instance where the proxy is running: $ openstack security group rule list --egress installer_host-sg +--------------------------------------+-------------+-----------+-----------+------------+-----------------------+ | ID | IP Protocol | Ethertype | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+-----------+-----------+------------+-----------------------+ | 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None | IPv6 | ::/0 | | None | | 06a10b42-a8f3-4227-b294-bbc5fe6775ca | None | IPv4 | 0.0.0.0/0 | | None | +--------------------------------------+-------------+-----------+-----------+------------+-----------------------+ $ oc get proxy cluster -o json | jq .status { "httpProxy": "http://dummy:dummy@172.16.0.3:3128/", "httpsProxy": "https://dummy:dummy@172.16.0.3:3130/", "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost" } Due to a known limitation, the manila-csi-driver-operator is getting a timeout while reaching OSP endpoints, as it does not have the PROXY env variables: $ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name) sh-4.4$ env | grep -i http SOURCE_GIT_URL=https://github.com/openshift/csi-driver-manila-operator KUBERNETES_SERVICE_PORT_HTTPS=443 sh-4.4$ env | grep -i proxy sh-4.4$ # As an example, the access to keystone OSP service is not working inside the operator: sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000 curl: (7) Failed to connect to 10.46.44.10 port 13000: No route to host Despite the above, the UPI installation works fine and all cluster operators are available: NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.nightly-2021-09-16-160553 True False False 20m cloud-credential 4.6.0-0.nightly-2021-09-16-160553 True False False 63m cluster-autoscaler 4.6.0-0.nightly-2021-09-16-160553 True False False 51m config-operator 4.6.0-0.nightly-2021-09-16-160553 True False False 53m console 4.6.0-0.nightly-2021-09-16-160553 True False False 26m csi-snapshot-controller 4.6.0-0.nightly-2021-09-16-160553 True False False 53m dns 4.6.0-0.nightly-2021-09-16-160553 True False False 51m etcd 4.6.0-0.nightly-2021-09-16-160553 True False False 52m image-registry 4.6.0-0.nightly-2021-09-16-160553 True False False 31m ingress 4.6.0-0.nightly-2021-09-16-160553 True False False 31m insights 4.6.0-0.nightly-2021-09-16-160553 True False False 53m kube-apiserver 4.6.0-0.nightly-2021-09-16-160553 True False False 51m kube-controller-manager 4.6.0-0.nightly-2021-09-16-160553 True False False 50m kube-scheduler 4.6.0-0.nightly-2021-09-16-160553 True False False 49m kube-storage-version-migrator 4.6.0-0.nightly-2021-09-16-160553 True False False 31m machine-api 4.6.0-0.nightly-2021-09-16-160553 True False False 42m machine-approver 4.6.0-0.nightly-2021-09-16-160553 True False False 51m machine-config 4.6.0-0.nightly-2021-09-16-160553 True False False 51m marketplace 4.6.0-0.nightly-2021-09-16-160553 True False False 51m monitoring 4.6.0-0.nightly-2021-09-16-160553 True False False 29m network 4.6.0-0.nightly-2021-09-16-160553 True False False 54m node-tuning 4.6.0-0.nightly-2021-09-16-160553 True False False 53m openshift-apiserver 4.6.0-0.nightly-2021-09-16-160553 True False False 46m openshift-controller-manager 4.6.0-0.nightly-2021-09-16-160553 True False False 44m openshift-samples 4.6.0-0.nightly-2021-09-16-160553 True False False 47m operator-lifecycle-manager 4.6.0-0.nightly-2021-09-16-160553 True False False 51m operator-lifecycle-manager-catalog 4.6.0-0.nightly-2021-09-16-160553 True False False 51m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2021-09-16-160553 True False False 31m service-ca 4.6.0-0.nightly-2021-09-16-160553 True False False 53m storage 4.6.0-0.nightly-2021-09-16-160553 True False False 53m Manila is not deployed as stated on the clusteroperator storage: $ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")' { "lastTransitionTime": "2021-09-17T08:23:46Z", "message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot authenticate with given credentials: Get \"https://10.46.44.10:13000/\": dial tcp 10.46.44.10:13000: connect: no route to host", "reason": "AsExpected", "status": "True", "type": "Available" } $ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 54m $ oc get pods -A | grep -i manila openshift-cluster-csi-drivers manila-csi-driver-operator-6c496dcb95-94stj 1/1 Running 1 54m $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-09-16-160553 True False 19m Cluster version is 4.6.0-0.nightly-2021-09-16-160553 and the cluster is fully operational: $ oc get pods,pvc NAME READY STATUS RESTARTS AGE pod/app2 1/1 Running 0 5m50s pod/demo-7897db69cc-67lvg 1/1 Running 0 31m pod/demo-7897db69cc-7xsgl 1/1 Running 0 31m pod/demo-7897db69cc-s2hk5 1/1 Running 0 31m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/pvc2 Bound pvc-8d54d363-7d32-4cca-aaa7-02f80a611935 1Gi RWO sc-test-intree 14m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.46 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3643 |