Bug 2002556
| Summary: | Cluster becomes degraded if it can't talk to Manila | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Martin André <m.andre> |
| Component: | Storage | Assignee: | Eric Duen <eduen> |
| Storage sub component: | OpenStack CSI Drivers | QA Contact: | rlobillo |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | adeshpan, aos-bugs, eduen, juriarte, openshift-bugzilla-robot, pprinett, rlobillo |
| Version: | 4.6 | Keywords: | FastFix, Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.6.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2002555 | Environment: | |
| Last Closed: | 2021-09-29 12:06:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2002555 | ||
| Bug Blocks: | |||
|
Comment 1
Martin André
2021-09-15 09:28:32 UTC
Pre-verified on 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest (cluster-bot build for build openshift/csi-driver-manila-operator#127)
on on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type.
The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance allows all the egress traffic.
# Egress rules on the instance where the proxy is running:
$ openstack security group rule list --egress installer_host-sg
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| ID | IP Protocol | Ethertype | IP Range | Port Range | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None | IPv6 | ::/0 | | None |
| 06a10b42-a8f3-4227-b294-bbc5fe6775ca | None | IPv4 | 0.0.0.0/0 | | None |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
$ oc get proxy cluster -o json | jq .status
{
"httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
"httpsProxy": "http://dummy:dummy@172.16.0.3:3128/",
"noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost"
}
Due to a known limitation, the manila-csi-driver-operator is getting a timeout while reaching OSP endpoints, as it does not have the PROXY env variables:
$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name)
sh-4.4$ env | grep -i http
KUBERNETES_SERVICE_PORT_HTTPS=443
sh-4.4$ env | grep -i proxy
sh-4.4$
# As an example, the access to keystone OSP service is not working inside the operator:
sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000
curl: (7) Failed to connect to 10.46.44.10 port 13000: No route to host
sh-4.4$
Despite the above, the UPI installation works fine and all cluster operators are available:
$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 8m8s
cloud-credential 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 83m
cluster-autoscaler 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
config-operator 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 67m
console 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 14m
csi-snapshot-controller 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
dns 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
etcd 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 65m
image-registry 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 25m
ingress 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 24m
insights 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 67m
kube-apiserver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 61m
kube-controller-manager 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m
kube-scheduler 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m
kube-storage-version-migrator 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 25m
machine-api 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 58m
machine-approver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 63m
machine-config 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 60m
marketplace 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
monitoring 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 20m
network 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 69m
node-tuning 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
openshift-apiserver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m
openshift-controller-manager 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 63m
openshift-samples 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m
operator-lifecycle-manager 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
operator-lifecycle-manager-catalog 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 62m
operator-lifecycle-manager-packageserver 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 59m
service-ca 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 67m
storage 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False False 66m
Manila is not deployed as stated on the clusteroperator storage:
$ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")'
{
"lastTransitionTime": "2021-09-15T14:51:09Z",
"message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot authenticate with given credentials: Get \"https://10.46.44.10:13000/\": dial tcp 10.46.44.10:13000: connect: no route to host",
"reason": "AsExpected",
"status": "True",
"type": "Available"
}
$ oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 53m
$ oc get pods -A | grep -i manila
openshift-cluster-csi-drivers manila-csi-driver-operator-7d7c4b7b89-mzmxs 1/1 Running 2 53m
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest True False 7m59s Cluster version is 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest
and the cluster is fully operational:
$ oc get pods,pvc -n demo
NAME READY STATUS RESTARTS AGE
pod/app2 1/1 Running 0 80s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pvc2 Bound pvc-71156835-446a-4f17-84d3-a789f738335a 1Gi RWO sc-test-intree 80s
Verified on 4.6.0-0.nightly-2021-09-16-160553 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type.
The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance allows all the egress traffic.
# Egress rules on the instance where the proxy is running:
$ openstack security group rule list --egress installer_host-sg
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| ID | IP Protocol | Ethertype | IP Range | Port Range | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None | IPv6 | ::/0 | | None |
| 06a10b42-a8f3-4227-b294-bbc5fe6775ca | None | IPv4 | 0.0.0.0/0 | | None |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
$ oc get proxy cluster -o json | jq .status
{
"httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
"httpsProxy": "https://dummy:dummy@172.16.0.3:3130/",
"noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost"
}
Due to a known limitation, the manila-csi-driver-operator is getting a timeout while reaching OSP endpoints, as it does not have the PROXY env variables:
$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name)
sh-4.4$ env | grep -i http
SOURCE_GIT_URL=https://github.com/openshift/csi-driver-manila-operator
KUBERNETES_SERVICE_PORT_HTTPS=443
sh-4.4$ env | grep -i proxy
sh-4.4$
# As an example, the access to keystone OSP service is not working inside the operator:
sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000
curl: (7) Failed to connect to 10.46.44.10 port 13000: No route to host
Despite the above, the UPI installation works fine and all cluster operators are available:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.6.0-0.nightly-2021-09-16-160553 True False False 20m
cloud-credential 4.6.0-0.nightly-2021-09-16-160553 True False False 63m
cluster-autoscaler 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
config-operator 4.6.0-0.nightly-2021-09-16-160553 True False False 53m
console 4.6.0-0.nightly-2021-09-16-160553 True False False 26m
csi-snapshot-controller 4.6.0-0.nightly-2021-09-16-160553 True False False 53m
dns 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
etcd 4.6.0-0.nightly-2021-09-16-160553 True False False 52m
image-registry 4.6.0-0.nightly-2021-09-16-160553 True False False 31m
ingress 4.6.0-0.nightly-2021-09-16-160553 True False False 31m
insights 4.6.0-0.nightly-2021-09-16-160553 True False False 53m
kube-apiserver 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
kube-controller-manager 4.6.0-0.nightly-2021-09-16-160553 True False False 50m
kube-scheduler 4.6.0-0.nightly-2021-09-16-160553 True False False 49m
kube-storage-version-migrator 4.6.0-0.nightly-2021-09-16-160553 True False False 31m
machine-api 4.6.0-0.nightly-2021-09-16-160553 True False False 42m
machine-approver 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
machine-config 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
marketplace 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
monitoring 4.6.0-0.nightly-2021-09-16-160553 True False False 29m
network 4.6.0-0.nightly-2021-09-16-160553 True False False 54m
node-tuning 4.6.0-0.nightly-2021-09-16-160553 True False False 53m
openshift-apiserver 4.6.0-0.nightly-2021-09-16-160553 True False False 46m
openshift-controller-manager 4.6.0-0.nightly-2021-09-16-160553 True False False 44m
openshift-samples 4.6.0-0.nightly-2021-09-16-160553 True False False 47m
operator-lifecycle-manager 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
operator-lifecycle-manager-catalog 4.6.0-0.nightly-2021-09-16-160553 True False False 51m
operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2021-09-16-160553 True False False 31m
service-ca 4.6.0-0.nightly-2021-09-16-160553 True False False 53m
storage 4.6.0-0.nightly-2021-09-16-160553 True False False 53m
Manila is not deployed as stated on the clusteroperator storage:
$ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")'
{
"lastTransitionTime": "2021-09-17T08:23:46Z",
"message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot authenticate with given credentials: Get \"https://10.46.44.10:13000/\": dial tcp 10.46.44.10:13000: connect: no route to host",
"reason": "AsExpected",
"status": "True",
"type": "Available"
}
$ oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) kubernetes.io/cinder Delete WaitForFirstConsumer true 54m
$ oc get pods -A | grep -i manila
openshift-cluster-csi-drivers manila-csi-driver-operator-6c496dcb95-94stj 1/1 Running 1 54m
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2021-09-16-160553 True False 19m Cluster version is 4.6.0-0.nightly-2021-09-16-160553
and the cluster is fully operational:
$ oc get pods,pvc
NAME READY STATUS RESTARTS AGE
pod/app2 1/1 Running 0 5m50s
pod/demo-7897db69cc-67lvg 1/1 Running 0 31m
pod/demo-7897db69cc-7xsgl 1/1 Running 0 31m
pod/demo-7897db69cc-s2hk5 1/1 Running 0 31m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pvc2 Bound pvc-8d54d363-7d32-4cca-aaa7-02f80a611935 1Gi RWO sc-test-intree 14m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.46 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3643 |