Bug 2002556

Summary: Cluster becomes degraded if it can't talk to Manila
Product: OpenShift Container Platform Reporter: Martin André <m.andre>
Component: StorageAssignee: Eric Duen <eduen>
Storage sub component: OpenStack CSI Drivers QA Contact: rlobillo
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: adeshpan, aos-bugs, eduen, juriarte, openshift-bugzilla-robot, pprinett, rlobillo
Version: 4.6Keywords: FastFix, Triaged
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2002555 Environment:
Last Closed: 2021-09-29 12:06:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2002555    
Bug Blocks:    

Comment 1 Martin André 2021-09-15 09:28:32 UTC
*** Bug 2000097 has been marked as a duplicate of this bug. ***

Comment 2 rlobillo 2021-09-15 16:04:50 UTC
Pre-verified on 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest (cluster-bot build for build openshift/csi-driver-manila-operator#127)
on  on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type.


The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance allows all the egress traffic.

# Egress rules on the instance where the proxy is running:
$ openstack security group rule list --egress installer_host-sg
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None        | IPv6      | ::/0      |            | None                  |
| 06a10b42-a8f3-4227-b294-bbc5fe6775ca | None        | IPv4      | 0.0.0.0/0 |            | None                  |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+


$ oc get proxy cluster  -o json | jq .status
{
  "httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "httpsProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost"
}

Due to a known limitation, the manila-csi-driver-operator is getting a timeout while reaching OSP endpoints, as it does not have the PROXY env variables:

$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name)
sh-4.4$ env | grep -i http
KUBERNETES_SERVICE_PORT_HTTPS=443
sh-4.4$ env | grep -i proxy
sh-4.4$ 

# As an example, the access to keystone OSP service is not working inside the operator:

sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000
curl: (7) Failed to connect to 10.46.44.10 port 13000: No route to host
sh-4.4$ 


Despite the above, the UPI installation works fine and all cluster operators are available:

$ oc get clusteroperators
NAME                                       VERSION                                                  AVAILABLE   PROGRESSING   DEGRADED   SINCE                                               
authentication                             4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      8m8s                                                
cloud-credential                           4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      83m                                                 
cluster-autoscaler                         4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
config-operator                            4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      67m                                                 
console                                    4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      14m                                                 
csi-snapshot-controller                    4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
dns                                        4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
etcd                                       4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      65m                                                 
image-registry                             4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      25m                                                 
ingress                                    4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      24m                                                 
insights                                   4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      67m                                                 
kube-apiserver                             4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      61m                                                 
kube-controller-manager                    4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      59m                                                 
kube-scheduler                             4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      59m                                                 
kube-storage-version-migrator              4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      25m                                                 
machine-api                                4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      58m                                                 
machine-approver                           4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      63m                                                 
machine-config                             4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      60m                                                 
marketplace                                4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
monitoring                                 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      20m                                                 
network                                    4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      69m                                                 
node-tuning                                4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
openshift-apiserver                        4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      59m                                                 
openshift-controller-manager               4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      63m                                                 
openshift-samples                          4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      59m                                                 
operator-lifecycle-manager                 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
operator-lifecycle-manager-catalog         4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      62m                                                 
operator-lifecycle-manager-packageserver   4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      59m                                                 
service-ca                                 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      67m                                                 
storage                                    4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         False      66m    

Manila is not deployed as stated on the clusteroperator storage:

$ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")'
{
  "lastTransitionTime": "2021-09-15T14:51:09Z",
  "message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot authenticate with given credentials: Get \"https://10.46.44.10:13000/\": dial tcp 10.46.44.10:13000: connect: no route to host",
  "reason": "AsExpected",
  "status": "True",
  "type": "Available"
}

$ oc get sc
NAME                 PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   kubernetes.io/cinder   Delete          WaitForFirstConsumer   true                   53m

$ oc get pods -A | grep -i manila
openshift-cluster-csi-drivers                      manila-csi-driver-operator-7d7c4b7b89-mzmxs               1/1     Running     2          53m

$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest   True        False         7m59s   Cluster version is 4.6.0-0.ci.test-2021-09-15-140334-ci-ln-3jvdc2t-latest

and the cluster is fully operational:

$ oc get pods,pvc -n demo
NAME       READY   STATUS    RESTARTS   AGE
pod/app2   1/1     Running   0          80s

NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
persistentvolumeclaim/pvc2   Bound    pvc-71156835-446a-4f17-84d3-a789f738335a   1Gi        RWO            sc-test-intree   80s

Comment 4 rlobillo 2021-09-17 09:37:14 UTC
Verified on 4.6.0-0.nightly-2021-09-16-160553 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type.

The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance allows all the egress traffic.

# Egress rules on the instance where the proxy is running:
$ openstack security group rule list --egress installer_host-sg
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+
| 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None        | IPv6      | ::/0      |            | None                  |
| 06a10b42-a8f3-4227-b294-bbc5fe6775ca | None        | IPv4      | 0.0.0.0/0 |            | None                  |
+--------------------------------------+-------------+-----------+-----------+------------+-----------------------+


$  oc get proxy cluster  -o json | jq .status
{
  "httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "httpsProxy": "https://dummy:dummy@172.16.0.3:3130/",
  "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost"
}


Due to a known limitation, the manila-csi-driver-operator is getting a timeout while reaching OSP endpoints, as it does not have the PROXY env variables:

$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name)
sh-4.4$ env | grep -i http
SOURCE_GIT_URL=https://github.com/openshift/csi-driver-manila-operator
KUBERNETES_SERVICE_PORT_HTTPS=443
sh-4.4$ env | grep -i proxy
sh-4.4$ 

# As an example, the access to keystone OSP service is not working inside the operator:

sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000
curl: (7) Failed to connect to 10.46.44.10 port 13000: No route to host


Despite the above, the UPI installation works fine and all cluster operators are available:

NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.nightly-2021-09-16-160553   True        False         False      20m
cloud-credential                           4.6.0-0.nightly-2021-09-16-160553   True        False         False      63m
cluster-autoscaler                         4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
config-operator                            4.6.0-0.nightly-2021-09-16-160553   True        False         False      53m
console                                    4.6.0-0.nightly-2021-09-16-160553   True        False         False      26m
csi-snapshot-controller                    4.6.0-0.nightly-2021-09-16-160553   True        False         False      53m
dns                                        4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
etcd                                       4.6.0-0.nightly-2021-09-16-160553   True        False         False      52m
image-registry                             4.6.0-0.nightly-2021-09-16-160553   True        False         False      31m
ingress                                    4.6.0-0.nightly-2021-09-16-160553   True        False         False      31m
insights                                   4.6.0-0.nightly-2021-09-16-160553   True        False         False      53m
kube-apiserver                             4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
kube-controller-manager                    4.6.0-0.nightly-2021-09-16-160553   True        False         False      50m
kube-scheduler                             4.6.0-0.nightly-2021-09-16-160553   True        False         False      49m
kube-storage-version-migrator              4.6.0-0.nightly-2021-09-16-160553   True        False         False      31m
machine-api                                4.6.0-0.nightly-2021-09-16-160553   True        False         False      42m
machine-approver                           4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
machine-config                             4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
marketplace                                4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
monitoring                                 4.6.0-0.nightly-2021-09-16-160553   True        False         False      29m
network                                    4.6.0-0.nightly-2021-09-16-160553   True        False         False      54m
node-tuning                                4.6.0-0.nightly-2021-09-16-160553   True        False         False      53m
openshift-apiserver                        4.6.0-0.nightly-2021-09-16-160553   True        False         False      46m
openshift-controller-manager               4.6.0-0.nightly-2021-09-16-160553   True        False         False      44m
openshift-samples                          4.6.0-0.nightly-2021-09-16-160553   True        False         False      47m
operator-lifecycle-manager                 4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
operator-lifecycle-manager-catalog         4.6.0-0.nightly-2021-09-16-160553   True        False         False      51m
operator-lifecycle-manager-packageserver   4.6.0-0.nightly-2021-09-16-160553   True        False         False      31m
service-ca                                 4.6.0-0.nightly-2021-09-16-160553   True        False         False      53m
storage                                    4.6.0-0.nightly-2021-09-16-160553   True        False         False      53m

Manila is not deployed as stated on the clusteroperator storage:

$ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")'
{
  "lastTransitionTime": "2021-09-17T08:23:46Z",
  "message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot authenticate with given credentials: Get \"https://10.46.44.10:13000/\": dial tcp 10.46.44.10:13000: connect: no route to host",
  "reason": "AsExpected",
  "status": "True",
  "type": "Available"
}


$ oc get sc
NAME                 PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   kubernetes.io/cinder   Delete          WaitForFirstConsumer   true                   54m

$ oc get pods -A | grep -i manila
openshift-cluster-csi-drivers                      manila-csi-driver-operator-6c496dcb95-94stj               1/1     Running             1          54m

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-09-16-160553   True        False         19m     Cluster version is 4.6.0-0.nightly-2021-09-16-160553


and the cluster is fully operational:

$ oc get pods,pvc
NAME                        READY   STATUS    RESTARTS   AGE
pod/app2                    1/1     Running   0          5m50s
pod/demo-7897db69cc-67lvg   1/1     Running   0          31m
pod/demo-7897db69cc-7xsgl   1/1     Running   0          31m
pod/demo-7897db69cc-s2hk5   1/1     Running   0          31m

NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     AGE
persistentvolumeclaim/pvc2   Bound    pvc-8d54d363-7d32-4cca-aaa7-02f80a611935   1Gi        RWO            sc-test-intree   14m

Comment 7 errata-xmlrpc 2021-09-29 12:06:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.46 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3643