Bug 2002555

Summary: Cluster becomes degraded if it can't talk to Manila
Product: OpenShift Container Platform Reporter: Martin André <m.andre>
Component: StorageAssignee: Eric Duen <eduen>
Storage sub component: OpenStack CSI Drivers QA Contact: rlobillo
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, eduen, juriarte, openshift-bugzilla-robot, rlobillo
Version: 4.6Keywords: FastFix, Triaged
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2002554
: 2002556 (view as bug list) Environment:
Last Closed: 2021-09-29 14:10:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2002554    
Bug Blocks: 2002556    

Comment 2 rlobillo 2021-09-15 14:08:15 UTC
Pre-verified on 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest (cluster-bot build for build openshift/csi-driver-manila-operator#126)
on  on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type.


The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance is blocking the egress traffic going to OSP manila endpoint:

$ openstack catalog show manila | grep public
|           |   public: https://10.46.44.10:13786/v1/1ebc41dabb5e4e9bae86a22bb4ffcb40 |


# Egress rules on the instance where the proxy is running:
$ openstack security group rule list --egress installer_host-sg
+--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range  | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+
| 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None        | IPv6      | ::/0      |             | None                  |
| 1d4be39b-8236-4968-8624-4458a82da619 | tcp         | IPv4      | 0.0.0.0/0 | 13787:65000 | None                  |
| 9b8dbd27-299f-420e-82f2-f46e35d938be | udp         | IPv4      | 0.0.0.0/0 |             | None                  |
| dceae5ee-38fb-44b0-824b-9f4975c2ce05 | tcp         | IPv4      | 0.0.0.0/0 | 1:13785     | None                  |
+--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+

$ oc get proxy cluster  -o json | jq .status
{
  "httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "httpsProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,localhost"
}


This is provoking that the manila-csi-driver-operator is getting a timeout while reaching the manila API, but it is working for the rest (tested with keystone):

$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name)

sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13786/v1/1ebc41dabb5e4e9bae86a22bb4ffcb40
curl: (28) Operation timed out after 5000 milliseconds with 0 out of 0 bytes received

sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000                                    
{"versions": {"values": [{"id": "v3.13", "status": "stable", "updated": "2019-07-19T00:00:00Z", "links": [{"rel": "self", "href": "https://10.46.44.10:13000/v3/"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}]}}sh-4.4$ 


Under these circumstances, the UPI installation works fine and all cluster operators are available:

$ oc get clusteroperators
NAME                                       VERSION                                                  AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      81s
baremetal                                  4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      43m
cloud-credential                           4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      74m
cluster-autoscaler                         4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      50m
config-operator                            4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      57m
console                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      7m26s
csi-snapshot-controller                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      45m
dns                                        4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      45m
etcd                                       4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      51m
image-registry                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      13m
ingress                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      11m
insights                                   4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      46m
kube-apiserver                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
kube-controller-manager                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
kube-scheduler                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
kube-storage-version-migrator              4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      11m
machine-api                                4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      42m
machine-approver                           4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      51m
machine-config                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
marketplace                                4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
monitoring                                 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      9m24s
network                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      53m
node-tuning                                4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
openshift-apiserver                        4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      45m
openshift-controller-manager               4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      44m
openshift-samples                          4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      44m
operator-lifecycle-manager                 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      50m
operator-lifecycle-manager-catalog         4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      50m
operator-lifecycle-manager-packageserver   4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      47m
service-ca                                 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      57m
storage                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m


and Manila is not deployed as stated on the clusteroperator storage:

$ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")'
{
  "lastTransitionTime": "2021-09-15T13:12:23Z",
  "message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot list available share types: Get \"https://10.46.44.10:13786/v2/1ebc41dabb5e4e9bae86a22bb4ffcb40/types\": Service Unavailable\nOpenStackCinderCSIDriverOperatorCRAvailable: All is well",
  "reason": "AsExpected",
  "status": "True",
  "type": "Available"
}

$ oc get sc
NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   47m
standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   43m

$ oc get pods -A | grep -i manila
openshift-cluster-csi-drivers                      manila-csi-driver-operator-9b79c5846-xf4fs                1/1     Running             1          44m

$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         13s     Cluster version is 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest

Comment 7 errata-xmlrpc 2021-09-29 14:10:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.32 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3636