Bug 2002555 - Cluster becomes degraded if it can't talk to Manila
Summary: Cluster becomes degraded if it can't talk to Manila
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.z
Assignee: Eric Duen
QA Contact: rlobillo
URL:
Whiteboard:
Depends On: 2002554
Blocks: 2002556
TreeView+ depends on / blocked
 
Reported: 2021-09-09 08:28 UTC by Martin André
Modified: 2021-09-29 14:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2002554
: 2002556 (view as bug list)
Environment:
Last Closed: 2021-09-29 14:10:09 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift csi-driver-manila-operator pull 126 0 None open [release-4.7] Bug 2002555: Do not degrade cluster on failure to reach Manila 2021-09-16 08:43:54 UTC
Red Hat Product Errata RHBA-2021:3636 0 None None None 2021-09-29 14:10:32 UTC

Comment 2 rlobillo 2021-09-15 14:08:15 UTC
Pre-verified on 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest (cluster-bot build for build openshift/csi-driver-manila-operator#126)
on  on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0) with OpenshiftSDN network type.


The UPI installation performed on restricted network with a proxy finished successfully when the SG rules on the proxy instance is blocking the egress traffic going to OSP manila endpoint:

$ openstack catalog show manila | grep public
|           |   public: https://10.46.44.10:13786/v1/1ebc41dabb5e4e9bae86a22bb4ffcb40 |


# Egress rules on the instance where the proxy is running:
$ openstack security group rule list --egress installer_host-sg
+--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range  | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+
| 016e5030-bca6-402d-8cfa-e4b7271ba9ec | None        | IPv6      | ::/0      |             | None                  |
| 1d4be39b-8236-4968-8624-4458a82da619 | tcp         | IPv4      | 0.0.0.0/0 | 13787:65000 | None                  |
| 9b8dbd27-299f-420e-82f2-f46e35d938be | udp         | IPv4      | 0.0.0.0/0 |             | None                  |
| dceae5ee-38fb-44b0-824b-9f4975c2ce05 | tcp         | IPv4      | 0.0.0.0/0 | 1:13785     | None                  |
+--------------------------------------+-------------+-----------+-----------+-------------+-----------------------+

$ oc get proxy cluster  -o json | jq .status
{
  "httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "httpsProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,localhost"
}


This is provoking that the manila-csi-driver-operator is getting a timeout while reaching the manila API, but it is working for the rest (tested with keystone):

$ oc rsh -n openshift-cluster-csi-drivers $(oc get pods -n openshift-cluster-csi-drivers -l name=manila-csi-driver-operator -o name)

sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13786/v1/1ebc41dabb5e4e9bae86a22bb4ffcb40
curl: (28) Operation timed out after 5000 milliseconds with 0 out of 0 bytes received

sh-4.4$ curl --connect-timeout 5 --proxy-cacert /etc/openstack-ca/ca-bundle.pem --cacert /etc/openstack-ca/ca-bundle.pem https://10.46.44.10:13000                                    
{"versions": {"values": [{"id": "v3.13", "status": "stable", "updated": "2019-07-19T00:00:00Z", "links": [{"rel": "self", "href": "https://10.46.44.10:13000/v3/"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}]}}sh-4.4$ 


Under these circumstances, the UPI installation works fine and all cluster operators are available:

$ oc get clusteroperators
NAME                                       VERSION                                                  AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      81s
baremetal                                  4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      43m
cloud-credential                           4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      74m
cluster-autoscaler                         4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      50m
config-operator                            4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      57m
console                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      7m26s
csi-snapshot-controller                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      45m
dns                                        4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      45m
etcd                                       4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      51m
image-registry                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      13m
ingress                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      11m
insights                                   4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      46m
kube-apiserver                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
kube-controller-manager                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
kube-scheduler                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
kube-storage-version-migrator              4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      11m
machine-api                                4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      42m
machine-approver                           4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      51m
machine-config                             4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
marketplace                                4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
monitoring                                 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      9m24s
network                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      53m
node-tuning                                4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m
openshift-apiserver                        4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      45m
openshift-controller-manager               4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      44m
openshift-samples                          4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      44m
operator-lifecycle-manager                 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      50m
operator-lifecycle-manager-catalog         4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      50m
operator-lifecycle-manager-packageserver   4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      47m
service-ca                                 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      57m
storage                                    4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         False      49m


and Manila is not deployed as stated on the clusteroperator storage:

$ oc get clusteroperator storage -o json | jq '.status.conditions[] | select(.type=="Available")'
{
  "lastTransitionTime": "2021-09-15T13:12:23Z",
  "message": "ManilaCSIDriverOperatorCRAvailable: CSI driver for Manila is disabled: Unable to retrieve Manila share types: cannot list available share types: Get \"https://10.46.44.10:13786/v2/1ebc41dabb5e4e9bae86a22bb4ffcb40/types\": Service Unavailable\nOpenStackCinderCSIDriverOperatorCRAvailable: All is well",
  "reason": "AsExpected",
  "status": "True",
  "type": "Available"
}

$ oc get sc
NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   kubernetes.io/cinder       Delete          WaitForFirstConsumer   true                   47m
standard-csi         cinder.csi.openstack.org   Delete          WaitForFirstConsumer   true                   43m

$ oc get pods -A | grep -i manila
openshift-cluster-csi-drivers                      manila-csi-driver-operator-9b79c5846-xf4fs                1/1     Running             1          44m

$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest   True        False         13s     Cluster version is 4.7.0-0.ci.test-2021-09-15-113122-ci-ln-n4zbqbk-latest

Comment 7 errata-xmlrpc 2021-09-29 14:10:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.32 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3636


Note You need to log in before you can comment on or make changes to this bug.