2000097 – Manilacsi becomes degraded even though it is not available with the underlying Openstack

Bug 2000097 - Manilacsi becomes degraded even though it is not available with the underlying Openstack

Summary: Manilacsi becomes degraded even though it is not available with the underlyin...

Keywords:
Status:	CLOSED DUPLICATE of bug 2002556
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Eric Duen
QA Contact:	rlobillo
Docs Contact:
URL:
Whiteboard:
Depends On:	1987036
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-01 11:48 UTC by Martin André
Modified:	2024-12-20 20:53 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1987036
Environment:
Last Closed:	2021-09-15 09:28:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift csi-driver-manila-operator pull 118	0	None	None	None	2021-09-03 01:00:46 UTC

Comment 8 Silvia Parpatekar 2021-09-13 17:35:58 UTC

Hello Team,

My cu is facing this issue in OCP 4.6.40 while installing the cluster.
~~~
$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           False      True         15m    Unable to apply 4.6.40: the cluster operator storage is degraded
Cluster operator storage is reporting a failure: ManilaCSIDriverOperatorCRDegraded: ManilaControllerDegraded: cannot authenticate with given credentials: Get "https://abc.com:yyy/": dial tcp x.x.x.x:yyy: i/o timeout
~~~

They are using proxy. As the installation is stuck it is a blocker at their end. Customer has plans to use OCS (external) and there is no plan of using manilla in future too.

Can this issue be taken on priority?

Comment 9 Martin André 2021-09-13 18:29:22 UTC

(In reply to Silvia Parpatekar from comment #8)
> Hello Team,
> 
> My cu is facing this issue in OCP 4.6.40 while installing the cluster.
> ~~~
> $ oc get clusterversion
> NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
> version           False      True         15m    Unable to apply 4.6.40: the
> cluster operator storage is degraded
> Cluster operator storage is reporting a failure:
> ManilaCSIDriverOperatorCRDegraded: ManilaControllerDegraded: cannot
> authenticate with given credentials: Get "https://abc.com:yyy/": dial tcp
> x.x.x.x:yyy: i/o timeout
> ~~~
> 
> They are using proxy. As the installation is stuck it is a blocker at their
> end. Customer has plans to use OCS (external) and there is no plan of using
> manilla in future too.
> 
> Can this issue be taken on priority?

Backporting full proxy support to 4.6 proved to be too complicated and we're instead using a different approach where we make the manila operator not degrade the cluster upon failure.

Please follow https://bugzilla.redhat.com/show_bug.cgi?id=2002556 that is already at the most urgent priority.

Comment 11 Aditya Deshpande 2021-09-14 08:29:46 UTC

In case, manila csi driver is not primarily used in OCP cluster, the storage operator which is in degraded state will not hamper any other processes/components of OpenShift, right?

Comment 12 rlobillo 2021-09-15 08:21:29 UTC

Failed on OCP4.6.0-0.nightly-2021-09-13-181051 (OpenShiftSDN network type) on top of RHOS-16.1-RHEL-8-20210818.n.0 with manila Enabled.

UPI installation of OCP cluster in a restricted network using proxy:

compute:                                                                                                                                                                                      
- name: worker                                                                                                                                                                                
  platform:                                                                                                                                                                                   
    openstack:                                                                                                                                                                                
      zones: []                                                                                                                                                                               
      additionalNetworkIDs: ['671634d1-c06f-433f-878f-745244a1f803']
  replicas: 0
controlPlane:
  name: master
  platform:
    openstack:
      zones: []
  replicas: 3
metadata:
  name: "ostest"
networking:
  clusterNetworks:
  - cidr:             10.128.0.0/14
    hostSubnetLength: 9
  serviceCIDR: 172.30.0.0/16
  machineCIDR: "172.16.0.0/24"
  type: "OpenShiftSDN"
platform:
  openstack:
    cloud:            "shiftstack"
    externalNetwork:  ""
    region:           "regionOne"
    computeFlavor:    "m4.xlarge"
    machinesSubnet: f27e32b6-2353-464c-95bc-1b8f1858ae59
    apiVIP: "172.16.0.5"
    ingressVIP: "172.16.0.7"
proxy:
  httpProxy: http://dummy:dummy@172.16.0.3:3128/
  httpsProxy: http://dummy:dummy@172.16.0.3:3128/

The manila operator is not using the proxy configuration so the clusteroperator storage remains degraded. This scenario will be avoided with https://bugzilla.redhat.com/show_bug.cgi?id=2002556, which will mark the clusteroperator as Available with the manila functionality disabled.

$ oc get co storage -o json | jq '.status.conditions[] | select (.type=="Available")'
{
  "lastTransitionTime": "2021-09-14T17:51:18Z",
  "message": "ManilaCSIDriverOperatorCRAvailable: Waiting for Manila operator to report status",
  "reason": "ManilaCSIDriverOperatorCR_WaitForOperator",
  "status": "False",
  "type": "Available"
}

The proxy is configured on the OCP cluster:

$ oc get proxy cluster -o json | jq .status
{
  "httpProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "httpsProxy": "http://dummy:dummy@172.16.0.3:3128/",
  "noProxy": ".cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.16.0.0/24,172.30.0.0/16,api-int.ostest.shiftstack.com,etcd-0.ostest.shiftstack.com,etcd-1.ostest.shiftstack.com,etcd-2.ostest.shiftstack.com,localhost"
}

But the manila-csi-driver-operator is still not making use of it:

$ oc logs -n openshift-cluster-csi-drivers                      manila-csi-driver-operator-6d55b5cd79-jvmbs  | tail -1
E0915 08:10:44.704358       1 base_controller.go:250] "ManilaController" controller failed to sync "key", err: cannot authenticate with given credentials: Get "https://10.46.44.10:13000/": dial tcp 10.46.44.10:13000: connect: no route to host

$ oc rsh -n openshift-cluster-csi-drivers                      manila-csi-driver-operator-6d55b5cd79-jvmbs
sh-4.4$ env | grep -i http
SOURCE_GIT_URL=https://github.com/openshift/csi-driver-manila-operator
KUBERNETES_SERVICE_PORT_HTTPS=443
sh-4.4$

Comment 13 Martin André 2021-09-15 09:28:32 UTC

Thanks Ramon for confirming that indeed, https://github.com/openshift/csi-driver-manila-operator/pull/118 is not enough to fix the problem entirely (as environment variables won't be exported in the pod) and we need to use an alternate approach with https://bugzilla.redhat.com/show_bug.cgi?id=2002556. In this case, the Manila operator will be disabled instead of degraded which is what people are asking for in this BZ.

For this reason I'll mark this bug a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2002556.

*** This bug has been marked as a duplicate of bug 2002556 ***

Note You need to log in before you can comment on or make changes to this bug.