Bug 1859165

Summary: [UI] PVC expansion should not be allowed in UI if PV spec does not carry expand secret
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Jilju Joy <jijoy>
Component: management-consoleAssignee: Nishanth Thomas <nthomas>
Status: CLOSED WONTFIX QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.5CC: anbehl, etamir, hchiramm, jefbrown, madam, nberry, nthomas, ocs-bugs, uchapaga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-10 13:33:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jilju Joy 2020-07-21 11:28:00 UTC
Description of problem (please be detailed as possible and provide log
snippests):

PVC expansion is allowed from UI even if the volume is not able to be expanded. This occurs particularly for the PVCs which are created before upgrading to OCS 4.5.

This in turn cause multiple retries for expansion all of which are failing.

Log snippet from csi-rbdplugin-provisioner pod csi-resizer container. These errors are repeated many times.

E0715 14:04:34.438152 1 controller.go:361] Resize volume "pvc-4635bd77-846b-47e0-9542-5b7d40aa7a7c" by resizer "openshift-storage.rbd.csi.ceph.com" failed: rpc error: code = Internal desc = provided secret is empty
I0715 14:04:34.438232 1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"namespace-test-77568bca51874c368ac8fa5d633c0a0c", Name:"pvc-test-ee2409c820574708be26902b657bcbae", UID:"4635bd77-846b-47e0-9542-5b7d40aa7a7c", APIVersion:"v1", ResourceVersion:"318543", FieldPath:""}): type: 'Warning' reason: 'VolumeResizeFailed' resize volume pvc-4635bd77-846b-47e0-9542-5b7d40aa7a7c failed: rpc error: code = Internal desc = provided secret is empty

------------------------------------------------------------------
Version of all relevant components (if applicable):

OpenShift Container Storage   4.5.0-493.ci
Cluster version is 4.5.0-0.nightly-2020-07-17-014709


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
This cause multiple retry attempts for expansion and lot of error events will be shown in UI.


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes


Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. On an OCS4.4 cluster, create a ceph-rbd or cephfs based PVC using and attach to pod.
2. Upgrade to OCS4.5.
3. Manually re-create SCs ocs-storagecluster-ceph-rbd and ocs-storagecluster-cephfs to support volume expansion. This is done manually due to the issue https://bugzilla.redhat.com/show_bug.cgi?id=1846085 . The SCs will be updated/re-created once it is fixed.
4. Try to expand PVC which is created in step 1


Actual results:
No error is shown in UI while trying to expand the PVC. But the resize will actually fail because the PV spec does not carry expand secret. Error events will be shown repeatedly.




Expected results:
Expansion should not be allowed from UI if PV spec does not carry expand secret.

Additional info:


I will share the logs shortly.

Comment 3 Neha Berry 2020-07-21 15:32:45 UTC
BZ in CSI - Bug 1859183

Comment 4 Jilju Joy 2020-07-21 15:45:56 UTC
Faced some issue with the initially tested cluster. Reproduced in a different setup.

Tried to expand PVCs pvc-test-011b851b860341b18b20c64a7da3e146 from 10Gi to 15Gi. Expand is done from UI.

log -  http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1859165/

$ oc get pvc pvc-test-011b851b860341b18b20c64a7da3e146 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
  creationTimestamp: "2020-07-21T12:38:02Z"
  finalizers:
  - kubernetes.io/pvc-protection
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:accessModes: {}
        f:resources:
          f:requests: {}
        f:storageClassName: {}
        f:volumeMode: {}
    manager: oc
    operation: Update
    time: "2020-07-21T12:38:02Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:pv.kubernetes.io/bind-completed: {}
          f:pv.kubernetes.io/bound-by-controller: {}
          f:volume.beta.kubernetes.io/storage-provisioner: {}
      f:spec:
        f:volumeName: {}
      f:status:
        f:accessModes: {}
        f:capacity:
          .: {}
          f:storage: {}
        f:phase: {}
    manager: kube-controller-manager
    operation: Update
    time: "2020-07-21T12:38:04Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:resources:
          f:requests:
            f:storage: {}
    manager: Mozilla
    operation: Update
    time: "2020-07-21T13:28:30Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          .: {}
          k:{"type":"Resizing"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
    manager: csi-resizer
    operation: Update
    time: "2020-07-21T14:48:49Z"
  name: pvc-test-011b851b860341b18b20c64a7da3e146
  namespace: namespace-test-69b432a13dd241fea838a8182ae9d96d
  resourceVersion: "142183"
  selfLink: /api/v1/namespaces/namespace-test-69b432a13dd241fea838a8182ae9d96d/persistentvolumeclaims/pvc-test-011b851b860341b18b20c64a7da3e146
  uid: efffe66a-9efe-48b2-918f-e427ff9a8317
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 15Gi
  storageClassName: ocs-storagecluster-cephfs
  volumeMode: Filesystem
  volumeName: pvc-efffe66a-9efe-48b2-918f-e427ff9a8317
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-21T14:48:49Z"
    status: "True"
    type: Resizing
  phase: Bound

Comment 5 Nishanth Thomas 2020-07-21 16:21:01 UTC
Can admin allowed to do it from the CLI? can you please confirm?

As we always talk, if an admin can do it from the CLI, we are not in favour of doing additional validations from the UI.

OCP 4.5 is released already, so its out question to handle this in 4.5. Also I don't think this is a 'high' severity bz from UI perspective.
I am reducing the severity and moving out of 4.5.

Comment 7 umanga 2020-07-22 07:45:46 UTC
First thing first, Expand Secret is on Storage Class parameter not on PV.
PV creation is dynamic and we don't control what parameters it can or can not have.

About blocking expansion in case where it is not possible, how do we expect UI to do that?
For something that'll probably crash if there's a delay of more than a certain threshold, asking UI to do such calculation (>100 PVs) doesn't seem right to me.

Even if we decide to do it, how will UI differentiate between PVs that allow expansion and those that don't?
Where will this information be persisted or do we always evaluate this (co-relate 100s of PVs to possibly multiple Storage Classes for evaluation)?

Of all the components in OCS, UI is probably the worst place for such restrictions to be enforced. I might be over-thinking this, but I don't think UI is that magic
component which can enforce anything that other components can not. Let's find an agreeable fix in operator or csi?

Comment 8 Humble Chirammal 2020-07-22 09:23:05 UTC
(In reply to umanga from comment #7)
> First thing first, Expand Secret is on Storage Class parameter not on PV.

Its on the PV spec too!

> PV creation is dynamic and we don't control what parameters it can or can
> not have.

But we can list/check PVs.

> 
> About blocking expansion in case where it is not possible, how do we expect
> UI to do that?
....
> Even if we decide to do it, how will UI differentiate between PVs that allow
> expansion and those that don't?

If the provisioner belong to OCS and if it does not carry the expand secrets in it we can mask those PVCs for expansion.

> For something that'll probably crash if there's a delay of more than a
> certain threshold, asking UI to do such calculation (>100 PVs) doesn't seem
> right to me.
> 

> Where will this information be persisted or do we always evaluate this
> (co-relate 100s of PVs to possibly multiple Storage Classes for evaluation)?
> 
> Of all the components in OCS, UI is probably the worst place for such
> restrictions to be enforced. I might be over-thinking this, but I don't
> think UI is that magic
> component which can enforce anything that other components can not. Let's
> find an agreeable fix in operator or csi?

The solution/workaround can be in UI, operator or CSI .. My take here would be 
this:

*) If its about blocking the operation -> its better to do it from starting point which is UI
*) If its about allowing the operation with an extra mile or some hack -> Pick either Operator or CSI.

Comment 9 umanga 2020-07-22 10:34:50 UTC
Got clarification for some of my points here https://bugzilla.redhat.com/show_bug.cgi?id=1859183#c16 .
Turns out I was wrong about some points, but it's good for us. We have enough data to help UI make decision whether to block expansion or not.

> If the provisioner belong to OCS and if it does not carry the expand secrets in it we can mask those PVCs for expansion.

While it's entirely upto UI devs if/how they want to tackle this, I believe if there are multiple entry points (UI and CLI in this case), checks should be at a point where eventually every request reaches and gets the same response.
If we want to completely disregard CLI and say that UI is the only entry point that will ever be used by the admin then blocking on UI seems okay.

But, that'll raise UX issue where some PVCs have the "Expand" option available while the others don't. How do we explain the user why it's like that?

> If the provisioner belong to OCS and if it does not carry the expand secrets in it we can mask those PVCs for expansion.

If it is this simple, why does the resizer controller in csi (or kubernetes) not do that? Can't it just say that "the PV you are trying to expand does not have the expand secret so we can not proceed"? This could then be easily reflected on UI without adding a bunch of checks. An example of similar behavior is when you create PV/C with negative size, kubernetes responds with "negative values are not allowed" and both UI and CLI reflects it accurately (no custom checks or blocks needed).

Comment 10 Ankush Behl 2020-07-22 10:43:33 UTC
Firstly we should have the event generated for the user if something like this happens & not check the operator logs to figure out.

Also The UI is not the right point to have this validation as when CLI user creates it they will not get any error message.
and as per console convention they want to provide same error from UI/CLI when something went wrong with the resource so this doesn't follows/matches that pattern.

Comment 11 Michael Adam 2020-07-23 13:21:00 UTC
There is, afaik, no way to block the operation from the CLI, since it is a generic "oc/kubectl edit" operation.
A deeper layer check in the csi driver doesn't help since the kubernetes retry mechanism would still kick in. So it wouldn't solve the problem.

On the other hand it is possible to check th PV and prevent this operation in the first place from the GUI.
Hence the suggestion.

https://bugzilla.redhat.com/show_bug.cgi?id=1859183#c22
https://bugzilla.redhat.com/show_bug.cgi?id=1859183#c26

@Humble - can you confirm this?

Comment 12 Ankush Behl 2020-07-23 13:54:36 UTC
@michael, The thing is we need to write a extension to support this validation in generic console code which is actually like writing a whole feature in which we will not be able to backport as this is not just about writing if else condition.

Secondly, I find this solution intrim & bit hacky as we need to support the extension forever or refactor the code(as we know) after two three release as when OCP 4.5 is unsupported this extension we wrote will not be required anymore. 

Also the Expension of PVC is a simple task for any admin to do from CLI and they will get into same situation where they need to open a customer case or read the docs that the older PV doesn't support expansion. 

Even if it gets in How would you explain customer that half of the PVC provisioned by OCP supports expansion & half of them doesn't. In that case also we need to redirect them to docs and explain why of it.

Why not a have a generic solution to these problems where user can get in trouble either ways?

Comment 13 Ankush Behl 2020-07-23 15:49:12 UTC
(In reply to Ankush Behl from comment #10)
> Firstly we should have the event generated for the user if something like
> this happens & not check the operator logs to figure out.
I checked that the event for PVC is coming up stating the same as mentioned in Operator logs. So user understands the
secret is not present.

We can imporove for the event to tell user which secret is not present and that's why we are not able to expand.
I see this event currently `provided secret is empty` which looks bit vague to me.

Comment 14 Humble Chirammal 2020-07-30 03:52:41 UTC
Can console/UI do or execute one command ( equivalant to #oc edit/patch) when someone try to click on "Expand PVC" ?  The advantage with this approach is that, we dont want to do or operate on all the PVs in one shot and dont want to change any other component. Its a minimal operation for that particular PV. 

Nishanth/Ankush is above possible ?

Comment 15 Ankush Behl 2020-07-30 06:58:05 UTC
Expand is generic flow for console we need to find specific OCS provisioners & do it. The problem is that if something specific need to be done we need extension points to accomodate this functionality. Which is hard to backport and I think find it ideal solution for the problem and problem still persist in the CLI flow because no one will be able to patch in CLI.

The same effort can be done from a template job & it will done for once & all.

Comment 16 Humble Chirammal 2020-08-10 13:18:20 UTC
(In reply to Ankush Behl from comment #15)
> Expand is generic flow for console we need to find specific OCS provisioners
> & do it. The problem is that if something specific need to be done we need
> extension points to accomodate this functionality. Which is hard to backport
> and I think find it ideal solution for the problem and problem still persist
> in the CLI flow because no one will be able to patch in CLI.
> 
> The same effort can be done from a template job & it will done for once &
> all.

Based on all the discussions so far, the take here is we are not going to fix 
this in the GUI, Isnt it ? if thats case may be we have to close this bug?

Comment 17 Nishanth Thomas 2020-08-10 13:33:50 UTC
Closing as the decision is to fix this in operator side.