Bug 2218850

Summary: ramen-dr-cluster-operator crash if malformed PVC request is provided when creating PVC which will be part of VolSync
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Elvir Kuric <ekuric>
Component: odf-drAssignee: Benamar Mekhissi <bmekhiss>
odf-dr sub component: ramen QA Contact: krishnaram Karthick <kramdoss>
Status: ON_QA --- Docs Contact:
Severity: unspecified    
Priority: unspecified CC: muagarwa, odf-bz-bot, rtalur, srangana
Version: 4.13   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elvir Kuric 2023-06-30 08:58:40 UTC
Description of problem (please be detailed as possible and provide log
snippests):

When PVC created by external applications do not provide in "spec" section 
does not contain "storageClassName" below is example of bad case:

--- 
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  volumeMode: Filesystem
  volumeName: pvc-81d3d148-044b-43ac-a715-fd22baa55621
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  phase: Bound 

-- 
then this PVC cannot be included by VolSync to be replicated and "ramen-dr-cluster-operator" pod in "openshift-dr-system" namespace will end in CrashLoopBackOff

https://gist.githubusercontent.com/ekuric/e2cb1e4f1d870095c12ebc0c9e03ee37/raw/b53b6913d208c7330156b894af94589763d104ac/ramen-crash

and will not recover as long as problematic pod/pvc are not removed from cluster.

In ramen logs we see 

https://gist.githubusercontent.com/ekuric/09cb8d0be478a2b1c68e3980671fcc47/raw/5b4bb62723f981b9fc6dd953571c9ac085a3cdaa/ramen 

Version of all relevant components (if applicable):
volsync-product.v0.7.1
ceph version 
ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable)
ODF v4.13
OCP v4.13

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
Yes, ensure that application create PVC with "storageClassName" in "spec" section when creating PVC. 

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. on cluster with ODF VolSync feature create PVC without "storageClassName" in "spec" section
2. monitor ramen logs 


Actual results:
ramen-dr-cluster-operator pod will crash and not recover. 

Expected results:
ramen-dr-cluster-operator not to crash if user provide malformed PVC object. When ramen-dr-cluster-operator crash other users cannot add new PVCs to VolSync as ramen-dr-cluster-operator is not working. 

Additional info:
NA