Description of problem (please be detailed as possible and provide log
snippests):
Steps to reproduce:-
-------------------
1) Keep the workload in RDR setup running for more than a week
Additional Info:
----------------
Not able to execute some rbd commands. Getting the below message while executing the rbd cmd
rbd mirror snapshot schedule list --recursive
rbd: rbd mirror snapshot schedule list failed: (11) Resource temporarily unavailable
Actual results:
---------------
observing the error message
'rados: ret=-11, Resource temporarily unavailable'
Because of this the snapshot scheduling stops
Expected results:
------------------
-> Snapshot scheduling should not stop
vr yaml:
-------
oc get vr busybox-pvc-61 -o yaml
apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplication
metadata:
creationTimestamp: "2023-07-10T08:04:25Z"
finalizers:
- replication.storage.openshift.io
generation: 1
name: busybox-pvc-61
namespace: appset-busybox-4
ownerReferences:
- apiVersion: ramendr.openshift.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: VolumeReplicationGroup
name: busybox-4-placement-drpc
uid: 6f21ad83-16e0-4eb9-98bf-e43b9fb9bdf0
resourceVersion: "36486402"
uid: a85e701c-4109-49a5-9dd6-fcb682a818bf
spec:
autoResync: false
dataSource:
apiGroup: ""
kind: PersistentVolumeClaim
name: busybox-pvc-61
replicationHandle: ""
replicationState: primary
volumeReplicationClass: rbd-volumereplicationclass-2263283542
status:
conditions:
- lastTransitionTime: "2023-07-10T08:04:26Z"
message: ""
observedGeneration: 1
reason: FailedToPromote
status: "False"
type: Completed
- lastTransitionTime: "2023-07-10T08:04:26Z"
message: ""
observedGeneration: 1
reason: Error
status: "True"
type: Degraded
- lastTransitionTime: "2023-07-10T08:04:26Z"
message: ""
observedGeneration: 1
reason: NotResyncing
status: "False"
type: Resyncing
message: 'rados: ret=-11, Resource temporarily unavailable'
observedGeneration: 1
state: Unknown
Must gather logs
----------------
c1 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2219628/july10/c1/
c2 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2219628/july10/c2/
hub - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2219628/july10/hub/
Version of all relevant components (if applicable):
OCP Version - 4.13.0-0.nightly-2023-06-05-164816
ODF - ODF 4.13.0-219.snaptrim
SUBMARINER version:- v0.15.1
VOLSYNC version:- volsync-product.v0.7.1
ceph version - ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Description of problem (please be detailed as possible and provide log snippests): Steps to reproduce:- ------------------- 1) Keep the workload in RDR setup running for more than a week Additional Info: ---------------- Not able to execute some rbd commands. Getting the below message while executing the rbd cmd rbd mirror snapshot schedule list --recursive rbd: rbd mirror snapshot schedule list failed: (11) Resource temporarily unavailable Actual results: --------------- observing the error message 'rados: ret=-11, Resource temporarily unavailable' Because of this the snapshot scheduling stops Expected results: ------------------ -> Snapshot scheduling should not stop vr yaml: ------- oc get vr busybox-pvc-61 -o yaml apiVersion: replication.storage.openshift.io/v1alpha1 kind: VolumeReplication metadata: creationTimestamp: "2023-07-10T08:04:25Z" finalizers: - replication.storage.openshift.io generation: 1 name: busybox-pvc-61 namespace: appset-busybox-4 ownerReferences: - apiVersion: ramendr.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: VolumeReplicationGroup name: busybox-4-placement-drpc uid: 6f21ad83-16e0-4eb9-98bf-e43b9fb9bdf0 resourceVersion: "36486402" uid: a85e701c-4109-49a5-9dd6-fcb682a818bf spec: autoResync: false dataSource: apiGroup: "" kind: PersistentVolumeClaim name: busybox-pvc-61 replicationHandle: "" replicationState: primary volumeReplicationClass: rbd-volumereplicationclass-2263283542 status: conditions: - lastTransitionTime: "2023-07-10T08:04:26Z" message: "" observedGeneration: 1 reason: FailedToPromote status: "False" type: Completed - lastTransitionTime: "2023-07-10T08:04:26Z" message: "" observedGeneration: 1 reason: Error status: "True" type: Degraded - lastTransitionTime: "2023-07-10T08:04:26Z" message: "" observedGeneration: 1 reason: NotResyncing status: "False" type: Resyncing message: 'rados: ret=-11, Resource temporarily unavailable' observedGeneration: 1 state: Unknown Must gather logs ---------------- c1 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2219628/july10/c1/ c2 - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2219628/july10/c2/ hub - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2219628/july10/hub/ Version of all relevant components (if applicable): OCP Version - 4.13.0-0.nightly-2023-06-05-164816 ODF - ODF 4.13.0-219.snaptrim SUBMARINER version:- v0.15.1 VOLSYNC version:- volsync-product.v0.7.1 ceph version - ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: