Bug 2195989

Summary:	timeout during waiting for condition. "error preparing volumesnapshots"
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	David Vaanunu <dvaanunu>
Component:	csi-driver	Assignee:	Nobody <nobody>
Status:	CLOSED ERRATA	QA Contact:	krishnaram Karthick <kramdoss>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.12	CC:	bniver, kbg, muagarwa, ocs-bugs, odf-bz-bot, sostapov
Target Milestone:	---
Target Release:	ODF 4.12.4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.12.4-1	Doc Type:	Bug Fix
Doc Text:	Previously, stale RADOS block device (RBD) images were left in the cluster as there was trouble deleting the the RBD image due to "numerical result is out of range" error. With this fix, the number of trash entries list is increased in go-ceph. So, stale RBD images are not found in the Ceph cluster.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-06-14 21:20:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description David Vaanunu 2023-05-07 09:31:12 UTC

Description of problem (please be detailed as possible and provide log
snippests):

During OADP testing (backup & restore), while running restore flow
getting errors (OADP logs)  regarding  Volumesnapshot"

Errors:

time="2023-05-04T05:32:17Z" level=error msg="Namespace perf-busy-data-cephrbd-50pods, resource restore error: error preparing volumesnapshots.snapshot.storage.k8s.io/perf-busy-data-cephrbd-50pods/velero-pvc-busy-data-rbd-50pods-1-szwqr: rpc error: code = Unknown desc = timed out waiting for the condition" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:498" restore=openshift-adp/dm-restore-rbd-50pvs-cc50-iter4


time="2023-05-04T05:29:07Z" level=error msg="Timed out awaiting reconciliation of volumesnapshotrestoreList" cmd=/plugins/velero-plugin-for-vsm logSource="/remote-source/app/internal/util/util.go:393" pluginName=velero-plugin-for-vsm restore=openshift-adp/dm-restore-rbd-50pvs-cc50-iter4


Version of all relevant components (if applicable):

OCP 4.12.9
ODF 4.12.2
OADP 1.2.0-63

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes. the tests are failing and can't complete a full cycle of backup & restore.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
no

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. create ns with a few PVs (+data)
2. running OADP backup - end with 'Completed' status
3. delete the ns
4. running OADP restore


Actual results:
restore failed with 'PartiallyFailed' status.


Expected results:
restore should succeed ('Completed' status)


Additional info:

Comment 2 Mudit Agarwal 2023-05-10 03:17:01 UTC

Not a 4.13 blocker

Comment 12 krishnaram Karthick 2023-06-01 11:54:33 UTC

Moving the bug to verified based on the regression run on 4.12.4-1 - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7951/

Comment 20 errata-xmlrpc 2023-06-14 21:20:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.12.4 security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3609