1960655 – "Invalid file size" and "not enough cache capacity" errors in Restic pod log during Restore process

Bug 1960655 - "Invalid file size" and "not enough cache capacity" errors in Restic pod log during Restore process

Summary: "Invalid file size" and "not enough cache capacity" errors in Restic pod log ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Migration Toolkit for Containers
Classification:	Red Hat
Component:	Velero
Sub Component:
Version:	1.4.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	1.7.0
Assignee:	Pranav Gaikwad
QA Contact:	Xin jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-14 14:13 UTC by John Matthews
Modified:	2024-06-14 01:31 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-24 06:32:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2022:1043	0	None	None	None	2022-03-24 06:32:38 UTC

Description John Matthews 2021-05-14 14:13:24 UTC

Description of problem:

MTC 1.4.3 is using Velero 1.5.x which includes a version of Restic that is susceptible to restoring incomplete files.  We are not sure the exact sequence of events to reproduce this.

The issue has so far been observed by our team in only environment which consisted of:

OCP 4.6 -> OCP 4.6 migration with usage of 'minio' as the object storage.

Others in community have reported this issue on Restic as:
https://github.com/restic/restic/issues/2244

Restic fixed this issue via:
https://github.com/restic/restic/pull/2195

The fixed version of Restic is included in Velero 1.6.0


Version-Release number of selected component (if applicable):
MTC 1.4.3 with Velero 1.5.x


How reproducible:
Rare
Characteristics of the specific data being restored and possible the object storage may result in triggering this bug.  


Steps to Reproduce:
We do not have a reproducer at time of filing this bug.


Actual results:
ERRORS from restic pod:
restic-tt8nj velero/velero/logs/current.log
verifying files in finished verifying 88 files in.
There were 13 errors:
ignoring error for /01F4Z59N1ZCC0V52C3AZ836CMC/chunks/000001: not enough cache capacity: requested 12562762, available 10636051
ignoring error for /01F4M744PSAG17Z0HJZDB8A0E5/chunks/000001: not enough cache capacity: requested 8388640, available 5492532
ignoring error for /01F4CFXYXX3JAQ3Q6KD1VCYHGE/chunks/000001: not enough cache capacity: requested 8388640, available 4476956
ignoring error for /01F5GHFNAVZRJEXRR49ZD0TSPP/index: not enough cache capacity: requested 6032572, available 5850911
ignoring error for /01F5F89KZ2C7A8A5W9RHBVWEH0/chunks/000001: not enough cache capacity: requested 12188827, available 8724831
ignoring error for /01F5BKHXVKT4S5FJN5VJ288DXG/chunks/000001: not enough cache capacity: requested 11874414, available 10395305
ignoring error for /01F4Z59N1ZCC0V52C3AZ836CMC/chunks/000001: Setxattr: xattr.Set /host_pods/8762c14a-54d0-4ec9-9f1d-702fa06a0bdf/volumes/kubernetes.io~nfs/pvc-16b557cb-ff59-4321-bec9-b3191ee6192b/01F4Z59N1ZCC0V52C3AZ836CMC/chunks/000001 system.nfs4_acl: no such file or directory
ignoring error for /01F4CFXYXX3JAQ3Q6KD1VCYHGE/chunks/000001: Invalid file size: expected 161900260 got 34699391
ignoring error for /01F4M744PSAG17Z0HJZDB8A0E5/chunks/000001: Invalid file size: expected 176122679 got 27205078
ignoring error for /01F4Z59N1ZCC0V52C3AZ836CMC/chunks/000001: stat 01F4Z59N1ZCC0V52C3AZ836CMC/chunks/000001: no such file or directory
ignoring error for /01F5BKHXVKT4S5FJN5VJ288DXG/chunks/000001: Invalid file size: expected 81178935 got 57432348
ignoring error for /01F5F89KZ2C7A8A5W9RHBVWEH0/chunks/000001: Invalid file size: expected 168691379 got 88352163
ignoring error for /01F5GHFNAVZRJEXRR49ZD0TSPP/index: Invalid file size: expected 16029695 got 6931839
 "controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:389" name=4900adc0-b357-11eb-b8be-c95423c0d8dd-hk994-bkj2d namespace=openshift-migration restore=openshift-migration/4900adc0-b357-11eb-b8be-c95423c0d8dd-hk994


When we look at the PVs we see a mismatch in size confirming all data was not restored.

source: 3.4G    ./server-pvc-a023be0d-a6ca-4921-98f7-f44bcb2cc924
destination: 2.9G    ./server-pvc-16b557cb-ff59-4321-bec9-b3191ee6192b

Comment 4 Pranav Gaikwad 2021-06-09 17:53:28 UTC

This issue is fixed through the Velero version bump (1.6) in MTC 1.5.0.

Comment 12 Daniel Gur 2021-11-22 13:52:09 UTC

Erik, what should be target release for this customer bug?

Comment 13 Erik Nelson 2021-11-22 14:22:55 UTC

This should be fixed in currently supported versions as we've rebased onto a Velero that contains the fix.

I would suggest that we simply verify it is not reproducible following Alay's guidance here: https://bugzilla.redhat.com/show_bug.cgi?id=1960655#c3

Updated with target release.

Comment 18 errata-xmlrpc 2022-03-24 06:32:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) 1.7.0 release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1043

Note You need to log in before you can comment on or make changes to this bug.