Bug 2006385 - ResticVerifyErrors warning is not displayed when Restic reports a verification error
Summary: ResticVerifyErrors warning is not displayed when Restic reports a verificatio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.6.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 1.7.0
Assignee: Jaydip Gabani
QA Contact: Xin jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-21 15:51 UTC by Sergio
Modified: 2022-03-24 06:32 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-24 06:32:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:1043 0 None None None 2022-03-24 06:32:41 UTC

Description Sergio 2021-09-21 15:51:06 UTC
Description of problem:
When we run a migration with the "verify" option enabled, and there is corrupted data, the migmigration plan is not displaying a `ResticVerifyErrors`. Instead of that we get a `VeleroStageRestorePartiallyFailed` warning.

Version-Release number of selected component (if applicable):
TARGET CLUSTER: AWS OCP4.9 GP2 (MTC 1.6.0) (CONTROLLER + IU)
SOURCE CLUSTER: AWS OCP3.11 GP2 (MTC 1.5.1)
REPLICATION REPOSITORY: AWS S3

How reproducible:
Always

Steps to Reproduce:
1. Follow "OCP-30240 - Data validation with indirect migration" test steps

Actual results:

The migmigration resource displays this warning

12:41:14.479 - INFO: "category": "Warn",
12:41:14.480 - INFO: "durable": true,
12:41:14.482 - INFO: "lastTransitionTime": "2021-09-17T12:40:55Z",
12:41:14.483 - INFO: "message": "Stage Restore openshift-migration/ocp-30240-datavalidation-mig-1631882348-stage-lnsfb: partially failed on destination cluster",
12:41:14.484 - INFO: "status": "True",
12:41:14.485 - INFO: "type": "VeleroStageRestorePartiallyFailed"
12:41:14.485 - INFO: },



The podvolumerestore resource is displaying this error

status:
  completionTimestamp: "2021-09-17T12:49:22Z"
  message: |-
    error restoring volume: error running restic restore, cmd=restic restore --repo=s3:s3-us-east-2.amazonaws.com/camreplication/velero/restic/ocp-30240-datavalidation --password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic 8b2700ff --target=. --delete --verify, stdout=restoring <Snapshot 8b2700ff of [/host_pods/fecb458b-17b4-11ec-97b2-0ef8128329e3/volumes/kubernetes.io~aws-ebs/pvc-feb97997-17b4-11ec-97b2-0ef8128329e3] at 2021-09-17 12:49:03.908299474 +0000 UTC by root@velero> to .
    verifying files in .
    , stderr=ignoring error for /index.html: Invalid file size: expected 21 got 0
    Fatal: There were 1 errors
 
    : exit status 1
  phase: Failed
  progress:
    bytesDone: 21
    totalBytes: 21
  startTimestamp: "2021-09-17T12:49:20Z"


The restore resource is displaying this error:

status:
  completionTimestamp: "2021-09-17T12:49:24Z"
  errors: 1
  phase: PartiallyFailed
  progress:
    itemsRestored: 6
    totalItems: 6
  startTimestamp: "2021-09-17T12:49:07Z"
  warnings: 2


If we describe the restore resource we get this error:

Errors:
  Velero:  pod volume restore failed: error restoring volume: error running restic restore, cmd=restic restore --repo=s3:s3-us-east-2.amazonaws.com/camreplication/velero/restic/ocp-30240-datavalidation --password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic 8b2700ff --target=. --delete --verify, stdout=restoring <Snapshot 8b2700ff of [/host_pods/fecb458b-17b4-11ec-97b2-0ef8128329e3/volumes/kubernetes.io~aws-ebs/pvc-feb97997-17b4-11ec-97b2-0ef8128329e3] at 2021-09-17 12:49:03.908299474 +0000 UTC by root@velero> to .
verifying files in .
, stderr=ignoring error for /index.html: Invalid file size: expected 21 got 0
Fatal: There were 1 errors
 
: exit status 1
  Cluster:    <none>
  Namespaces: <none>



In the restic pod we can see this error:

time="2021-09-17T12:47:06Z" level=info msg="Ran command=restic restore --repo=s3:s3-us-east-2.amazonaws.com/camreplication/velero/restic/ocp-30240-datavalidation --password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic 35cc7b9a --target=. --delete --verify, stdout=restoring <Snapshot 35cc7b9a of [/host_pods/fecb458b-17b4-11ec-97b2-0ef8128329e3/volumes/kubernetes.io~aws-ebs/pvc-feb97997-17b4-11ec-97b2-0ef8128329e3] at 2021-09-17 12:46:22.060127014 +0000 UTC by root@velero> to .\nverifying files in .\nfinished verifying 1 files in .\n, stderr=" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:399" name=ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zm5drc namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zfqfh
time="2021-09-17T12:47:06Z" level=info msg="Restore completed" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:323" name=ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zm5drc namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zfqfh
time="2021-09-17T12:47:06Z" level=info msg="Restore starting" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:282" name=ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-z9gz8w namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zfqfh
time="2021-09-17T12:47:08Z" level=info msg="Ran command=restic restore --repo=s3:s3-us-east-2.amazonaws.com/camreplication/velero/restic/ocp-30240-datavalidation --password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic 2cb7df66 --target=. --delete --verify, stdout=restoring <Snapshot 2cb7df66 of [/host_pods/fecb458b-17b4-11ec-97b2-0ef8128329e3/volumes/kubernetes.io~aws-ebs/pvc-feb4a4b2-17b4-11ec-97b2-0ef8128329e3] at 2021-09-17 12:46:19.488067273 +0000 UTC by root@velero> to .\nverifying files in .\nfinished verifying 2 files in .\n, stderr=" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:399" name=ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-z9gz8w namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zfqfh
time="2021-09-17T12:47:08Z" level=info msg="Restore completed" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:323" name=ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-z9gz8w namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882720-stage-phas-stage-zfqfh
time="2021-09-17T12:49:20Z" level=info msg="Restore starting" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:282" name=ocp-30240-datavalidation-mig-1631882901-stage-jzq7g-5mmnv namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882901-stage-jzq7g
time="2021-09-17T12:49:22Z" level=error msg="Error restoring volume" controller=pod-volume-restore error="error running restic restore, cmd=restic restore --repo=s3:s3-us-east-2.amazonaws.com/camreplication/velero/restic/ocp-30240-datavalidation --password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic 8b2700ff --target=. --delete --verify, stdout=restoring <Snapshot 8b2700ff of [/host_pods/fecb458b-17b4-11ec-97b2-0ef8128329e3/volumes/kubernetes.io~aws-ebs/pvc-feb97997-17b4-11ec-97b2-0ef8128329e3] at 2021-09-17 12:49:03.908299474 +0000 UTC by root@velero> to .\nverifying files in .\n, stderr=ignoring error for /index.html: Invalid file size: expected 21 got 0\nFatal: There were 1 errors\n\n: exit status 1" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/pod_volume_restore_controller.go:397" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeRestoreController).restorePodVolume" logSource="pkg/controller/pod_volume_restore_controller.go:310" name=ocp-30240-datavalidation-mig-1631882901-stage-jzq7g-5mmnv namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882901-stage-jzq7g
time="2021-09-17T12:49:22Z" level=info msg="Restore starting" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:282" name=ocp-30240-datavalidation-mig-1631882901-stage-jzq7g-j2nxn namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882901-stage-jzq7g
time="2021-09-17T12:49:24Z" level=info msg="Ran command=restic restore --repo=s3:s3-us-east-2.amazonaws.com/camreplication/velero/restic/ocp-30240-datavalidation --password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password --cache-dir=/scratch/.cache/restic e3a4a92a --target=. --delete --verify, stdout=restoring <Snapshot e3a4a92a of [/host_pods/fecb458b-17b4-11ec-97b2-0ef8128329e3/volumes/kubernetes.io~aws-ebs/pvc-feb4a4b2-17b4-11ec-97b2-0ef8128329e3] at 2021-09-17 12:49:01.578486552 +0000 UTC by root@velero> to .\nverifying files in .\nfinished verifying 2 files in .\n, stderr=" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:399" name=ocp-30240-datavalidation-mig-1631882901-stage-jzq7g-j2nxn namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882901-stage-jzq7g
time="2021-09-17T12:49:24Z" level=info msg="Restore completed" controller=pod-volume-restore logSource="pkg/controller/pod_volume_restore_controller.go:323" name=ocp-30240-datavalidation-mig-1631882901-stage-jzq7g-j2nxn namespace=openshift-migration restore=openshift-migration/ocp-30240-datavalidation-mig-1631882901-stage-jzq7g


Expected results:
The migmigration resource should display a `ResticVerifyErrors` warning instead of a `VeleroStageRestorePartiallyFailed` warning.

Additional info:

Comment 1 Jaydip Gabani 2022-01-24 18:54:29 UTC
this pr - https://github.com/openshift/velero/pull/147 - fixes the issue

Comment 5 Prasad Joshi 2022-02-22 14:19:35 UTC
Verified with MTC 1.7.0 (pre-stage)

metadata_nvr: openshift-migration-operator-metadata-container-v1.7.0-33

image: registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:c08211e7fa12e8a12ea07ed0f31bf24bdd4cc95b166a34d826e3240a0534e622


Migmigration resource is displaying the correct warning.

$ oc get migmigration ocp-30240-datavalidation-mig-1645536593 -o yaml

  - category: Warn
    durable: true
    lastTransitionTime: "2022-02-22T13:32:27Z"
    message: There were verify errors found in 1 Restic volume restores. See restore `ocp-30240-datavalidation-mig-1645536593-stage-hqzds` for details
    status: "True"
    type: ResticVerifyErrors

Moving status to Verified.

Comment 7 errata-xmlrpc 2022-03-24 06:32:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) 1.7.0 release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1043


Note You need to log in before you can comment on or make changes to this bug.