Bug 1912246 - migration did not check the data digest when choose "Verify Copy" in direct volume migration
Summary: migration did not check the data digest when choose "Verify Copy" in direct v...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.4.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 1.4.0
Assignee: Dylan Murray
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-04 09:27 UTC by whu
Modified: 2023-09-15 00:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-11 12:55:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:5329 0 None None None 2021-02-11 12:55:45 UTC

Description whu 2021-01-04 09:27:38 UTC
Description of problem:
In direct volume migration mode and checked "Verify Copy" option,  migrate an application which volume data was corrupted during migration process, migration did not throw out  `ResticVerifyErrors`.


Version-Release number of selected component (if applicable):
MTC 1.4.0
source cluster: ocp 4.4 aws
target cluster: ocp 4.7 aws

How reproducible:

Steps to Reproduce:
1.In source cluster deploy nginx application using PVC,
# oc process -f https://gitlab.cee.redhat.com/app-mig/cam-helper/raw/master/ocp-30240/nginx_with_pv_defaultsc_template.yml  -p NAMESPACE=ocp-30240-datavalidation | oc create -f -

2. Create a Migration Plan with default values. And check "Verify Copy" in the "Copy options" screen of the Persistent Volumes, and check the "Use direct PV migration for filesystem copies" in Migraton Options. 

3.In target cluster, create a pod which try to corrupt the volume data

cat <<EOF | oc create -f -
  apiVersion: v1
  kind: Pod
  metadata:
    name: pod-test
    namespace: ocp-30240-datavalidation
  spec:
    containers:
    - name: podtest
      image: alpine
      command: [ "/bin/sh", "-c", "--" ]
      args: [ "while true; do >/data/vol/index.html; done;" ]
      volumeMounts:
      - name: testvolume
        mountPath: /data/vol
    volumes:
    - name: testvolume
      persistentVolumeClaim:
        claimName: nginx-html
 EOF

4 Then trigger the migration, to capture `ResticVerifyErrors` in MigMigration resource.

Actual results:
There was no `ResticVerifyErrors` in MigMigration resource.

Expected results:
There should be a `ResticVerifyErrors` in MigMigration resource

Comment 1 Erik Nelson 2021-01-05 17:06:09 UTC
Is this test one that correctly corrupted the data and threw the ResticVerifyError in the past? Curious if that test is actually sufficient or not. Scott, any thoughts on that?

Comment 2 Scott Seago 2021-01-05 19:47:33 UTC
Yeah, I'm not sure what's going on here. Is the implication that restic is logging errors and we're not catching it? If so, then this is a bug against that. If the implication is that there are errors that neither restic nor MTC are catching, then this is unrelated to that and a completely new request. If we're being asked to look for and report on errors that restic is not already reporting, then this feels like something that's certainly out of scope for 1.4.0.

Basically, if this is not verified as a regression, we should probably push it post-1.4.0.

Comment 3 Xin jiang 2021-01-06 08:01:24 UTC
the issue is that migration doesn’t  check the checksum for source files  when select "Verify copy" to verify data migrated with Filesystem copy

Comment 4 Xin jiang 2021-01-06 08:17:24 UTC
for indirect migration, it does work. But for direct migration, it seems you missed to check the checksum for each of source files.

Comment 5 Scott Seago 2021-01-06 14:28:57 UTC
Ahh, yes. So the issue is that this is a feature that we have in indirect migration but has not (yet) been implemented in direct migration. Got it.

Comment 6 Dylan Murray 2021-01-13 21:06:58 UTC
For direct migration with rsync, I have a short-term solution that *should* catch this test scenario as described in the bug but we will want to include a future enhancement down the road.

Rsync performs transfer-level checksum verification out of the box, meaning that for every data transfer it runs checksum verification to ensure the transfer wasn't corrupted. Rsync also exposes a `--checksum` option which provides additional checksum comparisons to check if checksum of files on the source differ from checksums of files on the destination, and tries to copy the data to make them match. This is different from a high-level "post-transfer" verification where we actually run a checksum of the PV directory itself and compare the checksum of the source and the destination. To do this, we would need to enhance the DVM transfer workflow to compare checksums of the two PVCs from source and destination after rsync has completed.

This latter approach will require significant changes so I am proposing for 1.4.0 the "verify" flag will add the `--checksum` flag to rsync to add some additional checksum comparison over the default transfer-level checksums and we can revisit this in a future release to add additional checksum comparisons.

Comment 7 Erik Nelson 2021-01-14 01:23:28 UTC
Tracking the long term changes here as a 1.4.z candidate: https://issues.redhat.com/browse/MIG-504

Comment 8 Dylan Murray 2021-01-18 14:13:18 UTC
https://github.com/konveyor/mig-controller/pull/890

Comment 9 Dylan Murray 2021-01-19 14:55:06 UTC
https://github.com/konveyor/mig-operator/pull/553

Comment 13 Sergio 2021-01-25 15:42:54 UTC
Verified using MTC 1.4.0. AWS 3.11 -> AWS 4.5 (AWS S3)

openshift-migration-rhel7-operator@sha256:79f524931e7188bfbfddf1e3d23f491b627d691ef7849a42432c7aec2d5f8a54
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: cdf1bd56e353f076693cb7373c0a876be8984593d664ee0d7e1aeae7a3c54c1f

When we check "Validate data" in the migration, the rsync command is executed with the --checksum flag.

We can see that in the migration-controller pod's logs. For instance, this is a command run with validate data and a limited rate

{"level":"info","ts":1611587858.6720047,"logger":"direct|tqqph","msg":"Using Rsync command [rsync --bwlimit=2000 --archive --delete --recursive --hard-links --partial --info=COPY2,DEL2,REMOVE2,SKIP2,FLIST2,PROGRESS2,STATS2 --human-readable --port 2222 --log-file /dev/stdout --checksum /mnt/ocp-30240-datavalidation/nginx-html/ rsync://root.78.48/nginx-html]","direct":"openshift-migration/34387700-5f20-11eb-b0ca-a524f44d2dff-b5r2p"}


Moved to VERIFIED.

Comment 15 errata-xmlrpc 2021-02-11 12:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5329

Comment 16 Red Hat Bugzilla 2023-09-15 00:56:57 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.