Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Data is inconsistent after executing the blockcommit/blockpull/blockcopy sometimes. It's easier to reproduce especially when we execute them with the image new created for the first time.
Version-Release number of selected component (if applicable):
libvirt-9.3.0-2.el9.s390x
qemu-kvm-8.0.0-5.el9.s390x
How reproducible:
40%
Steps to Reproduce:
1. Prepare a s390x guest xml: rhel.xml.
2. Prepare a script.
# cat test.sh
VM=rhel
qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow2 500M
virsh define rhel.xml
virsh start $VM
for i in {1..4}; do
virsh snapshot-create-as $VM snap$i --disk-only --diskspec vdb,file=/var/lib/libvirt/images/test.snap$i --diskspec vda,snapshot=no --no-metadata
done
echo "------Get the hash value before blockpull------"
virsh console $VM
virsh blockpull $VM vdb --wait --verbose
echo "------Get the hash value after blockpull------"
virsh console $VM
echo "------Clear up------"
virsh undefine $VM
virsh destroy $VM
rm -rf /var/lib/libvirt/images/test.snap*
rm -rf /var/lib/libvirt/images/test.qcow2
3. Run the script and get the hash value to compare (skip the passed steps).
# for i in {1..5}; do echo "------$i------"; sh test.sh; done
------1------
......
[root@localhost ~]# diff 1 2
[root@localhost ~]#
------Clear up------
Domain 'rhel' has been undefined
Domain 'rhel' destroyed
------2------
Formatting '/var/lib/libvirt/images/test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16
Domain 'rhel' defined from rhel.xml
Domain 'rhel' started
Domain snapshot snap1 created
Domain snapshot snap2 created
Domain snapshot snap3 created
Domain snapshot snap4 created
------Get the hash value before blockpull------
Connected to domain 'rhel'
Escape character is ^] (Ctrl + ])
Red Hat Enterprise Linux 9.3 Beta (Plow)
Kernel 5.14.0-329.el9.s390x on an s390x
localhost login: root
Password:
Last login: Tue Jun 20 04:39:16 on ttysclp0
[root@localhost ~]# sha256sum /dev/vdb >1
[root@localhost ~]#
Block Pull: [100 %]
Pull complete
------Get the hash value after blockpull------
Connected to domain 'rhel'
Escape character is ^] (Ctrl + ])
[root@localhost ~]# sha256sum /dev/vdb >2
[root@localhost ~]# diff 1 2 ------>This time we get inconsistent data
1c1
< 45e03bb494c8c52909cfa0c331321d3eebcd45fb23e46075138d529a4b87e4f6 /dev/vdb
---
> 9c4998a4b6a77451dd135c0f9013b706079b4013868e1b23e603209cb7ae09fa /dev/vdb
[root@localhost ~]#
------Clear up------
Domain 'rhel' has been undefined
Domain 'rhel' destroyed
------3------
Formatting '/var/lib/libvirt/images/test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16
Domain 'rhel' defined from rhel.xml
Domain 'rhel' started
Domain snapshot snap1 created
Domain snapshot snap2 created
Domain snapshot snap3 created
Domain snapshot snap4 created
------Get the hash value before blockpull------
Connected to domain 'rhel'
Escape character is ^] (Ctrl + ])
Red Hat Enterprise Linux 9.3 Beta (Plow)
Kernel 5.14.0-329.el9.s390x on an s390x
localhost login: root
Password:
Last login: Tue Jun 20 04:39:51 on ttysclp0
[root@localhost ~]# sha256sum /dev/vdb >1
[root@localhost ~]#
Block Pull: [100 %]
Pull complete
------Get the hash value after blockpull------
Connected to domain 'rhel'
Escape character is ^] (Ctrl + ])
[root@localhost ~]# sha256sum /dev/vdb >2
[root@localhost ~]# diff 1 2 ------>This time we get inconsistent data
1c1
< 71714bb9fa17a40e549479b064aaeb21f6eda64c4332d3b6587783f032ebb00d /dev/vdb
---
> a43f8af3d1d892abe2ccf2a3e4416e5e8cf09a06c75cc06e52d6073fb150515c /dev/vdb
[root@localhost ~]#
------Clear up------
Domain 'rhel' has been undefined
Domain 'rhel' destroyed
------4------
......
[root@localhost ~]# diff 1 2
[root@localhost ~]#
------Clear up------
Domain 'rhel' has been undefined
Domain 'rhel' destroyed
------5------
......
[root@localhost ~]# diff 1 2
[root@localhost ~]#
------Clear up------
Domain 'rhel' has been undefined
Domain 'rhel' destroyed
Actual results:
Sometimes we will get inconsistent data after doing block operation
Expected results:
The data is consistent all the time
Additional info: This is passed in x86_64 arch.
While a change in the hash can see a difference in a disk it is almost impossible to see what actually caused it.
Since you are running a OS I presume that the 'vdb' disk doesn't have a filesystem on it. Otherwise any metadata change of the operating system made to the filesystem would invalidate the hash.
Now to see where in the disk the change happened please re-try (I don't have access to s390) with following steps:
1) start the test as it's done
2) instead of taking a hash of 'vdb' use 'dd' to create a full image. Store it on a different disk (e.g. dd if=/dev/vdb of=/tmp/image)
3) do whatever blockjob caused the problem
4) compare the image with the block device: (cmp -l /tmp/image /dev/vdb)
That way it will print what actually changed between the two points in time, including offsets and actual byte difference (note that printed value is in octal).
When using dd to create a full image, sometimes I'll get:
# dd if=/dev/vdb of=/tmp/image
dd: writing to '/tmp/image': No space left on device
14367481+0 records in
14367480+0 records out
7356149760 bytes (7.4 GB, 6.9 GiB) copied, 25.8914 s, 284 MB/s
After debugging, I found this is because the vdb is the bootable disk image which has filesystems on it. But not the disk we want to test. We never thought of this because we've also never tested the s390x before and actually the place of vdb disk is stable in x86_64. So we got some auto failures randomly in s390x jobs.
Based on this, we need to update our auto scripts and test again. If passed I will close this bug.
Thanks for your contribution.
The naming of /dev/vd* can be rather random indeed. If you want to have stable names, you should better use /dev/disk/by-path/... or /dev/disk/by-uuid/... instead.
Thanks for everyone's comments. According to the verified results, this is indeed not a bug but caused by the guest disk names. So move this bug to NOTABUG.
Description of problem: Data is inconsistent after executing the blockcommit/blockpull/blockcopy sometimes. It's easier to reproduce especially when we execute them with the image new created for the first time. Version-Release number of selected component (if applicable): libvirt-9.3.0-2.el9.s390x qemu-kvm-8.0.0-5.el9.s390x How reproducible: 40% Steps to Reproduce: 1. Prepare a s390x guest xml: rhel.xml. 2. Prepare a script. # cat test.sh VM=rhel qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow2 500M virsh define rhel.xml virsh start $VM for i in {1..4}; do virsh snapshot-create-as $VM snap$i --disk-only --diskspec vdb,file=/var/lib/libvirt/images/test.snap$i --diskspec vda,snapshot=no --no-metadata done echo "------Get the hash value before blockpull------" virsh console $VM virsh blockpull $VM vdb --wait --verbose echo "------Get the hash value after blockpull------" virsh console $VM echo "------Clear up------" virsh undefine $VM virsh destroy $VM rm -rf /var/lib/libvirt/images/test.snap* rm -rf /var/lib/libvirt/images/test.qcow2 3. Run the script and get the hash value to compare (skip the passed steps). # for i in {1..5}; do echo "------$i------"; sh test.sh; done ------1------ ...... [root@localhost ~]# diff 1 2 [root@localhost ~]# ------Clear up------ Domain 'rhel' has been undefined Domain 'rhel' destroyed ------2------ Formatting '/var/lib/libvirt/images/test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16 Domain 'rhel' defined from rhel.xml Domain 'rhel' started Domain snapshot snap1 created Domain snapshot snap2 created Domain snapshot snap3 created Domain snapshot snap4 created ------Get the hash value before blockpull------ Connected to domain 'rhel' Escape character is ^] (Ctrl + ]) Red Hat Enterprise Linux 9.3 Beta (Plow) Kernel 5.14.0-329.el9.s390x on an s390x localhost login: root Password: Last login: Tue Jun 20 04:39:16 on ttysclp0 [root@localhost ~]# sha256sum /dev/vdb >1 [root@localhost ~]# Block Pull: [100 %] Pull complete ------Get the hash value after blockpull------ Connected to domain 'rhel' Escape character is ^] (Ctrl + ]) [root@localhost ~]# sha256sum /dev/vdb >2 [root@localhost ~]# diff 1 2 ------>This time we get inconsistent data 1c1 < 45e03bb494c8c52909cfa0c331321d3eebcd45fb23e46075138d529a4b87e4f6 /dev/vdb --- > 9c4998a4b6a77451dd135c0f9013b706079b4013868e1b23e603209cb7ae09fa /dev/vdb [root@localhost ~]# ------Clear up------ Domain 'rhel' has been undefined Domain 'rhel' destroyed ------3------ Formatting '/var/lib/libvirt/images/test.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16 Domain 'rhel' defined from rhel.xml Domain 'rhel' started Domain snapshot snap1 created Domain snapshot snap2 created Domain snapshot snap3 created Domain snapshot snap4 created ------Get the hash value before blockpull------ Connected to domain 'rhel' Escape character is ^] (Ctrl + ]) Red Hat Enterprise Linux 9.3 Beta (Plow) Kernel 5.14.0-329.el9.s390x on an s390x localhost login: root Password: Last login: Tue Jun 20 04:39:51 on ttysclp0 [root@localhost ~]# sha256sum /dev/vdb >1 [root@localhost ~]# Block Pull: [100 %] Pull complete ------Get the hash value after blockpull------ Connected to domain 'rhel' Escape character is ^] (Ctrl + ]) [root@localhost ~]# sha256sum /dev/vdb >2 [root@localhost ~]# diff 1 2 ------>This time we get inconsistent data 1c1 < 71714bb9fa17a40e549479b064aaeb21f6eda64c4332d3b6587783f032ebb00d /dev/vdb --- > a43f8af3d1d892abe2ccf2a3e4416e5e8cf09a06c75cc06e52d6073fb150515c /dev/vdb [root@localhost ~]# ------Clear up------ Domain 'rhel' has been undefined Domain 'rhel' destroyed ------4------ ...... [root@localhost ~]# diff 1 2 [root@localhost ~]# ------Clear up------ Domain 'rhel' has been undefined Domain 'rhel' destroyed ------5------ ...... [root@localhost ~]# diff 1 2 [root@localhost ~]# ------Clear up------ Domain 'rhel' has been undefined Domain 'rhel' destroyed Actual results: Sometimes we will get inconsistent data after doing block operation Expected results: The data is consistent all the time Additional info: This is passed in x86_64 arch.