Bug 1678979

Summary: qemu-img convert abort when converting image with unaligned size (qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed)
Product: Red Hat Enterprise Linux 8 Reporter: Tingting Mao <timao>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
Status: CLOSED ERRATA QA Contact: Tingting Mao <timao>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.1CC: aefrat, areis, bugs, chayang, coli, ddepaula, ebenahar, eblake, eshames, jferlan, jsnow, juzhang, knoel, mtessun, nsoffer, rbalakri, timao, tnisan, virt-maint
Target Milestone: rcKeywords: Regression, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1649788 Environment:
Last Closed: 2019-11-05 20:47:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tingting Mao 2019-02-20 03:29:08 UTC
+++ This bug was initially created as a clone of Bug #1649788 +++

Description of problem:
Ran Import a VM with a memory snapshot to export domain via automation(meaning RESTAPI) fails with the following errors:

Engine:
2018-11-14 12:25:07,968+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-159) [vms_syncAction_3c7c04cf-e505-4ec4] EVENT_ID: IMPORTEXPORT_STARTING_EXPORT_VM(1,162), Starting export Vm vm_TestCase5134_1412135908 to export_domain


2018-11-14 12:25:10,343+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [vms_syncAction_3c7c04cf-e505-4ec4] Command 'ExportVm' (i
d: '5c8e3683-b5ad-4132-bf1b-93bf9f43f3a0') waiting on child command id: '43f44a09-b4fb-4209-874a-ba0ffa42f810' type:'CopyImageGroup' to complete
2018-11-14 12:25:17,941+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2018-11-14 12:25:17,950+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM h
ost_mixed_2 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ()
2018-11-14 12:25:17,951+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] SPMAsyncTask::PollTask: Polling task '211b887c-bd55-44e8-97f0-1174ed34de60' 
(Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2018-11-14 12:25:17,955+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] BaseAsyncTask::onTaskEndSuccess: Task '211b887c-bd55-44e8-97f0-1174ed34de60'
 (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2018-11-14 12:25:17,956+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] CommandAsyncTask::endActionIfNecessary: All tasks of command '54be2d09-6
671-4f42-bde1-a0065b0cf60f' has ended -> executing 'endAction'
2018-11-14 12:25:17,956+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '5
4be2d09-6671-4f42-bde1-a0065b0cf60f'): calling endAction '.
2018-11-14 12:25:17,957+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] SPMAsyncTask::PollTask: Polling task '846e40c9-0971-4c05-be2a-1379d4dc2372' 
(Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2018-11-14 12:25:17,959+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-302923) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to end
Action 'CopyImageGroup',
2018-11-14 12:25:17,968+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] BaseAsyncTask::logEndTaskFailure: Task '846e40c9-0971-4c05-be2a-1379d4dc2372
' (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed: (), code = 261',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed: (), code = 261'

VDSM - host_mixed_2 :
2018-11-14 12:25:06,691+0200 INFO  (jsonrpc/1) [vdsm.api] START moveImage(spUUID=u'0bbd940f-5c1a-4a04-b936-e76fe5ee71d8', srcDomUUID=u'4b7f9443-a53e-4655-8988-af46ee26be3f', dstDomUUID=u'1dc0cc6f-3fd7-47ce-beb0-
fe88e0464b38', imgUUID=u'4dd2f711-3378-4ac4-8b27-6662fcf65148', vmUUID='', op=1, postZero=u'false', force=u'false', discard=False) from=::ffff:10.35.162.7,46758, flow_id=vms_syncAction_3c7c04cf-e505-4ec4, task_i
d=846e40c9-0971-4c05-be2a-1379d4dc2372 (api:48)

2018-11-14 12:25:14,173+0200 DEBUG (tasks/1) [storage.operation] FAILED: <err> = bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - al
igned_offset\' failed.\n"); <rc> = -6 (operation:169)
2018-11-14 12:25:14,174+0200 ERROR (tasks/1) [storage.Image] Copy image error: image=4dd2f711-3378-4ac4-8b27-6662fcf65148, src domain=4b7f9443-a53e-4655-8988-af46ee26be3f, dst domain=1dc0cc6f-3fd7-47ce-beb0-fe88e0464b38 (image:498)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 489, in _interImagesCopy
    self._run_qemuimg_operation(operation)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 113, in _run_qemuimg_operation
    operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 337, in run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 104, in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 178, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
Error: Command ['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/4b7f9443-a53e-4655-8988-af46ee26be3f/images/4dd2f711-3378-4ac4-8b27-6662fcf65148/c0a512d6-08f9-4c8c-a41b-cd1ae55b441f', '-O', 'raw', '-o', 'preallocation=falloc', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_export__domain__storage__local__ge8/1dc0cc6f-3fd7-47ce-beb0-fe88e0464b38/images/4dd2f711-3378-4ac4-8b27-6662fcf65148/c0a512d6-08f9-4c8c-a41b-cd1ae55b441f'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")


2018-11-14 12:25:14,177+0200 ERROR (tasks/1) [storage.TaskManager.Task] (Task='846e40c9-0971-4c05-be2a-1379d4dc2372') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1665, in moveImage
    force, discard)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 545, in move
    self._interImagesCopy(destDom, srcSdUUID, imgUUID, chains)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 499, in _interImagesCopy
    raise se.CopyImageError()
CopyImageError: low level Image copy failed: ()


Version-Release number of selected component (if applicable):
Engine: 4.3.0-0.0.master.20181101091940.git61310aa.el7
VDSM :  4.30.1-25.gitce9e416

How reproducible:
Occurred once so far running automation TestCase5134

Steps to Reproduce:
12:13:59 2018-11-14 12:13:59,088 INFO       Test Setup   1: Creating VM vm_TestCase5134_1412135908
12:14:23 2018-11-14 12:14:22,988 INFO       Test Setup   2: Creating snapshot snap_TestCase5134_1412142298 of VM vm_TestCase5134_1412135908
12:14:23 2018-11-14 12:14:22,991 INFO       Test Setup   3: [class] Add snapshot to VM vm_TestCase5134_1412135908 with {'persist_memory': False, 'description': 'snap_TestCase5134_1412142298', 'wait': True}
12:14:39 2018-11-14 12:14:39,100 INFO       Test Setup   4: Starting VM vm_TestCase5134_1412135908
12:14:39 2018-11-14 12:14:39,103 INFO       Test Setup   5: [class] Start VM vm_TestCase5134_1412135908 with {'wait_for_ip': False, 'pause': False, 'use_cloud_init': False, 'timeout': 600, 'wait_for_status': 'up'}
12:15:50 2018-11-14 12:15:50,673 INFO       Test Step   6: Starting cat process on VM vm_TestCase5134_1412135908
12:24:27 2018-11-14 12:24:27,576 INFO       Test Step   7: Creating snapshot snap_TestCase5134_1412155067 with RAM state
12:24:27 2018-11-14 12:24:27,580 INFO       Test Setup   8: [function] Add snapshot to VM vm_TestCase5134_1412135908 with {'persist_memory': True, 'description': 'snap_TestCase5134_1412155067', 'wait': True}
12:24:59 2018-11-14 12:24:58,990 INFO       Test Setup   9: Power off VM vm_TestCase5134_1412135908
12:24:59 2018-11-14 12:24:59,163 INFO       Test Setup  10: [function] Stop vm vm_TestCase5134_1412135908 with {'async': 'true'}
12:25:02 2018-11-14 12:25:02,876 INFO 002: storage/rhevmtests.storage.storage_snapshots.test_ram_snapshot.TestCase5134.test_import_vm_with_memory_state_snapshot[nfs]
12:25:02 2018-11-14 12:25:02,876 INFO     Import a VM that has memory state snapshot and ensure it resumes memory
12:25:02 2018-11-14 12:25:02,876 INFO     state from that snapshot successfully
12:25:02 2018-11-14 12:25:02,877 INFO STORAGE: NFS
12:25:02 2018-11-14 12:25:02,877 INFO       Test Step  11: Exporting VM vm_TestCase5134_1412135908 to domain export_domain
12:25:02 2018-11-14 12:25:02,879 INFO       Test Step  12: Export vm vm_TestCase5134_1412135908 to export storage domain with {'exclusive': 'false', 'storagedomain': 'export_domain', 'discard_snapshots': 'false', 'timeout': 600, 'async': False}
12:25:18 2018-11-14 12:25:18,035 ERROR Result: FAILED

Actual results:
Import a VM with a memory snapshot to export domain fails - vdsm/storage/image.py", line 499, in _interImagesCopy     raise se.CopyImageError() CopyImageError: low level Image copy failed: ()

Expected results:
Import a VM with a memory snapshot to export domain should work

Additional info:

Comment 3 Ademar Reis 2019-02-25 16:37:13 UTC
Simpler reproducer and further details are here: https://bugzilla.redhat.com/show_bug.cgi?id=1649788#c31

Comment 4 Eric Blake 2019-04-05 18:54:32 UTC
Upstream qemu 4.0 has patches to avoid the problem:
https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg00136.html

Comment 5 Karen Noel 2019-04-06 02:00:59 UTC
(In reply to Eric Blake from comment #4)
> Upstream qemu 4.0 has patches to avoid the problem:
> https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg00136.html

Pick up fix via rebase.

Comment 6 Tingting Mao 2019-04-07 13:32:51 UTC
Reproduced this bug as below:

Test with:
Local file system - XFS
qemu-kvm-3.1.0-20.module+el8+2904+e658c755
kernel-4.18.0-80.el8


Steps:

# truncate -s 11136 test.img

# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 0.0313 sec (318.725 KiB/sec and 31.8725 ops/sec)

# qemu-img info test.img 
image: test.img
file format: raw
virtual size: 11K (11264 bytes)
disk size: 12K

# df -T test.img 
Filesystem                             Type 1K-blocks     Used Available Use% Mounted on
/dev/mapper/rhel_ibm--x3250m6--11-home xfs  913432768 22272620 891160148   3% /home

# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:2158: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.
Aborted (core dumped)

Comment 8 Tingting Mao 2019-06-05 09:12:33 UTC
Tried to verify this bug as below, but still hit the bug. So will assign back, please check. Thanks.


Tested with:
qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71
kernel-4.18.0-95.el8


Steps:
# truncate -s 11136 test.img

# qemu-io -c 'write -P 1 0 10K' test.img -f raw

# qemu-img info test.img 
image: test.img
file format: raw
virtual size: 11K (11264 bytes)
disk size: 12K

# df -T test.img 
Filesystem                              Type 1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel_lenovo--sr630--02-home xfs  514894144 3625496 511268648   1% /home

# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:2123: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.
Aborted (core dumped)

Comment 10 John Ferlan 2019-06-21 20:27:13 UTC
This I believe is (will be) fixed by bz 1588356 which it seems will be resolved using upstream commit:

https://git.qemu.org/?p=qemu.git;a=commit;h=9c3db310ff0b7473272ae8dce5e04e2f8a825390

Comment 11 Eric Blake 2019-06-21 20:43:13 UTC
(In reply to John Ferlan from comment #10)
> This I believe is (will be) fixed by bz 1588356 which it seems will be
> resolved using upstream commit:
> 
> https://git.qemu.org/?p=qemu.git;a=commit;
> h=9c3db310ff0b7473272ae8dce5e04e2f8a825390

Correct; there have been two batches of patches related to the same assertion in bdrv_co_block_status.
One batch related to the NBD drive (the patches mentioned in comment 4) in qemu 4.0;
the other batch related to O_DIRECT (triggered by -T none; the patch mentioned in comment 10) in qemu 4.1.

Comment 14 Tingting Mao 2019-07-19 07:31:49 UTC
Still hit this issue in 8.1 slow train. Reopen this bug in rhel8.1 slow train. Please adjust if needed. Thanks.


Tested with:
qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01
kernel-4.18.0-112.el8


Steps:
# truncate -s 11136 test.img
# qemu-io -c 'write -P 1 0 10K' test.img -f raw
# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.
Aborted (core dumped)

(gdb) bt
#0  0x00007f5252f3e8af in raise () from /lib64/libc.so.6
#1  0x00007f5252f28cc5 in abort () from /lib64/libc.so.6
#2  0x00007f5252f28b99 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f5252f36e66 in __assert_fail () from /lib64/libc.so.6
#4  0x00005583315c9292 in bdrv_co_block_status (bs=0x558333031c90, want_zero=want_zero@entry=true, offset=0, bytes=11264, 
    pnum=pnum@entry=0x7ffe10791ea0, map=map@entry=0x7f52510a5ef0, file=0x7f52510a5ef8) at block/io.c:2133
#5  0x00005583315c9053 in bdrv_co_block_status (bs=bs@entry=0x55833302b740, want_zero=want_zero@entry=true, offset=offset@entry=0, 
    bytes=bytes@entry=11264, pnum=pnum@entry=0x7ffe10791ea0, map=map@entry=0x0, file=0x0) at block/io.c:2145
#6  0x00005583315c937b in bdrv_co_block_status_above (file=0x0, map=0x0, pnum=0x7ffe10791ea0, bytes=11264, offset=0, want_zero=<optimized out>, 
    base=0x0, bs=<optimized out>) at block/io.c:2223
#7  bdrv_block_status_above_co_entry (opaque=0x7ffe10791df0) at block/io.c:2253
#8  0x000055833163ea03 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
#9  0x00007f5252f54480 in ?? () from /lib64/libc.so.6
#10 0x00007ffe10791620 in ?? ()
#11 0x0000000000000000 in ?? ()

Comment 19 Tingting Mao 2019-08-02 02:13:35 UTC
Verified this bug as below:

Tested with:
qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0
kernel-4.18.0-119.el8.x86_64

Steps:
# truncate -s 11136 test.img

# qemu-io -c 'write -P 1 0 10K' test.img -f raw

# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
    (100.00/100%)

Comment 21 errata-xmlrpc 2019-11-05 20:47:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3345