RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1649788 - qemu-img convert abort when converting image with unaligned size (qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed)
Summary: qemu-img convert abort when converting image with unaligned size (qemu-img: b...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: pre-dev-freeze
: ---
Assignee: Hanna Czenczek
QA Contact: Tingting Mao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-14 14:17 UTC by Avihai
Modified: 2019-07-22 20:28 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1678979 (view as bug list)
Environment:
Last Closed: 2019-07-22 20:28:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine and vdsm logs (669.91 KB, application/x-gzip)
2018-11-14 14:17 UTC, Avihai
no flags Details
engine and vdsm logs and complete rpm -qa (712.36 KB, application/gzip)
2018-12-11 09:01 UTC, Avihai
no flags Details
4.3.0.4_requested_images_and_engine_vdsm_logs (1.58 MB, application/gzip)
2019-02-18 12:57 UTC, Avihai
no flags Details

Description Avihai 2018-11-14 14:17:08 UTC
Created attachment 1505701 [details]
engine and vdsm logs

Description of problem:
Ran Import a VM with a memory snapshot to export domain via automation(meaning RESTAPI) fails with the following errors:

Engine:
2018-11-14 12:25:07,968+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-159) [vms_syncAction_3c7c04cf-e505-4ec4] EVENT_ID: IMPORTEXPORT_STARTING_EXPORT_VM(1,162), Starting export Vm vm_TestCase5134_1412135908 to export_domain


2018-11-14 12:25:10,343+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [vms_syncAction_3c7c04cf-e505-4ec4] Command 'ExportVm' (i
d: '5c8e3683-b5ad-4132-bf1b-93bf9f43f3a0') waiting on child command id: '43f44a09-b4fb-4209-874a-ba0ffa42f810' type:'CopyImageGroup' to complete
2018-11-14 12:25:17,941+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2018-11-14 12:25:17,950+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM h
ost_mixed_2 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ()
2018-11-14 12:25:17,951+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] SPMAsyncTask::PollTask: Polling task '211b887c-bd55-44e8-97f0-1174ed34de60' 
(Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2018-11-14 12:25:17,955+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] BaseAsyncTask::onTaskEndSuccess: Task '211b887c-bd55-44e8-97f0-1174ed34de60'
 (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2018-11-14 12:25:17,956+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] CommandAsyncTask::endActionIfNecessary: All tasks of command '54be2d09-6
671-4f42-bde1-a0065b0cf60f' has ended -> executing 'endAction'
2018-11-14 12:25:17,956+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '5
4be2d09-6671-4f42-bde1-a0065b0cf60f'): calling endAction '.
2018-11-14 12:25:17,957+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] SPMAsyncTask::PollTask: Polling task '846e40c9-0971-4c05-be2a-1379d4dc2372' 
(Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2018-11-14 12:25:17,959+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-302923) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to end
Action 'CopyImageGroup',
2018-11-14 12:25:17,968+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] BaseAsyncTask::logEndTaskFailure: Task '846e40c9-0971-4c05-be2a-1379d4dc2372
' (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed: (), code = 261',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed: (), code = 261'

VDSM - host_mixed_2 :
2018-11-14 12:25:06,691+0200 INFO  (jsonrpc/1) [vdsm.api] START moveImage(spUUID=u'0bbd940f-5c1a-4a04-b936-e76fe5ee71d8', srcDomUUID=u'4b7f9443-a53e-4655-8988-af46ee26be3f', dstDomUUID=u'1dc0cc6f-3fd7-47ce-beb0-
fe88e0464b38', imgUUID=u'4dd2f711-3378-4ac4-8b27-6662fcf65148', vmUUID='', op=1, postZero=u'false', force=u'false', discard=False) from=::ffff:10.35.162.7,46758, flow_id=vms_syncAction_3c7c04cf-e505-4ec4, task_i
d=846e40c9-0971-4c05-be2a-1379d4dc2372 (api:48)

2018-11-14 12:25:14,173+0200 DEBUG (tasks/1) [storage.operation] FAILED: <err> = bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - al
igned_offset\' failed.\n"); <rc> = -6 (operation:169)
2018-11-14 12:25:14,174+0200 ERROR (tasks/1) [storage.Image] Copy image error: image=4dd2f711-3378-4ac4-8b27-6662fcf65148, src domain=4b7f9443-a53e-4655-8988-af46ee26be3f, dst domain=1dc0cc6f-3fd7-47ce-beb0-fe88e0464b38 (image:498)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 489, in _interImagesCopy
    self._run_qemuimg_operation(operation)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 113, in _run_qemuimg_operation
    operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 337, in run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 104, in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 178, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
Error: Command ['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/4b7f9443-a53e-4655-8988-af46ee26be3f/images/4dd2f711-3378-4ac4-8b27-6662fcf65148/c0a512d6-08f9-4c8c-a41b-cd1ae55b441f', '-O', 'raw', '-o', 'preallocation=falloc', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_export__domain__storage__local__ge8/1dc0cc6f-3fd7-47ce-beb0-fe88e0464b38/images/4dd2f711-3378-4ac4-8b27-6662fcf65148/c0a512d6-08f9-4c8c-a41b-cd1ae55b441f'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")


2018-11-14 12:25:14,177+0200 ERROR (tasks/1) [storage.TaskManager.Task] (Task='846e40c9-0971-4c05-be2a-1379d4dc2372') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1665, in moveImage
    force, discard)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 545, in move
    self._interImagesCopy(destDom, srcSdUUID, imgUUID, chains)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 499, in _interImagesCopy
    raise se.CopyImageError()
CopyImageError: low level Image copy failed: ()


Version-Release number of selected component (if applicable):
Engine: 4.3.0-0.0.master.20181101091940.git61310aa.el7
VDSM :  4.30.1-25.gitce9e416

How reproducible:
Occurred once so far running automation TestCase5134

Steps to Reproduce:
12:13:59 2018-11-14 12:13:59,088 INFO       Test Setup   1: Creating VM vm_TestCase5134_1412135908
12:14:23 2018-11-14 12:14:22,988 INFO       Test Setup   2: Creating snapshot snap_TestCase5134_1412142298 of VM vm_TestCase5134_1412135908
12:14:23 2018-11-14 12:14:22,991 INFO       Test Setup   3: [class] Add snapshot to VM vm_TestCase5134_1412135908 with {'persist_memory': False, 'description': 'snap_TestCase5134_1412142298', 'wait': True}
12:14:39 2018-11-14 12:14:39,100 INFO       Test Setup   4: Starting VM vm_TestCase5134_1412135908
12:14:39 2018-11-14 12:14:39,103 INFO       Test Setup   5: [class] Start VM vm_TestCase5134_1412135908 with {'wait_for_ip': False, 'pause': False, 'use_cloud_init': False, 'timeout': 600, 'wait_for_status': 'up'}
12:15:50 2018-11-14 12:15:50,673 INFO       Test Step   6: Starting cat process on VM vm_TestCase5134_1412135908
12:24:27 2018-11-14 12:24:27,576 INFO       Test Step   7: Creating snapshot snap_TestCase5134_1412155067 with RAM state
12:24:27 2018-11-14 12:24:27,580 INFO       Test Setup   8: [function] Add snapshot to VM vm_TestCase5134_1412135908 with {'persist_memory': True, 'description': 'snap_TestCase5134_1412155067', 'wait': True}
12:24:59 2018-11-14 12:24:58,990 INFO       Test Setup   9: Power off VM vm_TestCase5134_1412135908
12:24:59 2018-11-14 12:24:59,163 INFO       Test Setup  10: [function] Stop vm vm_TestCase5134_1412135908 with {'async': 'true'}
12:25:02 2018-11-14 12:25:02,876 INFO 002: storage/rhevmtests.storage.storage_snapshots.test_ram_snapshot.TestCase5134.test_import_vm_with_memory_state_snapshot[nfs]
12:25:02 2018-11-14 12:25:02,876 INFO     Import a VM that has memory state snapshot and ensure it resumes memory
12:25:02 2018-11-14 12:25:02,876 INFO     state from that snapshot successfully
12:25:02 2018-11-14 12:25:02,877 INFO STORAGE: NFS
12:25:02 2018-11-14 12:25:02,877 INFO       Test Step  11: Exporting VM vm_TestCase5134_1412135908 to domain export_domain
12:25:02 2018-11-14 12:25:02,879 INFO       Test Step  12: Export vm vm_TestCase5134_1412135908 to export storage domain with {'exclusive': 'false', 'storagedomain': 'export_domain', 'discard_snapshots': 'false', 'timeout': 600, 'async': False}
12:25:18 2018-11-14 12:25:18,035 ERROR Result: FAILED

Actual results:
Import a VM with a memory snapshot to export domain fails - vdsm/storage/image.py", line 499, in _interImagesCopy     raise se.CopyImageError() CopyImageError: low level Image copy failed: ()

Expected results:
Import a VM with a memory snapshot to export domain should work

Additional info:

Comment 1 Avihai 2018-11-14 14:29:23 UTC
Correction its export VM to export SD NOT import VM.

Description of problem:
Ran export a VM with a memory snapshot to export domain via automation(meaning RESTAPI) fails with the following errors:

Comment 2 Avihai 2018-11-14 15:09:15 UTC
Ran the same scenario manually 3 times :

- The 1st time I created the VM from template to an ISCSI SD - issue did not reproduce.

- The 2nd time I created the VM from a template to an NFS SD - issue reproduced!!
- The 3rd time I created the VM from a template to an GLUSTER SD - issue reproduced!! 
 

It looks like this issue occurs only on when creating the VM on a file storage flavor either NFS/Gluster.

Comment 3 Avihai 2018-11-14 15:39:34 UTC
Checked also on 4.2.7.4 and the same issue appears.

I think this issue was here for a while but as we only test ISCSI SD in Tier2 tests we missed it.

However, this test was run/verified on 4.1 (18/11/2016) so I assume it was tested /worked before.

Tal, how should we mark this bug ? regression ?

Comment 4 Avihai 2018-11-18 09:55:02 UTC
Ran the same scenario on 4.1.11.2-0.1 (with NFS/Gluster e VM from template ) which reproduced in 4.2.7.4 and  4.3.0-0.0.master.20181101091940.git61310aa.el7.

Marking this issue as a regression (from 4.1 -> 4.2)

Comment 5 Red Hat Bugzilla Rules Engine 2018-11-18 09:55:07 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 6 Elad 2018-11-18 09:56:24 UTC
Tal, as this is a regression, let's re-target to 4.2.8

Comment 7 Nir Soffer 2018-12-07 19:39:45 UTC
This a qemu-img bug:

Error: Command ['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__2/4b7f9443-a53e-4655-8988-af46ee26be3f/images/4dd2f711-3378-4ac4-8b27-6662fcf65148/c0a512d6-08f9-4c8c-a41b-cd1ae55b441f', '-O', 'raw', '-o', 'preallocation=falloc', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_export__domain__storage__local__ge8/1dc0cc6f-3fd7-47ce-beb0-fe88e0464b38/images/4dd2f711-3378-4ac4-8b27-6662fcf65148/c0a512d6-08f9-4c8c-a41b-cd1ae55b441f'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")


This issue was discussed in devel mailing list here:
https://lists.ovirt.org/archives/list/devel@ovirt.org/thread/UGGNDS4SVJ2GHAQ57H2PZWMFC33ODLA4/#TF6BPZIIJYMFJLFFJYGADKBQ2FDH5Q7O

We filed this bug:
https://bugzilla.redhat.com/1589738

It was fixed on 2018-08-21:
https://bugzilla.redhat.com/show_bug.cgi?id=1589738#c7

Please test again with current qemu-img version and report which versions
were tested.

Please always attach the output of "rpm -qa" to a bug when you report the bug.

Comment 8 Nir Soffer 2018-12-07 19:43:17 UTC
Eric, is is possible that bug 1589738 sneaked into qemu-img-rhev?

Comment 9 Nir Soffer 2018-12-07 19:50:28 UTC
Avihay, can you explain how and when the vm snapshots were created? Are you 
testing old images that were created years ago, possibly with unaligned 
memory or configuration volumes?

Also please share the output of:

ls -l /rhev/data-center/mnt/[glusterSD/]server:_path/sd-id/images/img-id

We need to understand which volumes are not aligned and why.

Comment 10 Avihai 2018-12-11 09:00:08 UTC
(In reply to Nir Soffer from comment #9)
> Avihay, can you explain how and when the vm snapshots were created?
Sure, VM snapshots were created via UI .

1) Create a VM from a template from NFS/gluster SD .
2) Power-on the VM
3) Create vi UI a snapshot with RAM (default vi UI)
4) Power off VM
5) Export VM to export domain

> Are you 
> testing old images that were created years ago, possibly with unaligned 
> memory or configuration volumes?
No it's new images we are all using as template and it works with ISCSI SD so this does not look as the root cause here.
> 
> Also please share the output of:
> 
> ls -l /rhev/data-center/mnt/[glusterSD/]server:_path/sd-id/images/img-id
> 
> We need to understand which volumes are not aligned and why.
As gluster has issues now due to Bug 1657645 I reproduces the issue with NFS .
requested output:
[root@storage-ge8-vdsm2 ~]# ls -l /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe
total 1040
-rw-rw----. 1 vdsm kvm   11136 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b
-rw-rw----. 1 vdsm kvm 1048576 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b.lease
-rw-r--r--. 1 vdsm kvm     424 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b.meta

Complete rpm -qa is attached with new reproduction (4.3) logs.

Comment 11 Avihai 2018-12-11 09:01:25 UTC
Created attachment 1513302 [details]
engine and vdsm logs and complete rpm -qa

Comment 12 Avihai 2018-12-11 11:41:19 UTC
These are the ram snapshot disks which are potentially problematic:

1) name = vm1_snapshot_memory , id = b82b875a-502c-42b7-985a-8c3025c6f148 , SD = nfs_0 (ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5)

ls -l /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/b82b875a-502c-42b7-985a-8c3025c6f148
total 140716
-rw-rw----. 1 vdsm kvm 143037952 Dec 11 10:46 35bce438-cadc-47d3-b140-1f9457bbe176
-rw-rw----. 1 vdsm kvm   1048576 Dec 11 10:46 35bce438-cadc-47d3-b140-1f9457bbe176.lease
-rw-r--r--. 1 vdsm kvm       421 Dec 11 10:46 35bce438-cadc-47d3-b140-1f9457bbe176.meta

qemu-img info /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/b82b875a-502c-42b7-985a-8c3025c6f148/35bce438-cadc-47d3-b140-1f9457bbe176
image: /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/b82b875a-502c-42b7-985a-8c3025c6f148/35bce438-cadc-47d3-b140-1f9457bbe176
file format: raw
virtual size: 136M (143037952 bytes)
disk size: 136M

2) name = vm1_snapshot_metadata , id = 0b7dec1f-73d2-4799-8cef-64b3715602fe , SD = nfs_0 (ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5)

 ls -l /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe
total 1040
-rw-rw----. 1 vdsm kvm   11136 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b
-rw-rw----. 1 vdsm kvm 1048576 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b.lease
-rw-r--r--. 1 vdsm kvm     424 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b.meta

> qemu-img info /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-a2db-46ab61cfa50b
image: /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-a2db-46ab61cfa50b
file format: raw
virtual size: 11K (11264 bytes)
disk size: 12K

Comment 13 Avihai 2018-12-11 11:57:12 UTC
VDSM log:
2018-12-11 10:49:52,036+0200 DEBUG (tasks/4) [storage.operation] FAILED: <err> = bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - al
igned_offset\' failed.\n"); <rc> = -6 (operation:170)
2018-12-11 10:49:52,038+0200 ERROR (tasks/4) [storage.Image] Copy image error: image=0b7dec1f-73d2-4799-8cef-64b3715602fe, src domain=ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5, dst domain=80c22003-d77c-45bd-9a29-2056
22b71394 (image:498)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 489, in _interImagesCopy
    self._run_qemuimg_operation(operation)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 113, in _run_qemuimg_operation
    operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 337, in run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 106, in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 179, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e
8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-a2db-46ab61cfa50b', '-O', 'raw', '-o', 'preallocation=falloc', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Stora
ge__NFS_export__domain__storage__local__ge8/80c22003-d77c-45bd-9a29-205622b71394/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-a2db-46ab61cfa50b'] failed with rc=-6 out='' err=bytearray(b"qemu-i
mg: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")

Comment 14 Nir Soffer 2018-12-11 14:38:08 UTC
Returning needinfo for Eric.

Eric, see comment 7, is it possible that the upstream bug sneaked into
qemu-img-rhev?

Comment 15 Elad 2018-12-23 09:19:57 UTC
The bug occurs both on RHV 4.3 and 4.2.7 with RHEL7.6 and not on RHV 4.1 with RHEL7.5.
According to that and to comment #7, this smells like a regression in RHEL7.6.
As we're waiting for a response from Eric for 12 days already, I'm moving this bug to RHEL7 for further investigation.

Comment 18 Eric Blake 2019-01-18 20:15:55 UTC
(In reply to Nir Soffer from comment #7)
> This a qemu-img bug:
> 

> We filed this bug:
> https://bugzilla.redhat.com/1589738

Filed against Fedora; now we're asking about RHEV.

> 
> It was fixed on 2018-08-21:
> https://bugzilla.redhat.com/show_bug.cgi?id=1589738#c7
> 

Re-reading that report, the commit in question is:

commit e0b371ed5e2db079051139136fd0478728b6a58f
Author: Eric Blake <eblake>
Date:   Mon Jun 11 16:39:26 2018 -0500

    qemu-img: Fix assert when mapping unaligned raw file
    
    Commit a290f085 exposed a latent bug in qemu-img map introduced
    during the conversion of block status to be byte-based.  Earlier in
    commit 5e344dd8, the internal interface get_block_status() switched
    to take byte-based parameters, but still called a sector-based
    block layer function; as such, rounding was added in the lone
    caller to obey the contract.  However, commit 237d78f8 changed
    get_block_status() to truly be byte-based, at which point rounding
    to sector boundaries can result in calling bdrv_block_status() with
    'bytes == 0' (a coding error) when the boundary between data and a
    hole falls mid-sector (true for the past-EOF implicit hole present
    in POSIX files).  Fix things by removing the rounding that is now
    no longer necessary.
    
    See also https://bugzilla.redhat.com/1589738
    
    Fixes: 237d78f8
    Reported-by: Dan Kenigsberg <danken>
    Reported-by: Nir Soffer <nsoffer>
    Reported-by: Maor Lipchuk <mlipchuk>
    CC: qemu-stable
    Signed-off-by: Eric Blake <eblake>
    Signed-off-by: Kevin Wolf <kwolf>

Multiple commits mentioned there; here's the upstream version breakdowns:
$ git describe --contains --match v\* a290f085 5e344dd8 237d78f8 e0b371e
v2.12.0-rc0~49^2~30
v2.11.0-rc0~23^2~20
v2.11.0-rc0~23^2~19
v3.0.0-rc0~84^2~25

Or, the problem was latent in 2.11, exposed in 2.12, and fixed in 3.0.  Next is to determine which of these commits exist on various downstream builds (either directly, or as backports):

RHEL7.5: qemu-kvm-rhev-2.10.0-21.el7_5.7, based on 2.10 and thus does not contain bug or fix in isolation; likewise, grepping:
$ git log v2.10.0..qemu-kvm-rhev-2.10.0-21.el7_5.7 comes up empty for any of those patches.

RHEL7.6: qemu-kvm-rhev-2.12.0-18.el7_6.3, based on 2.12 and thus contains the bug unless a fix has been backported.  Grepping:
$ git log v2.12.0..qemu-kvm-rhev-2.12.0-18.el7_6.3 | grep e0b371e

turns up a hit; looking at context, I see:
commit bd6ce734563c0308b021c013f1e622ff185be60b
Author: Max Reitz <mreitz>
Date:   Mon Aug 6 16:35:52 2018 +0200

    qemu-img: Fix assert when mapping unaligned raw file
    
    RH-Author: Max Reitz <mreitz>
    Message-id: <20180806163553.13344-2-mreitz>
    Patchwork-id: 81647
    O-Subject: [RHV-7.6 qemu-kvm-rhev PATCH 1/2] qemu-img: Fix assert when mapping unaligned raw file
    RH-Acked-by: Eric Blake <eblake>
    RH-Acked-by: Kevin Wolf <kwolf>
    RH-Acked-by: John Snow <jsnow>
...
    (cherry picked from commit e0b371ed5e2db079051139136fd0478728b6a58f)

and with further checking, I found this was fixed for BZ 1601310 qemu-kvm-rhev-2.12.0-10.el7 in Aug 2018 (and a RHEL8 counterpart at BZ 1639374).

I'm suspecting this is therefore a duplicate of 1601310, although since that was fixed in Aug and this was opened in Nov, I have to wonder what version of qemu-kvm-rhev you were using to reproduce the problem.

Comment 19 Nir Soffer 2019-01-18 21:13:37 UTC
Eric, in attachment 1513302 [details] I found:

$ grep qemu-img logs/rpm_qa.txt 
qemu-img-rhev-2.12.0-18.el7_6.1.x86_64

In bug 1601310 we see:

qemu-img: qemu-img.c:2765: get_block_status: Assertion `bytes' failed.
Aborted (core dumped)

Here we see (in comment 13):

Command:
[
    '/usr/bin/qemu-img',
    'convert',
    '-p',
    '-t', 'none',
    '-T', 'none',
    '-f', 'raw',
    'source...',
    '-O', 'raw',
    '-o', 'preallocation=falloc',
    'destination...'
]

Error:
qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n

So this looks like a different assert.

Comment 20 Nir Soffer 2019-01-18 21:20:54 UTC
And in bug 1589738 we see:

/usr/bin/qemu-img map --output json /tmp/test2
[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 0},
{ "start": 4096, "length": 12288, "depth": 0, "zero": true, "data": false, "offset": 4096},
{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 16384},
qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed.
Aborted (core dumped)

Which looks similar to bug 1601310, but this bug seems to be another issue.

Comment 22 Tingting Mao 2019-01-22 06:15:51 UTC
Hi,

I tried reproduce this bug in qemu as below. However, it did not success. Could anyone who gives some points? Thanks in advance.


Prepare env:

# nfsstat -m
/home/test/tgt_dir from $nfs_server1:/home/nfs_share
 Flags:    rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.x.x,local_lock=none,addr=10.66.x.x

/home/test/bug_rep from $nfs_server2:/home/nfs_share
 Flags:    rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp6,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1



Scenario 1(Convert an unaligned file directly on NFS)

# truncate -s 11136 test.img

# qemu-img convert -f raw -O raw -p -t none -T none /home/test/bug_rep/timao_test/tmp/test.img /home/test/tgt_dir/tmp/tgt.img -o preallocation=falloc
    (100.00/100%)



Scenario 2(Create internal snapshots to the guest image before convert)

1. Boot from base.qcow2 file based on NFS
# /usr/libexec/qemu-kvm \
        -name 'guest-rhel7.5' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/test/bug_rep/timao_test/base.qcow2 \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pcie.0,addr=05,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 4096 \
        -smp 8 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \

2.  Create internal two internal snapshots:
# nc -U monitor-qmpmonitor1-20180220-094308-h9I6hRsI
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 12, "major": 2}, "package": "qemu-kvm-rhev-2.12.0-18.el7_6.1"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
{"return": {}}

{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn1"}}
{"timestamp": {"seconds": 1548122558, "microseconds": 140936}, "event": "STOP"}
{"timestamp": {"seconds": 1548122591, "microseconds": 864040}, "event": "RESUME"}
{"return": ""}

{"execute":"human-monitor-command","arguments":{"command-line":"info snapshots"}}
{"return": "List of snapshots present on all disks:\r\nID        TAG                 VM SIZE                DATE       VM CLOCK\r\n--        sn1                    3.0G 2019-01-21 21:02:38   14:04:23.015\r\n"}

{"execute":"human-monitor-command","arguments":{"command-line":"info snapshots"}}
{"return": "List of snapshots present on all disks:\r\nID        TAG                 VM SIZE                DATE       VM CLOCK\r\n--        sn1                    3.0G 2019-01-21 21:02:38   14:04:23.015\r\n"}

{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn2"}}
{"timestamp": {"seconds": 1548122747, "microseconds": 943199}, "event": "STOP"}
{"timestamp": {"seconds": 1548122781, "microseconds": 821281}, "event": "RESUME"}
{"return": ""}

{"execute":"human-monitor-command","arguments":{"command-line":"info snapshots"}}
{"return": "List of snapshots present on all disks:\r\nID        TAG                 VM SIZE                DATE       VM CLOCK\r\n--        sn1                    3.0G 2019-01-21 21:02:38   14:04:23.015\r\n--        sn2                    3.0G 2019-01-21 21:05:47   14:06:59.091\r\n"}

3. Convert base.qcow2 to target image based on another NFS
# qemu-img convert -f qcow2 -O qcow2 base.qcow2 -o preallocation=falloc -t none -T none -p /home/test/tgt_dir/tgt.qcow2
    (100.00/100%)
# qemu-img info /home/test/tgt_dir/tgt.qcow2
image: /home/test/tgt_dir/tgt.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 20G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false



More info after converting:

What’s strange is that the compare result is not the identical and after booting the guest from tgt.qcow2 file, there is no snapshot any more. It should be another bug???

# qemu-img compare -f qcow2 -F qcow2 /home/test/bug_rep/timao_test/base.qcow2 /home/test/tgt_dir/tgt.qcow2 -p
Content mismatch at offset 1048576!

Boot guest from tgt.qcow2 file, and check internal snapshots file info:
# nc -U monitor-qmpmonitor1-20180220-094308-h9I6hRsI
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 12, "major": 2}, "package": "qemu-kvm-rhev-2.12.0-18.el7_6.1"}, "capabilities": []}}

{"execute": "qmp_capabilities"}
{"return": {}}

{"execute":"human-monitor-command","arguments":{"command-line":"info snapshots"}}
{"return": "There is no snapshot available.\r\n"}

Comment 23 Ademar Reis 2019-02-14 15:49:45 UTC
I'm reviewing this BZ and I'm confused about the status... Is the original assumption still valid? It was originally repoted as an automation blocker and regression, but comment #22 says it can't be reproduced anymore.

Can it still be reproduced in the original test automation environment, or are we talking about a new bug here?

We need to understand the real impact to prioritize it.

Comment 24 Elad 2019-02-14 16:13:50 UTC
We were able to reproduce it in our automation several times with different scenarios (see comment #10), not consistently. If I'm not mistaken, some of the test cases are blocked from running because of this bug and that's why it is marked as an AutomationBlocker.

Comment 25 Nir Soffer 2019-02-14 16:39:18 UTC
(In reply to Elad from comment #24)

Elad, can you enable the failed tests and reproduce again?

If it is still reproducible, please attach the image and .meta files
to the bug (the image is only 12k).

I think the issue in comment 22 is trying to convert an empty file:

    # truncate -s 11136 test.img

While the file reproducing this issue was:

    ls -l /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe
    total 1040
    -rw-rw----. 1 vdsm kvm   11136 Dec 11 10:46 b36e4a3a-a826-484b-a2db-46ab61cfa50b

    $ qemu-img info /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-a2db-46ab61cfa50b
    image: /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-a2db-46ab61cfa50b
    file format: raw
    virtual size: 11K (11264 bytes)
    disk size: 12K

Note: disk size != 0

Note also the difference in virtual size:

    ls:        11,136 bytes  (21.75 * 512)
    qemu-img:  11,264 bytes  (22 * 512)

I think this is an error on qemu side, rounding disk size up, exposing non-existing
area on the actual file. I think qemu should instead round down, truncating unaligned
bytes at the end of the image.

I wonder also how we got a raw file with unaligned size. I think we always ensure that
raw images are aligned to 512 bytes. The only case when images are not aligned is empty
qcow2 file, created in odd size.

Comment 27 Avihai 2019-02-18 12:56:13 UTC
Reproduces with the same scenario on 4.3.0.4-0.1.el7.

As requested image + meta files are also attached.

Image path before copy:
/rhev/data-center/0666bee6-782b-4775-8683-93209de805db/8b27f44c-be74-405e-a099-39cdef2f83bc/images/ae6a82d9-f63e-4d8a-8c78-31483713fd0e

Engine log:

2019-02-18 14:32:55,231+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2019-02-18 14:32:55,247+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ()
2019-02-18 14:32:55,248+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] SPMAsyncTask::PollTask: Polling task '2818ac85-9d20-4211-a1b2-8a8751f396aa' (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'running'.
2019-02-18 14:32:55,248+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] SPMAsyncTask::PollTask: Polling task '33f3b730-1c46-410d-acd4-6e189e9e28a9' (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2019-02-18 14:32:55,256+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [] BaseAsyncTask::logEndTaskFailure: Task '33f3b730-1c46-410d-acd4-6e189e9e28a9' (Parent Command 'CopyImageGroup', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed: (), code = 261',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed: (), code = 261'


VDSM log:
2019-02-18 14:32:53,599+0200 DEBUG (tasks/7) [storage.operation] FAILED: <err> = bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n"); <rc> = -6 (operation:170)
2019-02-18 14:32:53,600+0200 ERROR (tasks/7) [storage.Image] Copy image error: image=ae6a82d9-f63e-4d8a-8c78-31483713fd0e, src domain=8b27f44c-be74-405e-a099-39cdef2f83bc, dst domain=83cbc4f2-cb4a-4ea7-ad2d-fe67a1b8478a (image:498)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 489, in _interImagesCopy
    self._run_qemuimg_operation(operation)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 113, in _run_qemuimg_operation
    operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 343, in run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 106, in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 179, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge4__nfs__0/8b27f44c-be74-405e-a099-39cdef2f83bc/images/ae6a82d9-f63e-4d8a-8c78-31483713fd0e/7fec9ccd-c649-4aff-9098-5953cb80f284', '-O', 'raw', '-o', 'preallocation=falloc', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_export__domain__storage__local__ge4/83cbc4f2-cb4a-4ea7-ad2d-fe67a1b8478a/images/ae6a82d9-f63e-4d8a-8c78-31483713fd0e/7fec9ccd-c649-4aff-9098-5953cb80f284'] failed with rc=-6 out='' err=bytearray(b"qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset\' failed.\n")

Comment 28 Avihai 2019-02-18 12:57:17 UTC
Created attachment 1535915 [details]
4.3.0.4_requested_images_and_engine_vdsm_logs

Comment 29 Nir Soffer 2019-02-18 13:14:17 UTC
Thanks Avihay!

This image is a memory metadata image (DISKTYP=MEMM), holding metadata about a snapshot.

$ cat 7fec9ccd-c649-4aff-9098-5953cb80f284.meta 
DOMAIN=8b27f44c-be74-405e-a099-39cdef2f83bc
CTIME=1550493132
FORMAT=RAW
DISKTYPE=MEMM
LEGALITY=LEGAL
SIZE=20
VOLTYPE=LEAF
DESCRIPTION={"DiskAlias":"vm_TestCase5134_1814254004_snapshot_metadata","DiskDescription":"Memory snapshot disk for snapshot 'snap_TestCase5134_1814274231' of VM 'vm_TestCase5134_1814254004' (VM ID: '513ac21a-9b12-43aa-a"}
IMAGE=ae6a82d9-f63e-4d8a-8c78-31483713fd0e
PUUID=00000000-0000-0000-0000-000000000000
MTIME=0
POOL_UUID=
TYPE=PREALLOCATED
GEN=0
EOF

Reproducing on Fedora 28 with virt-preview:

$ qemu-img convert -f raw -O raw -p -t none -T none 7fec9ccd-c649-4aff-9098-5953cb80f284 7fec9ccd-c649-4aff-9098-5953cb80f284-copy -o preallocation=falloc
qemu-img: /builddir/build/BUILD/qemu-3.1.0/block/io.c:2158: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.
Aborted (core dumped)

Tested with:
$ qemu-img --version
qemu-img version 3.1.0 (qemu-3.1.0-4.fc28)
Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers

Reproducing on CentOS 7.6:

$ qemu-img convert -f raw -O raw -p -t none -T none 7fec9ccd-c649-4aff-9098-5953cb80f284 7fec9ccd-c649-4aff-9098-5953cb80f284-copy -o preallocation=falloc
qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset' failed.
Aborted (core dumped)

$ qemu-img --version
qemu-img version 2.12.0 (qemu-kvm-ev-2.12.0-18.el7_6.3.1)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

Regardless of the qemu-img issue, I think this is bug in RHV - this file should be
padded to aligned size, I'm checking why it was not aligned.

But even if we fix this in RHV, this issue may break existing RHV versions (4.2)
and images created by such versions.

Comment 30 Tingting Mao 2019-02-19 03:15:24 UTC
(In reply to Nir Soffer from comment #25)
> (In reply to Elad from comment #24)
> 
> Elad, can you enable the failed tests and reproduce again?
> 
> If it is still reproducible, please attach the image and .meta files
> to the bug (the image is only 12k).
> 
> I think the issue in comment 22 is trying to convert an empty file:
> 
>     # truncate -s 11136 test.img
> 
> While the file reproducing this issue was:
> 
>     ls -l
> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:
> _Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-
> 40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe
>     total 1040
>     -rw-rw----. 1 vdsm kvm   11136 Dec 11 10:46
> b36e4a3a-a826-484b-a2db-46ab61cfa50b
> 
>     $ qemu-img info
> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:
> _Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-
> 40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-
> a2db-46ab61cfa50b
>     image:
> /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:
> _Storage__NFS_storage__local__ge8__nfs__0/ce0b257b-f3c7-4e8f-a724-
> 40dccf0d6bb5/images/0b7dec1f-73d2-4799-8cef-64b3715602fe/b36e4a3a-a826-484b-
> a2db-46ab61cfa50b
>     file format: raw
>     virtual size: 11K (11264 bytes)
>     disk size: 12K
> 
> Note: disk size != 0

Thanks for your info. And I tried again, still did not hit the issue. Any more pointers?


Tested with:

# nfsstat -m | grep share -A 2
/home/test/bug_rep/source from 10.66.10.26:/home/nfs_share
 Flags:	rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.194.81,local_lock=none,addr=10.66.10.26

/home/test/bug_rep/target from 10.73.72.84:/home/nfs_share
 Flags:	rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.194.81,local_lock=none,addr=10.73.72.84

# qemu-img --version
qemu-img version 2.12.0 (qemu-kvm-rhev-2.12.0-18.el7_6.3)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers


Steps:

# truncate -s 11136 test.img

# qemu-io -c 'write -P 1 0 10K' test.img 
WARNING: Image format was not specified for 'test.img' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 0.0014 sec (6.872 MiB/sec and 703.7298 ops/sec)

# ll test.img 
-rw-r--r-- 1 nfsnobody nfsnobody 11136 Feb 18 21:45 test.img

# qemu-img info test.img 
image: test.img
file format: raw
virtual size: 11K (11264 bytes)
disk size: 12K

# qemu-img convert -f raw -O raw -p -t none -T none /home/test/bug_rep/source/test.img /home/test/bug_rep/target/tgt.img -o preallocation=falloc
    (100.00/100%)

Comment 31 Nir Soffer 2019-02-19 18:20:22 UTC
(In reply to Tingting Mao from comment #30)
> Tested with:
> 
> # nfsstat -m | grep share -A 2
> /home/test/bug_rep/source from 10.66.10.26:/home/nfs_share
>  Flags:
> rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,
> timeo=600,retrans=2,sec=sys,clientaddr=10.73.194.81,local_lock=none,addr=10.
> 66.10.26

You are using NFS 4.1, which does not support sparseness. qemu uses
lseek(SEEK_HOLE) / lseek(SEEK_DATA) to find data and holes, and with 
NFS 4.1 it will find one data area from 0 to end of the image. This may
skip the code causing the segfault.

You should be able to reproduce this with:
- NFS 4.2
- GlusterFS
- xfs
- ext4

All options are supported by RHV and qemu-img should not segfault when
converting images located on these kinds of storage. I think that
verifying with local file system is good enough.
 
> Steps:
> 
> # truncate -s 11136 test.img
> 
> # qemu-io -c 'write -P 1 0 10K' test.img 

It is nicer to generate the test image dynamically, but best reproduce first
with the actual image generated by RHV, uploaded by Avihay in 
attachment 1535915 [details]:

The image is:
ae6a82d9-f63e-4d8a-8c78-31483713fd0e/7fec9ccd-c649-4aff-9098-5953cb80f284

> # qemu-img convert -f raw -O raw -p -t none -T none
> /home/test/bug_rep/source/test.img /home/test/bug_rep/target/tgt.img -o
> preallocation=falloc
>     (100.00/100%)

You forgot the -o preallocation option, please use:

    qemu-img convert -f raw -O raw -p -t none -T none 7fec9ccd-c649-4aff-9098-5953cb80f284 7fec9ccd-c649-4aff-9098-5953cb80f284-copy -o preallocation=falloc

Comment 32 Tingting Mao 2019-02-20 02:34:26 UTC
Thanks for Nir's info, I can reproduce this issue now.


Tested with:

Local file system(xfs)
qemu-kvm-rhev-2.12.0-18.el7_6.3
kernel-3.10.0-944.el7


Scenario 1 (Convert the unaligned raw file truncated by myself)

# truncate -s 11136 test.img

# qemu-io -c 'write -P 1 0 10K' test.img -f raw
wrote 10240/10240 bytes at offset 0
10 KiB, 1 ops; 0.0013 sec (7.489 MiB/sec and 766.8712 ops/sec)

# qemu-img info test.img 
image: test.img
file format: raw
virtual size: 11K (11264 bytes)
disk size: 12K

# ll test.img 
-rw-r--r-- 1 root root 11136 Feb 19 21:19 test.img

# df -T test.img 
Filesystem                            Type 1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel_hp--dl388g8--16-home xfs  222828032 5909412 216918620   3% /home

# qemu-img convert -f raw -O raw -p -t none -T none test.img tgt.img -o preallocation=falloc
qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset' failed.
Aborted (core dumped)


Scenario 2 (Convert the unaligned raw file generated by RHV)

# df -T 7fec9ccd-c649-4aff-9098-5953cb80f284 
Filesystem                            Type 1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel_hp--dl388g8--16-home xfs  222828032 5932492 216895540   3% /home

# qemu-img convert -f raw -O raw -p -t none -T none 7fec9ccd-c649-4aff-9098-5953cb80f284 target.img -o preallocation=falloc
qemu-img: block/io.c:2134: bdrv_co_block_status: Assertion `*pnum && (((*pnum) % (align)) == 0) && align > offset - aligned_offset' failed.
Aborted (core dumped)

Comment 33 Ademar Reis 2019-02-25 16:37:40 UTC
(In reply to Nir Soffer from comment #29)
> But even if we fix this in RHV, this issue may break existing RHV versions
> (4.2)
> and images created by such versions.

But is this something that happens in practice, in customer scenarios, or is it something that happens only in a synthetic test environment? It looks like the latter to me, so I would like to drop blocker and 7.6.z flags.

Comment 34 Nir Soffer 2019-02-25 16:50:14 UTC
(In reply to Ademar Reis from comment #33)
I did not see any customer bug about it yet, but this is real customer scenario:
1. Create vm with enough disks/devices to create big enough conf volume (> 10k)
2. Create snapshot with memory (default)
3. Export the vm

Comment 35 Ademar Reis 2019-02-25 17:02:34 UTC
(In reply to Nir Soffer from comment #34)
> (In reply to Ademar Reis from comment #33)
> I did not see any customer bug about it yet, but this is real customer
> scenario:
> 1. Create vm with enough disks/devices to create big enough conf volume (>
> 10k)
> 2. Create snapshot with memory (default)
> 3. Export the vm

OK, adjusting bug status, we'll try to fix it in 7.7.

Comment 37 Maxim Levitsky 2019-05-15 13:09:01 UTC
I analyzed this bug:

First of all this is a bit different issue that #1589738 although conceptual similar.


To reproduce this, you need a file that is allocated, but has non sector aligned size.
For example:

 dd if=/dev/zero of=./test.img bs=1 count=11050

If looking at the file from sector based perspective, its last sector has a 'hole' beyond EOF, which is not sector size aligned.

Then run

qemu-img convert -f raw -O raw -p -T none  ./test.img ./copy.img

The -T none forces O_DIRECT on the input file, which is needed to reproduce this bug.

What happening is that:

* raw_getlength (in file-posix) allows file backed image to be of any length.
  with byte based interface.
  this IMHO is correct, since files can represent various things, and not only block based storage.


* when opened with O_DIRECT, the raw_probe_alignment sets the 'bs->bl.request_alignment' to underlying block size
  (on my ext4 system it guesses it to be 512)

  This is to ensure that O_DIRECT would work.
  This also seems to be correct to me conceptually.


* Then raw_co_block_status reads the actual allocated ranges, and returns the range from 0 to file size which is not block size aligned.

  Then bdrv_co_block_status explicitly checks that the returned range is sector aligned.

I think that the solution is to fix the bdrv_co_block_status to cope with the unaligned ranges returned.

What do you think?

Comment 38 Eric Blake 2019-05-15 13:34:31 UTC
The '-T none' clue is important - that is a manifestation of the file-posix.c driver using O_DIRECT with larger alignment but not rounding up its block status requests to match. Upstream patch proposed:
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03370.html

Comment 39 Hanna Czenczek 2019-05-15 20:17:25 UTC
(In reply to Eric Blake from comment #38)
> The '-T none' clue is important - that is a manifestation of the
> file-posix.c driver using O_DIRECT with larger alignment but not rounding up
> its block status requests to match. Upstream patch proposed:
> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03370.html

Then this BZ is probably a duplicate of BZ 1588356.

Max

Comment 40 Hanna Czenczek 2019-05-15 20:19:05 UTC
No, sorry, I spoke to soon.  BZ 1588356 is for RHEL 8 (AV), so we can use this to track RHV 7.

(I won’t object to assigning this BZ to me.)

Max

Comment 43 Nir Soffer 2019-06-19 21:01:08 UTC
Gordon, this was the bug I mixed with bug 1711643.


Note You need to log in before you can comment on or make changes to this bug.