Bug 1999051

Summary:

Qemu VM with VirtIO gets paused directly after booting with: BLOCK_IO_ERROR, reason: "Invalid argument"

Product:

Red Hat Enterprise Linux Advanced Virtualization

Reporter:

Jean-Louis Dupond <jean-louis>

Component:

qemu-kvm

Assignee:

Virtualization Maintenance <virt-maint>

qemu-kvm sub component:

Storage

QA Contact:

qing.wang <qinwang>

Status:

CLOSED DUPLICATE

Docs Contact:

Severity:

urgent

Priority:

medium

CC:

coli, elima, jinzhao, juzhang, kchamart, kkiwi, qinwang, quintela, timao, virt-maint

Version:

8.4

Keywords:

Triaged

Target Milestone:

Target Release:

8.5

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-09-02 02:01:44 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
qemu args	none
libvirt xml	none
Full libvirt debug log for that VM	none

Description Jean-Louis Dupond 2021-08-30 10:27:33 UTC

Description of problem:
When updating oVirt from 4.4.7 to 4.4.8 it goes from libvirt 7.4.0 to 7.5.0 and qemu from 5.2 to 6.0.

Now while migrating machines from old to new versions, there were some VM's going into paused stated.
Even rebooting/shutdown-start caused them to go almost directly in a paused state.

The libivirt debug log gives me:
BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-1-format", "reason": "Invalid argument", "operation": "read", "action": "stop"}}]

Starting the VM again on a qemu 5.2 machine works fine.
This happend on 3 machines out of 159.
All of them are VirtIO disks and I440 chip.


Version-Release number of selected component (if applicable):
libvirt-7.5.0-1.el8s.x86_64
qemu-kvm-6.0.0-26.el8s.x86_64

How reproducible:
Always (for the affected VMs)

Steps to Reproduce:
1. Stop VM
2. Start VM on libvirt 7.5.0/qemu 6.0 hypervisor
3. Windows starts, and almost directly the error appears and VM goes into a paused state

Additional info:
2021-08-30 08:43:37.935+0000: 34438: info : qemuMonitorIOWrite:438 : QEMU_MONITOR_IO_WRITE: mon=0x7ff9582d8c70 buf={"execute":"query-named-block-nodes","id":"libvirt-421"}
2021-08-30 08:43:37.935+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.935+0000: 34438: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.935+0000: 34438: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"return": [{"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 51011125248, "filename": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e", "cluster-size": 65536, "format": "qcow2", "actual-size": 0, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "compression-type": "zlib", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false, "extended-l2": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "libvirt-1-format", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e"}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 56237228032, "filename": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e", "format": "host_device", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "libvirt-1-storage", "backing_file_depth": 0, "drv": "host_device", "iops": 0, "bps_wr": 0, "write_threshold": 55700357120, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e"}], "id": "libvirt-421"}]
2021-08-30 08:43:37.936+0000: 34438: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0x7ff9582d8c70 reply={"return": [{"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 51011125248, "filename": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e", "cluster-size": 65536, "format": "qcow2", "actual-size": 0, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "compression-type": "zlib", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false, "extended-l2": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "libvirt-1-format", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e"}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 56237228032, "filename": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e", "format": "host_device", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "libvirt-1-storage", "backing_file_depth": 0, "drv": "host_device", "iops": 0, "bps_wr": 0, "write_threshold": 55700357120, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e"}], "id": "libvirt-421"}
2021-08-30 08:43:37.936+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7ff9582d8c70
2021-08-30 08:43:37.936+0000: 34438: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7ff9582d8c70
2021-08-30 08:43:52.113+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:52.113+0000: 34438: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1630313032, "microseconds": 113719}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-1-format", "reason": "Invalid argument", "operation": "read", "action": "stop"}}]
2021-08-30 08:43:52.113+0000: 34438: info : qemuMonitorJSONIOProcessLine:237 : QEMU_MONITOR_RECV_EVENT: mon=0x7ff9582d8c70 event={"timestamp": {"seconds": 1630313032, "microseconds": 113719}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-1-format", "reason": "Invalid argument", "operation": "read", "action": "stop"}}
2021-08-30 08:43:52.113+0000: 34438: debug : qemuMonitorJSONIOProcessEvent:183 : mon=0x7ff9582d8c70 obj=0x7ff8a000e400
2021-08-30 08:43:52.113+0000: 34438: debug : qemuMonitorEmitEvent:1168 : mon=0x7ff9582d8c70 event=BLOCK_IO_ERROR
2021-08-30 08:43:52.113+0000: 34438: info : virObjectRef:402 : OBJECT_REF: obj=0x7ff9582d8c70
2021-08-30 08:43:52.113+0000: 34438: debug : qemuProcessHandleEvent:561 : vm=0x7ff9581773d0
2021-08-30 08:43:52.114+0000: 34438: info : virObjectNew:258 : OBJECT_NEW: obj=0x7ff95426c590 classname=virDomainQemuMonitorEvent




image: /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e
file format: qcow2
virtual size: 47.5 GiB (51011125248 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

Comment 1 Jean-Louis Dupond 2021-08-30 10:28:11 UTC

Created attachment 1819048 [details]
qemu args

Comment 2 Jean-Louis Dupond 2021-08-30 10:28:32 UTC

Created attachment 1819049 [details]
libvirt xml

Comment 3 Juan Quintela 2021-08-30 11:11:19 UTC

> image: /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-> c06eb1b4ce3e
> file format: qcow2
> virtual size: 47.5 GiB (51011125248 bytes)
> disk size: 0 B

this 0 size is a weird as it can be.  Can you make sure that there is enough space/libvirt/qemu can write to that place on destination?

> cluster_size: 65536
> Format specific information:


Thanks, Juan.

Comment 4 Kashyap Chamarthy 2021-08-30 11:14:25 UTC

Hi,

I know you've mentioned on #qemu (OFTC) that changing from 'virtio-blk' 
to 'virtio-scsi' has "fixed the issue" for you.

Also, besides what Juan already asked above, one more:

Since you're doing migration, it can help to have full QMP transactions 
from source and destination host.  You can obtain that in the following
way:

(1) Install the `virt-admin` tool

(2) Configure the log file to capture the output:

    $> virt-admin daemon-log-outputs "1:file:/var/log/libvirt/libvirtd.log"

(3) Then, configure the dynamic log filters:

    $> virt-admin daemon-log-filters \
         "1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util"

That’s it; no need to restart the libvirt daemon. Repeat your test. And 
attach the /var/log/libvirt/libvirtd.log from source and destination to
the bug.

But first check what Juan has asked in comment#3, please.

Comment 5 Jean-Louis Dupond 2021-08-30 11:14:32 UTC

Created attachment 1819057 [details]
Full libvirt debug log for that VM

Comment 6 Jean-Louis Dupond 2021-08-30 11:21:55 UTC

(In reply to Juan Quintela from comment #3)
> > image: /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-> c06eb1b4ce3e
> > file format: qcow2
> > virtual size: 47.5 GiB (51011125248 bytes)
> > disk size: 0 B
> 
> this 0 size is a weird as it can be.  Can you make sure that there is enough
> space/libvirt/qemu can write to that place on destination?
> 

There is plenty of space. If I just try to boot the VM on a non-upgraded hypervisor (so with qemu 5.2/libvirt 7.4) it boots fine.
And that is on the same storage/network/...

The disk size: 0 B ain't a big issue I think?
Docs tell me: '    How much space the image file occupies on the host file system (may be shown as 0 if this information is unavailable, e.g. because there is no file system)'
As this is a Windows machine, it might be the cause? This is the same on other VM's I checked also.

Also like mentioned above I was able to have it working again by switching from VirtIO to VirtIO-SCSI for the disk.
So it really seems to be some kind of Qemu issue.

Comment 7 Jean-Louis Dupond 2021-08-30 11:23:55 UTC

Please note that this happend first after a migration.
But even a fresh start of the VM causes the same issue. So its unrelated to the migration.

I also tried to clone the VM that has this issue. And it was 100% reproducible on the clone.

Comment 8 Juan Quintela 2021-08-30 11:33:37 UTC

Extracted for comment1 (i.e. removing lot of stuff around)

"image": { "virtual-size": 51011125248,
           "filename": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e",
           "cluster-size": 65536,
	   "format": "qcow2",
	   "actual-size": 0,
	   "format-specific": {"type": "qcow2",
	                       "data": {"compat": "1.1",
			                "compression-type": "zlib",
					"lazy-refcounts": false,
					"refcount-bits": 16,
					"corrupt": false,
					"extended-l2": false}},
            "dirty-flag": false},

"image": {"virtual-size": 56237228032,
          "filename": "/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e",
	  "format": "host_device",
	  "actual-size": 0,
	  "dirty-flag": false},

you can see that the disk size is zero even for the block device.
You can't create a qcow2 block device on something that has size 0.

Later, Juan.

Comment 9 Juan Quintela 2021-08-30 11:35:53 UTC

This is on my system, a normal file, not a block device, but qcow2 header will use the same size.

$ qemu-img create -f qcow2 /tmp/kk.img 10G
Formatting '/tmp/kk.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
$ ll /tmp/kk.img 
-rw-r--r--. 1 quintela quintela 193K Aug 30 13:32 /tmp/kk.img
$ du /tmp/kk.img
196K	/tmp/kk.img
$ qemu-img info /tmp/kk.img
image: /tmp/kk.img
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
$ 

Notice how the disk file uses 196K, and qemu-img sees 196KB as disk size.  Comment one shows a disk size of 0.  Notice that this is a completely empty qcow2 disk image.

Comment 10 Jean-Louis Dupond 2021-08-30 12:12:56 UTC

(In reply to Juan Quintela from comment #9)
> This is on my system, a normal file, not a block device, but qcow2 header
> will use the same size.
> 
> 
> Notice how the disk file uses 196K, and qemu-img sees 196KB as disk size. 
> Comment one shows a disk size of 0.  Notice that this is a completely empty
> qcow2 disk image.

Seems like this is some qemu-img reporting issue.

# qemu-img info /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/0353604b-fab6-4a9d-bf39-42cbf6d3e74c/eee5169b-f75f-4e93-983f-9d8033d3ae70
image: /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/0353604b-fab6-4a9d-bf39-42cbf6d3e74c/eee5169b-f75f-4e93-983f-9d8033d3ae70
file format: qcow2
virtual size: 47.5 GiB (51011125248 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

But then:
# dd if=/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/0353604b-fab6-4a9d-bf39-42cbf6d3e74c/eee5169b-f75f-4e93-983f-9d8033d3ae70 of=test.img
107741184+0 records in
107741184+0 records out
55163486208 bytes (55 GB, 51 GiB) copied, 277.342 s, 199 MB/s

# qemu-img info test.img 
image: test.img
file format: qcow2
virtual size: 47.5 GiB (51011125248 bytes)
disk size: 51.4 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false


So its qemu(-img) reporting a wrong size when its qcow2 on a blockdev.
Its not a blockdev issue itself.

Comment 11 qing.wang 2021-08-31 01:56:59 UTC

Maybe this issue related to  
Bug 1989717 - [regression] blk_update_request: I/O error when generating load on iSCSI
Bug 1994494 - VM remains in paused state when trying to write on a resized disk resides on iscsi

Comment 12 qing.wang 2021-08-31 02:03:44 UTC

Could you please try remove the vm error policy,does it hit crash on
qemu-kvm: ../util/iov.c:428: qemu_iovec_init_extended: Assertion `mid_qiov->niov <= IOV_MAX' failed.

Comment 14 Tingting Mao 2021-08-31 05:56:20 UTC

From the log of Comment1 [1], /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e image is used as 'host_device'. While for block device image, qemu-img can not get the real size as doc says "How much space the image file occupies on the host file system (may be shown as 0 if this information is unavailable, e.g. because there is no file system)'"

[1]-blockdev '{"driver":"host_device","filename":"/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/f81d418f-ba14-444a-aaf7-0f3510fe5fd0/63672aa8-4748-4ddc-923c-c06eb1b4ce3e","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

Comment 15 Tingting Mao 2021-08-31 06:06:04 UTC

(In reply to Jean-Louis Dupond from comment #10)
> Seems like this is some qemu-img reporting issue.
> 
> # qemu-img info
> /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/
> 0353604b-fab6-4a9d-bf39-42cbf6d3e74c/eee5169b-f75f-4e93-983f-9d8033d3ae70
> image:
> /rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/
> 0353604b-fab6-4a9d-bf39-42cbf6d3e74c/eee5169b-f75f-4e93-983f-9d8033d3ae70
> file format: qcow2
> virtual size: 47.5 GiB (51011125248 bytes)
> disk size: 0 B
> cluster_size: 65536
> Format specific information:
>     compat: 1.1
>     compression type: zlib
>     lazy refcounts: false
>     refcount bits: 16
>     corrupt: false
>     extended l2: false
> 
> But then:
> # dd
> if=/rhev/data-center/mnt/blockSD/17f5688c-11d0-4708-a52c-55ee43936f74/images/
> 0353604b-fab6-4a9d-bf39-42cbf6d3e74c/eee5169b-f75f-4e93-983f-9d8033d3ae70
> of=test.img
> 107741184+0 records in
> 107741184+0 records out
> 55163486208 bytes (55 GB, 51 GiB) copied, 277.342 s, 199 MB/s
> 
> # qemu-img info test.img 
> image: test.img
> file format: qcow2
> virtual size: 47.5 GiB (51011125248 bytes)
> disk size: 51.4 GiB
> cluster_size: 65536
> Format specific information:
>     compat: 1.1
>     compression type: zlib
>     lazy refcounts: false
>     refcount bits: 16
>     corrupt: false
>     extended l2: false
> 
> 
> So its qemu(-img) reporting a wrong size when its qcow2 on a blockdev.
> Its not a blockdev issue itself.
I think this should be because of speculative preallocation algorithm of File System. Like XFS, you could check https://linux-xfs.oss.sgi.narkive.com/jjjfnyI1/faq-xfs-speculative-preallocation for more details.

Comment 16 Jean-Louis Dupond 2021-08-31 08:01:12 UTC

(In reply to qing.wang from comment #12)
> Could you please try remove the vm error policy,does it hit crash on
> qemu-kvm: ../util/iov.c:428: qemu_iovec_init_extended: Assertion
> `mid_qiov->niov <= IOV_MAX' failed.

You mean the error_policy on the disk device right?

-blockdev '{"driver":"host_device","filename":"/rhev/data-center/mnt/blockSD/ea2fe42d-2fa1-410f-894d-29621bb08716/images/95854454-a9ca-4979-9bf1-cbe7024c93aa/b3653cc3-e722-4091-bc34-88ff500ee98a","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,iothread=iothread1,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=ua-95854454-a9ca-4979-9bf1-cbe7024c93aa,bootindex=1,write-cache=on,serial=95854454-a9ca-4979-9bf1-cbe7024c93aa \


Removed it, still throws the same error:
2021-08-31 07:58:18.694+0000: 375431: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1630396698, "microseconds": 694539}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-1-format", "reason": "Invalid argument", "operation": "read", "action": "report"}}]

It doesn't go into a paused state now, but just hangs.
Doesn't crash with the message I reported in the other bug (IOV_MAX one).

Comment 17 qing.wang 2021-08-31 09:20:37 UTC

Probably hang action prevented the crash.
Could you please help to try  change the device virtio-blk-pci to scsi-hd?
It indicates the image is fine if boot succeed without any issue.

If it boot succeed i think it still relevant with Bug 1994494.

or try latest qemu-kvm (>=29) if this issue fixed?

 (In reply to Jean-Louis Dupond from comment #16)
> (In reply to qing.wang from comment #12)
> > Could you please try remove the vm error policy,does it hit crash on
> > qemu-kvm: ../util/iov.c:428: qemu_iovec_init_extended: Assertion
> > `mid_qiov->niov <= IOV_MAX' failed.
> 
> You mean the error_policy on the disk device right?
> 
> -blockdev
> '{"driver":"host_device","filename":"/rhev/data-center/mnt/blockSD/ea2fe42d-
> 2fa1-410f-894d-29621bb08716/images/95854454-a9ca-4979-9bf1-cbe7024c93aa/
> b3653cc3-e722-4091-bc34-88ff500ee98a","aio":"native","node-name":"libvirt-1-
> storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,
> "discard":"unmap"}' \
> -blockdev
> '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,
> "no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":
> null}' \
> -device
> virtio-blk-pci,iothread=iothread1,bus=pci.0,addr=0x5,drive=libvirt-1-format,
> id=ua-95854454-a9ca-4979-9bf1-cbe7024c93aa,bootindex=1,write-cache=on,
> serial=95854454-a9ca-4979-9bf1-cbe7024c93aa \
> 
> 
> Removed it, still throws the same error:
> 2021-08-31 07:58:18.694+0000: 375431: debug :
> qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds":
> 1630396698, "microseconds": 694539}, "event": "BLOCK_IO_ERROR", "data":
> {"device": "", "nospace": false, "node-name": "libvirt-1-format", "reason":
> "Invalid argument", "operation": "read", "action": "report"}}]
> 
> It doesn't go into a paused state now, but just hangs.
> Doesn't crash with the message I reported in the other bug (IOV_MAX one).

Probably hang action prevented the crash.
Could you please help to try  change the device virtio-blk-pci to scsi-hd?
It indicates the image is fine if boot succeed without any issue.

If it boot succeed i think it still relevant with Bug 1994494.

or try latest qemu-kvm (>=29) if this issue fixed?

Comment 18 Jean-Louis Dupond 2021-08-31 11:07:18 UTC

(In reply to qing.wang from comment #17)
> Probably hang action prevented the crash.
> Could you please help to try  change the device virtio-blk-pci to scsi-hd?
> It indicates the image is fine if boot succeed without any issue.

-device scsi-hd,bus=ua-18fca1d1-a616-48e9-94c0-0705f9b29e31.0,channel=0,scsi-id=0,lun=0,device_id=95854454-a9ca-4979-9bf1-cbe7024c93aa,drive=libvirt-1-format,id=ua-95854454-a9ca-4979-9bf1-cbe7024c93aa,bootindex=1,write-cache=on,serial=95854454-a9ca-4979-9bf1-cbe7024c93aa \

Then the VM boots fine and everything seems to be normal.
It happens indeed only on plain VirtIO. Not on VirtIO-SCSI

> 
> If it boot succeed i think it still relevant with Bug 1994494.
> 
> or try latest qemu-kvm (>=29) if this issue fixed?

As I am on CentOS Stream 8, I don't have a more recent version available.
@

Comment 19 Jean-Louis Dupond 2021-08-31 11:11:25 UTC

> As I am on CentOS Stream 8, I don't have a more recent version available.
> @

Maybe @etrunko could build a newer version?

Comment 21 Jean-Louis Dupond 2021-09-01 13:53:58 UTC

As qemu-kvm-6.0.0-29.el8s was build today, I've directly tested it.
Can confirm that this fixed the above issue!

Comment 22 Eduardo Lima (Etrunko) 2021-09-01 14:14:31 UTC

(In reply to Jean-Louis Dupond from comment #19)
> > As I am on CentOS Stream 8, I don't have a more recent version available.
> > @
> 
> Maybe @etrunko could build a newer version?

qemu-kvm-6.0.0-29.el8s build is done

https://cbs.centos.org/koji/taskinfo?taskID=2528662

Comment 23 qing.wang 2021-09-02 02:01:44 UTC

Thanks Jean-Louis testing
https://bugzilla.redhat.com/show_bug.cgi?id=1999051#c21 
This issue closed by dup 1994494.

*** This bug has been marked as a duplicate of bug 1994494 ***