Bug 1070531
| Summary: | qmp can not give a reasonable hint when block_resize a scsi_debug disk with 10G | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jun Li <juli> |
| Component: | qemu-kvm-rhev | Assignee: | Markus Armbruster <armbru> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xueqiang Wei <xuwei> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 7.0 | CC: | aliang, armbru, coli, hhuang, juli, juzhang, kchamart, knoel, mrezanin, ngu, pbonzini, qzhang, virt-maint, xfu, xuwei |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.12.0-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-08 07:08:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Still an issue upstream. Issue isn't specific to just raw block devices: for example qcow2_truncate (the function that block_resize maps to) has several error_report calls with fine grained error messages, but those never hit the qmp monitor and instead you'll see a similarly generic error. I've posted a patch upstream now which ensures that error_report messages will at least hit stderr, since right now if the user is interacting with qmp, the messages are just dropped. Minor improvement, but not what's requested here. I believe the proper solution is to change bdrv_truncate to take an Error **errp from callers... however changing that in all the drivers kinda sucks and sets an annoying precedent for whenever we want finer grained error info from deep down the call stack. I commented in another bug[2] that perhaps we could do away with most of the explicit Error passing and instead use a global 'last reported error' tracked with thread local storage, similar to what libvirt uses. That would potentially allow us to just replace error_report calls with proper qerror_report invocations when we want a message propagated up to qmp. I'm not too familiar with the qemu code though so maybe there are reasons that won't work, but I'm going to see if I can get a POC working. [1] http://marc.info/?l=qemu-devel&m=139457994925605&w=2 [2] https://bugzilla.redhat.com/show_bug.cgi?id=885099#c10 As mentioned in that other bug, qerror_report more or less does what I suggested, but it's not the preferred API these days. Sounds like passing Error up the call stack is the ideal solution Not upstream and not urgent, so 7.2 material at the least. But this is a fairly minor error reporting issue IMO. Certainly worth fixing upstream at some point, but I don't think it's worth a backport, figuring that every bdrv_truncate implementation will need to be touched which is probably 10 files or so. I'm not really keyed into the RHEL process, so resetting assignee This bug was filed against qemu-kvm, but comment#0 gives a qemu-kvm-rhev version. Does the bug reproduce with current qemu-kvm? What about current qemu-kvm-rhev? Hi Xiangchun, Could you reply comment10? Best Regards, Junyi 1)For scsi_debug disk as data disk.
Both qemu-kvm-rhev-2.6.0-15 and qemu-kvm-1.5.3-118.el7 still can be reproduced.
{ "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 10737418240 }}
{"error": {"class": "GenericError", "desc": "Could not resize: Invalid argument"}}
2)For local file as data disk
Both qemu-kvm-rhev-2.6.0-15 and qemu-kvm-1.5.3-118.el7 work.
set 100T size
{ "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 107374182400000}}
{"return": {}}
set 200T size
{ "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 207374182400000}}
{"return": {}}
result: can get 100T and 200T size disk inside guest.
QMP command block_resize runs qmp_block_resize(). Like any QMP command handler, it returns an Error through its errp parameter on error. qmp_block_resize() first checks a number of error conditions, and if everything's okay, it runs bdrv_truncate(). bdrv_truncate() checks a few more error conditions, then runs the block driver's .bdrv_truncate() method. Both bdrv_truncate() and the .bdrv_truncate() method return negative errno on error. Failure modes and their errno values aren't specified. bdrv_truncate() itself returns -ENOMEDIUM when the block device has no medium (can only happen for devices with removable media, obviously), -ENOTSUP when the block driver doesn't provide a .bdrv_truncate() method, and -EACCES when the block device was opened read-only. The .bdrv_truncate() methods can of course return whatever they want. In particular, block driver "host_device"'s method raw_truncate() returns -EINVAL when asked to grow a block or character device special file. qmp_block_resize() needs to create an Error object when bdrv_truncate() fails. It creates specific error messages for the errno values it recognizes: -ENOMEDIUM, -ENOTSUP, -EACCES and -EBUSY. For any other errno value, it creates a generic "Could not resize: S" error message, where S is strerror(errno). This is what the reporter observed. Two possible solutions: 1. Create specific and suitable error messages for more errno values. Only possible for errno values that are used consistently by all block drivers. Their meaning should be specified in the method contract then. 2. As Cole observed in comment#2, some block drivers use error_report(). This is wrong. They probably do it because just returning -errno produces inferior error messages on the command line and in HMP. "Fixing" the wrongness by dropping the error_report() would regress error message quality there. We instead need to fix it by converting the bdrv_truncate() to take an Error *errp parameter and use error_setg() instead of error_report(). The more complete solution is 2. It's a lot of tedious work. Upstream first. Does the bug still reproduce with current qemu-kvm? I'm asking because bdrv_truncate() takes an Error ** argument since upstream commit ed3d2ec98a3, v2.10.0. (In reply to Markus Armbruster from comment #16) > Does the bug still reproduce with current qemu-kvm? > > I'm asking because bdrv_truncate() takes an Error ** argument since upstream > commit ed3d2ec98a3, v2.10.0. 1) For scsi_debug disk as data disk. qemu-kvm-rhev-2.12.0-7.el7 hit error as below: set 1G size { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 1073741824 }} {"return": {}} set 10G size { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 10737418240 }} {"error": {"class": "GenericError", "desc": "Cannot grow device files"}} qemu-kvm-1.5.3-156.el7_5.4 still can be reproduced. set 1G size { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 1073741824 }} {"return": {}} set 10G size { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 10737418240 }} {"error": {"class": "GenericError", "desc": "Could not resize: Invalid argument"}} 2)For local file as data disk Both qemu-kvm-rhev-2.12.0-7.el7 and qemu-kvm-1.5.3-156.el7_5.4 work. set 100T size { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 107374182400000}} {"return": {}} { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 107374182400000}} {"return": {}} result: can get 100T size disk inside guest. Output for qemu-kvm-rhev-2.12.0-7.el7 now matches expectations (I think). If you disagree, please provide expected output. Output for qemu-kvm-1.5.3-156.el7_5.4 doesn't match expectations. However, this bug is against qemu-kvm-rhev. If you think we should fix qemu-kvm, clone this bug. According to Comment 17 and Comment 18, verify this bug. And I think it is unnecessary to clone this bug, we seldom test this scenario. |
Description of problem: qmp can not give a reasonable hint when block_resize a scsi_debug disk with 10G. Version-Release number of selected component (if applicable): qemu-kvm-rhev-1.5.3-49.el7.x86_64 3.10.0-95.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a new disk via scsi_debug. # modprobe scsi_debug dev_size_mb=1024 lbpu=1 2.boot guest with this scsi_debug disk(/dev/sdb): /usr/libexec/qemu-kvm -M pc -m 4G -smp 2 -monitor stdio -boot menu=on,reboot-timeout=-1,strict=on -qmp tcp::8888,server,nowait -device virtio-scsi-pci,bus=pci.0,id=scsi0 -drive file=/home/juli/rhel-server-7.0-64.qcow2,if=none,format=qcow2,id=img,cache=none,werror=stop,rerror=stop -device scsi-hd,bus=scsi0.0,drive=img,id=sys-img -netdev tap,id=tap1,vhost=on,queues=4,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device virtio-net-pci,netdev=tap1,id=net1,mq=on,vectors=17,mac=18:03:73:39:a6:11 -vga qxl -global qxl-vga.revision=3 -spice port=5931,disable-ticketing \ -drive file=/dev/sdb,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop,discard=on -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 -device scsi-hd,drive=drive-data-disk,bus=scsi1.0,id=data-disk 3.block_resize scsi_debug disk with 1G via qmp. { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 1073741824 }} 4.block_resize scsi_debug disk with 10G via qmp. { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 10737418240 }} Actual results: After step 4, check the result. { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 10737418240 }} {"error": {"class": "GenericError", "desc": "Could not resize: Invalid argument"}} Expected results: If block_resize 10G is too large, please give a reasonable hint. Such as "Could not resize: File too large" Additional info: Also test with local file. when file is too large, qmp will give a reasonable error hint. 1,create a data image. # qemu-img create -f raw /home/data-img 5G 2,boot guest with this data image. # /usr/libexec/qemu-kvm -M pc -m 4G -smp 2 -monitor stdio -boot menu=on,reboot-timeout=-1,strict=on -qmp tcp::8888,server,nowait -device virtio-scsi-pci,bus=pci.0,id=scsi0 -drive file=/home/juli/rhel-server-7.0-64.qcow2,if=none,format=qcow2,id=img,cache=none,werror=stop,rerror=stop -device scsi-hd,bus=scsi0.0,drive=img,id=sys-img -netdev tap,id=tap1,vhost=on,queues=4,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device virtio-net-pci,netdev=tap1,id=net1,mq=on,vectors=17,mac=18:03:73:39:a6:11 -vga qxl -global qxl-vga.revision=3 -spice port=5931,disable-ticketing -drive file=/home/data-img,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop,discard=on -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 -device scsi-hd,drive=drive-data-disk,bus=scsi1.0,id=data-disk ---------- 3, block_resize with 100T, and qmp will give error hint. { "execute": "block_resize", "arguments": { "device": "drive-data-disk", "size": 107374182400000}} {"error": {"class": "GenericError", "desc": "Could not resize: File too large"}}