Bug 1279777
Summary: | qemu becomes unresponsive when mirroring a drive | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Francesco Romani <fromani> |
Component: | qemu-kvm-rhev | Assignee: | Kevin Wolf <kwolf> |
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.1 | CC: | acanan, amureini, areis, christian.grundmann, derez, dyuan, ecohen, famz, fromani, gklein, hhan, huding, jcody, juzhang, kgoldbla, knoel, kwolf, lsurette, michal.skrivanek, mst, nsoffer, pbonzini, pezhang, rbalakri, Rhev-m-bugs, tnisan, virt-maint, xfu, xuzhang, yanyang, yeylon, ylavi |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | virt | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1261980 | Environment: | |
Last Closed: | 2016-01-15 12:18:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1154205, 1170712, 1261980 |
Description
Francesco Romani
2015-11-10 09:42:45 UTC
Still waiting for a description of what you think that qemu is doing wrong here. As I said in one of the comments of the original bugs "This qemu process looks alive", and I can't see any newer information that says otherwise. Also my request that you try and attach a new monitor to the process is still open. Also, why is this marked as a regression? In which qemu-kvm-rhev version did it work correctly according to your definition, and what is the exact difference between the behaviour then and now? (In reply to Kevin Wolf from comment #4) > Also, why is this marked as a regression? In which qemu-kvm-rhev version did > it > work correctly according to your definition, and what is the exact difference > between the behaviour then and now? It is marked as regression because the original bug I cloned to make this one was marked as such. I don't claim there is a regression on QEMU given the information we have so far. *** Bug 1261980 has been marked as a duplicate of this bug. *** *** Bug 1261980 has been marked as a duplicate of this bug. *** Tested with the following code: ----------------------------------- rhevm-3.6.0.3-0.1.el6.noarch vdsm-4.17.10.1-0.el7ev.noarch Verified with the following steps: ---------------------------------- Steps to Reproduce: 1. Create a VM with one 5G thin provisioned disk on an iSCSI SD 2. Boot the VM from a live CD such as TinyCorePlus 3. Write data to the first part of the disk: Open a terminal inside the VM and run the following command dd if=/dev/zero of=/dev/vda bs=1M count=2048 >>>>> During the dd to the disks the connection is lost with the QEMU process. This happens every time. We have a bz https://bugzilla.redhat.com/show_bug.cgi?id=1279777 open for the qemu issue. THIS IS A BLOCKER! If this connection was not lost at this point I did steps 4 and connection to the qemu process was lost after step 5 4. When above command is finished create a VM snapshot 5. Write data to the second part of the disk: Open a terminal inside the VM and run the following command dd if=/dev/zero of=/dev/vda bs=1M count=2048 seek=2048 I can only repeat once more my repeated request for more information as explained in comment 3. Unless you can break this down to misbehaviour on the qemu level, we can't debug this. "RHEV-M lost the connection" is a bug report as precise as "something on my system is broken". If you can't provide more detailed information on the qemu level, can you leave a failed qemu (does it exist any more? With the information provided here I don't even know yet whether it's hanging, has exited or what else) around and provide me an SSH connection so I can have at least a look at the state of the qemu process after the problem occurred? The comment 8 scenario doesn't seem to relate with mirror. But FWIW this may relate to https://bugzilla.redhat.com/show_bug.cgi?id=1277922 (In reply to Kevin Alon Goldblatt from comment #8) Did you retest with the fix of bug 1283987? Especially based on last comment it may very well resolve your problem Retested witht the following code: vdsm-4.17.12-0.el7ev.noarch rhevm-3.6.0.3-0.1.el6.noarch Retested with the scenario in comment 8 >>>>> the same result. When using gdb on the process and running I/O the VM enters into a Paused state and the console to the vm freezez Without gdb on the qemu process the qemu proces dies/exits when running I/O Reproduced the scenario with kwolf. (In reply to Kevin Alon Goldblatt from comment #13) Did you retest with with the fix of bug 1283987? (In reply to Michal Skrivanek from comment #14) > (In reply to Kevin Alon Goldblatt from comment #13) > Did you retest with with the fix of bug 1283987? Yes, I used the same code as they indicated in bug 1283987 (Comment 16) which fixed the problem, namely: From Comment 16: =================================================== verify the issue on x86_64: host info: qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 3.10.0-327.4.1.el7.x86_64 steps: steps are the same as above results: no core dumped, when extend lv to 4GB, the guest can be installed successfully. ------------------------------------------------------ (In reply to Kevin Alon Goldblatt from comment #15) > (In reply to Michal Skrivanek from comment #14) > > (In reply to Kevin Alon Goldblatt from comment #13) > > Did you retest with with the fix of bug 1283987? > > Yes, I used the same code as they indicated in bug 1283987 (Comment 16) > which fixed the problem, namely: > > From Comment 16: > > =================================================== > verify the issue on x86_64: > host info: > qemu-kvm-rhev-2.3.0-31.el7_2.4.x86_64 > 3.10.0-327.4.1.el7.x86_64 > > steps: > > steps are the same as above > > results: > no core dumped, when extend lv to 4GB, the guest can be installed > successfully. > > ------------------------------------------------------ To clarify my comment. I used the same code that was used in bug 1283987. In that bug, the problem associated with it was remedied. In this bug the problem of either a frozen or exited qemu process remains. In the setup that Kevin provided me and where he said the problem would reproduce (which didn't involve any mirroring), all I could see so far is that VMs get stopped because their LV is full. They didn't automatically extend the VM and then resume as you would expect, but they would resume successfully if you requested it manually. So I think this is a VDSM bug. I haven't seen qemu processes disappearing, and considering that in contrast to the original report no mirroring was involved, I wonder whether that VDSM bug that Kevin was apparently reproducing is different from what was originally reported. I tied to reproduce it in libvirt. I started a guest without block iotune and started iozone to provide high IO rate in guest. Then do common blockcopy: # virsh blockcopy guest vda /var/lib/libvirt/images/new.qcow2 --verbose --wait And blockcopy could reach 100%. Then do pivotv job: # virsh blockjob guest vda --pivot It continued running and not finished while the guest was accessible. When I stopped the iozone process in guest, the pivot job would finish soon and guest was OK to use. In RHEVM, I also prepared a guest running iozone and did storage migration. The migration run for long time and report failed at last. But the guest was accessible all the time. When I setup block iotune in libvirt and RHEVM, such as 20MB/s for total-bytes-sec, it was ok to finish the pivot job and storage migration. So, I tend to consider that it is NotABug because it's hard to make images sync when guest is in high IO rate. In production environment, we could use the limits of blkdeviotune to make the storage migration success. Based on comment #17 and comment #18, closing it as NOTABUG. Please reopen if you have more details. Thanks. |