Description of problem: stating to play with lvextend flow (basic flow) we hit qemu bug, which instead of e_no_space, we got EIO errors, that resulted a situation where vm goes to pause state, and we're unable to resume (continue) it what so ever. this is no an image corruption case as when I killed vm (qemu-process), I simply managed to run it again, but yet again, it fail on same scenario (manage to reproduce it on several vms). please note that Dor saw this behaviour along with Dan and identified it as qemu do not try to write to an image of qcow2 after many extends - qemu is not trying the actual write and something is wrong with the meta data in ram, also, while debugging they noticed that failures in cluser_alloc check prevented qemu of even trying to write to the image. please note that on that situation we hit a segmentation fault when we run qemu-img check on that volume. it's important to mention that it this issue is not reproduce so easily, and it's a bit evasive, though when it occurs, it's very bad behaviour. environment details: package versions: vdsm22-4.5-62.5.el5_5rhev2_2 kvm-83-164.el5_5.12 2.6.18-194.3.1.el5 repro steps: 1) make sure to have 2 running hosts connected to iscsi pool. 2) create 4 (running rhel5.5) vms with qcow2 disk (thinly provisioned), make sure some are using virtIO and some are IDE, and run them (no importance for specific server) 3) start 'dd' to local file system (dd if=/dev/zero of=/tmp/dd1 bs=1M) 4) try to migrate vms during their lvextend attempts. please note that first it occurred with the above rpms, so I tried to reproduce it with kvm test rpms *83-164.el5_5.12.qcowtest1.x86_64.rpm and it DIDN'T REPRODUCE, though when i reverted back to the original zstream latest rpms, it DIDN'T reproduce again. Hope this information is useful enough for further analysis and possible fix.
(In reply to comment #0) > please note that Dor saw this behaviour along with Dan and identified it as > qemu do not try to write to an image of qcow2 after many extends - qemu is not > trying the actual write and something is wrong with the meta data > in ram, also, while debugging they noticed that failures in cluser_alloc check > prevented qemu of even trying to write to the image. Can someone (Dan?) detail on this? I'm not even sure what you mean by this "cluster_alloc check". Also please note that there's a whole bunch of patches waiting for inclusion which may fix EIO cases or in-memory metadata corruptions. We'll need to try reproducing it once these fixes are in.
(In reply to comment #1) > (In reply to comment #0) > > please note that Dor saw this behaviour along with Dan and identified it as > > qemu do not try to write to an image of qcow2 after many extends - qemu is not > > trying the actual write and something is wrong with the meta data > > in ram, also, while debugging they noticed that failures in cluser_alloc check > > prevented qemu of even trying to write to the image. > > Can someone (Dan?) detail on this? I'm not even sure what you mean by this > "cluster_alloc check". > > Also please note that there's a whole bunch of patches waiting for inclusion > which may fix EIO cases or in-memory metadata corruptions. We'll need to try > reproducing it once these fixes are in. [hateya] I tested the above with your rpms (with the patches that deals with EIO cases, special version to QE, *83-164.el5_5.12.qcowtest1.x86_64.rpm) and I was not able to reproduce, nevertheless, when I reverted to the original rpms, the problem didn't reproduce again, so, it doesn't say much anyway, Dor - can you please elaborate on your insights when your first saw the problem.
Dan does not... Maybe Dor?
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release.
(In reply to comment #5) > Dan does not... Maybe Dor? Dor: Ping?
(In reply to comment #2) > (In reply to comment #1) > > (In reply to comment #0) > > > please note that Dor saw this behaviour along with Dan and identified it as > > > qemu do not try to write to an image of qcow2 after many extends - qemu is not > > > trying the actual write and something is wrong with the meta data > > > in ram, also, while debugging they noticed that failures in cluser_alloc check > > > prevented qemu of even trying to write to the image. > > > > Can someone (Dan?) detail on this? I'm not even sure what you mean by this > > "cluster_alloc check". > > > > Also please note that there's a whole bunch of patches waiting for inclusion > > which may fix EIO cases or in-memory metadata corruptions. We'll need to try > > reproducing it once these fixes are in. > [hateya] I tested the above with your rpms (with the patches that deals > with EIO cases, special version to QE, > *83-164.el5_5.12.qcowtest1.x86_64.rpm) > and I was not able to reproduce, nevertheless, when I reverted > to the original rpms, the problem didn't reproduce again, so, it doesn't say > much anyway, > > Dor - can you please elaborate on your insights when your first saw the > problem. Dor, Haim needs above info from you, thanks.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days