Bug 609022 - [kvm] qemu doesn't try to write to an image of qcow2 after many lvextends [NEEDINFO]
[kvm] qemu doesn't try to write to an image of qcow2 after many lvextends
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm (Show other bugs)
5.5.z
All Linux
low Severity high
: rc
: ---
Assigned To: Kevin Wolf
Virtualization Bugs
:
Depends On:
Blocks: Rhel5KvmTier2
  Show dependency treegraph
 
Reported: 2010-06-29 04:40 EDT by Haim
Modified: 2014-01-12 19:46 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-01-27 06:15:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
abaron: needinfo? (hateya)


Attachments (Terms of Use)

  None (edit)
Description Haim 2010-06-29 04:40:09 EDT
Description of problem:

stating to play with lvextend flow (basic flow) we hit qemu bug, which instead of e_no_space, we got EIO errors, that resulted a situation where vm goes to pause state, and we're unable to resume (continue) it what so ever.

this is no an image corruption case as when I killed vm (qemu-process), I simply managed to run it again, but yet again, it fail on same scenario (manage to reproduce it on several vms).

please note that Dor saw this behaviour along with Dan and identified it as qemu do not try to write to an image of qcow2 after many extends - qemu is not trying the actual write and something is wrong with the meta data
in ram, also, while debugging they noticed that failures in cluser_alloc check prevented qemu of even trying to write to the image.

please note that on that situation we hit a segmentation fault when we run qemu-img check on that volume. 

it's important to mention that it this issue is not reproduce so easily, and it's a bit evasive, though when it occurs, it's very bad behaviour. 

environment details: 

package versions: 

vdsm22-4.5-62.5.el5_5rhev2_2
kvm-83-164.el5_5.12
2.6.18-194.3.1.el5

repro steps: 

1) make sure to have 2 running hosts connected to iscsi pool. 
2) create 4 (running rhel5.5) vms with qcow2 disk (thinly provisioned), make 
   sure some are using 
   virtIO and some are IDE, and run them (no importance for specific server)
3) start 'dd' to local file system (dd if=/dev/zero of=/tmp/dd1 bs=1M) 
4) try to migrate vms during their lvextend attempts.  

please note that first it occurred with the above rpms, so I tried to reproduce it with kvm test rpms *83-164.el5_5.12.qcowtest1.x86_64.rpm and it DIDN'T REPRODUCE, though when i reverted back to the original zstream latest rpms, it DIDN'T reproduce again. 

Hope this information is useful enough for further analysis and possible fix.
Comment 1 Kevin Wolf 2010-06-29 08:10:57 EDT
(In reply to comment #0)
> please note that Dor saw this behaviour along with Dan and identified it as
> qemu do not try to write to an image of qcow2 after many extends - qemu is not
> trying the actual write and something is wrong with the meta data
> in ram, also, while debugging they noticed that failures in cluser_alloc check
> prevented qemu of even trying to write to the image.

Can someone (Dan?) detail on this? I'm not even sure what you mean by this "cluster_alloc check".

Also please note that there's a whole bunch of patches waiting for inclusion which may fix EIO cases or in-memory metadata corruptions. We'll need to try reproducing it once these fixes are in.
Comment 2 Haim 2010-06-29 09:29:54 EDT
(In reply to comment #1)
> (In reply to comment #0)
> > please note that Dor saw this behaviour along with Dan and identified it as
> > qemu do not try to write to an image of qcow2 after many extends - qemu is not
> > trying the actual write and something is wrong with the meta data
> > in ram, also, while debugging they noticed that failures in cluser_alloc check
> > prevented qemu of even trying to write to the image.
> 
> Can someone (Dan?) detail on this? I'm not even sure what you mean by this
> "cluster_alloc check".
> 
> Also please note that there's a whole bunch of patches waiting for inclusion
> which may fix EIO cases or in-memory metadata corruptions. We'll need to try
> reproducing it once these fixes are in.    
  [hateya] I tested the above with your rpms (with the patches that deals 
   with EIO cases, special version to QE, *83-164.el5_5.12.qcowtest1.x86_64.rpm) 
   and I was not able to reproduce, nevertheless, when I reverted 
   to the original rpms, the problem didn't reproduce again, so, it doesn't say  
   much anyway, 

Dor - can you please elaborate on your insights when your first saw the problem.
Comment 5 Dan Kenigsberg 2010-11-24 14:10:01 EST
Dan does not... Maybe Dor?
Comment 7 RHEL Product and Program Management 2011-01-11 15:27:22 EST
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.
Comment 8 RHEL Product and Program Management 2011-01-11 17:52:34 EST
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.
Comment 9 Kevin Wolf 2011-01-14 11:55:45 EST
(In reply to comment #5)
> Dan does not... Maybe Dor?

Dor: Ping?
Comment 10 Ayal Baron 2011-01-15 08:44:45 EST
(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > please note that Dor saw this behaviour along with Dan and identified it as
> > > qemu do not try to write to an image of qcow2 after many extends - qemu is not
> > > trying the actual write and something is wrong with the meta data
> > > in ram, also, while debugging they noticed that failures in cluser_alloc check
> > > prevented qemu of even trying to write to the image.
> > 
> > Can someone (Dan?) detail on this? I'm not even sure what you mean by this
> > "cluster_alloc check".
> > 
> > Also please note that there's a whole bunch of patches waiting for inclusion
> > which may fix EIO cases or in-memory metadata corruptions. We'll need to try
> > reproducing it once these fixes are in.    
>   [hateya] I tested the above with your rpms (with the patches that deals 
>    with EIO cases, special version to QE,
> *83-164.el5_5.12.qcowtest1.x86_64.rpm) 
>    and I was not able to reproduce, nevertheless, when I reverted 
>    to the original rpms, the problem didn't reproduce again, so, it doesn't say 
>    much anyway, 
> 
> Dor - can you please elaborate on your insights when your first saw the
> problem.

Dor, Haim needs above info from you, thanks.

Note You need to log in before you can comment on or make changes to this bug.