Bug 609022 - [kvm] qemu doesn't try to write to an image of qcow2 after many lvextends
Summary: [kvm] qemu doesn't try to write to an image of qcow2 after many lvextends
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.5.z
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier2
TreeView+ depends on / blocked
 
Reported: 2010-06-29 08:40 UTC by Haim
Modified: 2023-09-14 01:21 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-27 11:15:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Haim 2010-06-29 08:40:09 UTC
Description of problem:

stating to play with lvextend flow (basic flow) we hit qemu bug, which instead of e_no_space, we got EIO errors, that resulted a situation where vm goes to pause state, and we're unable to resume (continue) it what so ever.

this is no an image corruption case as when I killed vm (qemu-process), I simply managed to run it again, but yet again, it fail on same scenario (manage to reproduce it on several vms).

please note that Dor saw this behaviour along with Dan and identified it as qemu do not try to write to an image of qcow2 after many extends - qemu is not trying the actual write and something is wrong with the meta data
in ram, also, while debugging they noticed that failures in cluser_alloc check prevented qemu of even trying to write to the image.

please note that on that situation we hit a segmentation fault when we run qemu-img check on that volume. 

it's important to mention that it this issue is not reproduce so easily, and it's a bit evasive, though when it occurs, it's very bad behaviour. 

environment details: 

package versions: 

vdsm22-4.5-62.5.el5_5rhev2_2
kvm-83-164.el5_5.12
2.6.18-194.3.1.el5

repro steps: 

1) make sure to have 2 running hosts connected to iscsi pool. 
2) create 4 (running rhel5.5) vms with qcow2 disk (thinly provisioned), make 
   sure some are using 
   virtIO and some are IDE, and run them (no importance for specific server)
3) start 'dd' to local file system (dd if=/dev/zero of=/tmp/dd1 bs=1M) 
4) try to migrate vms during their lvextend attempts.  

please note that first it occurred with the above rpms, so I tried to reproduce it with kvm test rpms *83-164.el5_5.12.qcowtest1.x86_64.rpm and it DIDN'T REPRODUCE, though when i reverted back to the original zstream latest rpms, it DIDN'T reproduce again. 

Hope this information is useful enough for further analysis and possible fix.

Comment 1 Kevin Wolf 2010-06-29 12:10:57 UTC
(In reply to comment #0)
> please note that Dor saw this behaviour along with Dan and identified it as
> qemu do not try to write to an image of qcow2 after many extends - qemu is not
> trying the actual write and something is wrong with the meta data
> in ram, also, while debugging they noticed that failures in cluser_alloc check
> prevented qemu of even trying to write to the image.

Can someone (Dan?) detail on this? I'm not even sure what you mean by this "cluster_alloc check".

Also please note that there's a whole bunch of patches waiting for inclusion which may fix EIO cases or in-memory metadata corruptions. We'll need to try reproducing it once these fixes are in.

Comment 2 Haim 2010-06-29 13:29:54 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > please note that Dor saw this behaviour along with Dan and identified it as
> > qemu do not try to write to an image of qcow2 after many extends - qemu is not
> > trying the actual write and something is wrong with the meta data
> > in ram, also, while debugging they noticed that failures in cluser_alloc check
> > prevented qemu of even trying to write to the image.
> 
> Can someone (Dan?) detail on this? I'm not even sure what you mean by this
> "cluster_alloc check".
> 
> Also please note that there's a whole bunch of patches waiting for inclusion
> which may fix EIO cases or in-memory metadata corruptions. We'll need to try
> reproducing it once these fixes are in.    
  [hateya] I tested the above with your rpms (with the patches that deals 
   with EIO cases, special version to QE, *83-164.el5_5.12.qcowtest1.x86_64.rpm) 
   and I was not able to reproduce, nevertheless, when I reverted 
   to the original rpms, the problem didn't reproduce again, so, it doesn't say  
   much anyway, 

Dor - can you please elaborate on your insights when your first saw the problem.

Comment 5 Dan Kenigsberg 2010-11-24 19:10:01 UTC
Dan does not... Maybe Dor?

Comment 7 RHEL Program Management 2011-01-11 20:27:22 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 8 RHEL Program Management 2011-01-11 22:52:34 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 9 Kevin Wolf 2011-01-14 16:55:45 UTC
(In reply to comment #5)
> Dan does not... Maybe Dor?

Dor: Ping?

Comment 10 Ayal Baron 2011-01-15 13:44:45 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > please note that Dor saw this behaviour along with Dan and identified it as
> > > qemu do not try to write to an image of qcow2 after many extends - qemu is not
> > > trying the actual write and something is wrong with the meta data
> > > in ram, also, while debugging they noticed that failures in cluser_alloc check
> > > prevented qemu of even trying to write to the image.
> > 
> > Can someone (Dan?) detail on this? I'm not even sure what you mean by this
> > "cluster_alloc check".
> > 
> > Also please note that there's a whole bunch of patches waiting for inclusion
> > which may fix EIO cases or in-memory metadata corruptions. We'll need to try
> > reproducing it once these fixes are in.    
>   [hateya] I tested the above with your rpms (with the patches that deals 
>    with EIO cases, special version to QE,
> *83-164.el5_5.12.qcowtest1.x86_64.rpm) 
>    and I was not able to reproduce, nevertheless, when I reverted 
>    to the original rpms, the problem didn't reproduce again, so, it doesn't say 
>    much anyway, 
> 
> Dor - can you please elaborate on your insights when your first saw the
> problem.

Dor, Haim needs above info from you, thanks.

Comment 15 Red Hat Bugzilla 2023-09-14 01:21:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.