609022 – [kvm] qemu doesn't try to write to an image of qcow2 after many lvextends

Bug 609022 - [kvm] qemu doesn't try to write to an image of qcow2 after many lvextends

Summary: [kvm] qemu doesn't try to write to an image of qcow2 after many lvextends

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kvm
Sub Component:
Version:	5.5.z
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Kevin Wolf
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Rhel5KvmTier2
TreeView+	depends on / blocked

Reported:	2010-06-29 08:40 UTC by Haim
Modified:	2023-09-14 01:21 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-27 11:15:20 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Haim 2010-06-29 08:40:09 UTC

Description of problem:

stating to play with lvextend flow (basic flow) we hit qemu bug, which instead of e_no_space, we got EIO errors, that resulted a situation where vm goes to pause state, and we're unable to resume (continue) it what so ever.

this is no an image corruption case as when I killed vm (qemu-process), I simply managed to run it again, but yet again, it fail on same scenario (manage to reproduce it on several vms).

please note that Dor saw this behaviour along with Dan and identified it as qemu do not try to write to an image of qcow2 after many extends - qemu is not trying the actual write and something is wrong with the meta data
in ram, also, while debugging they noticed that failures in cluser_alloc check prevented qemu of even trying to write to the image.

please note that on that situation we hit a segmentation fault when we run qemu-img check on that volume. 

it's important to mention that it this issue is not reproduce so easily, and it's a bit evasive, though when it occurs, it's very bad behaviour. 

environment details: 

package versions: 

vdsm22-4.5-62.5.el5_5rhev2_2
kvm-83-164.el5_5.12
2.6.18-194.3.1.el5

repro steps: 

1) make sure to have 2 running hosts connected to iscsi pool. 
2) create 4 (running rhel5.5) vms with qcow2 disk (thinly provisioned), make 
   sure some are using 
   virtIO and some are IDE, and run them (no importance for specific server)
3) start 'dd' to local file system (dd if=/dev/zero of=/tmp/dd1 bs=1M) 
4) try to migrate vms during their lvextend attempts.  

please note that first it occurred with the above rpms, so I tried to reproduce it with kvm test rpms *83-164.el5_5.12.qcowtest1.x86_64.rpm and it DIDN'T REPRODUCE, though when i reverted back to the original zstream latest rpms, it DIDN'T reproduce again. 

Hope this information is useful enough for further analysis and possible fix.

Comment 1 Kevin Wolf 2010-06-29 12:10:57 UTC

(In reply to comment #0)
> please note that Dor saw this behaviour along with Dan and identified it as
> qemu do not try to write to an image of qcow2 after many extends - qemu is not
> trying the actual write and something is wrong with the meta data
> in ram, also, while debugging they noticed that failures in cluser_alloc check
> prevented qemu of even trying to write to the image.

Can someone (Dan?) detail on this? I'm not even sure what you mean by this "cluster_alloc check".

Also please note that there's a whole bunch of patches waiting for inclusion which may fix EIO cases or in-memory metadata corruptions. We'll need to try reproducing it once these fixes are in.

Comment 2 Haim 2010-06-29 13:29:54 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > please note that Dor saw this behaviour along with Dan and identified it as
> > qemu do not try to write to an image of qcow2 after many extends - qemu is not
> > trying the actual write and something is wrong with the meta data
> > in ram, also, while debugging they noticed that failures in cluser_alloc check
> > prevented qemu of even trying to write to the image.
> 
> Can someone (Dan?) detail on this? I'm not even sure what you mean by this
> "cluster_alloc check".
> 
> Also please note that there's a whole bunch of patches waiting for inclusion
> which may fix EIO cases or in-memory metadata corruptions. We'll need to try
> reproducing it once these fixes are in.    
  [hateya] I tested the above with your rpms (with the patches that deals 
   with EIO cases, special version to QE, *83-164.el5_5.12.qcowtest1.x86_64.rpm) 
   and I was not able to reproduce, nevertheless, when I reverted 
   to the original rpms, the problem didn't reproduce again, so, it doesn't say  
   much anyway, 

Dor - can you please elaborate on your insights when your first saw the problem.

Comment 5 Dan Kenigsberg 2010-11-24 19:10:01 UTC

Dan does not... Maybe Dor?

Comment 7 RHEL Program Management 2011-01-11 20:27:22 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 8 RHEL Program Management 2011-01-11 22:52:34 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 9 Kevin Wolf 2011-01-14 16:55:45 UTC

(In reply to comment #5)
> Dan does not... Maybe Dor?

Dor: Ping?

Comment 10 Ayal Baron 2011-01-15 13:44:45 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > please note that Dor saw this behaviour along with Dan and identified it as
> > > qemu do not try to write to an image of qcow2 after many extends - qemu is not
> > > trying the actual write and something is wrong with the meta data
> > > in ram, also, while debugging they noticed that failures in cluser_alloc check
> > > prevented qemu of even trying to write to the image.
> > 
> > Can someone (Dan?) detail on this? I'm not even sure what you mean by this
> > "cluster_alloc check".
> > 
> > Also please note that there's a whole bunch of patches waiting for inclusion
> > which may fix EIO cases or in-memory metadata corruptions. We'll need to try
> > reproducing it once these fixes are in.    
>   [hateya] I tested the above with your rpms (with the patches that deals 
>    with EIO cases, special version to QE,
> *83-164.el5_5.12.qcowtest1.x86_64.rpm) 
>    and I was not able to reproduce, nevertheless, when I reverted 
>    to the original rpms, the problem didn't reproduce again, so, it doesn't say 
>    much anyway, 
> 
> Dor - can you please elaborate on your insights when your first saw the
> problem.

Dor, Haim needs above info from you, thanks.

Comment 15 Red Hat Bugzilla 2023-09-14 01:21:35 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.