Bug 616495 - KVM guest uses 100% cpu when LVM snapshot reaches 100% usage, then cannot re-activate LVM snapshot after lvextend
KVM guest uses 100% cpu when LVM snapshot reaches 100% usage, then cannot re-...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2 (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: LVM and device-mapper development team
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-20 11:29 EDT by Marc Sauton
Modified: 2010-07-21 12:32 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-21 06:19:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Marc Sauton 2010-07-20 11:29:36 EDT
Description of problem:

KVM guest uses 100% cpu when LVM snapshot reaches 100% usage, then cannot re-activate LVM snapshot after lvextend
(not sure what component to select for this report)


Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Linux dirsec2-seg.lab.sjc.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
device-mapper-1.02.39-1.el5
lvm2-2.02.56-8.el5
kvm-qemu-img-83-164.el5
libvirt-0.6.3-33.el5 


How reproducible:
happened once, till cannot recover LVM snapshot, havn't spend the time to try to recreate one more time the issue


Steps to Reproduce:
1. have RHEL 5.5 x86_64 KVM host
2. have LVM and too small lvm snapshot for KVM guest image 
3. have KVM guest with RHEL 5 x86_64
3. KVM guest does some task, fills the lvm snapshot up to the max

  
Actual results:
- lost control of KVM guest when LVM snapshot reached 100% usage
- the cpu allocated to the KVM guest was running at 100%
- forced shutdown of the KVM guest (could not virsh reboot)
- cannot re-activate LVM snapshot after lvextend usage


Expected results:


Additional info:

after resizing the snapshot with lvextend from 1G to 4G, could not re-activate the lvm snapshot:
lvchange  -a y /dev/VolGroup00/cs80el5x8664ms2cs8dash2DevelSnap1
  /dev/VolGroup00/cs80el5x8664ms2cs8dash2DevelSnap1: read failed after 0 of 4096 at 0: Input/output error
  Can't change snapshot logical volume "cs80el5x8664ms2cs8dash2DevelSnap1"

A lvs or lvsdisplay show the new size of 4G (older value was 1G), but till 100% full, which I somehow do not expect anymore:

lvs
  /dev/VolGroup00/cs80el5x8664ms2cs8dash2DevelSnap1: read failed after 0 of 4096 at 0: Input/output error
  LV                                                                      VG         Attr   LSize   Origin                                                  Snap%  Move Log Copy%  Convert
...
  cs80el5x8664ms2cs8dash2DevelSnap1                                       VolGroup00 Swi-Io   4.00G cs80el5x8664ms2cs8dash2modnssOcspHttpRHCS80devMaster    100.00


lvdisplay /dev/VolGroup00/cs80el5x8664ms2cs8dash2DevelSnap1
  /dev/VolGroup00/cs80el5x8664ms2cs8dash2DevelSnap1: read failed after 0 of 4096 at 0: Input/output error
  --- Logical volume ---
  LV Name                /dev/VolGroup00/cs80el5x8664ms2cs8dash2DevelSnap1
  VG Name                VolGroup00
  LV UUID                xEwr3d-XIFS-Ox8W-6LBe-eHcg-siJl-yHmhZ2
  LV Write Access        read/write
  LV snapshot status     INACTIVE destination for /dev/VolGroup00/cs80el5x8664ms2cs8dash2modnssOcspHttpRHCS80devMaster
  LV Status              available
  # open                 0
  LV Size                6.00 GB
  Current LE             192
  COW-table size         4.00 GB
  COW-table LE           128
  Snapshot chunk size    4.00 KB
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:46

I am not sure how I can re-activate this snapshot.
Could not locate too much relevant info in KB, Intranet or Google search.

Tried system-config-lvm out of curiousity in a VNC session, but got some exception:

[root@dirsec2-seg ~]# system-config-lvm
/usr/share/system-config-lvm/cylinder_items.py:1032: GtkWarning: gdk_pixbuf_scale_simple: assertion `dest_width > 0' failed
  scaled_pixbuf = self.pixbuf.scale_simple(pixmap_width, height, gtk.gdk.INTERP_BILINEAR)
Traceback (most recent call last):
  File "/usr/share/system-config-lvm/Volume_Tab_View.py", line 454, in on_tree_selection_changed
    self.on_best_fit(None)
  File "/usr/share/system-config-lvm/Volume_Tab_View.py", line 536, in on_best_fit
    self.display_view.draw()
  File "/usr/share/system-config-lvm/renderer.py", line 591, in draw
    self.display.draw(self.da, self.gc, (10, y_offset))
  File "/usr/share/system-config-lvm/cylinder_items.py", line 920, in draw
    self.cyl_upper.draw(pixmap, gc, (x, y))
  File "/usr/share/system-config-lvm/cylinder_items.py", line 305, in draw
    CylinderItem.draw(self, dc, gc, (x, y))
  File "/usr/share/system-config-lvm/cylinder_items.py", line 120, in draw
    child.draw(dc, gc, (x, y))
  File "/usr/share/system-config-lvm/cylinder_items.py", line 305, in draw
    CylinderItem.draw(self, dc, gc, (x, y))
  File "/usr/share/system-config-lvm/cylinder_items.py", line 120, in draw
    child.draw(dc, gc, (x, y))
  File "/usr/share/system-config-lvm/cylinder_items.py", line 311, in draw
    cyl_pix = self.cyl_gen.get_cyl(dc, self.get_width(), self.height)
  File "/usr/share/system-config-lvm/cylinder_items.py", line 1039, in get_cyl
    pixmap.draw_pixbuf(gc, scaled_pixbuf, 0, 0, 0, 0, -1, -1)
TypeError: GdkDrawable.draw_pixbuf() argument 2 must be gtk.gdk.Pixbuf, not None
The program 'system-config-lvm' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadAlloc (insufficient resources for operation)'.
  (Details: serial 26941 error_code 11 request_code 53 minor_code 0)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
[root@dirsec2-seg ~]#
Comment 1 Daniel Berrange 2010-07-21 06:05:50 EDT
I'm pretty sure that if a snapshot or VG containing a snapshot reaches 100% utilization, it is effectively corrupt, because it can't no longer store deltas for the ongoing changes on the master volume. This isn't a virt problem, so re-assigning to LVM due to error messages.
Comment 2 Milan Broz 2010-07-21 06:19:41 EDT
If snapshot reaches 100% it gets invalidated and all IO to it returns IO error
(any 100%CPU then is just consequence of IO errors).
YOu can only remove invalidate snapshot, no other action is allowed.

This doesn't influence origin volume - you can still use it.

You have to extend snapshot before it gets invalidated, after it is impossible (some delta data are lost already.)
Comment 3 Marc Sauton 2010-07-21 12:32:10 EDT
I "missed" the warnings in the KVM host's system log about the growing snapshot usage of the KVM guest.
And did not know about "only" removing invalid snapshots (failed to locate some docs for this before opening this bz)
We may want to document this either for KVM/virt or LVM, because the error returned from the lvchange is kind of generic, and ending up with a bad KVM guest is not a good situation.
Should a KVM guest shuts down before "corrupting" a file system (with a snapshot)?

Note You need to log in before you can comment on or make changes to this bug.