| Summary: | Self-healing causes KVM VMs to hard reboot and causes IO bottlenecks | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | jblawn | ||||
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.3-beta | CC: | gluster-bugs, jdarcy | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | --- | |||||
| Regression: | --- | Mount Type: | fuse | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 646 The self-healing process in 3.3beta no longer causes the VMs to become completely unresponsive due to the entire VM container (qemu image) being locked. However I am receiving the following error from qemu-kvm when self-healing kicks off which causes the VMs to hard reboot:
qemu-kvm: virtio_ioport_write: unexpected address 0x13 value 0x1
Also, the IO is terrible as the healing process is consuming nearly 100% of all IO on the disks between two servers. I have attempted to tune to reduce the healing IO by setting the following:
cluster.data-self-heal-algorithm = diff
cluster.self-heal-window-size = 4 or 8, from 16
There was very little difference.
System config:
ArchLinux
Linux kernel 3.0.3
qemu-kvm 0.15.0-2
libvirt 0.9.4-2
gluster 3.3beta2
VMs are Windows Server 2008 with the following disk configuration:
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='writeback'/>
<source file='/gluster/WinTest2.img'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
Mount options for gluster client and VM filesystem:
/dev/mapper/testvg-vm on /vm type ext4 (rw,noatime,user_xattr)
test1:/test-vm on /gluster type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)
Please let me know if there is more information needed, or additional testing with debugging enabled.
Attached failed node dmesg, there was nothing logged to dmesg on the active node. This is a two node cluster which act as bricks and clients, self-mounting on each for HA/live migration, etc. With the self-heal change the IO was still high, but this time neither VM crashed. No iowrite errors logged. The performance was definitely impacted, is there anyway to throttle this down further? Test started local time Sep 2 00:48, log files are too large, including two URLs: http://www.cctt.org/gluster-active-node.tar.gz http://www.cctt.org/gluster-failed-node.tar.gz Thanks! (In reply to comment #3) > Attached failed node dmesg, there was nothing logged to dmesg on the active > node. This is a two node cluster which act as bricks and clients, > self-mounting on each for HA/live migration, etc. > > With the self-heal change the IO was still high, but this time neither VM > crashed. No iowrite errors logged. The performance was definitely impacted, > is there anyway to throttle this down further? > > Test started local time Sep 2 00:48, log files are too large, including two > URLs: > http://www.cctt.org/gluster-active-node.tar.gz > http://www.cctt.org/gluster-failed-node.tar.gz > > Thanks! hi Jeremy, Good to know that the self-heal did not crash the VMs with the window size 1. I made this default in the master branch today morning. Self-heal needs to sync changes from source to stale node to keep them in sync, reason for the I/O you are observing. We are still working on ways to make self-heal happen in a way that does not affect the real-traffic i.e the self-heal operations will happen with very low priority. I went through the logs I did not see any thing out of the ordinary. You are the first one who ran into severe problems with the default config. Will it be possible to test the new change in your environment? That will tell us if it needs any more improvement. I can provide a build with these changes. Pranith Pranith, Yes, I can test again. I can pull the latest git and build off that. Are there any other tunables that we should look at for storing virtual images? Is 'cluster.data-self-heal-algorithm = diff' the preferred setting for this setup? Thanks, Jeremy (In reply to comment #5) > Pranith, > > Yes, I can test again. I can pull the latest git and build off that. > > Are there any other tunables that we should look at for storing virtual images? > > Is 'cluster.data-self-heal-algorithm = diff' the preferred setting for this > setup? > > Thanks, > > Jeremy hi Jeremy, I have sent the build in an email to you. Please let us know your findings. Pranith. IO performance returned to acceptable levels after setting self-heal-window-size=1 as default in latest git build. There is still an issue with using virtio drivers and Windows guests during self-heal where the VM may timeout and require a hard reboot. This is not exhibited with using a virtual ide controller. CHANGE: http://review.gluster.com/335 (Change-Id: Ib6730a708f008054fbd379889a0f6dd3b051b6ad) merged in master by Anand Avati (avati) CHANGE: http://review.gluster.com/336 (Change-Id: Id8a1dffa3c3200234ad154d1749278a2d7c7021b) merged in master by Anand Avati (avati) |
hi Jeremy, Could you please configure self-heal-window-size to be 1 and configure the diagnostics.client-log-level, diagnostics.brick-log-level to be DEBUG and perform the same test and provide all the log files on client and brick machines and the dmesg output on the client machine. Pranith