Bug 1057645
Summary: | ownership of diskimage changes during livemigration, livemigration with kvm/libvirt fails | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | bernhard.glomm | ||||||||
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 20 | CC: | bernhard.glomm, berrange, bloch, boven, charlesalva, clalancette, crobinso, dyuan, gluster-bugs, gsun, itamar, jforbes, laine, libvirt-maint, pkarampu, sasundar, shyu, vbellur, veillard, virt-maint | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1058032 (view as bug list) | Environment: | |||||||||
Last Closed: | 2015-05-31 18:48:45 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1058032, 1286213 | ||||||||||
Attachments: |
|
Description
bernhard.glomm
2014-01-24 14:35:27 UTC
at exactly 15:00 UTC I started the migration. Inside the vm I had for i in `seq 1 100`; do echo `date` >> /tmp/test.file; sleep 1; done runing, after 7 seconds it returned with: echo: write error: Read-only file system I checked all logfiles on both machines but found only this in srv-vms-mnt_atom01.log (the brick log) on the receiving side [2014-01-24 15:00:07.848387] W [client-rpc-fops.c:471:client3_3_open_cbk] 0-glfs_atom01-client-1: remote operation failed: Permission denied. Path: /atom01.img (74885dd0-6ff5-4ded-b5b4-d2f477e5bd6d) [2014-01-24 15:00:07.848549] W [client-rpc-fops.c:471:client3_3_open_cbk] 0-glfs_atom01-client-0: remote operation failed: Permission denied. Path: /atom01.img (74885dd0-6ff5-4ded-b5b4-d2f477e5bd6d) [2014-01-24 15:00:07.848590] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341318: WRITE => -1 (Permission denied) [2014-01-24 15:00:07.849288] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341320: WRITE => -1 (Permission denied) [2014-01-24 15:00:07.849535] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341322: WRITE => -1 (Permission denied) [2014-01-24 15:00:12.719313] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341324: WRITE => -1 (Permission denied) [2014-01-24 15:00:12.719530] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341326: WRITE => -1 (Permission denied) [2014-01-24 15:00:12.719866] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341328: WRITE => -1 (Permission denied) [2014-01-24 15:00:12.720111] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341330: WRITE => -1 (Permission denied) [2014-01-24 15:00:12.720360] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 341332: WRITE => -1 (Permission denied) (END) /var/log/libvirt/qemu/atom01.log libvirt logs on the sending side: 2014-01-24 14:53:33.052+0000: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name atom01 -S -M pc-i440fx-1.4 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid a20b4267-22b8-fc91-a4ea-938a5ad4a889 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/atom01.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/srv/vms/mnt_atom01/atom01.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:fa:ce:31,bus=pci.0,addr=0x3,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1,password -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 W: kvm binary is deprecated, please use qemu-system-x86_64 instead char device redirected to /dev/pts/2 (label charserial0) 2014-01-24 15:00:03.208+0000: shutting down qemu: terminating on signal 15 from pid 2170 ---------------------- libvirt logs on the receiving side: 2014-01-24 15:00:00.986+0000: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name atom01 -S -M pc-i440fx-1.4 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid a20b4267-22b8-fc91-a4ea-938a5ad4a889 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/atom01.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/srv/vms/mnt_atom01/atom01.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:fa:ce:31,bus=pci.0,addr=0x3,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0,password -vga std -incoming tcp:0.0.0.0:49159 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 W: kvm binary is deprecated, please use qemu-system-x86_64 instead char device redirected to /dev/pts/4 (label charserial0) Would it be possible to attach tgz of glusterfs log directories from both nodes to this bug report? Created attachment 856093 [details]
gluster logs during a migration
Attached are logfiles at DEBUG level, made during a migration of the guest 'kvmhost'. The migration itself starts at 14:00:00 and succeeds, but results in the migrated guest not being able to access its image on the destination server. The setup is Ubuntu 13.04, Gluster-3.4.1 from the Ubuntu PPA (semiosis). Also included is the /var/lib/libvirt/qemu/kvmtest.log from the destination server.
Created attachment 856149 [details]
gluster logs of migration host1
virsh migrate --verbose --live --p2p --domain atom01 --desturi qemu+ssh://192.168.242.93/system
the diskimage of atom01 resides on glustervolume "glfs_atom01"
root@ping[/0]:~ # gluster volume info glfs_atom01
Volume Name: glfs_atom01
Type: Replicate
Volume ID: f28f0f62-37b3-4b10-8e86-9b373f4c0e75
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.24.1.11:/ecopool/fs_atom01
Brick2: 172.24.1.13:/ecopool/fs_atom01
Options Reconfigured:
network.remote-dio: enable
storage.owner-uid: 107
storage.owner-gid: 104
diagnostics.client-log-level: DEBUG
root@ping[/0]:~ # id libvirt-qemu
uid=107(libvirt-qemu) gid=104(kvm) groups=104(kvm
Created attachment 856150 [details]
gluster logs of migration host2
logs of the receiving host
The libvirt-wiki states that during a migration, libvirt will change the ownership of the guest image, unless it detects that the image is on a shared filesystem. After looking at the code for libvirt, they have code to detect NFS, GFS2 and SMB/CIFS, but not Gluster. As libvirt does not detect that the storage is on a shared file system, the originating host will perform a chown back to root:root at the end of a successful migration, whereas the destination host will do a chown to libvirt-qemu:kvm. This is in fact a race condition, so the difference in behaviour between 3.4.0 and 3.4.1 could be down to timing differences. http://wiki.libvirt.org/page/Migration_fails_because_disk_image_cannot_be_found Workaround: * stop your guests * stop libvirt-bin * edit /etc/libvirt/qemu.conf - this contains a commented out entry 'dynamic_ownership=1', which is the default. Change this to 0, and remove the comment. * Do a chown to libvirt-qemu:kvm for all your stopped images. * Start the service libvirt-bin again * Bring up the guests * Repeat on the other half of your cluster * Test a live migration - for me, they work again. You now have to take care of properly setting the ownership of a guest image yourself (presumably only once when you create it). Other possible solutions: JoeJulian suggested using libgfapi, giving libvirt direct access without having to go through the filesystem. This is the preferred setup for libvirt+gluster and should also result in better I/O performance. I haven't tested this yet, but it's high on my to-do list. Submit a patch to libvirt so it can detect that the filesystem is Gluster. statfs() will only show 'FUSE", but we could then use getxattr to see if there is a gluster-specific attribute set (suggested by kkeithley). This could be trusted.glusterfs.volume-id, e.g. I can confirm the workaround above. It would be nice if libvirt could be patched as described so we won't have to worry about this problem in the future anymore. Based on comment https://bugzilla.redhat.com/show_bug.cgi?id=1057645#c7 assigning the bug to libvirt This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. libvirt migration has had a lot of bug fixes since f20, so I'm assuming this is fixed. Closing as CURRENTRELEASE, please reopen if anyone can reproduce with newer fedora |