Bug 822052

Summary: migration will cause guest IO failure when DST sebool is virt_use_nfs=off
Product: Red Hat Enterprise Linux 7 Reporter: zhpeng
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.0CC: acathrow, cwei, dallan, dyuan, eblake, mjenner, mzhan, weizhan, zpeng
Target Milestone: rc   
Target Release: 7.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.1.1-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 13:26:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 547546, 1652078    
Bug Blocks:    
Attachments:
Description Flags
guest io failure screenshot none

Description zhpeng 2012-05-16 08:22:05 UTC
Description of problem:
migration will cause guest IO failure when DST sebool is virt_use_nfs=off

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-0.12.1.2-2.292.el6.x86_64
libvirt-0.9.10-20.el6.x86_64
kernel-2.6.32-270.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1, prepare a nfs and mount it to host A and B as a shared nfs pool.
2, virt_use_nfs = off (B)
   virt_use_nfs = on  (A)
3, define and start a guest on A
4, migrate guest from A to B
# migrate --live aaa qemu+ssh://10.66.7.230/system
The authenticity of host '10.66.7.230 (10.66.7.230)' can't be established.
RSA key fingerprint is d2:76:01:77:2f:5b4:bf:8f:1a:a1:92:94:c3:e3:2e.
Are you sure you want to continue connecting (yes/no)? yes
root.7.230's password:
error: internal error Process exited while reading console log output: char device redirected to /dev/pts/1
qemu-kvm: -drive file=/var/lib/libvirt/images/rhel6u2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none: could not open disk image /var/lib/libvirt/images/rhel6u2.img: Permission denied
5, check the screen of guest, wait for a while it will show some IO Failure message.(see attachment)
  


Actual results:
As Steps

Expected results:
IMO libvirt should check the migrate ENV first, then decide if migrate or not. if DST host sebool is virt_use_nfs=off, libvirt should deny migration. Migration failure is ok, but guest failure is not acceptable, and in productional ENV, this is very very dangerous.

Additional info:

Comment 1 zhpeng 2012-05-16 08:22:39 UTC
Created attachment 584892 [details]
guest io failure screenshot

Comment 4 Jiri Denemark 2012-07-20 13:48:39 UTC
I was able to reproduce the issue. It iss caused by dynamic ownership on
destination host. Once qemu fails to start (because SELinux denies access to
the disk image), libvirtd resets the image ownership to root:root and NFS
denies access to it even though the file is already open.

We have to finally fix dynamic ownership to restore the original owner instead
of just resetting it to root:root.

Comment 5 Jiri Denemark 2012-07-23 10:05:38 UTC
The root cause of this error is bug 547546. I'll keep this bug open (rather than closing it as dup) to track possible complications we need to handle during migration and also as a reminder that fixes for bug 547546 need to be verified during migration.

Comment 7 Xu Wang 2012-11-05 10:56:57 UTC
I got the same error with migrate & domjobabort testing scenario.
Details are as follows:

Version
libvirt-0.10.2-6.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.330.el6.x86_64
kernel-2.6.32-335.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
0. Prepare a migration environment

1. start domain "rhel_mig" on srouce host
   # virsh start rhel_mig
 
2. wait for the domain fully started.
 
3. migrate on source
   # virsh migrate --live rhel_mig qemu+ssh://${dest_host_ip}/system --verbose
 
4. before step 3 finished, open another terminal to cancel migrating job
   # virsh domjobabort rhel_mig

Actual results:
After step 4, migrating job is successfully canceled, but there are buffer I/O errors in guest like the screenshot in this bug's attachment.

Comment 9 Xu Wang 2012-11-16 09:30:58 UTC
I got this error again, there are buffer I/O errors on the screen of guest, and commands in the guest doesn't work.
Details are as follows:

Version
libvirt-0.9.10-21.el6_3.6.x86_64
qemu-kvm-0.12.1.2-2.295.el6_3.8.x86_64
kernel-2.6.32-279.14.1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
0. Prepare a migration environment

1. Define and start network with the following testbr.xml;
<network>
  <name>testbr</name>
  <uuid>8da85d86-fbd9-c2a1-013b-f121e7c42c8a</uuid>
  <forward mode='nat'/>
  <bridge name='testbr' stp='on' delay='0' />
  <ip address='192.168.100.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.100.2' end='192.168.100.6' />
    </dhcp>
  </ip>
</network>

# virsh net-define testbr.xml
# virsh net-start testbr

2. Define and start a guest with image on /mnt dir and replace the interface segment with the following xml
<interface type='network'>
      <mac address='52:54:00:c5:66:80'/>
      <source network='testbr'/>
      <target dev='vnet0'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

3. Issue the following command to migrateguest
# virsh start rhel_mig
# ll -Z /mnt/rhel_mig.img
-rw-------. qemu qemu system_u:object_r:nfs_t:s0       /mnt/test_xuwan.img
# virsh migrate rhel_mig --live qemu+ssh://$target_ip/system
error: Network not found: no network with matching name 'testbr'
# ll -Z /mnt/rhel_mig.img
-rw-------. root root system_u:object_r:nfs_t:s0       /mnt/test_xuwan.img

Actual results:
After step 3, there are buffer I/O errors in guest like the screenshot in this bug's attachment.

Expected results:
After step 3, migration will fail, but the guest should run well on the source.

Comment 10 Jiri Denemark 2012-11-16 11:01:39 UTC
Xu, comments 7 and 9 are both hitting the same issue this bug is about. Basically, whenever you start a migration and the migration fails on destination, libvirtd on destination will reset file ownership to root:root in an attempt to cleanup after failed start of the domain and that will cause IO errors on files stored on NFS. Comments 7 and 9 only differ in the reason why migration failed on destination. In comment 7 it failed because it was you aborted it and in comment 9 it failed because the required network was not found on destination.

Comment 11 Eric Blake 2013-08-19 21:45:49 UTC
bug 895826 is another reported instance where a failed migration invokes the relabeling cleanup, and proposes a patch that might solve the immediate symptoms (if migration fails, then don't attempt relabels on the destination, because the source is still using the file), without needing the more complex fix of proper ref-counting and restoring permissions to original settings in the first place.

Comment 12 Eric Blake 2013-08-20 22:47:26 UTC
Upstream patch proposed:
https://www.redhat.com/archives/libvir-list/2013-August/msg01005.html

Comment 14 zhe peng 2013-09-02 09:03:12 UTC
I can reproduce this with:
libvirt-1.1.1-2.el7.x86_64
qemu-kvm-1.5.2-4.el7.x86_64
kernel-3.10.0-9.el7.x86_64

verify with build:
libvirt-1.1.1-3.el7.x86_64
qemu-kvm-1.5.2-4.el7.x86_64
kernel-3.10.0-9.el7.x86_64

step:
1, prepare a nfs and mount it to host A and B as a shared nfs pool.
2, virt_use_nfs = off (B)
   virt_use_nfs = on  (A)
3, define and start a guest on A
4, migrate guest from A to B
# migrate --live aaa qemu+ssh://$hostB_ip/system
root.106.30's password: 
error: internal error: process exited while connecting to monitor: char device redirected to /dev/pts/1 (label charserial0)
qemu-kvm: -drive file=/var/lib/libvirt/migrate/kvm-rhel6.4-x86_64-qcow2.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none: could not open disk image /var/lib/libvirt/migrate/kvm-rhel6.4-x86_64-qcow2.img: Permission denied

no I/O errors in guest.
check ownership of images
# ll -Z
-rw-r--r--. qemu qemu unconfined_u:object_r:virt_image_t:s0 kvm-rhel6.4-x86_64-qcow2.img

not changed to root:root, worked as expect, move to verified.

Comment 16 Ludek Smid 2014-06-13 13:26:00 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.