Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 860204

Summary: qemu-kvm: when storage is blocked in source, vms change state to pause on EIO during migration and than to up automatically in destination
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: amureini, areis, chayang, dgilbert, fsimonce, hhuang, juzhang, knoel, michen, mkenneth, pbonzini, quintela, qzhang, rbalakri, rpacheco, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage,virsh
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-24 14:34:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-09-25 09:14:18 UTC
Created attachment 616927 [details]
logs

Description of problem:

In a two hosts cluster with two hosts and NFS Storage I run vms whom are writing with no cache. 
I block the connectivity to the storage domain using iptables
vm's remain in up state which causes engine to start migrate. 
during the migration qemu changes the vm's state to pause on EIO
on destination vm state changes to UP automatically

Version-Release number of selected component (if applicable):

si18.1 and si16.2
qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.295.el6_3.2.x86_64
libvirt-0.9.10-21.el6_3.4.x86_64
vdsm-4.9.6-31.0.el6_3.x86_64 and 

How reproducible:

100%

Steps to Reproduce:
1. run vm's on spm with NFS storage with two hosts - make sure they are writing 
2. block connectivity to the storage domain using iptables
3.
  
Actual results:

vm's start to migrate since they are still in up state
they change state to pause during migration and move to up state in destination. 

Expected results:

if vm state was changed to pause on EIO during migration it should remain in pause state in dest. 

Additional info: logs

[root@localhost ~]# vdsClient -s 0 list table
b88a8bb6-605c-4ce5-b50a-9bfdefc6aa8d  11576  XP-1                 Migration Source*                        
dfa58ab9-912a-411d-9a8c-6c299ecae568  11283  RHEL-1               Migration Source*                        
980089e4-0867-4ba3-a911-7d91acada2f4  11431  RHEL-2               Migration Source                         
2099a02f-b302-4e6c-8162-193e2c5e54c6  11717  XP-2                 Migration Source                         
4e45d6c5-0cd2-4841-b25f-acba0599284c  12001  XP-3                 Migration Source*                        
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9  11858  RHEL-3               Migration Source                         
[root@localhost ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 7     RHEL-1                         running
 8     RHEL-2                         paused
 9     XP-1                           paused
 11    RHEL-3                         running
 12    XP-3                           running

[root@localhost ~]# vdsClient -s 0 list table
dfa58ab9-912a-411d-9a8c-6c299ecae568  11283  RHEL-1               Migration Source                         
4e45d6c5-0cd2-4841-b25f-acba0599284c  12001  XP-3                 Migration Source                         
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9  11858  RHEL-3               Migration Source                         
[root@localhost ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 7     RHEL-1                         running
 11    RHEL-3                         running
 12    XP-3                           running

[root@localhost ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 7     RHEL-1                         running
 11    RHEL-3                         paused
 12    XP-3                           running

[root@localhost ~]# vdsClient -s 0 list table
dfa58ab9-912a-411d-9a8c-6c299ecae568  11283  RHEL-1               Migration Source                         
4e45d6c5-0cd2-4841-b25f-acba0599284c  12001  XP-3                 Migration Source                         
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9  11858  RHEL-3               Migration Source        


destination: 

[root@gold-vdsd ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 7     XP-3                           running
 8     RHEL-1                         running
 9     XP-2                           running
 10    XP-1                           running
 11    RHEL-3                         running
 12    RHEL-2                         paused

[root@gold-vdsd ~]# vdsClient -s 0 list table
b88a8bb6-605c-4ce5-b50a-9bfdefc6aa8d   3026  XP-1                 Up                                       
dfa58ab9-912a-411d-9a8c-6c299ecae568   2751  RHEL-1               Up                                       
980089e4-0867-4ba3-a911-7d91acada2f4   4488  RHEL-2               Paused                                   
2099a02f-b302-4e6c-8162-193e2c5e54c6   2889  XP-2                 Up                                       
4e45d6c5-0cd2-4841-b25f-acba0599284c   2601  XP-3                 Up                                       
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9   3163  RHEL-3               Up

Comment 3 Juan Quintela 2012-09-28 11:58:40 UTC
Hi

This is known issue.  There is only one way to fix it, and that is to add to the protocol one section that says what is the _state_ of the virt machine.

Just to be sure that I have understood the behaviour you want:

- source is running well
- we start migration (up in destination)
- write/read error on NFS server
- source machine stops with that error
- we want destination to not start automatically (there is posibility of data corruption), right?
- only real solution is to fix the error on source
- and migrate then

Is that right?

Comment 5 Dor Laor 2012-10-17 08:57:41 UTC
(In reply to comment #3)
> Hi
> 
> This is known issue.  There is only one way to fix it, and that is to add to
> the protocol one section that says what is the _state_ of the virt machine.
> 
> Just to be sure that I have understood the behaviour you want:
> 
> - source is running well
> - we start migration (up in destination)
> - write/read error on NFS server
> - source machine stops with that error
> - we want destination to not start automatically (there is posibility of
> data corruption), right?
> - only real solution is to fix the error on source
> - and migrate then
> 
> Is that right?

IMHO it doesn't matter if the guest will be re-started on the destination.
That was the whole point behind mgmt reason to migrate it.
If the destination host has no storage connectivity either, then the guest should be automatically paused.

The problem is that this is a very subtle issue and I rather not migrate such guests in EIO/ENOSPACE case. It should work but hard to get right. That's why it safer just not to migrate such guests and get mgmt to kill them on the source.
At least until such a scenario will be tested 1000 times successfuly.
Dor

Comment 10 Paolo Bonzini 2014-09-23 10:32:09 UTC
Is NFS being mounted soft?  If it is hard, you shouldn't get into EIO, and you will not be able to complete migration until the connection to the server is restored.

Soft-mounted NFS is not safe and is not supported.