Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 616927[details]
logs
Description of problem:
In a two hosts cluster with two hosts and NFS Storage I run vms whom are writing with no cache.
I block the connectivity to the storage domain using iptables
vm's remain in up state which causes engine to start migrate.
during the migration qemu changes the vm's state to pause on EIO
on destination vm state changes to UP automatically
Version-Release number of selected component (if applicable):
si18.1 and si16.2
qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.295.el6_3.2.x86_64
libvirt-0.9.10-21.el6_3.4.x86_64
vdsm-4.9.6-31.0.el6_3.x86_64 and
How reproducible:
100%
Steps to Reproduce:
1. run vm's on spm with NFS storage with two hosts - make sure they are writing
2. block connectivity to the storage domain using iptables
3.
Actual results:
vm's start to migrate since they are still in up state
they change state to pause during migration and move to up state in destination.
Expected results:
if vm state was changed to pause on EIO during migration it should remain in pause state in dest.
Additional info: logs
[root@localhost ~]# vdsClient -s 0 list table
b88a8bb6-605c-4ce5-b50a-9bfdefc6aa8d 11576 XP-1 Migration Source*
dfa58ab9-912a-411d-9a8c-6c299ecae568 11283 RHEL-1 Migration Source*
980089e4-0867-4ba3-a911-7d91acada2f4 11431 RHEL-2 Migration Source
2099a02f-b302-4e6c-8162-193e2c5e54c6 11717 XP-2 Migration Source
4e45d6c5-0cd2-4841-b25f-acba0599284c 12001 XP-3 Migration Source*
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 11858 RHEL-3 Migration Source
[root@localhost ~]# virsh -r list
Id Name State
----------------------------------------------------
7 RHEL-1 running
8 RHEL-2 paused
9 XP-1 paused
11 RHEL-3 running
12 XP-3 running
[root@localhost ~]# vdsClient -s 0 list table
dfa58ab9-912a-411d-9a8c-6c299ecae568 11283 RHEL-1 Migration Source
4e45d6c5-0cd2-4841-b25f-acba0599284c 12001 XP-3 Migration Source
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 11858 RHEL-3 Migration Source
[root@localhost ~]# virsh -r list
Id Name State
----------------------------------------------------
7 RHEL-1 running
11 RHEL-3 running
12 XP-3 running
[root@localhost ~]# virsh -r list
Id Name State
----------------------------------------------------
7 RHEL-1 running
11 RHEL-3 paused
12 XP-3 running
[root@localhost ~]# vdsClient -s 0 list table
dfa58ab9-912a-411d-9a8c-6c299ecae568 11283 RHEL-1 Migration Source
4e45d6c5-0cd2-4841-b25f-acba0599284c 12001 XP-3 Migration Source
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 11858 RHEL-3 Migration Source
destination:
[root@gold-vdsd ~]# virsh -r list
Id Name State
----------------------------------------------------
7 XP-3 running
8 RHEL-1 running
9 XP-2 running
10 XP-1 running
11 RHEL-3 running
12 RHEL-2 paused
[root@gold-vdsd ~]# vdsClient -s 0 list table
b88a8bb6-605c-4ce5-b50a-9bfdefc6aa8d 3026 XP-1 Up
dfa58ab9-912a-411d-9a8c-6c299ecae568 2751 RHEL-1 Up
980089e4-0867-4ba3-a911-7d91acada2f4 4488 RHEL-2 Paused
2099a02f-b302-4e6c-8162-193e2c5e54c6 2889 XP-2 Up
4e45d6c5-0cd2-4841-b25f-acba0599284c 2601 XP-3 Up
69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 3163 RHEL-3 Up
Hi
This is known issue. There is only one way to fix it, and that is to add to the protocol one section that says what is the _state_ of the virt machine.
Just to be sure that I have understood the behaviour you want:
- source is running well
- we start migration (up in destination)
- write/read error on NFS server
- source machine stops with that error
- we want destination to not start automatically (there is posibility of data corruption), right?
- only real solution is to fix the error on source
- and migrate then
Is that right?
(In reply to comment #3)
> Hi
>
> This is known issue. There is only one way to fix it, and that is to add to
> the protocol one section that says what is the _state_ of the virt machine.
>
> Just to be sure that I have understood the behaviour you want:
>
> - source is running well
> - we start migration (up in destination)
> - write/read error on NFS server
> - source machine stops with that error
> - we want destination to not start automatically (there is posibility of
> data corruption), right?
> - only real solution is to fix the error on source
> - and migrate then
>
> Is that right?
IMHO it doesn't matter if the guest will be re-started on the destination.
That was the whole point behind mgmt reason to migrate it.
If the destination host has no storage connectivity either, then the guest should be automatically paused.
The problem is that this is a very subtle issue and I rather not migrate such guests in EIO/ENOSPACE case. It should work but hard to get right. That's why it safer just not to migrate such guests and get mgmt to kill them on the source.
At least until such a scenario will be tested 1000 times successfuly.
Dor
Is NFS being mounted soft? If it is hard, you shouldn't get into EIO, and you will not be able to complete migration until the connection to the server is restored.
Soft-mounted NFS is not safe and is not supported.
Created attachment 616927 [details] logs Description of problem: In a two hosts cluster with two hosts and NFS Storage I run vms whom are writing with no cache. I block the connectivity to the storage domain using iptables vm's remain in up state which causes engine to start migrate. during the migration qemu changes the vm's state to pause on EIO on destination vm state changes to UP automatically Version-Release number of selected component (if applicable): si18.1 and si16.2 qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64 qemu-kvm-rhev-tools-0.12.1.2-2.295.el6_3.2.x86_64 libvirt-0.9.10-21.el6_3.4.x86_64 vdsm-4.9.6-31.0.el6_3.x86_64 and How reproducible: 100% Steps to Reproduce: 1. run vm's on spm with NFS storage with two hosts - make sure they are writing 2. block connectivity to the storage domain using iptables 3. Actual results: vm's start to migrate since they are still in up state they change state to pause during migration and move to up state in destination. Expected results: if vm state was changed to pause on EIO during migration it should remain in pause state in dest. Additional info: logs [root@localhost ~]# vdsClient -s 0 list table b88a8bb6-605c-4ce5-b50a-9bfdefc6aa8d 11576 XP-1 Migration Source* dfa58ab9-912a-411d-9a8c-6c299ecae568 11283 RHEL-1 Migration Source* 980089e4-0867-4ba3-a911-7d91acada2f4 11431 RHEL-2 Migration Source 2099a02f-b302-4e6c-8162-193e2c5e54c6 11717 XP-2 Migration Source 4e45d6c5-0cd2-4841-b25f-acba0599284c 12001 XP-3 Migration Source* 69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 11858 RHEL-3 Migration Source [root@localhost ~]# virsh -r list Id Name State ---------------------------------------------------- 7 RHEL-1 running 8 RHEL-2 paused 9 XP-1 paused 11 RHEL-3 running 12 XP-3 running [root@localhost ~]# vdsClient -s 0 list table dfa58ab9-912a-411d-9a8c-6c299ecae568 11283 RHEL-1 Migration Source 4e45d6c5-0cd2-4841-b25f-acba0599284c 12001 XP-3 Migration Source 69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 11858 RHEL-3 Migration Source [root@localhost ~]# virsh -r list Id Name State ---------------------------------------------------- 7 RHEL-1 running 11 RHEL-3 running 12 XP-3 running [root@localhost ~]# virsh -r list Id Name State ---------------------------------------------------- 7 RHEL-1 running 11 RHEL-3 paused 12 XP-3 running [root@localhost ~]# vdsClient -s 0 list table dfa58ab9-912a-411d-9a8c-6c299ecae568 11283 RHEL-1 Migration Source 4e45d6c5-0cd2-4841-b25f-acba0599284c 12001 XP-3 Migration Source 69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 11858 RHEL-3 Migration Source destination: [root@gold-vdsd ~]# virsh -r list Id Name State ---------------------------------------------------- 7 XP-3 running 8 RHEL-1 running 9 XP-2 running 10 XP-1 running 11 RHEL-3 running 12 RHEL-2 paused [root@gold-vdsd ~]# vdsClient -s 0 list table b88a8bb6-605c-4ce5-b50a-9bfdefc6aa8d 3026 XP-1 Up dfa58ab9-912a-411d-9a8c-6c299ecae568 2751 RHEL-1 Up 980089e4-0867-4ba3-a911-7d91acada2f4 4488 RHEL-2 Paused 2099a02f-b302-4e6c-8162-193e2c5e54c6 2889 XP-2 Up 4e45d6c5-0cd2-4841-b25f-acba0599284c 2601 XP-3 Up 69a7f6b4-cc45-4c67-b60f-ae65a206a1b9 3163 RHEL-3 Up