Bug 1285921

Summary: Can't migrate VMs with cache != none, but 'none' doesn't work with 4k native drives
Product: Red Hat Enterprise Linux 6 Reporter: Madison Kelly <mkelly>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.7CC: agk, cluster-maint, djansa, fdinitto
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-30.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 19:15:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Madison Kelly 2015-11-27 00:46:48 UTC
Description of problem:

Systems with 4k native disks are not compatible with 'cache=none' (install fails, it can't see the disk). So cache must be set to 'writeback' or 'writethrough'. However, with this set, migration fails:

====
[root@node2 ~]# clusvcadm -M vm:server -m node2.ccrs.bcn
Trying to migrate vm:server to node2.ccrs.bcn...Failed; service running on original owner
====

Logs on host:

====
Nov 27 02:41:06 node1 rgmanager[32465]: Migrating vm:server to node2.ccrs.bcn
Nov 27 02:41:06 node1 rgmanager[684]: [vm] Migrate server to node2.ccrs.bcn failed:
Nov 27 02:41:06 node1 rgmanager[706]: [vm] error: Unsafe migration: Migration may lead to data corruption if disks use cache != none
Nov 27 02:41:06 node1 rgmanager[32465]: migrate on vm "server" returned 150 (unspecified)
Nov 27 02:41:06 node1 rgmanager[32465]: Migration of vm:server to node2.ccrs.bcn failed; return code 150
====


Version-Release number of selected component (if applicable):

resource-agents-3.9.5-24.el6_7.1.x86_64



How reproducible:

100%


Steps to Reproduce:
1. Build a machine using only 4k native drives
2. Create a VM with a dedicated clustered LV as storage, verify that 'cache=none' is not supported.
3. Use 'cache=write{back,through}', confirm VM now installs.
4. Try to migrate the VM.


Actual results:

Refuses to migrate


Expected results:

Migrate, though possibly with warning or alert of some sort. Possibly call a flush after pausing the VM prior to kicking over to the peer?


Additional info:

Using a cman + rgmanager cluster.

Comment 2 Oyvind Albrigtsen 2015-11-30 16:27:14 UTC
Tested and created pull request for working patch: https://github.com/ClusterLabs/resource-agents/pull/707

Comment 3 Oyvind Albrigtsen 2015-11-30 16:31:35 UTC
Set migrate_options="--unsafe" to make it work where cache!=none.

Comment 4 Madison Kelly 2015-11-30 16:36:55 UTC
I will grab a copy of the patched RA and test in the next day or two and report back.

Thanks!

Comment 5 Oyvind Albrigtsen 2016-01-08 09:09:56 UTC
Before:
# rpm -q resource-agents
resource-agents-3.9.5-24.el6_7.1.x86_64

# clusvcadm -M vm:vm1 -m host2
Trying to migrate vm:vm1 to host2...Failed; service running on original owner
# tail -f /var/log/cluster/rgmanager.log
[vm] error: Unsafe migration: Migration may lead to data corruption if disks use cache != none


After:
Add migrate_options="--unsafe" to cluster.conf and reload the configuration.

# rpm -q resource-agents
resource-agents-3.9.5-30.el6.x86_64
# clusvcadm -M vm:vm1 -m host2
Trying to migrate vm:vm1 to host2...Success
# tail -f /var/log/cluster/rgmanager.log
[vm] virsh migrate  --live --unsafe vm1 qemu+ssh://host2/system tcp:host2
Migration of vm:vm1 to host2 completed

Comment 7 Madison Kelly 2016-01-22 00:12:59 UTC
This appears to now work. Thank you!

Comment 10 errata-xmlrpc 2016-05-10 19:15:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0735.html