Bug 1296406

Summary: VirtualDomain: add migration_speed and migration_downtime options
Product: Red Hat Enterprise Linux 7 Reporter: Oyvind Albrigtsen <oalbrigt>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: agk, cfeist, cluster-maint, fdinitto, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-69.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 00:00:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oyvind Albrigtsen 2016-01-07 08:02:36 UTC
Description of problem:
Add option "migration_downtime" to allow live migration of machines with higher memory pressure and "migration_speed" to set the maximum migration bandwidth (in MiB/s) for a domain which is being migrated to another host.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
https://github.com/ClusterLabs/resource-agents/pull/629

Comment 3 Oyvind Albrigtsen 2016-03-04 15:38:29 UTC
Tested patch and verified that it's working as expected.

Comment 4 Mike McCune 2016-03-28 23:14:23 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 6 michal novacek 2016-09-08 14:42:23 UTC
I have verified that new live migration paramaters migration_speed and
migration_downtime can be set for VirtualDomain in
resource-agents-3.9.5-81.el7.x86_64

----

See cluster setup at the bottom. Virtual machines live migration works
correctly without these new params as well.


migration_speed unset, takes about 10 seconds
...
Sep  8 16:18:04 kiff-03 VirtualDomain(R-pool-10-34-70-94)[19714]: INFO: Issuing graceful shutdown request for domain pool-10-34-70-94.
Sep  8 16:18:59 kiff-03 VirtualDomain(R-pool-10-34-70-94)[20945]: INFO: pool-10-34-70-94: Starting live migration to light-02.cluster-qe.lab.eng.brq.redhat.com (using: virsh --connect=qemu:///system --quiet migrate --live  pool-10-34-70-94 qemu+ssh://light-02.cluster-qe.lab.eng.brq.redhat.com/system ).
Sep  8 16:19:09 kiff-03 VirtualDomain(R-pool-10-34-70-94)[20945]: INFO: pool-10-34-70-94: live migration to light-02.cluster-qe.lab.eng.brq.redhat.com succeeded.
Sep  8 16:19:09 kiff-03 VirtualDomain(R-pool-10-34-70-94)[21163]: INFO: Domain pool-10-34-70-94 already stopped.


migration_speed=5 (5mbits/s), takes about 70 seconds
...
Sep  8 16:13:37 kiff-03 VirtualDomain(R-pool-10-34-70-94)[15434]: INFO: pool-10-34-70-94: Setting live migration speed limit for pool-10-34-70-94 (using: virsh --connect=qemu:///system --quiet migrate-setspeed pool-10-34-70-94 5).
Sep  8 16:13:37 kiff-03 VirtualDomain(R-pool-10-34-70-94)[15434]: INFO: pool-10-34-70-94: Starting live migration to light-02.cluster-qe.lab.eng.brq.redhat.com (using: virsh --connect=qemu:///system --quiet migrate --live  pool-10-34-70-94 qemu+ssh://light-02.cluster-qe.lab.eng.brq.redhat.com/system ).
Sep  8 16:14:47 kiff-03 VirtualDomain(R-pool-10-34-70-94)[15434]: INFO: pool-10-34-70-94: live migration to light-02.cluster-qe.lab.eng.brq.redhat.com succeeded.

migration_downtime=1 (1 miliseconds)

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-KVM_live_migration-Live_KVM_migration_with_virsh.html
Will set a maximum tolerable downtime for a domain which is being live-migrated
to another host. The specified downtime is in milliseconds. The domain
specified must be the same domain that is being migrated.

Sep  8 16:38:15 kiff-03 VirtualDomain(R-pool-10-34-70-94)[6392]: INFO: Domain pool-10-34-70-94 already stopped.
Sep  8 16:38:42 kiff-03 VirtualDomain(R-pool-10-34-70-94)[7044]: INFO: pool-10-34-70-94: Starting live migration to light-02.cluster-qe.lab.eng.brq.redhat.com (using: virsh --connect=qemu:///system --quiet migrate --live  pool-10-34-70-94 qemu+ssh://light-02.cluster-qe.lab.eng.brq.redhat.com/system ).
Sep  8 16:38:44 kiff-03 VirtualDomain(R-pool-10-34-70-94)[7044]: INFO: pool-10-34-70-94: Setting live migration downtime for pool-10-34-70-94 (using: virsh --connect=qemu:///system --quiet migrate-setmaxdowntime pool-10-34-70-94 1).
Sep  8 16:38:52 kiff-03 VirtualDomain(R-pool-10-34-70-94)[7044]: INFO: pool-10-34-70-94: live migration to light-02.cluster-qe.lab.eng.brq.redhat.com succeeded.
Sep  8 16:38:52 kiff-03 VirtualDomain(R-pool-10-34-70-94)[7253]: INFO: Domain pool-10-34-70-94 already stopped.

----

>>(1)
[root@light-02 ~]# pcs config 
Cluster Name: STSRHTS19499
Corosync Nodes:
 light-02.cluster-qe.lab.eng.brq.redhat.com kiff-03.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
 kiff-03.cluster-qe.lab.eng.brq.redhat.com light-02.cluster-qe.lab.eng.brq.redhat.com

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true clone-max=2 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true clone-max=2 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Clone: shared-group-clone
  Meta Attrs: clone-max=2 interleave=true 
  Group: shared-group
   Resource: shared-vg (class=ocf provider=heartbeat type=LVM)
    Attributes: exclusive=false partial_activation=false volgrpname=shared
    Operations: start interval=0s timeout=30 (shared-vg-start-interval-0s)
                stop interval=0s timeout=30 (shared-vg-stop-interval-0s)
                monitor interval=10 timeout=30 (shared-vg-monitor-interval-10)
   Resource: etc-libvirt (class=ocf provider=heartbeat type=Filesystem)
    Attributes: devkice=/dev/shared/etc0 directory=/etc/libvirt/qemu fstype=gfs2 options=
    Operations: start interval=0s timeout=60 (etc-libvirt-start-interval-0s)
                stop interval=0s timeout=60 (etc-libvirt-stop-interval-0s)
                monitor interval=30s (etc-libvirt-monitor-interval-30s)
   Resource: images (class=ocf provider=heartbeat type=Filesystem)
    Attributes: device=/dev/shared/images0 directory=/var/lib/libvirt/images fstype=gfs2 options=
    Operations: start interval=0s timeout=60 (images-start-interval-0s)
                stop interval=0s timeout=60 (images-stop-interval-0s)
                monitor interval=30s (images-monitor-interval-30s)
 Resource: R-pool-10-34-70-94 (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: hypervisor=qemu:///system config=/etc/libvirt/qemu/pool-10-34-70-94.xml migration_transport=ssh migration_speed=5
  Meta Attrs: allow-migrate=true priority=100 
  Utilization: cpu=2 hv_memory=1024
  Operations: start interval=0s timeout=120s (R-pool-10-34-70-94-start-interval-0s)
              stop interval=0s timeout=120s (R-pool-10-34-70-94-stop-interval-0s)
              monitor interval=10 timeout=30 (R-pool-10-34-70-94-monitor-interval-10)
              migrate_from interval=0 timeout=120s (R-pool-10-34-70-94-migrate_from-interval-0)
              migrate_to interval=0 timeout=120 (R-pool-10-34-70-94-migrate_to-interval-0)

Stonith Devices:
 Resource: fence-light-02 (class=stonith type=fence_ipmilan)
  Attributes: ipaddr=light-02-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=light-02.cluster-qe.lab.eng.brq.redhat.com delay=5
  Operations: monitor interval=60s (fence-light-02-monitor-interval-60s)
 Resource: fence-kiff-03 (class=stonith type=fence_ipmilan)
  Attributes: ipaddr=kiff-03-ilo login=admin passwd=admin pcmk_host_check=static-list pcmk_host_list=kiff-03.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-kiff-03-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: R-pool-10-34-70-94
    Disabled on: kiff-03.cluster-qe.lab.eng.brq.redhat.com (score:-INFINITY) (role: Started) (id:cli-ban-R-pool-10-34-70-94-on-kiff-03.cluster-qe.lab.eng.brq.redhat.com)
  Resource: clvmd-clone
    Disabled on: pool-10-34-70-94 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-70-94--INFINITY)
    Disabled on: pool-10-34-70-95 (score:-INFINITY) (id:location-clvmd-clone-pool-10-34-70-95--INFINITY)
  Resource: dlm-clone
    Disabled on: pool-10-34-70-94 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-70-94--INFINITY)
    Disabled on: pool-10-34-70-95 (score:-INFINITY) (id:location-dlm-clone-pool-10-34-70-95--INFINITY)
  Resource: shared-group-clone
    Enabled on: light-02.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-shared-group-clone-light-02.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Enabled on: kiff-03.cluster-qe.lab.eng.brq.redhat.com (score:INFINITY) (id:location-shared-group-clone-kiff-03.cluster-qe.lab.eng.brq.redhat.com-INFINITY)
    Disabled on: pool-10-34-70-94 (score:-INFINITY) (id:location-shared-group-clone-pool-10-34-70-94--INFINITY)
    Disabled on: pool-10-34-70-95 (score:-INFINITY) (id:location-shared-group-clone-pool-10-34-70-95--INFINITY)
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
  start clvmd-clone then start shared-group-clone (kind:Mandatory) (id:order-clvmd-clone-shared-group-clone-mandatory)
  start shared-group-clone then start R-pool-10-34-70-94 (kind:Mandatory) (id:order-shared-group-clone-R-pool-10-34-70-94-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)
  shared-group-clone with clvmd-clone (score:INFINITY) (id:colocation-shared-group-clone-clvmd-clone-INFINITY)
Ticket Constraints:

Alerts:
 Alert: forwarder (path=/usr/tests/sts-rhel7.3/pacemaker/alerts/alert_forwarder.py)
  Recipients:
   Recipient: forwarder-recipient (value=http://virt-009.cluster-qe.lab.eng.brq.redhat.com:37676/)

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS19499
 dc-version: 1.1.15-10.el7-e174ec8
 have-watchdog: false
 last-lrm-refresh: 1473338656
 no-quorum-policy: freeze
 stonith-enabled: true

Quorum:
  Options:

>>(2)
[root@light-02 ~]# pcs status
Cluster name: STSRHTS19499
Stack: corosync
Current DC: light-02.cluster-qe.lab.eng.brq.redhat.com (version 1.1.15-10.el7-e174ec8) - partition with quorum
Last updated: Thu Sep  8 16:14:18 2016          Last change: Thu Sep  8 16:13:37 2016 by root via crm_resource on kiff-03.cluster-qe.lab.eng.brq.redhat.com

2 nodes and 13 resources configured

Online: [ kiff-03.cluster-qe.lab.eng.brq.redhat.com light-02.cluster-qe.lab.eng.brq.redhat.com ]

Full list of resources:

 R-pool-10-34-70-94     (ocf::heartbeat:VirtualDomain): Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
 fence-light-02 (stonith:fence_ipmilan):        Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
 fence-kiff-03  (stonith:fence_ipmilan):        Started light-02.cluster-qe.lab.eng.brq.redhat.com
 Clone Set: dlm-clone [dlm]
     Started: [ kiff-03.cluster-qe.lab.eng.brq.redhat.com light-02.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ kiff-03.cluster-qe.lab.eng.brq.redhat.com light-02.cluster-qe.lab.eng.brq.redhat.com ]
 Clone Set: shared-group-clone [shared-group]
     Started: [ kiff-03.cluster-qe.lab.eng.brq.redhat.com light-02.cluster-qe.lab.eng.brq.redhat.com ]


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Comment 8 errata-xmlrpc 2016-11-04 00:00:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html