Bug 1324566

Summary: Post-copy migration of non-shared storage (libvirt)
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Jiri Denemark <jdenemar>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED WONTFIX QA Contact: Fangge Jin <fjin>
Severity: medium Docs Contact:
Priority: low    
Version: ---CC: berrange, dgilbert, dyuan, fdeutsch, fjin, jdenemar, jsuchane, kchamart, mkletzan, pbonzini, rbalakri, xuzhang, zpeng
Target Milestone: rcKeywords: FutureFeature, Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1386359 (view as bug list) Environment:
Last Closed: 2021-06-15 07:31:00 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1386359, 1644988    
Bug Blocks:    

Description Jiri Denemark 2016-04-06 15:48:47 UTC
Description of problem:

Post-copy migration is supposed to always converge, but when migrating a domain with non-shared storage, post-copy migration starts once all disks are migrated to the destination host. However, storage migration does not use post-copy approach, which means migrating a domain may never finish even though post-copy was requested.

Both storage and memory needs to be migrated in a post-copy way to ensure migration always converges.

Version-Release number of selected component (if applicable):

libvirt-1.3.3-1.el7

Comment 1 Jiri Denemark 2016-04-06 15:52:03 UTC
Paolo, you seemed to have an idea what block jobs should be used by libvirt to implement post-copy storage migration. Could you describe your idea in detail?

Comment 3 Paolo Bonzini 2016-04-07 13:23:34 UTC
Sure! It's the opposite to the current NBD flow, which runs the NBD server on the destination on drive-mirror on the source.

Here, the NBD server runs on the source, the qcow2 image is created with the NBD server as the backing file, and block-stream is used on the destination to do post-copy migration.  Unfortunately you cannot switch from pre-copy to postcopy; you have to start the postcopy phase before doing "cont" on the destination, with no previous copy.

I think this makes it less desirable than for RAM.

Comment 4 Dr. David Alan Gilbert 2016-04-07 13:52:00 UTC
(In reply to Paolo Bonzini from comment #3)
> Sure! It's the opposite to the current NBD flow, which runs the NBD server
> on the destination on drive-mirror on the source.
> 
> Here, the NBD server runs on the source, the qcow2 image is created with the
> NBD server as the backing file, and block-stream is used on the destination
> to do post-copy migration.  Unfortunately you cannot switch from pre-copy to
> postcopy; you have to start the postcopy phase before doing "cont" on the
> destination, with no previous copy.
> 
> I think this makes it less desirable than for RAM.

I don't understand with this how you know when you can start running the destination; I also don't understand the interaction of why you can't do the precopy phase of RAM.

Anyway, isn't the easier story here just to use the existing block write throttling to set a block write bandwidth lower than the network bandwidth - unlike RAM, we've already got a throttle that should be able to limit based on what we need.

Comment 5 Jiri Denemark 2016-04-07 13:56:26 UTC
Apparently (confirmed with Paolo on IRC), we'd have to start block-stream jobs after stopping vCPUs on the source and before starting the destination.

Comment 6 Jiri Denemark 2016-04-07 13:57:59 UTC
That said, I think throttling disk I/O is really a better idea.

What do OpenStack guys think about it, Daniel?

Comment 7 Daniel Berrangé 2016-04-07 14:21:46 UTC
The desirable thing about post-copy is that it guarantees completion in a finite amount of time without having to do any kind of calculations wrt guest dirtying rate vs constantly changing network bandwidth availability. If you use pre-copy with bandwidth throttling you have the problem of figuring out what level of throttling you need to apply in order to ensure the guest completes in a finite predictable time - this is pretty non-trivial unless you are super conservative and apply a very strict bandwidth limit, which in turn has the problem that you're probably slowing the guest down more than it actually needs to be. THis is the prime reason OpenStack is much more enthusiastic about using post-copy, than throttling guest CPUs with cgroups or auto-converge feature of QEMU.


So from POV of ease of management having disk able to support post-copy in the same way as RAM is really very desirable for OpenStack.

Comment 8 Daniel Berrangé 2016-04-07 14:39:02 UTC
I'm thinking about the proposals on qemu-devel wrt extending the NBD server to support live backups, and how the NBD server would expose a fake block allocation bitmap to represent dirty blocks. I think that functionality could be usable as the foundation for doing combined switchable pre+post-copy for disk too.

First have the NBD server always run on the source host. Now during initial pre-copy phase, the NBD client on the target host would do one pass copying of the whole dataset. Thereafter it would loop querying the fake "block allocation bitmap" from the source, to get list of blocks which have been dirtied since the initial copy and copying those. 

When switching to post-copy, it would continue to fetch all remaining dirty blocks, but at any point it can make a request fetch a specific block being accessed right now by the VM.

Comment 9 Jiri Denemark 2016-04-07 14:52:19 UTC
So it seems you agree that implementing storage migration using the currently available block-stream job is not something you'd want from libvirt. In that case we need to clone this BZ for QEMU requesting the new functionality.

Comment 10 Daniel Berrangé 2016-04-07 18:12:31 UTC
It would be nice if we could implement it using existing functionality, from what Paolo describes, it doesnt sound like it is possible - you can do pre-copy, or post-copy, but can't switch from pre-to-post on the fly which is what we'd need to have the ability todo to match RAM handling. So it seems we likely need new QEMU functionality.

Comment 13 Dr. David Alan Gilbert 2020-05-05 10:01:43 UTC
Given 1644988 - the qemu active mirror block job code - is now done, it would be good to get this one going.

Comment 16 RHEL Program Management 2020-12-15 07:40:50 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 19 RHEL Program Management 2021-06-15 07:31:00 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.