Description of problem: There are two-nodes cluster. The hostname of each nodes are 'sk010001' and 'sk010002'. Each nodes has two bonding network interfaces for public and private (interconnect). The hostname matches the hostname of the ip address on public network. Node1: sk010001 bond0 (for public network) : 172.22.51.1 sk010001 bond2 (for private network): 172.22.48.131 sk010001-hb Node2: sk010002 bond0 (for public network) : 172.22.51.2 sk010002 bond2 (for private network): 172.22.48.132 sk010002-hb They specified migration_mapping option in cluster.conf to use the private network for migration. When doing a live migration, the traffic should use the -hb interfaces, but bond0 is used. The rgmanager uses the following command for live migration from sk010001 to sk010002. virsh migrate --live su21k003 qemu+ssh://sk010002-hb/system But, this is not enough for the purpose. For transfering the guest image on migration, the private network will not used. --migrateuri option of 'virsh migrate' is needed for it. So, the following command should be executed. virsh migrate --live su21k003 qemu+ssh://sk010002-hb/system tcp:172.22.48.132 migration_mapping option in cluster.conf is used to replace the hostname in --desturi option. It's not care --migrateuri option at all. I created a patch for it, though this is not tested yet. Version-Release number of selected component (if applicable): rgmanager-2.0.52-6.el5 How reproducible: Always Steps to Reproduce: 1. set migration_mapping option to specify the interface on the private network 2. execute migration on cluster 3. watch the RX traffic on the destination node with ifconfig Actual results: The private network is not used for migration Expected results: The private network is used for migration Additional info: My proposed patch requires a bugfix of libvirt package. (BZ595992 is for the libvirt.) Currently, the migration fails if the port number is not specified with --migrateuri option like tcp:172.22.48.132:5000. This libvirt fix enables the migration to finish properly without specified port number.
Created attachment 416684 [details] proposed patch
My understanding is the destination is taken from the URI, so either there is a DNS or /etc/hosts mixup causing sk010002-hb to be confused with sk010002, or there is a bug in libvirt requiring it to be specified twice for some reason: http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Virtualization_Guide/sect-Virtualization-KVM_live_migration-Live_KVM_migration_with_virsh.html
The hostname libvirt is trying to migrate to is the output of 'virsh hostname' on the destination host, which in this case is sk010002. As was recently discussed upstream, this is a deficiency of the libvirt migrations protocol. Possible solutions: - Have virsh always build a --migrateuri for qemu if the user doesn't specify one, using the hostname from the destination connection. Generating a port will suck though, and be much less safe than the destination host doing it. This is basically what virt-manager already does. - Put a qemu specific hack in virDomainMigratePrepare2 in libvirt.c, which takes the URI the destination threw back at us, and splice in the hostname from the dest URI. - Find some way to fix the remote libvirt driver so it sees the libvirt URI we are using on using on the source host. No idea if this is even possible. Yeah, all these solutions suck pretty bad.
Ok, so it's something we need to work around in rgmanager. Cole, do you know if this is still a problem in F12 or RHEL6 beta?
It's not really solved upstream yet, so it's a problem for all libvirt versions. Adding a way for the user to specify the --migrateuri option in rgmanager would be useful anyways: there may be times when the user explicitly does not want to use the same hostname/interface that the libvirt URI is using (which is why we have the option). Libvirt should still do the intuitive thing by default though.
Ok. We'll work around it in rgmanager. To that end, Masahiro's patch looks correct, though it might be slightly more correct to use the original hostname in the migrate-uri instead of the target hostname.
I.e. this would, I think, be the most correct: virsh migrate --live su21k003 \ qemu+ssh://sk010002/system tcp:sk010002-hb ^^^^^^^^ Masahiro's patch would provide the following, which is 100% acceptable I believe: virsh migrate --live su21k003 qemu+ssh://sk010002-hb/system tcp:sk010002-hb ^^^^^^^^^^^ Since the latter requires the least amount of code changes and the work is done, I vote for that we use Masahiro's patch ;)
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=94085eace39e248040cf7069c7294178c6f944ce
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0134.html