770626 – Unable to override system UUID using libvirtd.conf's host_uuid

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 770626 - Unable to override system UUID using libvirtd.conf's host_uuid

Summary: Unable to override system UUID using libvirtd.conf's host_uuid

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.2
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Peter Krempa
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-12-28 05:41 UTC by Madison Kelly
Modified:	2012-04-23 18:24 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-04-23 18:24:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Madison Kelly 2011-12-28 05:41:14 UTC

Description of problem:

I have two servers with Tyan S5510 mainboards. These share identical system UUIDs, which linvirt uses when setting the node's UUID. This blocks live migration because libvirt sees the same UUID on both nodes.

I tried over-riding the system UUID using /etc/libvirt/libvirtd.conf's host_uuid, but even after stop -> starting the libvirtd daemon (in fact, after a full reboot), 'virsh sysinfo | grep uuid' returns the same UUID as 'dmidecode -s system-uuid'.

It would appear that libvirtd is seeing the system UUID as valid, and thus ignoring the configuration file. This leaves migration of VMs impossible.

Version-Release number of selected component (if applicable):

libvirt-0.9.4-23.el6_2.1.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Create a two-node cluster using two Tyan S5510 mainboards (http://tyan.com/product_SKU_spec.aspx?ProductType=MB&pid=698&SKU=600000217).
2. Confirm that both nodes show the same system UUID.
3. Try to live-migrate a VM to the other node, you will get an '[vm] error: internal error Attempt to migrate guest to the same host <uuid>' error in syslog.
  
Actual results:

See "Additional info"

Expected results:

A user who sets libvirtd.conf's 'host_uuid' should be assumed to know what s/he is doing and thus, libvirtd should give that UUID priority over any system UUID.

Additional info:

[root an-node01 ~]# virsh sysinfo | grep uuid
    <entry name='uuid'>03000200-0400-0500-0006-000700080009</entry>

[root an-node01 ~]# diff -u /etc/libvirt/libvirtd.conf.orig
/etc/libvirt/libvirtd.conf
--- /etc/libvirt/libvirtd.conf.orig	2011-12-27 22:29:01.243394880 -0500
+++ /etc/libvirt/libvirtd.conf	2011-12-27 22:59:02.738640396 -0500
@@ -365,4 +365,4 @@
 # NB This default all-zeros UUID will not work. Replace
 # it with the output of the 'uuidgen' command and then
 # uncomment this entry
-#host_uuid = "00000000-0000-0000-0000-000000000000"
+host_uuid = "31873B9E-1069-42CE-B950-137AE5EAA3D1"

[root an-node01 ~]# /etc/init.d/libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

[root an-node01 ~]# virsh sysinfo | grep uuid
    <entry name='uuid'>03000200-0400-0500-0006-000700080009</entry>

[root an-node01 ~]# rpm -q libvirt
libvirt-0.9.4-23.el6_2.1.x86_64

[root an-node01 ~]# cat /etc/issue
Red Hat Enterprise Linux Server release 6.2 (Santiago)
Kernel \r on an \m

Comment 1 Madison Kelly 2011-12-28 05:42:13 UTC

Note that I've set the priority to "high" as, in cases like mine, a cluster is left in an unresolvable state blocking HA VMs. If another priority should be set, please adjust.

Comment 3 Peter Krempa 2012-01-04 14:53:29 UTC

Hello digimer,

could you please check and compare outputs of 

virsh capabilities | grep uuid

on both of the nodes. The sysinfo command retrieves the data provided by chassis/BIOS, but libvirtd's internal UUID that is checked while migrating and can be set in the config file is obtained in the capabilities XML. I checked this on current upstream libvirt code, and setting of arbitrary UUID works there.

Thanks

Peter

Comment 4 Madison Kelly 2012-01-04 14:58:01 UTC

I'll be happy to get you this. It might be later tonight/tomorrow though before I get back to the test cluster. Thanks for following up.

Comment 6 Madison Kelly 2012-01-05 02:04:34 UTC

Now this is odd. Identical servers, both fully up to date RHEL 6.2 installs. The first still shows the system board's non-unique UUID for 'virsh sysinfo' but the second doesn't. Both show the host_uuid value for 'virsh capabilities'. 


[root@an-node01 ~]# rpm -q libvirt
libvirt-0.9.4-23.el6_2.1.x86_64
[root@an-node01 ~]# virsh sysinfo | grep uuid
    <entry name='uuid'>03000200-0400-0500-0006-000700080009</entry>
[root@an-node01 ~]# virsh capabilities | grep uuid
    <uuid>31873b9e-1069-42ce-b950-137ae5eaa3d1</uuid>
[root@an-node01 ~]# dmidecode | grep UUID
	UUID: 03000200-0400-0500-0006-000700080009


[root@an-node02 ~]# rpm -q libvirt
libvirt-0.9.4-23.el6_2.1.x86_64
[root@an-node02 ~]# virsh sysinfo | grep uuid
    <entry name='uuid'>90b8d280-c9ff-4e0e-867e-6d4f7d915995</entry>
[root@an-node02 ~]# virsh capabilities | grep uuid
    <uuid>90b8d280-c9ff-4e0e-867e-6d4f7d915995</uuid>
[root@an-node02 ~]# dmidecode | grep UUID
	UUID: 03000200-0400-0500-0006-000700080009

Comment 7 Peter Krempa 2012-01-09 17:33:16 UTC

That's strange, but I can't reproduce that on my laptop (Thinkpad T61). I have some questions on you:
Is it easily reproducible? 
Is it preventing you from migrating guests between the two hosts? 
If yes, could you please test it on upstream libvirt?

Thanks

Peter

(Sorry for not catching up earlier. I wrote a response and forgot to submit it.)

Comment 8 Madison Kelly 2012-01-09 17:41:12 UTC

Hi Peter,

It is/was easy to reproduce, but it only happens when both nodes share the same systemboard UUID. I don't think this is a problem on Thinkpads, but it is on my boards. To replicate the fault, you would need a wrapper for dmidecode which returned the same UUID on two hosts. Then you should see the problem.

Yes, it does block migration which is how I realized there was a problem:

====
Dec 27 22:00:46 an-node01 rgmanager[2492]: Migrating vm:vm0001-dev to an-node02.alteeve.com
Dec 27 22:00:46 an-node01 rgmanager[22331]: [vm] Migrate vm0001-dev to an-node02.alteeve.com failed:
Dec 27 22:00:46 an-node01 rgmanager[22353]: [vm] error: internal error Attempt to migrate guest to the same host 00020003-0004-0005-0006-000700080009
Dec 27 22:00:46 an-node01 rgmanager[2492]: migrate on vm "vm0001-dev" returned 150 (unspecified)
Dec 27 22:00:46 an-node01 rgmanager[2492]: Migration of vm:vm0001-dev to an-node02.alteeve.com failed; return code 150
====

I've since torn down the test cluster to start another project. However, if you can't reproduce the problem, I can rebuild it. I would have to install the upstream on RHEL6, but so long as that doesn't devolve into dependency hell, I should be able to do it.

Comment 9 Dave Allan 2012-01-09 19:48:57 UTC

Peter, maybe it would appear with two VMs with a BIOS uuid specified?

Comment 11 jtd 2012-03-27 18:21:17 UTC

I've also experienced this problem on CentOS 6.2 systems using some Silicon Mechanics machines which have the same UUID, presumably because the vendor neglected to set them properly.

I can't tell from this discussion whether there is a plan to change any of the behaviors mentioned above.  It seems to me that it would be best if the host UUID emitted by 'virsh sysinfo' is the same as that shown in the capabilities output.  Is that going to get changed?  If not, how can we avoid this kind of confusion in the future?

Comment 12 Madison Kelly 2012-03-27 18:32:49 UTC

As an aside; I've since seen this problem reproduced on another class of mainboards (built by Tyan). In discussions with them, they sent me a DOS tool for setting a random UUID which I put onto a freedos disk. This set the system UUIDs properly, and as such my problem was resolved. However, it was still a problem until the fix was applied, and I am pretty sure many users won't be able to get a similar tool from all vendors.

@jtd

If your systems are based on Tyan boards (dmidecode might give you this info), I'd recommend calling Tyan support and explain the issue. I would offer the tool directly, but I don't want to risk harming your system should the tool not be compatible.

Comment 13 jtd 2012-03-27 18:46:18 UTC

My systems aren't based on Tyan boards, but my primary concern is the inconsistency in libvirt's output, not whether my boards' UUIDs can be fixed.  Setting the board UUIDs is preferable to a software configuration change, but the ability to set host_uuid is a documented feature of libvirt and I'm seeking clarity on what the outcome of this bug report is going to be so I know whether to anticipate a fix in the future.

Comment 14 Madison Kelly 2012-03-27 18:51:47 UTC

Oh, I hear you and I agree that this is a feature in libvirt that needs to be fixed.

Knowing that it can sometimes take a while to get bugs fixed though, finding an interim work-around is sometimes necessary. To that end, I wrote a wrapper for dmidecode that reads libvirt's config file and, if the 'host_uuid' is set, it returns that UUID instead of the actual UUID. It's a messy work-around, but it does work...

The details are here: 

https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Setting_host_uuid_Didn.27t_Work.2C_What_Now.3F

Comment 15 Dave Allan 2012-03-27 19:36:27 UTC

Neither I nor Peter afaik has hardware that reproduces this behavior, would anybody who has hardware that does be willing to help debug it?

Comment 16 Madison Kelly 2012-03-27 20:19:33 UTC

Dave,

  I am away on business and won't have access to my lab until early April. Otherwise, I'd be happy to help.

  Perhaps you could induce the problem using the dmidecode wrapper script to always return '03000200-0400-0500-0006-000700080009', regardless of how it's called. That would effectively simulate the problem environment. 

  If jtd can't assist, and that work-around is not feasible, I can help when I return. I will need a reminder though, as I am certain to forget between now and then. A follow-up here on the 4th would be awesome.

Cheers

Comment 17 Dave Allan 2012-04-04 19:26:28 UTC

(In reply to comment #16)
> and then. A follow-up here on the 4th would be awesome.

Following up here :)

Comment 18 Peter Krempa 2012-04-13 13:54:45 UTC

I had another look at the code and the UUID that's provided to libvirt in the configuration file variable host_uuid= is set as libvirtd's host uuid as the first thing after parsing the config file (the only limitation is that it has to be a valid UUID after libvirts standards, that are ... very light). My machine has a valid UUID and I'm able to override it by defining a custom one in the configuration:

# grep host_uuid /etc/libvirt/libvirtd.conf 
host_uuid = "13371337-1337-1337-1337-133713371337"

# virsh capabilities 
<capabilities>

  <host>
    <uuid>13371337-1337-1337-1337-133713371337</uuid>
...

so I don't think this problem should be reaching any further than just changing the host_uuid variable in the configuration and restarting the daemon. Migration should work as a charm afterwards as the migration cookie is filled with contents of host_uuid. If this doesn't work for you, we'll need more information from your system.

Comment 19 Madison Kelly 2012-04-20 00:40:23 UTC

I just got my system set back up and I can confirm that, in my case on a fully updated RHEL 6.2 machine, the issue is resolved.

Not sure how it got fixed, but thanks!

Comment 20 Peter Krempa 2012-04-23 18:24:40 UTC

Thanks for confirming that it works. I'm closing this bug as it appears that it's working now. Feel free to reopen it if the issue should appear again.

Note You need to log in before you can comment on or make changes to this bug.