Bug 770626
| Summary: | Unable to override system UUID using libvirtd.conf's host_uuid | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Madison Kelly <mkelly> |
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.2 | CC: | acathrow, dallan, jtd, mzhan, rwu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-04-23 18:24:40 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Madison Kelly
2011-12-28 05:41:14 UTC
Note that I've set the priority to "high" as, in cases like mine, a cluster is left in an unresolvable state blocking HA VMs. If another priority should be set, please adjust. Hello digimer, could you please check and compare outputs of virsh capabilities | grep uuid on both of the nodes. The sysinfo command retrieves the data provided by chassis/BIOS, but libvirtd's internal UUID that is checked while migrating and can be set in the config file is obtained in the capabilities XML. I checked this on current upstream libvirt code, and setting of arbitrary UUID works there. Thanks Peter I'll be happy to get you this. It might be later tonight/tomorrow though before I get back to the test cluster. Thanks for following up. Now this is odd. Identical servers, both fully up to date RHEL 6.2 installs. The first still shows the system board's non-unique UUID for 'virsh sysinfo' but the second doesn't. Both show the host_uuid value for 'virsh capabilities'.
[root@an-node01 ~]# rpm -q libvirt
libvirt-0.9.4-23.el6_2.1.x86_64
[root@an-node01 ~]# virsh sysinfo | grep uuid
<entry name='uuid'>03000200-0400-0500-0006-000700080009</entry>
[root@an-node01 ~]# virsh capabilities | grep uuid
<uuid>31873b9e-1069-42ce-b950-137ae5eaa3d1</uuid>
[root@an-node01 ~]# dmidecode | grep UUID
UUID: 03000200-0400-0500-0006-000700080009
[root@an-node02 ~]# rpm -q libvirt
libvirt-0.9.4-23.el6_2.1.x86_64
[root@an-node02 ~]# virsh sysinfo | grep uuid
<entry name='uuid'>90b8d280-c9ff-4e0e-867e-6d4f7d915995</entry>
[root@an-node02 ~]# virsh capabilities | grep uuid
<uuid>90b8d280-c9ff-4e0e-867e-6d4f7d915995</uuid>
[root@an-node02 ~]# dmidecode | grep UUID
UUID: 03000200-0400-0500-0006-000700080009
That's strange, but I can't reproduce that on my laptop (Thinkpad T61). I have some questions on you: Is it easily reproducible? Is it preventing you from migrating guests between the two hosts? If yes, could you please test it on upstream libvirt? Thanks Peter (Sorry for not catching up earlier. I wrote a response and forgot to submit it.) Hi Peter, It is/was easy to reproduce, but it only happens when both nodes share the same systemboard UUID. I don't think this is a problem on Thinkpads, but it is on my boards. To replicate the fault, you would need a wrapper for dmidecode which returned the same UUID on two hosts. Then you should see the problem. Yes, it does block migration which is how I realized there was a problem: ==== Dec 27 22:00:46 an-node01 rgmanager[2492]: Migrating vm:vm0001-dev to an-node02.alteeve.com Dec 27 22:00:46 an-node01 rgmanager[22331]: [vm] Migrate vm0001-dev to an-node02.alteeve.com failed: Dec 27 22:00:46 an-node01 rgmanager[22353]: [vm] error: internal error Attempt to migrate guest to the same host 00020003-0004-0005-0006-000700080009 Dec 27 22:00:46 an-node01 rgmanager[2492]: migrate on vm "vm0001-dev" returned 150 (unspecified) Dec 27 22:00:46 an-node01 rgmanager[2492]: Migration of vm:vm0001-dev to an-node02.alteeve.com failed; return code 150 ==== I've since torn down the test cluster to start another project. However, if you can't reproduce the problem, I can rebuild it. I would have to install the upstream on RHEL6, but so long as that doesn't devolve into dependency hell, I should be able to do it. Peter, maybe it would appear with two VMs with a BIOS uuid specified? I've also experienced this problem on CentOS 6.2 systems using some Silicon Mechanics machines which have the same UUID, presumably because the vendor neglected to set them properly. I can't tell from this discussion whether there is a plan to change any of the behaviors mentioned above. It seems to me that it would be best if the host UUID emitted by 'virsh sysinfo' is the same as that shown in the capabilities output. Is that going to get changed? If not, how can we avoid this kind of confusion in the future? As an aside; I've since seen this problem reproduced on another class of mainboards (built by Tyan). In discussions with them, they sent me a DOS tool for setting a random UUID which I put onto a freedos disk. This set the system UUIDs properly, and as such my problem was resolved. However, it was still a problem until the fix was applied, and I am pretty sure many users won't be able to get a similar tool from all vendors. @jtd If your systems are based on Tyan boards (dmidecode might give you this info), I'd recommend calling Tyan support and explain the issue. I would offer the tool directly, but I don't want to risk harming your system should the tool not be compatible. My systems aren't based on Tyan boards, but my primary concern is the inconsistency in libvirt's output, not whether my boards' UUIDs can be fixed. Setting the board UUIDs is preferable to a software configuration change, but the ability to set host_uuid is a documented feature of libvirt and I'm seeking clarity on what the outcome of this bug report is going to be so I know whether to anticipate a fix in the future. Oh, I hear you and I agree that this is a feature in libvirt that needs to be fixed. Knowing that it can sometimes take a while to get bugs fixed though, finding an interim work-around is sometimes necessary. To that end, I wrote a wrapper for dmidecode that reads libvirt's config file and, if the 'host_uuid' is set, it returns that UUID instead of the actual UUID. It's a messy work-around, but it does work... The details are here: https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Setting_host_uuid_Didn.27t_Work.2C_What_Now.3F Neither I nor Peter afaik has hardware that reproduces this behavior, would anybody who has hardware that does be willing to help debug it? Dave, I am away on business and won't have access to my lab until early April. Otherwise, I'd be happy to help. Perhaps you could induce the problem using the dmidecode wrapper script to always return '03000200-0400-0500-0006-000700080009', regardless of how it's called. That would effectively simulate the problem environment. If jtd can't assist, and that work-around is not feasible, I can help when I return. I will need a reminder though, as I am certain to forget between now and then. A follow-up here on the 4th would be awesome. Cheers (In reply to comment #16) > and then. A follow-up here on the 4th would be awesome. Following up here :) I had another look at the code and the UUID that's provided to libvirt in the configuration file variable host_uuid= is set as libvirtd's host uuid as the first thing after parsing the config file (the only limitation is that it has to be a valid UUID after libvirts standards, that are ... very light). My machine has a valid UUID and I'm able to override it by defining a custom one in the configuration:
# grep host_uuid /etc/libvirt/libvirtd.conf
host_uuid = "13371337-1337-1337-1337-133713371337"
# virsh capabilities
<capabilities>
<host>
<uuid>13371337-1337-1337-1337-133713371337</uuid>
...
so I don't think this problem should be reaching any further than just changing the host_uuid variable in the configuration and restarting the daemon. Migration should work as a charm afterwards as the migration cookie is filled with contents of host_uuid. If this doesn't work for you, we'll need more information from your system.
I just got my system set back up and I can confirm that, in my case on a fully updated RHEL 6.2 machine, the issue is resolved. Not sure how it got fixed, but thanks! Thanks for confirming that it works. I'm closing this bug as it appears that it's working now. Feel free to reopen it if the issue should appear again. |