Bug 884990

Summary: non deterministic bios dev naming in KVM guests
Product: [Fedora] Fedora Reporter: Dan Kenigsberg <danken>
Component: biosdevnameAssignee: Narendra K <narendra_k>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: bazulay, harald, iheim, jeder, jordan_hargrave, knoel, lpeer, matt_domsch, mebrown, mst, pbonzini, praveen_paladugu, vpavlin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 888529 (view as bug list) Environment:
Last Closed: 2015-02-17 14:36:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 888529    
Attachments:
Description Flags
patch disabling running-on-hypervisor check, instead check any devices have persistent names none

Description Dan Kenigsberg 2012-12-07 08:30:56 UTC
Description of problem:
biosdevname fails to allocate a stable address to nic in KVM guests.
More details at
https://lists.fedorahosted.org/pipermail/vdsm-devel/2012-December/001842.html

Version-Release number of selected component (if applicable):
biosdevname-0.4.1-2.fc18.x86_64

How reproducible:
quite often

Steps to Reproduce:
1. Boot a Fedora 18 guest with multiple nics under the KVM hypervisor
2. nics receive random names.
  

On Thu, Dec 06, 2012 at 01:12:52PM +0200, Michael S. Tsirkin wrote:
> This is not a qemu issue. This is a biosdevname/VMware issue.
> biodevname has this code:
> 
> /*
>   Algorithm suggested by:
>   http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
> */
> 
> static int
> running_in_virtual_machine (void)
> {
>     u_int32_t eax=1U, ecx=0U;
> 
>     ecx = cpuid (eax, ecx);
>     if (ecx & 0x80000000U)
>        return 1;
>     return 0;
> }
> 
> So it just looks for a hypervisor.
> 
> It should look at the hypervisor leaf
> and either blacklist vmware specifically or whitelist kvm.
> 
> Please open (preferably urgent prio) bugzilla for biosdevname component
> so we can fix it in F18, cc me.
> I can write you a patch but maintainer needs to apply it.

Comment 1 Václav Pavlín 2012-12-07 10:04:07 UTC
Hi,
Why is this biosdevname issue? I went through the thread but I am not sure how is the random naming related to biosdevname - biosdevname should terminate itself if it is in virtual machine. The problem you see there is that it terminates, or that it doesn't terminate? This was not clear from the thread for me.

Michael, if you are sure it is biosdevname problem, please provide the patch and I will apply it.

Thanks,
Vaclav

Comment 2 Michael S. Tsirkin 2012-12-07 12:35:26 UTC
I think the problem is that it terminates.
Why does it want to terminate on a VM?
The reason to terminate is apparently that vmware does not
emulate a pci bus so biosdevname makes no sense? but kvm does
emulate pci with consistent addresses same as a physical machine.
Generally I see two possible approaches:
- detect a pci device and do not terminate even on vm
- detect kvm and do not terminate even on vm

Thoughts?

Comment 3 jordan hargrave 2012-12-10 16:14:07 UTC
kvm guests do not have the knowledge (SMBIOS table or etc) of the physical machine to guest hardware mapping know if a NIC is really embedded or an add-in card.  biosdevname should just terminate if running in a guest OS and all NICs should be named ethX.  Unfortunately it appears the CPUID instruction ECX bit is not correctly implemented in some CPUs/kernels.

Comment 4 Michael S. Tsirkin 2012-12-10 16:21:20 UTC
Apparently vmware does not have a bios? kvm certainly does
so it can provide all necessary information to guests.
There is no reason to special-case kvm guests that I see.

What happens on the physical machine seems completely irrelevant:
guests run on virtual machines not physical ones.

Comment 5 Michael S. Tsirkin 2012-12-10 16:36:15 UTC
and yes all kvm releases set hypervisor bit in cpuid.

Comment 6 Paolo Bonzini 2012-12-12 08:27:09 UTC
> kvm guests do not have the knowledge (SMBIOS table or etc) of the physical 
> machine to guest hardware mapping know if a NIC is really embedded or an 
> add-in card.

Guest hardware is never embedded, in KVM it can always be treated as a PCI add-in card.  Even if it is the passthrough of the host's embedded NIC.

> biosdevname should just terminate if running in a guest OS and all NICs should 
> be named ethX.

Wrong.  biosdevname should just check for VPD as usual.  It won't find it as of now, but if one day we implement it *in the host* the guest will get it for free.

Additionally, if a VPD-enabled network card is passed to the guest with PCI device assignment, it should get stable names *even now*.

Comment 7 Narendra K 2012-12-12 18:16:25 UTC
Biosdevname requires SMBIOS type 9 records to name PCI add-in adapters (and SMBIOS type 41 records to name onboard network interfaces). In the absence of type 9 records, it looks for Slot # from the 'SltCap'.

On a Fedora 17 guest installed on a Fedora 17 host/hypervisor, it seemed like the guest BIOS did not have SMBIOS type 9 records (dmidecode -t 9 did not show any data). I also assigned a PCI device from host to the guest, but it did not result in a corresponding type 9 record created.

Biosdevname uses Vital Product Data to retrieve NIC Partition information on adapters which are NIC Partition capable.

Comment 8 Michael S. Tsirkin 2012-12-12 21:55:56 UTC
To comment 7.
First, unfortunately biosdevname boils out early after seeing it's
a VM so it will not check any of this. It's a biosdevname bug and
I don't know why it does this - probably some vmware workaround.
Yes current VMs by default don't have this bios record but
using command line one can add SHPC which gives you
a pci slot capability (this is what you mean by SltCap yes?).
Also if you assign a NIC with VPD you can
get VPD today.


Second, once biosdevname failed udev should make more effort
to make the device name persistent based on mac address,
like it did for RHEL.
It was not done on assumption that it's uncommon but it's
common for VMs. This second bit needs a udev BZ though.

Comment 9 Paolo Bonzini 2012-12-13 13:20:52 UTC
> Biosdevname requires SMBIOS type 9 records to name PCI add-in adapters

This can be added.  But the slot id in type 9 records would have nothing to do with the slot id of the host machine, it will be just a number from 0 to 31 equal to the PCI device number.  

The question is _how_ do you want the persistent names to be built?  Should they match physical slots of the host, and thus work only for assigned devices, or should the guest just make them up for the sake of persistent naming?  Either choice has a tradeoff.  The latter is easy but it may look strange in the presence of an assigned device.

In any case, note that it is basically impossible to take an assigned PCI device that is "emNN" on the host and make it "emNN" in the guest too, because the device may come and go at any time after boot, long after SMBIOS type 41 records have been prepared.

> a pci slot capability (this is what you mean by SltCap yes?).

No, he means the physical slot number from the PCI Express capability.

Comment 10 Matt Domsch 2012-12-13 14:13:22 UTC
Michael - bailing out when it sees it's running in a VM may be considered a bug I suppose, but it was quite intentional, no VMware conspiracy theories needed, and a quick 'git blame' shows I added that code in Feb 2011 (nearly 2 years ago).

At the time, there were no VM BIOSes (KVM, Xen, VMware, VirtualBox, Hyper-V, ...) that exposed SMBIOS type 9 (slot) or type 41 (embedded) device structures, nor were there any SltCap data (and then, biosdevname wasn't reading the PCI Express capability field either).

The only fallback we had was the PCI IRQ Routing Table, which too tended to have arbitrary values in them for the slots.  This was causing the "eth0" devices to appear in slot 123, which is complete non-intuitive for a VM with a single virtual NIC.

So, rather than present non-intuitive names for devices, the common case of which there is but a single NIC, and absent a better way to get information from the VM BIOS for all the different virt platforms, I punted, and put in the running_in_vm() check.

If KVM can expose the SMBIOS and/or PCI SltCap data now, and that's reliable and reproducible, it's fair to reconsider.

As for udev persistent naming based on MAC address - that's exactly what biosdevname intends to avoid. If I move a card from Slot 1 to Slot 2, the name should move from p1p1 to p2p1, not retain the same name because some secondary lookup in udev forced it back to p1p1, which would then cause a renaming collision with the device biosdevname thinks is p1p1, and we'd be right back where we started this mess.

Comment 11 Paolo Bonzini 2012-12-17 16:41:17 UTC
> The only fallback we had was the PCI IRQ Routing Table, which too tended to 
> have arbitrary values in them for the slots.  This was causing the "eth0" 
> devices to appear in slot 123, which is complete non-intuitive for a VM with a 
> single virtual NIC.

What virt platform was this on?

Comment 12 Michael S. Tsirkin 2012-12-17 16:59:28 UTC
To comment 10:
OK that's good to know.

But it seems misguided: how are VMs special here?
Any box without SMBIOS and with a single PCI NIC should be handled in
exactly same way as the VM IMHO.
Also as others pointed out, KVM is able to expose VPD and
soltcap for assigned devices and if it does biosdevname can use it right?

You also see this issue:
> a renaming collision with the device biosdevname thinks is p1p1
however we should be able to detect that biosdevname bailed out
and therefore no collision is going to occur.
Besides udev names were using a sheme different from biosdevname
so no conflict seems possible.

Anyway, there's now a regression on old hypervisors which we can't fix
retroactively. Names changing across reboots is bad experience.
What is your suggestion for a fix?

Comment 13 jordan hargrave 2013-01-06 09:39:45 UTC
Boxes without SMBIOS won't use biosdevname in RedHat... it is disabled for systems that aren't running smbios 2.5+

I still think that guests should report ethX instead of emX/pX names as the name in the guest may not relate to a specific NIC on the hypervisor.  If it is causing problems, then use biosdevname=0 on the kernel command line during installation of a guest.

Comment 14 Michael S. Tsirkin 2013-01-07 17:58:23 UTC
> the name in the guest may not relate to a specific NIC on the hypervisor.

E.g. if the NIC has VPD then it does.

Comment 15 Michael S. Tsirkin 2013-01-07 18:13:06 UTC
also
>Boxes without SMBIOS won't use biosdevname in RedHat... it is disabled for systems that aren't running smbios 2.5+
>I still think that guests should report ethX instead of emX/pX names as the name in the guest may not relate to a specific NIC on the hypervisor.  If it is causing problems, then use biosdevname=0 on the kernel command line during installation of a guest.

this will not help fix the issue.
The issue is that names are now unstable.
Fedora used to rely on the mac to make names stable.
Now it uses biosdevname but if that fails, a random
non stable name.

This makes fedora unuseable as a guest if one hotunplug/hotplugs
the NIC a lot.
Since biosdevname detects a hypervisor and refuses to give a
stable name, there is nothing a hypervisor can do.
>Boxes without SMBIOS won't use biosdevname in RedHat...
so there is no reason to blacklist a hypervisor then:
it does not have an smbios.

Comment 16 Narendra K 2013-01-08 09:55:59 UTC
(In reply to comment #15)
>>If it is causing problems, then use biosdevname=0 on the kernel command line >>during installation of a guest.
> 
> this will not help fix the issue.
> The issue is that names are now unstable.
> Fedora used to rely on the mac to make names stable.
> Now it uses biosdevname but if that fails, a random
> non stable name.
> 

In a host, when 'biosdevname=0' is passed, '/etc/udev/rules.d/70-persistent-net.rules' file would be generated and it would ensure that names are persistent across future reboots. In the guest, it looks like '70-persistent-net.rules' is not generated and seems like an issue.

Comment 17 Michael S. Tsirkin 2013-01-08 13:50:01 UTC
Further, should be automatic in guest without need to tweak kernel command line.

Comment 18 Narendra K 2013-01-09 10:34:58 UTC
In guest, 'biosdevname=0' need not be passed as biosdevname exits without suggesting any name (current behavior). As a result, interfaces are named ethN and the expected behavior is the generation of '/etc/udev/rules.d/70-persistent-net.rules' to ensure persistence of names across future reboots.

It seems like an issue with udev and needs to be addressed in udev scripts which generate 70-persistent-net.rules ?

Comment 19 Michael S. Tsirkin 2013-01-09 12:10:31 UTC
I think there are two issues:
- if smbios or VPD name a device, biosdevname should use that even in VM.
  This will let hypervisor control device naming in the future,
  making the names in guest and host match.

- udev issue in case there's no smbios and no VPD.
For the second issue clone this BZ to udev?

Comment 20 jordan hargrave 2013-01-21 20:06:28 UTC
This is a WONTFIX for biosdevname... there's just not any way of knowing what NIC the ethX device in the guest is actually referring to on the physical hardware.

What happens if you dynamically add a NIC once in QEMU?  QEMU BIOS will already have built type 9/41 structures at boot time, these are not dynamic.

biosdevname should always be disabled on VMs... if it is not it is due to a bug/feature in cpuid that isn't recording hypervisor properly. I can add another test to disable on KVM as well if there is a way to determine this from within the guest?

Comment 21 jordan hargrave 2013-01-21 20:07:32 UTC
The only option would be to name NICs as something else. vm0, vm1, etc.

Comment 22 Michael S. Tsirkin 2013-01-21 22:39:29 UTC
if a NIC has a VPD then this is a very good reliable way to know what does
the NIC refer to on physical hardware.
what you say about hotplug applies to actual hardware as well.
In other words there is no real reason to special-case
VMs that I can see.

Comment 23 Paolo Bonzini 2013-01-24 09:41:24 UTC
> What happens if you dynamically add a NIC once in QEMU?  QEMU BIOS will 
> already have built type 9/41 structures at boot time, these are not dynamic.

You can create type 9/41 structures for empty slots, and populate the corresponding PCI slots later with hotplug.

Comment 24 Fedora End Of Life 2013-04-03 16:28:35 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 25 Michael S. Tsirkin 2013-06-20 06:53:55 UTC
To Comment 10:
> So, rather than present non-intuitive names for devices, the common case of
> which there is but a single NIC, and absent a better way to get information
> from the VM BIOS for all the different virt platforms, I punted,
> and put in the running_in_vm() check.

So this makes it impossible for hypervisor

Comment 26 Michael S. Tsirkin 2013-06-20 07:22:16 UTC
Also, would it work to simply fail if there are
no known NICs instead of checking biosdevname is running on HV?
This way a HV can fix the problem by updating bios or whatever.
E.g. like the below (also attached):

diff -rup biosdevname-0.4.1-bak/src/bios_dev_name.c biosdevname-0.4.1/src/bios_dev_name.c
--- biosdevname-0.4.1-bak/src/bios_dev_name.c   2013-06-20 10:09:15.989303465 +0300
+++ biosdevname-0.4.1/src/bios_dev_name.c       2013-06-20 10:19:36.593337106 +0300
@@ -153,8 +153,6 @@ int main(int argc, char *argv[])
 
        if (!running_as_root())
                exit(3);
-       if (running_in_virtual_machine())
-               exit(4);
        cookie = setup_bios_devices(opts.namingpolicy, opts.prefix);
        if (!cookie) {
                rc = 1;
diff -rup biosdevname-0.4.1-bak/src/naming_policy.c biosdevname-0.4.1/src/naming_policy.c
--- biosdevname-0.4.1-bak/src/naming_policy.c   2013-06-20 10:09:15.988303465 +0300
+++ biosdevname-0.4.1/src/naming_policy.c       2013-06-20 10:18:34.041333715 +0300
@@ -13,7 +13,7 @@
 #include "state.h"
 #include "dmidecode/dmidecode.h"
 
-static void use_all_ethN(const struct libbiosdevname_state *state)
+static int use_all_ethN(const struct libbiosdevname_state *state)
 {
        struct bios_device *dev;
        unsigned int i=0;
@@ -26,9 +26,10 @@ static void use_all_ethN(const struct li
                        dev->bios_name = strdup(buffer);
                }
        }
+       return 0;
 }
 
-static void use_physical(const struct libbiosdevname_state *state, const char *prefix)
+static int use_physical(const struct libbiosdevname_state *state, const char *prefix)
 {
        struct bios_device *dev;
        char buffer[IFNAMSIZ];
@@ -37,6 +38,7 @@ static void use_physical(const struct li
        char interface[IFNAMSIZ];
        unsigned int portnum=0;
        int known=0;
+       int status=-1;
        struct pci_device *vf;
 
        list_for_each_entry(dev, &state->bios_devices, node) {
@@ -88,9 +90,11 @@ static void use_physical(const struct li
                        if (known) {
                                snprintf(buffer, sizeof(buffer), "%s%s%s", location, port, interface);
                                dev->bios_name = strdup(buffer);
+                               status = 0;
                        }
                }
        }
+       return status;
 }
 
 
@@ -99,11 +103,11 @@ int assign_bios_network_names(const stru
        int rc = 0;
        switch (policy) {
        case all_ethN:
-               use_all_ethN(state);
+               rc = use_all_ethN(state);
                break;
        case physical:
        default:
-               use_physical(state, prefix);
+               rc = use_physical(state, prefix);
                break;
        }

Comment 27 Michael S. Tsirkin 2013-06-20 07:23:35 UTC
Created attachment 763294 [details]
patch disabling running-on-hypervisor check, instead check any devices have persistent names

Comment 28 Václav Pavlín 2013-07-10 09:21:26 UTC
Jordan, could you please check Michaels patch?

Comment 29 jordan hargrave 2013-08-08 16:03:01 UTC
VPD still doesn't tell us what slot number the device is in.. it only tells us the port number.  Even if VPD is implemented in KVM.  I don't think this patch is going to work.

Comment 30 Michael S. Tsirkin 2013-08-08 18:55:40 UTC
General rule: everything that real hardware implements, KVM can implement.
Even if it doesn't today, this does not mean it does not make sense
in a VM - just that emulation is not perfect.
So please don't put the logic "if I'm in VM assume xyz is not
implemented" anywhere.
Always code it up if "xyz is not implemented".

Comment 31 Fedora End Of Life 2015-01-09 17:30:56 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 32 Fedora End Of Life 2015-02-17 14:36:22 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.