Bug 1804207

Summary: Libguestfs relies on /dev/sdX device enumeration order, kernel no longer enumerates them in order
Product: [Community] Virtualization Tools Reporter: Richard W.M. Jones <rjones>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: lersek, mplch, ptoscano, tburke, yoguo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 10:14:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1785415    

Description Richard W.M. Jones 2020-02-18 12:57:38 UTC
Description of problem:

In libguestfs, it's an unfortunate (in hindsight) ABI that we rely on
the order that /dev/sdX devices are enumerated being identical to the
order that the devices appear in the libvirt XML.  For example, that
the first device appears as /dev/sda, the second as /dev/sdb and so
on.

This was true until fairly recently.  What changed (in Linux) was
that it now does asynchronous polling:

- [drivers] driver core: Probe devices asynchronously instead of the driver
(Jeff Moyer) [1724965]
- [drivers] device core: Consolidate locking and unlocking of parent and device 
(Jeff Moyer) [1724965]
- [drivers] driver core: Establish order of operations for device_add and
device_del via bitflag (Jeff Moyer)
- [drivers] driver core: Add missing dev->bus->need_parent_lock checks (Jeff
Moyer) [1724965]
- [drivers] driver core: Move async_synchronize_full call (Jeff Moyer) [1724965]

This means that devices are no longer created in order, the
/dev/sdX name can change from boot to boot.

This also affects supermin when it's looking for the root
device (see bug 1803191 for an example).  I don't know yet if
we should file a separate bug for supermin.

Version-Release number of selected component (if applicable):

libguestfs 1.41.8

How reproducible:

Quite infrequent, but easier to reproduce if you add a large
number of disks (eg. > 100).

Steps to Reproduce:

We actually have a regression test that picks this up, see:

https://github.com/libguestfs/libguestfs/blob/56834875b25a604983b1aa90b15a01e6cc22c9bc/tests/disks/test-add-disks.c#L311

(Thanks Vitaly Kuznetsov for bug analysis)

Comment 1 Richard W.M. Jones 2020-02-18 12:59:47 UTC
*** Bug 1803191 has been marked as a duplicate of this bug. ***

Comment 2 Richard W.M. Jones 2020-02-20 14:52:06 UTC
First part of the fix is:
https://www.redhat.com/archives/libguestfs/2020-February/msg00220.html

This isn't quite the whole story.  It does appear that we will need to
either modify supermin or else modify libguestfs to supply the
root=UUID=XXX parameter to supermin.  See also this commit in supermin:
https://github.com/libguestfs/supermin/commit/cd5281beed0af7b57473e36f6fa275eaecde4f09