2038105 – RFE: Deliver virt-p2v as a UEFI bootable image

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2038105 - RFE: Deliver virt-p2v as a UEFI bootable image

Summary: RFE: Deliver virt-p2v as a UEFI bootable image

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	virt-p2v
Sub Component:
Version:	9.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Laszlo Ersek
QA Contact:	mxie@redhat.com
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1507872 (view as bug list)
Depends On:
Blocks:	1609618 1945835
TreeView+	depends on / blocked

Reported:	2022-01-07 10:05 UTC by tingting zheng
Modified:	2022-09-02 10:05 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-02 10:05:03 UTC
Type:	Feature Request
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1507872	1	medium	CLOSED	Physical machine can't boot into virt-p2v client when machine's BIOS is on UEFI mode	2022-01-07 14:39:34 UTC
Red Hat Bugzilla	1647915	1	None	None	None	2022-01-13 10:15:41 UTC
Red Hat Issue Tracker	RHELPLAN-107083	0	None	None	None	2022-01-07 10:10:19 UTC
Red Hat Issue Tracker	RHELPLAN-118637	0	None	None	None	2022-04-12 10:17:38 UTC

Description tingting zheng 2022-01-07 10:05:56 UTC

Description of problem:
RFE: Deliver virt-p2v as a UEFI binary.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Use virt-p2v-make-disk to make UEFI boot image into usb.
2. Boot virt-p2v client from image created from step 1.
3.

Actual results:


Expected results:
Virt-p2v client can be booted successfully.

Additional info:

Comment 2 Laszlo Ersek 2022-01-07 10:37:53 UTC

https://listman.redhat.com/archives/libguestfs/2022-January/msg00021.html

Comment 3 mxie@redhat.com 2022-01-07 13:16:06 UTC

There is a similar bug 1507872

Comment 4 Laszlo Ersek 2022-01-07 14:22:35 UTC

(In reply to mxie from comment #3)
> There is a similar bug 1507872

Ming Xie, that's a good catch, and I'm tempted to close this one as a duplicate of bug 1507872. However, both BZs target different major releases: RHEL8 vs. RHEL9, so closing this one immediately might not be good.

(Rich indicated previously in <20220107110827.GJ1127> that building virt-p2v on an older RHEL major release had benefits, so closing this BZ as a duplicate (or even as "WONTFIX"?) could be just right...)

Comment 5 Richard W.M. Jones 2022-01-07 14:32:59 UTC

*** Bug 1507872 has been marked as a duplicate of this bug. ***

Comment 6 Laszlo Ersek 2022-01-11 16:25:42 UTC

I don't think that this can (or *should*) be solved exclusively within "virt-p2v-make-disk":

- "virt-p2v-make-disk" is a wrapper around virt-builder. Turning a BIOS guest into a UEFI guest (after installation) is an unholy mess (which could potentially require re-partitioning -- the addition of the UEFI system partition!)

- As an example, "virt-builder --notes fedora-32" references "builder/templates/make-template.ml" from the guestfs-tools project, as the program that generated the base disk image.

- One step of "builder/templates/make-template.ml" (in guestfs-tools) is to invoke virt-install. This step is customized by "needs_uefi" (indirectly). Currently, "needs_uefi" is determined automatically, and based solely on the guest architecture -- ATM, only Aarch64 and Armv7 set "needs_uefi" to true, and everything else gets "false".

So, IMO, we first need to extend virt-builder so that it permit the user to force UEFI, with a command line option. In "make-template.ml", we also need to extend "make_index_fragment", to save the "UEFI-based or not" statement in a new, standalone key, in the index fragment. Perhaps the "UEFI-basedness" needs to be captured in the filename of the generated base image, too, using some automatic scheme.

Then we can build some UEFI templates (adding them to the public index).

Then virt-builder needs to be extended with a new flag, so that it *look* for a UEFI-based variant of a particular base disk (using the new key in the index fragments). Alternatively, perhaps this searching must refer only to the new filename scheme.

Finally, we should extend "virt-p2v-make-disk" with a new option. When that option is specified, the shell script shall invoke virt-builder such that virt-builder fetch a UEFI-based image, for starting out.

Rich: how does this sound? Thanks!

Comment 7 Richard W.M. Jones 2022-01-11 17:24:45 UTC

The bigger issue is that we don't expect RHEL users to build virt-p2v
themselves, we ship a selection of prebuilt images.  So what this bug is
really asking for is to prebuild an image that works on UEFI.  How we
do that is not really material for fixing this bug, so the bug description
shouldn't talk about virt-p2v-make disk at all.

_If_ we were to use virt-builder, then it does actually support UEFI
already -- eg. the aarch64 images are UEFI-based.  Nothing needs to
change in virt-builder.

But I don't think we would use it for this, we'd probably use whatever
is the current replacement for "brew spin-livecd" (which used the now
obsolete livecd-creator which was driven from a kickstart file).  I don't
know what the replacement is.  Image Builder maybe?

Comment 19 Laszlo Ersek 2022-03-02 16:36:53 UTC

Some findings:

(1) The "composer-cli" front-end can not connect to a remote osbuild-composer backend, only to a locally running systemd service. Networked access (to a hosted image builder) seems only possible via the web console that's provided by the "cockpit-composer" package -- and then the access means "web console", not CLI.

(2) The osbuild-composer package (= the image builder back-end) in Fedora 35 does not support the "image-installer" (= .iso) output format (as reported by "composer-cli compose types"). This is consistent with the "internal/distro/fedora33/distro.go" file in the upstream osbuild-composer repo, at commit 5e291bd98f79. The only distros for which the "image-installer" output seems to be enabled are:

rhel85/distro.go
rhel86/distro.go
rhel90/distro.go
rhel90beta/distro.go
test_distro/distro.go

Time to install a dedicated RHEL9 VM for this I guess. (Truth to be told, the documentation explicitly recommends this, I just thought I wouldn't need to split my development environment in yet more parts...)

Comment 20 Laszlo Ersek 2022-03-02 17:06:10 UTC

Can't seem to get this working on RHEL9 thus far (from the Nightly repo):

# dnf install osbuild-composer composer-cli
# systemctl enable --now osbuild-composer.socket

# systemctl status osbuild-composer.socket
active (listening)

# composer-cli compose types
ERROR: Types Error: Get "http://localhost/api/v1/compose/types": read unix @->/run/weldr/api.socket: read: connection reset by peer

# systemctl status osbuild-composer.socket
failed
osbuild-composer.socket: Failed with result 'service-start-limit-hit'

It was literally the first and only one request I tried to send. :/

osbuild-composer-45-1.el9.x86_64
weldr-client-35.5-2.el9.x86_64

The journal provides some more info: "Error initializing weldr API: host distro is not supported".

I think I'm stopping for (at least) today.

Comment 22 Laszlo Ersek 2022-03-03 11:44:28 UTC

The /etc/os-release file in my RHEL9 (nightly) guest identifies as "9.1". Based on the distro registry in the osbuild-composer git repo ATM, I think that's the problem -- 9.1 is not known by osbuild-composer yet.

Comment 23 Laszlo Ersek 2022-03-03 13:03:41 UTC

Managed to get "composer-cli compose types" working (and to get it print "image-installer"). These are the repo URLs to use:

http://download.devel.redhat.com/rhel-9/nightly/RHEL-9/latest-RHEL-9.0/compose/BaseOS/x86_64/os/
http://download.devel.redhat.com/rhel-9/nightly/RHEL-9/latest-RHEL-9.0/compose/AppStream/x86_64/os/

Of importance are:
- "latest-RHEL-9.0". Not "latest-RHEL-9"! (cf. comment 22)
- "nightly". "rel-eng" is insufficient: the latest 9.0 rel-eng compose does not contain weldr-client.

In other words, "rel-eng 9.0" is too old, "nightly 9.1" is too new, and "nightly 9.0" is the only one that works.

Comment 24 Laszlo Ersek 2022-03-03 13:29:48 UTC

The documentation <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9-beta/html/composing_a_customized_rhel_system_image/index> does not explain that, after creating repository overrides under /etc/osbuild-composer/repositories, the osbuild-composer service needs to be stopped with systemctl (so that the next composer-cli invocation restart it via socket activation). Without this, the overrides do not take effect.

Thus far I've managed to submit a RHEL9 compose:

[root@osbuild-composer ~]# cat /etc/osbuild-composer/repositories/rhel-90.json 
{
    "x86_64": [
        {
            "name": "baseos",
            "baseurl": "http://download.devel.redhat.com/rhel-9/nightly/RHEL-9/latest-RHEL-9.0/compose/BaseOS/x86_64/os/",
            "check_gpg": false
        },
        {
            "name": "appstream",
            "baseurl": "http://download.devel.redhat.com/rhel-9/nightly/RHEL-9/latest-RHEL-9.0/compose/AppStream/x86_64/os/",
            "check_gpg": false
        }
    ]
}

[root@osbuild-composer ~]# systemctl stop osbuild-composer

[root@osbuild-composer ~]# cat virt-p2v.toml
name = "virt-p2v"
description = "virt-p2v boot media"
version = "0.0.1"
packages = []
modules = []
groups = []
distro = "rhel-90"

[root@osbuild-composer ~]# composer-cli blueprints push virt-p2v.toml
[root@osbuild-composer ~]# rm virt-p2v.toml

[root@osbuild-composer ~]# composer-cli compose start virt-p2v image-installer
Compose 9cb7e545-99ac-4148-81cf-e0a001f03525 added to the queue

[root@osbuild-composer ~]# composer-cli compose status
9cb7e545-99ac-4148-81cf-e0a001f03525 RUNNING  Thu Mar 3 14:19:54 2022 virt-p2v        0.0.1 image-installer  

According to the docs, the ready-made image will be downloadable with 

[root@osbuild-composer ~]# composer-cli compose image    9cb7e545-99ac-4148-81cf-e0a001f03525
[root@osbuild-composer ~]# composer-cli compose logs     9cb7e545-99ac-4148-81cf-e0a001f03525
[root@osbuild-composer ~]# composer-cli compose metadata 9cb7e545-99ac-4148-81cf-e0a001f03525

I've not found a command for watching the compose progress interactively.

Comment 26 Laszlo Ersek 2022-03-03 13:54:45 UTC

From browsing the documentation, I'm getting the following impression:

- we'll have to add the virt-p2v systemd service through a separate
  subpackage of virt-p2v

- according to comment 17, the service should be enabled as vendor
  preset at once (or we can modify the blueprint to enable it;
  [customizations.services])

- I think there is no way to pass an RPM on the command line, so we'll
  have to expose it through a dedicated / separate, networked repository

- this repository will have to be enabled (exposed to osbuild-composer)
  as a repository override, in
  "/etc/osbuild-composer/repositories/rhel-90.json"

- the blueprint will have to specify the subpackage by name, in
  [[packages]]


Beyond that, I expect there could be a conflict between the virt-p2v
service (which we want to execute on the bare metal host, from the ISO,
after boot), and Anaconda. Chapter 7. ("CREATING A BOOT ISO INSTALLER
IMAGE WITH IMAGE BUILDER") says:

> You can use Image Builder to create bootable ISO Installer images.
> These images consist of a tarball that contains a root file system.
> You can use the bootable ISO image to install the file system to a
> bare metal server.
>
> Image Builder builds a manifest that creates a boot ISO that contains
> the commit and a root file system.
>
> To create the ISO image, choose the new image type image-installer.
> Image Builder builds a .tar file, which contains:
> - a standard Anaconda installer ISO
> - an embedded RHEL system tarball
> - a default kickstart file that installs the commit with minimal
>   default requirements
>
> The created installer ISO image embeds a pre-configured system image
> that you can install directly to a bare metal server.

This is definitely not what we want -- we don't want Anaconda to even
start. It can be placed on the ISO (its presence is no issue), but the
virt-p2v service is what should be launched, not Anaconda.

... If we can identify the systemd service that starts Anaconda, we
could actually try disabling it, in [customizations.services]!

... Anyway my first image seems to have been built, so next, I'll
attempt to boot it in a virtual machine, and see what it does.

Comment 28 Christian Kellner 2022-03-03 14:07:35 UTC

(In reply to Laszlo Ersek from comment #26)
> Beyond that, I expect there could be a conflict between the virt-p2v
> service (which we want to execute on the bare metal host, from the ISO,
> after boot), and Anaconda. Chapter 7. ("CREATING A BOOT ISO INSTALLER
> IMAGE WITH IMAGE BUILDER") says:

I think what we need to do is create a new "image type" in 
  https://github.com/osbuild/osbuild-composer/tree/main/internal/distro/rhel90
based on something like 
  https://github.com/osbuild/osbuild-composer/blob/a351a17f5ecb96782b81c8af2eb168899767ef21/internal/distro/rhel90/distro.go#L1332
with a modified version of
  https://github.com/osbuild/osbuild-composer/blob/f0522e758b34c21aed56327b71142dc9c06c17d9/internal/distro/rhel90/pipelines.go#L215

Comment 31 Laszlo Ersek 2022-03-10 13:49:07 UTC

(In reply to Laszlo Ersek from comment #24)

> [root@osbuild-composer ~]# composer-cli compose image 9cb7e545-99ac-4148-81cf-e0a001f03525

This installer image boots, and contains a small / stripped down Anaconda instance.

The OS that it installs does not boot correctly ("can't mount root filesystem"). (Not that we need that, as we want virt-p2v rather than Anaconda; I just figured I'd test it given that I had already booted the installer ISO...)

Comment 33 Laszlo Ersek 2022-04-12 09:31:51 UTC

https://github.com/livecd-tools/livecd-tools

Comment 45 Laszlo Ersek 2022-04-22 08:34:10 UTC

It's a bug in livecd-creator.

At some point, livecd-creator switched to lazy unmounting (in case
normal umount fails). As far as I understand the git log, this was done
in order to mitigate the exact "nasty failure modes" that you documented
in the make-kickstart manual, for example, leaving the system running
livecd-creator with loop mounts hanging around, in case livecd-creator
is interrupted or it fails for some reason.

Unfortunately, this introduces an actual bug.

I've done two things. First, I've built a new ISO. I monitored the
install_root directory while livecd-creator was rnning. All the
base64-decoded files were created correctly *in that directory*.
However, livecd-creator, in the other terminal, complained about having
to lazy-unmount the filesystem.

That gave me an idea: I reviewed the log from comment 42 (where I
certainly didn't touch the install_root directory while livecd-creator
was running). And that log describes the same issue:

> umount: /var/tmp/imgcreate-s0a1sw7w/install_root/var/cache/dnf: target is busy.
> Unable to unmount /var/tmp/imgcreate-s0a1sw7w/install_root/var/cache/dnf normally, using lazy unmount
> lazy umount succeeded on /var/tmp/imgcreate-s0a1sw7w/install_root/var/cache/dnf
> Unmounting directory /var/tmp/imgcreate-s0a1sw7w/install_root
> umount: /var/tmp/imgcreate-s0a1sw7w/install_root: target is busy.
> Unmounting directory /var/tmp/imgcreate-s0a1sw7w/install_root failed, using lazy umount
> lazy umount succeeded on /var/tmp/imgcreate-s0a1sw7w/install_root
> Losetup remove /dev/loop0
> Checking filesystem /var/tmp/imgcreate-s0a1sw7w/tmp-68k6mkfd/ext3fs.img
> e2fsck 1.46.3 (27-Jul-2021) _livecd-p2v-x86_: recovering journal

It's nearly obvious in retrospect: we lose the contents of our
base64-decoded files because the filesystem is corrupted. Namely:

(1) Something keeps the "install_root" directory tree busy (could check
with "fuser -v"). Because of this, umount fails, so livecd-creator falls
back to "umount -l" (quite wrongly, in my opinion, as that's what masks
the actual issue!)

(2) "losetup -d" *succeeds* on /dev/loop0, even though the filesystem is
*not* unmounted just yet -- and that's because, ever helpfully, "losetup
-d" *default* to a lazy-detach! I got this in the losetup manual even on
RHEL7:

>   -d, --detach loopdev...
>          detach  the  file  or device associated with the specified
>          loop device(s). Note that since Linux v3.7 kernel uses "lazy
>          device destruction".  The detach operation does not return
>          EBUSY error  anymore  if device is actively used by system,
>          but it is marked by autoclear flag and destroyed later.

So whether or not the /dev/loop0 block device disappears at this point,
the backing file "ext3fs.img" is *still in use* when we advance to the
next step.

(3) e2fsck is started on "ext3fs.img", and the first thing e2fsck
reports is that the FS was not cleanly unmounted: "recovering journal".
That's most likely how we lose the base64-decoded stuff, as those are
the last writes to the filesystem (that we know of!) near the end of the
kickstart file.


I don't know if we can force "losetup -d" to *wait*, maybe that's not
possible anymore, but the lazy umount is totally a huge bug in
livecd-creator. "Gosh something I started is still using the filesystem,
who cares, let me just run fsck underneath".

I guess I'll try to set up livecd-creator from source and poke around in
it. The lazy umount should not be necessary at all. If some background
process is justifiedly using the install_root dir tree at the time, then
livecd-creator must wait for it.

Comment 47 Laszlo Ersek 2022-04-22 08:50:34 UTC

Livecd-tools commit 118092c8a287 ("Try normal umount before falling back to lazy umount.", 2010-09-11) confirms that the lazy-umount stuff is a last resort, and the related log messages are supposed to inform the user that something has gone wrong. Well in that case I don't understand why they just didn't leave livecd-creator to *FAIL* in the first place?

Earlier livecd-tools commit 73bd6807678f ("Do lazy umounts to reduce the impact of umount issues.", 2010-07-24) fingers bug 617844 as the original issue that lazy-umount was supposed to work around.

The latter commit also points at bug 509427, which describes about the same symptoms that we're now experiencing.

Comment 48 Laszlo Ersek 2022-04-22 08:51:15 UTC

My plan is to insert a "fuser -v" into livecd-creator...

Comment 49 Laszlo Ersek 2022-04-22 09:15:42 UTC

Known upstream bug: https://github.com/livecd-tools/livecd-tools/issues/221

Comment 50 Laszlo Ersek 2022-04-22 09:44:00 UTC

Suspect: DNF library bug that breaks livecd-creator.

I've added a time.sleep(3600) call to "imgcreate/fs.py" right after the first *eager* umount's failure, and when I hit that, I ran lsof.

The first umount failure refers to "/var/tmp/imgcreate-5k_smx6c/install_root/var/cache/dnf", and that directory is even bind-mounted into the "/var/tmp/imgcreate-5k_smx6c/install_root" tree (which is itself a loop-mounted ext4 fs).

At this point, lsof prints:

COMMAND PID TID TASKCMD USER FD  TYPE DEVICE SIZE/OFF   NODE NAME
python3 887             root  7r  REG    8,1 25320843 655370 /var/tmp/imgcreate-5k_smx6c/install_root/var/cache/dnf/fedora.solv (deleted)
python3 887             root  8r  REG    8,1 52531426 655389 /var/tmp/imgcreate-5k_smx6c/install_root/var/cache/dnf/fedora-filenames.solvx

I believe that these are DNF dependency solver work files. Because the PID in question (887) does not belong to a standalone dnf command, but to "livecd-creator", I think that livecd-creator uses a DNF library, and the failure to close the depsolver work files is in that library (or else, livecd-creator does not invoke the proper libsolver API to release the files).

Comment 51 Laszlo Ersek 2022-04-22 09:50:01 UTC

Would not be the first time where the DNF library forgets about references:
https://github.com/mitogen-hq/mitogen/issues/280
https://bugzilla.redhat.com/show_bug.cgi?id=1594016

Comment 52 Laszlo Ersek 2022-04-22 10:37:57 UTC

This is how the file descriptors get lost:

(stracing open,openat,dup2,dup,close):

openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora.solv", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora-8d2b126b5dd5700c/repodata/repomd.xml", O_RDONLY) = 6
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora-8d2b126b5dd5700c/repodata/ab3b204e0ad65172b5de99e6077d3ff1379d63b955f98d5043eede346bb9dd3a-primary.xml.zck", O_RDONLY) = 7
close(6)                                = 0
close(7)                                = 0
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora.solv.1K06hS", O_RDWR|O_CREAT|O_EXCL, 0600) = 6
close(6)                                = 0
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora.solv.1K06hS", O_RDONLY) = 6
dup(6)                                  = 7
close(6)                                = 0
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora-filenames.solvx", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora-8d2b126b5dd5700c/repodata/2d456c5f8aed8acb7aa323deec76ba2d6e8b9cc1144aaad24f01fb0fecaab80d-filelists.xml.zck", O_RDONLY) = 6
close(6)                                = 0
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora-filenames.solvx.L4J0HM", O_RDWR|O_CREAT|O_EXCL, 0600) = 6
close(6)                                = 0
openat(AT_FDCWD, "/var/tmp/imgcreate-k82ktf3s/install_root/var/cache/dnf/fedora-filenames.solvx.L4J0HM", O_RDONLY) = 6
dup(6)                                  = 8
close(6)                                = 0

Comment 53 Laszlo Ersek 2022-04-22 12:58:23 UTC

First, let me explain the code path to the problematic dup() call. We
leak file descriptors to both "fedora.solv.1K06hS" and
"fedora-filenames.solvx.L4J0HM", but the outer parts of those call
chains are nearly identical, so it suffices to investigate only one.
This is with libdnf @ ddf5adfc31fb and libsolv @ 5e1ac6aaa91c.

So we start in libdnf "libdnf/libdnf/dnf-sack.cpp", either function
write_ext() -- for "fedora-filenames.solvx" -- or function write_main()
-- for "fedora.solv". Both functions follow the same high-level pattern.
I'll use write_main() here.

  write_main()                            [libdnf/libdnf/dnf-sack.cpp]
    mkstemp()
    fdopen("w+")
    fclose()
    /* [1] */
    fp = fopen("r")
    repo_add_solv(fp)                     [libsolv/src/repo_solv.c]
      repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
        dup()
    fclose(fp)

[1] The three lines above this mark are consistent with the
openat(O_RDWR|O_CREAT|O_EXCL) and the subsequent close() in comment#52.

Basically write_main() in libdnf calls repo_add_solv() from libsolv,
passing it the (FILE*) stdio stream that has the repository file open.

Here libdnf has absolutely no idea that internally to libsolv,
dup(fileno(fp)) is called, and the resultant file descriptor is stored
in a "Repopagestore.pagefd" field, for some performance-related black
magic (paging or whatnot, doesn't matter here).

Then, after repo_add_solv() returns to write_main(), libdnf calls
fclose(fp), and remains blissfully ignorant that the file *remains open*
in the same process.

The file descriptor should be closed when the "DNF Sack" object, derived
from GObject, is released:

dnf_sack_finalize()            [libdnf/libdnf/dnf-sack.cpp]
  pool_free()                  [libsolv/src/pool.c]
    pool_freeallrepos()        [libsolv/src/pool.c]
      repo_freedata()          [libsolv/src/repo.c]
        repodata_freedata()    [libsolv/src/repodata.c]
          repopagestore_free() [libsolv/src/repopage.c]
            close()

But in practice, it does not happen.

Comment 54 Laszlo Ersek 2022-04-22 13:31:38 UTC

dnf_sack_finalize() does get called, but store->pagefd in repopagestore_free() is (-1).

Comment 55 Laszlo Ersek 2022-04-22 15:52:47 UTC

OK, after debugging for hours with gdb, I think I'm closer to the bug in libdnf.

There are *three* calls altogether to dnf_sack_init(), and therefore to pool_create(). The pools created in order are:

#1: (Pool *) 0x555555df0ad0
#2: (Pool *) 0x555555d09320
#3: (Pool *) 0x555555ee3840

But pool_free() is only ever called for the last one (#3).

At that time, each of these pools has 1 repository attached (don't be misguided by the nrepos values below -- they are 1-based, not 0-based):

(gdb) print ((Pool *) 0x555555df0ad0)->nrepos
$103 = 2
(gdb) print ((Pool *) 0x555555d09320)->nrepos
$104 = 2
(gdb) print ((Pool *) 0x555555ee3840)->nrepos
$105 = 2

The repodata counts in these repositories differ. #1 has 2, #2 has none, #3 has 1 (again, these are 1-based values):

(gdb) print ((Pool *) 0x555555df0ad0)->repos[1]->nrepodata
$106 = 3
(gdb) print ((Pool *) 0x555555d09320)->repos[1]->nrepodata
$107 = 0
(gdb) print ((Pool *) 0x555555ee3840)->repos[1]->nrepodata
$108 = 2

Now, the next step would be to print the following field:

  ((Pool *) whatever)->repos[1]->repodata[N]->store.pagefd

where N ranges from 1 to 2 inclusive for Pool#1, it is skipped entirely for Pool#2 (no repodata at all), and N is 1 for Pool#3.

Unfortunately, due to some very tricky shenanigans with the "LIBSOLV_INTERNAL" macro, the repodata field (a pointer to an array) is actually hidden from GDB. I don't know how that happens. However, we can brute-force our way:


(gdb) print *(int *)((char*)(*(Repodata **)(((char *)((Pool *) 0x555555df0ad0)->repos[1]) + 80)) + 1 * 472 + 296)
$109 = 7
(gdb) print *(int *)((char*)(*(Repodata **)(((char *)((Pool *) 0x555555df0ad0)->repos[1]) + 80)) + 2 * 472 + 296)
$110 = 8

These are the leaked file descriptors. They belong to the one repo in Pool#1.


(gdb) print *(int *)((char*)(*(Repodata **)(((char *)((Pool *) 0x555555ee3840)->repos[1]) + 80)) + 1 * 472 + 296)
$111 = -1

And this belongs to Pool#3. This is the *only* pool upon which pool_free() is called (via dnf_sack_finalize()), and because *this is not the pool* where we actually have any file descriptors open, pool_free() does nothing here. This is the explanation for my comment 54, and for the fact that fds 7 and 8 are still leaked.

In summary: for the pool (Pool#1) that is used with write_ext() and write_main(), dnf_sack_finalize() is NEVER CALLED. That's the leak.

Because dnf_sack_finalize() is a GObject destructor, the bug is almost certainly in the livecd-creator python code, *OR* the python bindings for libdnf. The reference count is not decreased somewhere on the DNF Sack in question, and so the repos of the underlying Pool are not cleaned up.

Comment 56 Laszlo Ersek 2022-04-23 14:12:05 UTC

Mistery solved, as far as I can tell.

Consider the following call chain:

  install()               [livecd-tools/imgcreate/creator.py]
    dbo.close             [livecd-tools/imgcreate/dnfinst.py]
      dnf.Base.close      [dnf/dnf/base.py]
        reset(sack=True,  [dnf/dnf/base.py]
              repos=True,
              goal=True)
          _sack = None
          _goal = None
          _transaction = None

I expected the dnf_sack_finalize() call -- which would further lead to
the closure of the file descriptors, refer to comment 53 -- to occur
immediately on assigning None to "_sack".

However, I found that the reference count of the object pointed to by
"_sack", just before the assignment, was *not* 1. It was a much higher
number, namely:

  1 + 1 + package_count

I found this out by debugging livecd-creator with the "pdb" module, and
checking sys.getrefcount() on the various fields of the "dbo" object
("_sack", "_goal", "_transaction" etc).

Here's how that reference count comes to exist.

(a) The first reference comes from "_sack" itself:

  install()                [livecd-tools/imgcreate/creator.py]
    fill_sack()            [dnf/dnf/base.py]
      _build_sack()        [dnf/dnf/sack.py]
        Sack()
          sack_init()      [libdnf/python/hawkey/sack-py.cpp]
            dnf_sack_new() [libdnf/libdnf/dnf-sack.cpp]

(b) The second reference on "_sack" comes from "_goal" -- it is a
back-reference:

  install()                 [livecd-tools/imgcreate/creator.py]
    fill_sack()             [dnf/dnf/base.py]
       _goal = Goal(_sack)
         goal_init()        [libdnf/python/hawkey/goal-py.cpp]
           Py_INCREF(_sack)

(c) Then there is one reference to "_sack" *per package* in the
transaction. It's really difficult to track down, but it's goes like
this:

  install()                           [livecd-tools/imgcreate/creator.py]
    runInstall()                      [livecd-tools/imgcreate/dnfinst.py]
      resolve()                       [dnf/dnf/base.py]
        _goal2transaction()           [dnf/dnf/base.py]
          list_installs()             [libdnf/python/hawkey/goal-py.cpp]
            list_generic()            [libdnf/python/hawkey/goal-py.cpp]
              packagelist_to_pylist() [libdnf/python/hawkey/iutil-py.cpp]
                new_package()         [libdnf/python/hawkey/sack-py.cpp]
                  Py_BuildValue()
          ts.add_install()

list_installs() creates a list of packages that need to be installed by
DNF. In the loop body, which iterates over that list, Py_BuildValue()
uses the "O" format specifier, which increases the reference count on
"_sack".

Subsequently, there is a loop in Python language, in the
_goal2transaction() method, that iterates over the package list created
by list_installs(), and adds each package to the transaction
(ts.add_install()). After _goal2transaction() returns, this transaction
is assigned to "self._transaction".


So the big picture is this:

- the _sack field is the primary reference to the sack object

- the _goal field adds 1 back-reference to the sack object

- the _transaction field contains N packages, and each package adds one
  reference to the sack object

And to quote the defintion of a sack object
("libdnf/docs/hawkey/tutorial-py.rst"): "*Sack* is an abstraction for a
collection of packages."

That's how we have circular references! When the reset() method runs
(see again at the top):

- the primary reference to "_sack" disappears,

- the secondary reference to "_sack", via "_goal", also disappears,

- the "_transaction" field's reference to the packages disappears,

- however, the (now disappeared) sack's references to the packages
  remain in place, and each package's back-reference to the sack remains
  in place as well.

This "one sack -- many packages" set of cross-references
(which is basically a big set of reference cycles) remains in place,
and the whole think is unreachable from any live python object.

That's what is cleaned up, as I've learned, by Python's "generational
garbage collector" <https://stackify.com/python-garbage-collection/>.

The problem is that the generational garbage collector runs *too late*,
and so the file descriptors owned by the sack object hang around for too
long -- unpredictably long, even.

I found this out when I single-stepped the reset() method, even assigned
None to various fields manually, *slowly*, and in a different terminal,
I suddenly noticed that "lsof" reported the dangling file descriptors as
gone -- it seemed like the bug suddenly stopped reproducing.

The generational garbage collector's *unpredictable* activation is
alright, as long as it is only responsible for *memory*. But that's not
the case here!

The following patch forces a garbage collection after the above-noted
explicit references are removed, and so it makes the problem go away:

> diff --git a/imgcreate/creator.py b/imgcreate/creator.py
> index 491ad25ca7c3..6cb4ba46ccc7 100644
> --- a/imgcreate/creator.py
> +++ b/imgcreate/creator.py
> @@ -32,6 +32,7 @@ import tempfile
>  import shutil
>  import logging
>  import subprocess
> +import gc
>
>  import selinux
>  import dnf
> @@ -728,6 +729,7 @@ class ImageCreator(object):
>              raise CreatorError("Unable to install: %s" % (e,))
>          finally:
>              dbo.close()
> +            gc.collect()
>              os.unlink(dnf_conf)
>
>          # do some clean up to avoid lvm info leakage.  this sucks.

With this, the P2V ISO (BIOS build) actually boots to the GUI for me. :)

Now, the gc.collect() actually belongs to the very end of
dnf.Base.close(), after reset()! That's when it's possible to clean up
the file descriptors, and that's when they *should* be, regardless of
the DNF library client (livecd-creator or anything else).

We can still try to patch livecd-creator as well: the whole lazy-unmount
stuff should be removed. That can never be right, as demonstrated in the
earlier comments (it leads to filesystem corruption).

I'll try to submit upstream PRs sometime soon.

Comment 57 Laszlo Ersek 2022-04-23 14:16:55 UTC

Hmmm, an even better location for the gc.collect() call is the reset() method in "dnf/dnf/base.py", namely at the end of the method. reset() is called from more spots than just close(); and it is documented explicitly as "Make the Base object forget about various things". Because the paging file descriptors of libsolv are among those various things, that's where we need to force a garbage collection.

Comment 59 Laszlo Ersek 2022-04-24 06:03:05 UTC

BTW the SELinux failures (AVCs) we've also seen when booting the ISO created by livecd-tools are arguably rooted in the same issue. livecd-creator runs "setfiles", and setfiles writes a large amount of data (SELinux attributes / contexts) to the filesystem, before livecd-tools attempts to unmount the filesystem. When the unmount fails (and the lazy unmount and the lazy loop device unassigment do nothing useful) and the regular file underlying the loop block device is fsck'd (pun intended), and then captured into the ISO, those SELinux contexts have not been written out yet.

Comment 61 Laszlo Ersek 2022-04-24 13:39:44 UTC

Comprehensively, deterministically testing the DNF fix is very
difficult!

Consider the following test matrix ("fail" means that the first umount
attempt fails, and livecd-creator falls back to "lazy" umount -- which
in itself is wrong, and a misfeature):

#  livecd-tools           dnf                               result
-  ---------------------  --------------------------------  ---------
1  28.3-4.fc35.x86_64     4.11.1-2.fc35.noarch              fail
2  28.3-4.fc35.x86_64     upstream @ 96b08be69050           fail
3  28.3-4.fc35.x86_64     upstream @ 96b08be69050 + my fix  fail
4  upstream 4afde9352e82  4.11.1-2.fc35.noarch              pass/fail
5  upstream 4afde9352e82  upstream @ 96b08be69050           pass/fail
6  upstream 4afde9352e82  upstream @ 96b08be69050 + my fix  pass

The result in case #3 was confusing. I expected that my fix would work
there, and it didn't.

Reason for that: a genuine, independent bug in livecd-tools that
triggers the exact same symptom, but from a different origin. Namely,
BZ#2007045 has been solved upstream (in the latest commit at current
master: 4afde9352e82) but the fix has not been backported to Fedora 35
yet. With this bug present in Fedora's livecd-tools
(28.3-4.fc35.x86_64), the effect of my DNF fix is masked.

Then, the results in cases #4 and #5 were even more confusing: the
original symptom, which I had relied on through all my debugging,
suddenly stopped reproducing!

My hunch here was that the GC started behaving differently, and ended up
freeing the DnfSack object (tied up in circular references) that owned
the file descriptors *just in time* for livecd-creator to successfully
umount the filesystem. In order to test this hypothesis, I disabled
automatic garbage collection in livecd-tools (on top of commit
4afde9352e82), like this:

> diff --git a/tools/livecd-creator b/tools/livecd-creator
> index 291de10cbbf9..8d2c740c238b 100755
> --- a/tools/livecd-creator
> +++ b/tools/livecd-creator
> @@ -31,6 +31,8 @@ from dnf.exceptions import Error as DnfBaseError
>  import imgcreate
>  from imgcreate.errors import KickstartError
>
> +import gc
> +
>  class Usage(Exception):
>      def __init__(self, msg = None, no_error = False):
>          Exception.__init__(self, msg, no_error)
> @@ -261,5 +263,6 @@ def do_nss_libs_hack():
>      return hack
>
>  if __name__ == "__main__":
> +    gc.disable()
>      hack = do_nss_libs_hack()
>      sys.exit(main())

This was indeed sufficient to reliably reproduce the file descriptor
leak, and to show that my DNF fix was correct.

Comment 62 Laszlo Ersek 2022-04-24 15:00:35 UTC

- force garbage collection on dnf.Base.reset()
  https://github.com/rpm-software-management/dnf/pull/1825

- imgcreate/fs.py: abort on hard-unmount failure
  https://github.com/livecd-tools/livecd-tools/pull/227

Comment 63 Laszlo Ersek 2022-04-26 13:38:57 UTC

[p2v PATCH] virt-p2v-make-kickstart: add packages for making the P2V ISO UEFI-bootable
Message-Id: <20220426133823.15335-1-lersek>
https://listman.redhat.com/archives/libguestfs/2022-April/028749.html

Comment 64 Laszlo Ersek 2022-04-27 08:47:04 UTC

(In reply to Laszlo Ersek from comment #62)

> - imgcreate/fs.py: abort on hard-unmount failure
>   https://github.com/livecd-tools/livecd-tools/pull/227

upstream livecd-tools commit 9095966dd092 ("imgcreate/fs.py: abort on hard-unmount failure", 2022-04-26); part of release "livecd-tools-30.0".

(In reply to Laszlo Ersek from comment #63)
> [p2v PATCH] virt-p2v-make-kickstart: add packages for making the P2V ISO UEFI-bootable
> Message-Id: <20220426133823.15335-1-lersek>
> https://listman.redhat.com/archives/libguestfs/2022-April/028749.html

upstream virt-p2v commit 6a16343eba27.

Comment 65 Laszlo Ersek 2022-04-28 13:46:45 UTC

(In reply to Laszlo Ersek from comment #62)
> - force garbage collection on dnf.Base.reset()
>   https://github.com/rpm-software-management/dnf/pull/1825

upstream dnf commit range c1a407eba323..065c32efb8b1.

Comment 68 Laszlo Ersek 2022-05-09 15:30:19 UTC

*** Bug 1609618 has been marked as a duplicate of this bug. ***

Comment 74 Richard W.M. Jones 2022-05-10 09:25:01 UTC

FYI virt-p2v 1.42.1 has been released upstream and is available in Fedora
(not as an ISO but as a set of scripts for building it):
https://bodhi.fedoraproject.org/updates/FEDORA-2022-73352e7ddb (Rawhide)
https://bodhi.fedoraproject.org/updates/FEDORA-2022-37e09123cd (F36)
https://bodhi.fedoraproject.org/updates/FEDORA-2022-bcd142cb20 (F35)

Note You need to log in before you can comment on or make changes to this bug.