Bug 2320025

Summary: libguests builds embed mock uid/gid and timestamp in tar.gz files, making builds irreproducible
Product: [Fedora] Fedora Reporter: Zbigniew Jędrzejewski-Szmek <zbyszek>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: rjones
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libguestfs-1.54.0-3.fc42 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-10-21 15:16:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zbigniew Jędrzejewski-Szmek 2024-10-20 11:16:18 UTC
Description of problem:

This is about the reproducible builds effort (https://fedoraproject.org/wiki/Changes/ReproduciblePackageBuilds).

A rebuild of libguests shows:
    libguestfs-appliance-1.54.0-2.fc42.x86_64
        modified-S.5........ /usr/lib64/guestfs/supermin.d/base.tar.gz
        modified-S.5........ /usr/lib64/guestfs/supermin.d/daemon.tar.gz
        modified-S.5........ /usr/lib64/guestfs/supermin.d/init.tar.gz
        modified-S.5........ /usr/lib64/guestfs/supermin.d/udev-rules.tar.gz

Diffoscope says:
│ ├── ./usr/lib64/guestfs/supermin.d/udev-rules.tar.gz
│ │ ├── udev-rules.tar
│ │ │ ├── file list
│ │ │ │ @@ -1,4 +1,4 @@
│ │ │ │ -drwxr-xr-x   0 mockbuild  (1000) mock       (425)        0 2024-10-14 10:14:42.000000 etc/
│ │ │ │ -drwxr-xr-x   0 mockbuild  (1000) mock       (425)        0 2024-10-14 10:14:42.000000 etc/udev/
│ │ │ │ -drwxr-xr-x   0 mockbuild  (1000) mock       (425)        0 2024-10-14 10:14:42.000000 etc/udev/rules.d/
│ │ │ │ --rw-r--r--   0 mockbuild  (1000) mock       (425)      798 2023-11-16 10:48:23.000000 etc/udev/rules.d/99-guestfs-serial.rules
│ │ │ │ +drwxr-xr-x   0 mockbuild  (1000) mock       (135)        0 2024-10-14 11:09:52.000000 etc/
│ │ │ │ +drwxr-xr-x   0 mockbuild  (1000) mock       (135)        0 2024-10-14 11:09:52.000000 etc/udev/
│ │ │ │ +drwxr-xr-x   0 mockbuild  (1000) mock       (135)        0 2024-10-14 11:09:52.000000 etc/udev/rules.d/
│ │ │ │ +-rw-r--r--   0 mockbuild  (1000) mock       (135)      798 2023-11-16 10:48:23.000000 etc/udev/rules.d/99-guestfs-serial.rules

This is actually very similar to the problem we had with static archives (*.a), https://pagure.io/fedora-reproducible-builds/project/issue/7. There, we ended up creating a postprocessing step to clean up the file.

I think we end state should be that the tar files don't embed the uid/gid and the timestamp is either zeroed out or set to $SOURCE_DATE_EPOCH. This could happen either by changing the build process in the package itself, or by extending add-determinism to clean up all .tar.gz files. The question is what is easier and more maintainable. Would it be hard to change the build process in the package to include "--owner=root --mtime=$SOURCE_DATE_EPOCH"?

Comment 1 Zbigniew Jędrzejewski-Szmek 2024-10-20 11:18:23 UTC
Hmm, looking at the diff listing again, we'd probably want to clamp the mtimes, not override them completely. I.e. the timestamp on 99-guestfs-serial.rules should stay, but the timestamps on the directories should be clamped.

Comment 2 Richard W.M. Jones 2024-10-20 17:03:35 UTC
base.tar.gz is generated by supermin here:

https://github.com/libguestfs/supermin/blob/685f1482ac10e98be6a93f76d7c0d74c00550e1e/src/mode_prepare.ml#L165

The other three files are generated during the libguestfs build here:

https://github.com/libguestfs/libguestfs/blob/e37768d8892d6f467c7834f8b142b89f8f0af7dc/appliance/Makefile.am#L116-L154

It would be possible to modify the tar commands.

--mtime shouldn't be a problem.

--owner/--group are likely to be fine too as we already lose the owner/group information
when we build the appliance.

Comment 3 Richard W.M. Jones 2024-10-20 18:03:05 UTC
I'm thinking we could change the upstream tar commands to add:

--owner=root ${SOURCE_DATE_EPOCH:+--mtime=$SOURCE_DATE_EPOCH}

which should only add the --mtime parameter if SOURCE_DATE_EPOCH is set, and
omit it otherwise, and will set --owner always.  Not sure if we need to use
--group=root or not.

Comment 4 Richard W.M. Jones 2024-10-21 11:01:08 UTC
I sent a couple of patches upstream, please review.

They'll eventually appear here:
https://lists.libguestfs.org/archives/list/guestfs@lists.libguestfs.org/2024/10/

Comment 5 Fedora Update System 2024-10-21 13:43:49 UTC
FEDORA-2024-1cb20f9dc0 (supermin-5.3.5-2.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-1cb20f9dc0

Comment 6 Fedora Update System 2024-10-21 15:16:09 UTC
FEDORA-2024-1cb20f9dc0 (supermin-5.3.5-2.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 7 Fedora Update System 2024-10-21 16:55:29 UTC
FEDORA-2024-e7da4e7159 (libguestfs-1.54.0-3.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-e7da4e7159

Comment 8 Richard W.M. Jones 2024-10-21 17:25:24 UTC
libguestfs-1.54.0-3.fc42 is built with supermin-5.3.5-2.fc42 and that should
(in theory at least) use the --owner, --group and --mtime options for all four
tarballs.

The libguestfs build logs are linked from this page:

https://koji.fedoraproject.org/koji/buildinfo?buildID=2572290

Comment 9 Fedora Update System 2024-10-21 18:31:09 UTC
FEDORA-2024-e7da4e7159 (libguestfs-1.54.0-3.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.