Bug 2245066 - fedora-toolbox:39 image is over 5GB due to missing hardlinking
Summary: fedora-toolbox:39 image is over 5GB due to missing hardlinking
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora Container Images
Classification: Fedora
Component: fedora-toolbox
Version: 39
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Debarshi Ray
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
Depends On:
Blocks: F39FinalFreezeException 2216766
TreeView+ depends on / blocked
 
Reported: 2023-10-19 14:20 UTC by Jens Petersen
Modified: 2023-11-06 15:21 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-30 17:47:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure releng issue 11735 0 None None None 2023-10-26 13:58:46 UTC
Github redhat-imaging imagefactory issues 412 0 None open Hard link disappeared when built docker image 2023-10-19 20:30:27 UTC
Github redhat-imaging imagefactory pull 455 0 None open Docker.py: Pass the use_ino option to fix hardlnks 2023-10-25 22:45:46 UTC

Description Jens Petersen 2023-10-19 14:20:57 UTC
If one tries fedora-toolbox:39 (I am testing in F39 VM)
one finds that the image is over 5.5GB in size!
This compares to under 1.7GB for fedora-toolbox:38 ((already quite big IMHO)).

This seems caused by hardlinks not being created in the image,
eg /usr/lib64/dir is 1.6GB.

I feel 5.5GB is an unacceptable size so I feel this is a potential blocker?
Though the actual on disk size might be less, if btrfs is clever I dunno?

But it needs to be fixed by Releng I think.

This is somewhat similar to https://pagure.io/releng/issue/11712 and https://github.com/redhat-imaging/imagefactory/issues/412

Originally reported at https://github.com/containers/toolbox/issues/1389
though I think rishi was already aware of this yesterday or earlier.

Comment 1 Debarshi Ray 2023-10-19 14:46:47 UTC
(In reply to Jens Petersen from comment #0)
> If one tries fedora-toolbox:39 (I am testing in F39 VM)
> one finds that the image is over 5.5GB in size!
> This compares to under 1.7GB for fedora-toolbox:38 ((already quite big
> IMHO)).

Those are the uncompressed on-disk sizes from 'podman images', which is different from the compressed over-the-network sizes from 'skopeo inspect'.  Toolbx from Git main shows the compressed over-the-network sizes when it prompts you for the download, but it's not in Fedora, yet.

The compressed over-the-network sizes went from 302.2M for the fedora-toolbox:38 image to 1.1G for the fedora-toolbox:39 image.  That's ... a significant jump. :)

> This seems caused by hardlinks not being created in the image,
> eg /usr/lib64/dir is 1.6GB.

Yes, so far, I know of the Mesa userspace drivers in %{_libdir}/dri and the locale definitions in %{_prefix}/lib/locale/locale-archive*.

> I feel 5.5GB is an unacceptable size so I feel this is a potential blocker?
> Though the actual on disk size might be less, if btrfs is clever I dunno?

These are already the on-disk sizes from 'podman images'.  See above.
 
> But it needs to be fixed by Releng I think.
> 
> This is somewhat similar to https://pagure.io/releng/issue/11712 and
> https://github.com/redhat-imaging/imagefactory/issues/412
> 
> Originally reported at https://github.com/containers/toolbox/issues/1389
> though I think rishi was already aware of this yesterday or earlier.

Yes, I noticed the sudden bump in the image size a few weeks ago.  However, I am/was busy investigating and fixing other user-visible regressions (bug 2242874 and bug 2244503), so I didn't really look into it until yesterday.

I am not sure that we can "fix it in Toolbx", because I suspect that the problem is in the Image Factory pipeline.  eg., https://github.com/redhat-imaging/imagefactory/issues/412 as you (and also Halfline) pointed out.

Comment 2 Debarshi Ray 2023-10-19 14:54:05 UTC
On one hand, I would like to bring the size down, but I don't know how feasible it would be to do it before Fedora 39 GA.

If this is indeed considered a blocker, then my suggestion would be to switch back to the Dockerfile-based images built with OpenShift Build Service for Fedora 39.  This would mean moving the ToolbxReleaseBlocker Change [1] to Fedora 40, and that's fine by me.

One thing to consider in the blocker debate is that, strictly speaking, we never actively tried to optimize the image's size.  Providing a good interactive command line user experience was considered more important, and finding ways to optimize the size got pushed down as "future work".  So, again, pedantically speaking, one could argue that this isn't a regression.  Although a jump from 302.2M to 1.1G in network traffic is hard to hand wave away.

[1] https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker

Comment 3 Debarshi Ray 2023-10-19 14:56:32 UTC
(In reply to Debarshi Ray from comment #1)
> I am not sure that we can "fix it in Toolbx", because I suspect that the
> problem is in the Image Factory pipeline.  eg.,
> https://github.com/redhat-imaging/imagefactory/issues/412 as you (and also
> Halfline) pointed out.

I forgot to mention that I already verified that it isn't a bug in the Podman stack.

If you install the RPMs later on the fedora base image, then the hard links are created as expected:
$ podman run --env TERM=$TERM -it --rm registry.fedoraproject.org/fedora:40 /bin/bash
...
# dnf swap glibc-minimal-langpack glibc-all-langpacks
...
# ls --inode /usr/lib/locale
...
# dnf install mesa-dri-drivers
...
# ls --inode /usr/lib64/dri
...

Plus, we are building the images with Image Factory so there's no Docker or Podman involved, as far as I know.

Comment 4 Jens Petersen 2023-10-19 15:16:41 UTC
Wonder if there are any hardlinks missing in the base image?

Comment 5 Adam Williamson 2023-10-19 15:39:27 UTC
I don't think there's any grounds for considering this a release blocker, as we drew things up. We didn't actually declare any maximum size for the toolbx image. That would be done by adding it to https://docs.fedoraproject.org/en-US/releases/f39/blocking/ with a max size, but...we didn't do that.

I think I'm OK with letting this slide to potentially be fixed post-GA, honestly, but that's just a squishy feeling. We *should* probably revise https://docs.fedoraproject.org/en-US/releases/f39/blocking/ to include the toolbx image, at least if it's delivered as part of the compose tree (it is...right?)

Comment 6 Adam Williamson 2023-10-19 15:53:48 UTC
https://pagure.io/fedora-pgm/pgm_docs/pull-request/51 adds toolbx images to the blocking list. I didn't include a max size for now; we can add one if it seems desirable.

Comment 7 Ray Strode [halfline] 2023-10-19 20:30:28 UTC
just pointing out here, that Owen has proposed a simple one line change to image factory on the aforementioned github issue that has a high likelihood of fixing this problem.

Comment 8 Jens Petersen 2023-10-20 02:14:35 UTC
I opened https://pagure.io/releng/issue/11735

Comment 9 Adam Williamson 2023-10-20 15:35:20 UTC
+3 in https://pagure.io/fedora-qa/blocker-review/issue/1417 , marking accepted.

Comment 10 Debarshi Ray 2023-10-25 22:48:40 UTC
I want to reiterate that we cannot "fix it in Toolbx".  This is a bug in the infrastructure for building Docker images, specifically Image Factory:
https://github.com/redhat-imaging/imagefactory/issues/412
https://github.com/redhat-imaging/imagefactory/pull/455

As per Peter's suggestion in https://pagure.io/releng/issue/11735, I built imagefactory-1.1.16-7.fc39 for F39 with the patch from Owen:
https://bodhi.fedoraproject.org/updates/FEDORA-2023-a8871574f4
https://src.fedoraproject.org/rpms/imagefactory/c/28c6a561b78dacdc6dc9e00240ea4a
https://koji.fedoraproject.org/koji/taskinfo?taskID=108105779

What's the next step?

Comment 11 Jens Petersen 2023-10-26 03:32:24 UTC
Moving to MODIFIED since https://pagure.io/releng/issue/11735 got closed

Comment 12 Adam Williamson 2023-10-26 21:21:25 UTC
so it looks like another update is needed, I re-opened https://pagure.io/releng/issue/11735 to track re-updating the builders.

Comment 13 Jens Petersen 2023-10-29 10:48:37 UTC
I believe this is now fixed as of yesterday.

Comment 14 Debarshi Ray 2023-10-30 17:46:42 UTC
Yes, I can confirm that the sizes of the fedora-toolbox:39 and fedora-toolbox:40 images are back within expected limits:


Testing with toolbox.git main:

[rishi@topinka ~]$ /opt/bin/toolbox create --release 40
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:40 (362.2MB)? [y/N]: N

[rishi@topinka ~]$ /opt/bin/toolbox create --release 39
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:39 (359.6MB)? [y/N]: N

[rishi@topinka ~]$ /opt/bin/toolbox create --release 38
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:38 (302.2MB)? [y/N]: N

[rishi@topinka ~]$ /opt/bin/toolbox create --release 37
Image required to create toolbox container.
Download registry.fedoraproject.org/fedora-toolbox:37 (293.7MB)? [y/N]: N


Else, with skopeo(1), since the toolbox(1) in Fedora only shows a hard coded approximate size:

[rishi@topinka ~]$ skopeo inspect --format "{{range .LayersData}}{{.Size}} {{end}}" docker://registry.fedoraproject.org/fedora-toolbox:40
362173030 

[rishi@topinka ~]$ skopeo inspect --format "{{range .LayersData}}{{.Size}} {{end}}" docker://registry.fedoraproject.org/fedora-toolbox:39
359576852 

[rishi@topinka ~]$ skopeo inspect --format "{{range .LayersData}}{{.Size}} {{end}}" docker://registry.fedoraproject.org/fedora-toolbox:38
70993156 231193396 

[rishi@topinka ~]$ skopeo inspect --format "{{range .LayersData}}{{.Size}} {{end}}" docker://registry.fedoraproject.org/fedora-toolbox:37
69579116 224100728 

The older images have two numbers that need to be added, because they used to be layered images built with OpenShift Build Service and had two layers.

Comment 15 Debarshi Ray 2023-10-30 17:50:13 UTC
[rishi@topinka ~]$ /opt/bin/toolbox enter --release f39
⬢[rishi@toolbox ~]$ 
⬢[rishi@toolbox ~]$ cat /etc/fedora-release 
Fedora release 39 (Thirty Nine)
⬢[rishi@toolbox ~]$ 
⬢[rishi@toolbox ~]$ ls -1 --inode /usr/lib64/dri
20066105 armada-drm_dri.so
20066105 crocus_dri.so
20066105 etnaviv_dri.so
20066105 exynos_dri.so
20066105 hx8357d_dri.so
20066105 i915_dri.so
20066105 ili9225_dri.so
20066105 ili9341_dri.so
20066105 imx-dcss_dri.so
20066105 imx-drm_dri.so
20066105 imx-lcdif_dri.so
20066105 ingenic-drm_dri.so
20066105 iris_dri.so
20066105 kgsl_dri.so
20066105 kirin_dri.so
20066105 kms_swrast_dri.so
20066105 komeda_dri.so
20066105 lima_dri.so
20066105 mali-dp_dri.so
20066105 mcde_dri.so
20066105 mediatek_dri.so
20066105 meson_dri.so
20066105 mi0283qt_dri.so
20066105 msm_dri.so
20066105 mxsfb-drm_dri.so
20066105 nouveau_dri.so
20066110 nouveau_drv_video.so
20066105 panfrost_dri.so
20066105 pl111_dri.so
20066105 r300_dri.so
20066105 r600_dri.so
20066110 r600_drv_video.so
20066105 radeonsi_dri.so
20066110 radeonsi_drv_video.so
20066105 rcar-du_dri.so
20066105 repaper_dri.so
20066105 rockchip_dri.so
20066105 st7586_dri.so
20066105 st7735r_dri.so
20066105 stm_dri.so
20066105 sun4i-drm_dri.so
20066105 swrast_dri.so
20066105 tegra_dri.so
20066105 v3d_dri.so
20066105 vc4_dri.so
20066105 virtio_gpu_dri.so
20066110 virtio_gpu_drv_video.so
20066105 vmwgfx_dri.so
20066105 zink_dri.so
⬢[rishi@toolbox ~]$ 
⬢[rishi@toolbox ~]$ ls -1 --inode /usr/lib/locale/locale-archive*
20065193 /usr/lib/locale/locale-archive
20065193 /usr/lib/locale/locale-archive.real

Comment 16 Debarshi Ray 2023-10-31 13:47:02 UTC
I was doing some scratch builds of the fedora-toolbox:40 OCI image recently, and found that in some builds the hard links are absent from %{_libdir}/dri and %{_prefix}/lib/locale/locale-archive*, and hence those images are a lot bigger than expected.  eg., compare these two:
https://koji.fedoraproject.org/koji/taskinfo?taskID=108340937
https://koji.fedoraproject.org/koji/taskinfo?taskID=108344786

The first tarball is a lot bigger than the second.

It's possible that some of the builders still have a buggy Image Factory, and I will keep an eye on this and pursue it.  However, since the official images on registry.fedoraproject.org are alright, I will leave this bug closed.

Comment 17 Debarshi Ray 2023-11-01 15:22:40 UTC
(In reply to Debarshi Ray from comment #16)
> I was doing some scratch builds of the fedora-toolbox:40 OCI image recently,
> and found that in some builds the hard links are absent from %{_libdir}/dri
> and %{_prefix}/lib/locale/locale-archive*, and hence those images are a lot
> bigger than expected.  eg., compare these two:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=108340937
> https://koji.fedoraproject.org/koji/taskinfo?taskID=108344786
> 
> The first tarball is a lot bigger than the second.
> 
> It's possible that some of the builders still have a buggy Image Factory,
> and I will keep an eye on this and pursue it.  However, since the official
> images on registry.fedoraproject.org are alright, I will leave this bug
> closed.

Kevin confirmed that he found 3 x86_64 builders that still had a buggy Image Factory.


Note You need to log in before you can comment on or make changes to this bug.