Bug 2033192

Summary: weldr-client doesn't convert --size to bytes before sending it to osbuild-composer
Product: Red Hat Enterprise Linux 8 Reporter: Marko Myllynen <myllynen>
Component: weldr-clientAssignee: Brian Lane <bcl>
Status: CLOSED ERRATA QA Contact: Release Test Team <release-test-team-automation>
Severity: high Docs Contact: Eliane Ramos Pereira <elpereir>
Priority: medium    
Version: 8.6CC: atodorov, bcl, elpereir, lmanasko, obudai
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: weldr-client-35.9-2.el8 Doc Type: Bug Fix
Doc Text:
.The `--size` parameter of the `composer-cli compose start` command now treats its values as MiB Previously, when using the `composer-cli compose start --size __size_value__ __blueprint_name__ __image_type__` command, the `composer-cli` tool treated the `--size` parameter values as byte units. This update fixes the issue, and the `--size` parameter values are now correctly used in the MiB format.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-16 08:27:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test output none

Description Marko Myllynen 2021-12-16 07:41:04 UTC
Description of problem:
Building an image using the toml file (and Ansible roles) from https://github.com/myllynen/rhel-image works with composer-cli-28.14.62-1.el8.x86_64 but fails using weldr-client-35.3-2.el8.x86_64. Tried both as root and as the dedicated user.

There are several warnings in the logs. Ideally we'd have clean logs for successful builds as the warnings might cause confusion or false alarms among users. I'm not sure are these package-specific issues or something else. For example:

    Output:
    [/usr/lib/tmpfiles.d/grafana.conf:1] Unknown user 'grafana'.
    [/usr/lib/tmpfiles.d/journal-nocow.conf:26] Failed to resolve specifier: uninitialized /etc detected, skipping
    [/usr/lib/tmpfiles.d/rpcbind.conf:2] Unknown user 'rpc'.

The real issue seems to be:

    Failed to create file /sys/fs/selinux/checkreqprot: Read-only file system                                                                                                                                      
    truncate: Invalid number: ‘18446744073709549623’: Value too large for defined data type                                                                                                                        
    Traceback (most recent call last):                                                                                                                                                                             
      File "/run/osbuild/bin/org.osbuild.truncate", line 54, in <module>                                                                                                                                           
        ret = main(args["tree"], args["options"])                                                                                                                                                                  
      File "/run/osbuild/bin/org.osbuild.truncate", line 47, in main                                                                                                                                               
        subprocess.run(["truncate", "--size", size, dest], check=True)                                                                                                                                             
      File "/usr/lib64/python3.6/subprocess.py", line 438, in run                                                                                                                                                  
        output=stdout, stderr=stderr)                                                                                                                                                                              
    subprocess.CalledProcessError: Command '['truncate', '--size', '18446744073709549623', '/run/osbuild/tree/disk.img']' returned non-zero exit status 1.

This should be easy to reproduce with the toml file in the Git repo. Thanks.

Version-Release number of selected component (if applicable):
composer-cli-28.14.62-1.el8.x86_64
weldr-client-35.3-2.el8.x86_64

Comment 1 Brian Lane 2021-12-16 17:09:49 UTC
This isn't related to composer-cli, all it does it pass the blueprint to the server. What version of osbuild-composer and osbuild were you using when it worked, and what versions when it failed? That error traceback is python, from osbuild's org.osbuild.truncate module.

Comment 2 Marko Myllynen 2021-12-17 17:13:01 UTC
Right, I had updated only weldr-client and then started seeing the issue. Now tested again with:

osbuild-43-1.el8.noarch
osbuild-composer-40-1.el8.x86_64
osbuild-composer-core-40-1.el8.x86_64
osbuild-composer-dnf-json-40-1.el8.x86_64
osbuild-composer-worker-40-1.el8.x86_64
osbuild-ostree-43-1.el8.noarch
osbuild-selinux-43-1.el8.noarch
python3-osbuild-43-1.el8.noarch
selinux-policy-3.14.3-85.el8.noarch
selinux-policy-targeted-3.14.3-85.el8.noarch
weldr-client-35.3-2.el8.x86_64

And see this:

Stage org.osbuild.sfdisk
Output:
[/usr/lib/tmpfiles.d/journal-nocow.conf:26] Failed to resolve specifier: uninitialized /etc detected, skipping
All rules containing unresolvable specifiers will be skipped.
Failed to create file /sys/fs/selinux/checkreqprot: Read-only file system
label: gpt
label-id: D209C89E-EA5E-4FBD-B161-B461CCE297E0
start="2048", size="2048", type="21686148-6449-6E6F-744E-656564454649", uuid="FAC7F1FB-3E8D-4137-A512-961DE09A5549", bootable
start="4096", size="204800", type="C12A7328-F81F-11D2-BA4B-00A0C93EC93B", uuid="68B2905B-DF3E-4FB3-80FA-49D1E773AA33"
start="208896", size="18446744073709343024", type="0FC63DAF-8483-4772-8E79-3D69D8477DE4", uuid="6264D520-3FB9-423F-8AB8-7A0A8E3D3562"
Sector 2048 already used.
Failed to add #1 partition: Numerical result out of range
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.sfdisk", line 203, in <module>
    ret = main(args["devices"], args["options"])
  File "/run/osbuild/bin/org.osbuild.sfdisk", line 194, in main
    pt.write_to(device)
  File "/run/osbuild/bin/org.osbuild.sfdisk", line 151, in write_to
    check=True)
  File "/usr/lib64/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sfdisk', '-q', '--no-tell-kernel', '/dev/loop0']' returned non-zero exit status 1.

If this is a hickup with the toml file then it's a bit hard to tell from the above message especially since it works on RHEL 8.5.

Thanks.

Comment 3 Ondřej Budai 2022-03-01 14:36:48 UTC
Hello Marko, can you post the blueprint that doesn't work for you as an attachment? I'm not sure if this bus is up-to-date with the repository you linked.

Comment 4 Marko Myllynen 2022-03-08 15:31:55 UTC
Thanks for looking into this.

Here's a permalink to the blueprint: https://github.com/myllynen/rhel-image/blob/d551286c8a7a2c42c3bb5c44f063abed6708d6a8/base-image.toml

Build ok with RHEL 8.5 packages but fails with these:

# rpm -qa | grep -i -e osbuild -e weldr | sort
osbuild-52-1.el8eng.noarch
osbuild-composer-46-1.el8.x86_64
osbuild-composer-core-46-1.el8.x86_64
osbuild-composer-dnf-json-46-1.el8.x86_64
osbuild-composer-worker-46-1.el8.x86_64
osbuild-luks2-52-1.el8eng.noarch
osbuild-lvm2-52-1.el8eng.noarch
osbuild-ostree-52-1.el8eng.noarch
osbuild-selinux-52-1.el8eng.noarch
osbuild-tools-52-1.el8eng.noarch
python3-osbuild-52-1.el8eng.noarch
weldr-client-35.5-1.el8.x86_64

Thanks.

Comment 5 Ondřej Budai 2022-04-19 15:27:09 UTC
Hi Marko,

sorry, I completely forgot about this. Can you add selinux-policy-targeted into the list of packages in the blueprint? We are currently observing a bug that causes osbuild-composer to not be able to depsolve packages with conditional dependencies.

Thanks,
Ondřej

Comment 6 Marko Myllynen 2022-04-26 05:18:23 UTC
Hi Ondřej,

I added selinux-policy-targeted into the list of packages in the blueprint but still see the same partition creation related issue.

I'm attaching a slightly more complete output from my test. I used the blueprint I linked above so this should be straightforward for you to reproduce.

Thanks.

Comment 7 Marko Myllynen 2022-04-26 05:19:24 UTC
Created attachment 1874995 [details]
test output

Comment 10 Ondřej Budai 2022-09-05 12:15:10 UTC
Hi @myllynen,

I tried your blueprint on a fresh RHEL 8.6 install, and it built alright:

$ rpm -qa | grep -i -e osbuild -e weldr | sort
osbuild-53-2.el8.noarch
osbuild-composer-46.3-1.el8_6.x86_64
osbuild-composer-core-46.3-1.el8_6.x86_64
osbuild-composer-dnf-json-46.3-1.el8_6.x86_64
osbuild-composer-worker-46.3-1.el8_6.x86_64
osbuild-luks2-53-2.el8.noarch
osbuild-lvm2-53-2.el8.noarch
osbuild-ostree-53-2.el8.noarch
osbuild-selinux-53-2.el8.noarch
python3-osbuild-53-2.el8.noarch
weldr-client-35.5-1.el8.x86_64

Can you retest? I'm not entirely sure what else to do here because you used packages that weren't shipped to customers, we only shipped the following ones:

dnf list --showduplicates weldr-client composer-cli osbuild osbuild-composer
Installed Packages
weldr-client.x86_64       35.5-1.el8         @rhel-8-for-x86_64-appstream-rpms
Available Packages
composer-cli.x86_64       28.14.23-1.el8     rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.23-5.el8_0   rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.23-7.el8_0   rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.30-1.el8     rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.42-1.el8     rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.42-2.el8_2   rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.55-2.el8     rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.58-1.el8     rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.62-1.el8     rhel-8-for-x86_64-appstream-rpms 
composer-cli.x86_64       28.14.68-1.el8     rhel-8-for-x86_64-appstream-rpms 
osbuild.noarch            18-3.el8           rhel-8-for-x86_64-appstream-rpms 
osbuild.noarch            27.2-1.el8         rhel-8-for-x86_64-appstream-rpms 
osbuild.noarch            27.3-2.el8_4       rhel-8-for-x86_64-appstream-rpms 
osbuild.noarch            35-3.el8           rhel-8-for-x86_64-appstream-rpms 
osbuild.noarch            53-2.el8           rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   20.1-1.el8         rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   28.4-1.el8         rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   28.6-1.el8_4       rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   28.7-1.el8_4       rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   33.2-1.el8         rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   46.1-1.el8         rhel-8-for-x86_64-appstream-rpms 
osbuild-composer.x86_64   46.3-1.el8_6       rhel-8-for-x86_64-appstream-rpms 
weldr-client.x86_64       35.5-1.el8         rhel-8-for-x86_64-appstream-rpms

Comment 11 Marko Myllynen 2022-09-05 16:04:19 UTC
I retried and still see the failure. Here are the steps what I did this time:

1) Installed a completely new RHEL 8.6 Server test VM using RHEL 8.6 DVD ISO and Minimal install
2) Subscribed the system to RHSM
3) yum update and reboot
4) yum install composer-cli git-core osbuild-composer tar wget; systemctl enable osbuild-composer.socket and reboot
5) Used wget to fetch the reproducer toml file

Then used the following commands as root:

# composer-cli blueprints push base-image.toml
# composer-cli blueprints depsolve base-image
# composer-cli compose start --size 20480 base-image qcow2
# composer-cli compose info 82e3f27f-9ca9-4914-a3c6-25cf1a7fabd8 | grep FAILED
82e3f27f-9ca9-4914-a3c6-25cf1a7fabd8 FAILED   base-image      2022.01.24 qcow2            20480

Downloading the logs for the build show again the same XFS related message but nothing helpful, really.

At this point as an end-user I can't see a way to investigate this further. Is it something to do with the VM or disk/partitioning/storage or the blueprint or the commands or SELinux or permissions or a missing dependency or something else, no hints whatsoever in the logs.

Would it be possible that you ping me over the chat and I would then give you full access to this test VM so that you could then investigate this further hands-on?

Thanks.

Comment 12 Ondřej Budai 2022-09-07 10:23:19 UTC
Thanks, Marko, for giving me access. I found two issues.

Firstly, let's see the documentation for the --size parameter of composer-cli compose start:

--size uint   Size of image in MiB

The thing is that the Weldr API accepts the size parameter in bytes instead of MiB. The old composer-cli did convert the units from MiB to B before sending the compose request, but weldr-client doesn't do it. @bcl, can you fix that?

@elpereir, this would be valuable in the 8.7 known issues:

The --size parameters of composer-cli compose start is documented to be in MiB, but this is currently broken and composer-cli treats it as bytes instead. The workaround is to multiply the size by 1048576. The better workaround is to use customizations.filesystem which allows more granular control over filesystems and accepts units like MiB or GiB.

Marko, the workaround for you is the same: either use --size 21474836480 when using the buggy weldr-client version, or switch to [customizations.filesystem] (preferred), see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/composing_a_customized_rhel_system_image/creating-system-images-with-composer-command-line-interface_composing-a-customized-rhel-system-image#image-customizations_creating-system-images-with-composer-command-line-interface.

----

The next issue is that osbuild-composer allowed building of such a small image that it couldn't fit an empty XFS root partition. The reason is that our 8.5 definitions shipped to 8.6 version of osbuild-composer don't specify a minimum size of / partition which is a bug. Note that building 8.6 on 8.6 is fine because the minimum size is there.

I was thinking if we want to fix this in 8.6 but since building 8.6 on 8.6 isn't affected and 8.5 is already EOL, I don't find it very important. 

Note that this is fixed on 8.7/9.1 because this version is using unified image definitions (and thus partition tables) for all minor versions of both RHEL 8 and 9.

Comment 19 Eliane Ramos Pereira 2023-01-17 20:52:36 UTC
Hello @obudai 

Please, is this Known Issue still valid to be included in the RHEL 8.8 release?

Thank you so much.

Comment 20 Ondřej Budai 2023-01-25 11:02:04 UTC
@bcl Seems like the fix landed in weldr-client 35.7. Do you have plans for rebasing the client in 8.8 and 9.2?

Comment 21 Brian Lane 2023-01-25 16:42:21 UTC
Yeah, that should be safe. The new functions (eg. diff) fail gracefully when the server doesn't support them so a rebase should be ok.

Comment 28 Alexander Todorov 2023-02-20 10:24:28 UTC
# rpm -q weldr-client
weldr-client-35.9-2.el8.x86_64

# composer-cli compose start --size 20480 base-image qcow2
# composer-cli compose status
ID                                     Status     Time                       Blueprint         Version      Type               Size
8014a5a8-9f45-4c4d-bd90-52cee9ffa050   FINISHED   Mon Feb 20 11:22:38 2023   base-image        2022.01.24   qcow2              21474836480

Moving to verified.

Comment 30 errata-xmlrpc 2023-05-16 08:27:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Image Builder security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2780