Bug 2123812

Summary: ostree installer composes fail in mock using systemd-nspawn
Product: [Fedora] Fedora Reporter: Kevin Fenzi <kevin>
Component: distributionAssignee: Kevin Fenzi <kevin>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: dustymabe, jonathan, kevin, klember, lsedlar, lucab, miabbott, ngompa13, robertthomasfairley, travier, walters
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-16 00:43:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kevin Fenzi 2022-09-02 16:47:07 UTC
We recently switched rawhide to use the 'new' systemd-nspawn chroot method instead of the old chroot method. 

The ostree installer image composes now fail: 

2022-09-02 06:49:25,750: running /mnt/koji/compose/rawhide/Fedora-Rawhide-20220902.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-configure-repo.tmpl
running /mnt/koji/compose/rawhide/Fedora-Rawhide-20220902.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-configure-repo.tmpl
2022-09-02 06:49:25,767: running /mnt/koji/compose/rawhide/Fedora-Rawhide-20220902.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-embed-repo.tmpl
running /mnt/koji/compose/rawhide/Fedora-Rawhide-20220902.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-embed-repo.tmpl
2022-09-02 06:49:42,952: command output:
error: Writing content object: Setting xattrs: fsetxattr(security.selinux): Invalid argument

command output:
error: Writing content object: Setting xattrs: fsetxattr(security.selinux): Invalid argument

2022-09-02 06:49:42,952: command returned failure (1)
command returned failure (1)
2022-09-02 06:49:42,952: template command error in /mnt/koji/compose/rawhide/Fedora-Rawhide-20220902.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-embed-repo.tmpl:
template command error in /mnt/koji/compose/rawhide/Fedora-Rawhide-20220902.n.0/work/x86_64/Silverblue/lorax_templates/ostree-based-installer/lorax-embed-repo.tmpl:
2022-09-02 06:49:42,952:   runcmd ostree --repo=/var/tmp/lorax/lorax.sfou2vis/installtree/ostree/repo pull --mirror fedora fedora/rawhide/x86_64/silverblue
  runcmd ostree --repo=/var/tmp/lorax/lorax.sfou2vis/installtree/ostree/repo pull --mirror fedora fedora/rawhide/x86_64/silverblue
2022-09-02 06:49:42,954:   subprocess.CalledProcessError: Command '['ostree', '--repo=/var/tmp/lorax/lorax.sfou2vis/installtree/ostree/repo', 'pull', '--mirror', 'fedora', 'fedora/rawhide/x86_64/silverblue']' returned non-zero exit status 1.
  subprocess.CalledProcessError: Command '['ostree', '--repo=/var/tmp/lorax/lorax.sfou2vis/installtree/ostree/repo', 'pull', '--mirror', 'fedora', 'fedora/rawhide/x86_64/silverblue']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/sbin/lorax", line 223, in <module>
    main()
  File "/usr/sbin/lorax", line 204, in main
    lorax.run(dnfbase, opts.product, opts.version, opts.release,
  File "/usr/lib/python3.11/site-packages/pylorax/__init__.py", line 272, in run
    rb.install()
  File "/usr/lib/python3.11/site-packages/pylorax/treebuilder.py", line 148, in install
    self._runner.run(tmpl, **self.add_template_vars)
  File "/usr/lib/python3.11/site-packages/pylorax/ltmpl.py", line 149, in run
    self._run(commands)
  File "/usr/lib/python3.11/site-packages/pylorax/ltmpl.py", line 168, in _run
    f(*args)
  File "/usr/lib/python3.11/site-packages/pylorax/ltmpl.py", line 665, in runcmd
    stdout = runcmd_output(cmd)
             ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/pylorax/executils.py", line 373, in runcmd_output
    return execWithCapture(cmd[0], cmd[1:], **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/pylorax/executils.py", line 251, in execWithCapture
    return _run_program(argv, stdin=stdin, root=root, log_output=log_output, filter_stderr=filter_stderr,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/pylorax/executils.py", line 205, in _run_program
    raise subprocess.CalledProcessError(proc.returncode, argv, output)
subprocess.CalledProcessError: Command '['ostree', '--repo=/var/tmp/lorax/lorax.sfou2vis/installtree/ostree/repo', 'pull', '--mirror', 'fedora', 'fedora/rawhide/x86_64/silverblue']' returned non-zero exit status 1.
2022-09-02 06:49:42,957: Cleaning up tempdir - /var/tmp/lorax/lorax.sfou2vis

This of course might be a mock bug or a nspawn bug or a lorax bug or something else, but I thought I would start here.

Comment 1 Colin Walters 2022-09-07 14:52:48 UTC
Are there AVC denials on the host system?  It may be that previously we were operating in an install_t context with cap_mac_admin, but not anymore.  (Another way to check this is to compare ps axZ output of the running programs)

Now, IMO our use of mock for things like this should be considered legacy.  I think we should be building things using more standard container tools - namely podman and Kubernetes.  For Fedora CoreOS we have heavily invested in having our build and test tooling run as a standard container in standardized ways.  For *this specific* problem, we end up launching a transient VM inside the container, because this ensures strong isolation from the host.  mock privileged containers aren't doing that and are inherently going to lead to problems like this - nspawn doesn't help here.

Comment 2 Kevin Fenzi 2022-09-07 16:10:15 UTC
All the koji builders are in selinux permissive mode. ;(

Comment 3 Colin Walters 2022-09-12 14:34:31 UTC
How tenable is it to use the old chroot mode just for this task?

What we're getting here is EINVAL, which I'm pretty sure is happening here
https://github.com/torvalds/linux/blob/80e78fcce86de0288793a0ef0f6acf37656ee4cf/security/selinux/hooks.c#L3189

Crucially, I think this error isn't dependent on whether or not the system is in SELinux permissive mode, it is dependent on whether or not the caller has CAP_MAC_ADMIN:
https://github.com/torvalds/linux/blob/80e78fcce86de0288793a0ef0f6acf37656ee4cf/security/selinux/hooks.c#L3136

It seems likely to me that nspawn is dropping this permission.  Adding an invocation of `capsh --print` before the relevant command would likely say.

Medium term, we will make rpm-ostree work fully unprivileged - this was part of the big goal of https://github.com/coreos/rpm-ostree/issues/729 and we're pretty close, but not there yet.

So short term, I think we need to do old chroot or figure out how to get nspawn to give us the credentials.

(Or of course, also medium term, use podman/kubernetes which is how we should be running containers in production)

Comment 4 Colin Walters 2022-09-12 18:42:32 UTC
Moving back to distribution to denote this is not short term actionable by (rpm-)ostree issue today and must be fixed in the infrastructure invoking us (whether that's mock/koji/nspawn/etc.)

Comment 5 Kevin Fenzi 2022-09-12 22:26:07 UTC
Alright. Thanks. 

So, currently there's only 2 places we can control this:
1) The koji tag can set to use old-chroot or nspawn. This will apply to basically everything for that branch. 

2) We can adjust site-defaults.cfg on builders. This will however apply to every branch/all things, but if we only change config_opts['nspawn_args'] it will be ignored by the non nspawn branches. 

So, we could add: 

config_opts['nspawn_args'] = ['--capability=cap_mac_admin']

But that would then apply to every build using nspawn. Is that something that would be bad to have enabled for all builds?

Alternately, perhaps we could get pungi to do ostree_installer runroot tasks with old-chroot passed to koji?
Adding Lsedlar for comment on that approach. :)

Comment 6 Lubomír Sedlář 2022-09-13 06:17:19 UTC
This is actually interesting. Pungi always submits ostree tasks with the --new-chroot option.
https://pagure.io/pungi/pull-request/411

If it no longer works, I don't see a problem with making it configurable.

Comment 7 Kevin Fenzi 2022-09-13 15:12:30 UTC
ostree is always new-choot, but ostree_installer (which is what is breaking here) is not. From what I can tell it's just using whatever default/setting koji has. 

So, can we adjust ostree_installer phase to always use old chroot?

Comment 8 Neal Gompa 2022-09-13 19:07:10 UTC
Couldn't we go new-chroot and add the nspawn args Kevin suggested in comment 5 from pungi for ostree_installer?

Comment 9 Lubomír Sedlář 2022-09-14 07:07:26 UTC
This would switch the option: https://pagure.io/pungi/pull-request/1636
Using new-chroot and customizing mock options is not possible from Pungi. The `koji runroot` does not have that level of granularity.

Comment 10 Kevin Fenzi 2022-09-14 18:27:38 UTC
Thanks. After pondering some, I realized I can isolate the additional nspawn capability/arg to just runroot builders. So, thats a much smaller area.

So, I'd like to try that first, and if it doesn't do the trick, then go with the change in pungi... I'll know more with tomorrows compose. 

I think it would be better to move everything we can to new-chroot.

Comment 11 Kevin Fenzi 2022-09-15 23:09:45 UTC
ok, that got them to compose. 

Can someone test that they actually work ? :) 

todays rawhide 20220915n.1

Comment 13 Kevin Fenzi 2022-09-16 00:43:41 UTC
Cool. Then, I think we have worked around this and don't need the pungi change after all. 

Many thanks to everyone for tracking things down here.

Comment 14 Timothée Ravier 2022-10-19 16:16:05 UTC
I'm late to the party but thanks a lot for fixing this one