Bug 2075049 - Don't default to FCOS
Summary: Don't default to FCOS
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: rust-coreos-installer
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Antonio Murdaca
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-13 13:55 UTC by Colin Walters
Modified: 2023-06-28 09:42 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-28 09:42:15 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-118836 0 None None None 2022-04-13 14:08:13 UTC

Description Colin Walters 2022-04-13 13:55:29 UTC
As coreos-installer is now used by RHEL for Edge, I think it doesn't make sense to have "coreos-installer" default to installing Fedora CoreOS - at least when built for RHEL.

We've seen at least a few situations where people trying to use coreos-installer for RHEL CoreOS/OCP somehow get defaulted to trying to install FCOS which has to be from their PoV extremely surprising.

(I'd bikeshed that we should introduce a wrapper binary fedora-coreos-installer that is literally just:

```
#!/bin/bash
exec coreos-installer --distro fcos
```
or so, then ship that in Fedora too.

Comment 1 Antonio Murdaca 2022-05-05 13:57:51 UTC
I can help throw a patch for this Colin - do we want to introduce a --distro flag and use the base url for the given distro, if any? this way, shipping a wrapper in fedora would be backward compatible too and in RHEL we just error out requiring a flag or osmet?

Comment 2 Colin Walters 2022-05-05 14:30:14 UTC
I've only contributed a few patches to coreos-installer myself.  I think I'd propose as a strawman though:

- Add a Cargo feature `fcos-default` (default to...off?)
- Add a conditional in the Fedora spec `%if !0%{rhel}` that turns it on
- Change the option processing code to have the "stream" argument under `if cfg!("feature = fcos-default")`
- Change all code using StreamLocation::new to also be under cfg! and return an error if fcos-default is unset
  (as you say, we basically error if no osmet)

I think adding --distro is a harder and more involved task, and probably raises questions around how this
intersects with Edge and requires more design.  The strawman above just turns off the default for FCOS
when built for RHEL.

Comment 3 Antonio Murdaca 2022-05-05 14:42:36 UTC
(In reply to Colin Walters from comment #2)
> I've only contributed a few patches to coreos-installer myself.  I think I'd
> propose as a strawman though:
> 
> - Add a Cargo feature `fcos-default` (default to...off?)
> - Add a conditional in the Fedora spec `%if !0%{rhel}` that turns it on
> - Change the option processing code to have the "stream" argument under `if
> cfg!("feature = fcos-default")`
> - Change all code using StreamLocation::new to also be under cfg! and return
> an error if fcos-default is unset
>   (as you say, we basically error if no osmet)
> 
> I think adding --distro is a harder and more involved task, and probably
> raises questions around how this

agreed - wanted to triple check on that before just adding a cargo feature which is trivial compared to new flag anyway
if nobody claimed the issue already, I can work on it :)

> intersects with Edge and requires more design.  The strawman above just
> turns off the default for FCOS
> when built for RHEL.

Comment 4 Benjamin Gilbert 2022-05-05 22:47:22 UTC
It's not that simple, unfortunately.  The Fedora CoreOS docs heavily recommend using the container, which is built upstream and doesn't have the opportunity to ship differentiated wrapper scripts.  Even if that weren't true, I think it's net _more_ confusing if different builds of coreos-installer have different defaults.  (It's legal to use coreos-installer on e.g. Fedora to install FCOS, or RHCOS, or whatever.)  So we'd need to solve this in a unified way upstream, which I think means adding a `--distro` flag and going through a deprecation period where we warn if the flag is missing and default to `fcos`.

We'd also need to decide what non-`fcos` distros should do.  I agree that there's no reasonable default image source for RHCOS.  Does RHEL for Edge offer stream metadata that we can use to locate a default image?

Comment 5 Colin Walters 2022-05-05 23:09:26 UTC
> The Fedora CoreOS docs heavily recommend using the container, which is built upstream and doesn't have the opportunity to ship differentiated wrapper scripts. 

But since the binary already defaults to FCOS, what would be the problem with the container including a wrapper?

> Even if that weren't true, I think it's net _more_ confusing if different builds of coreos-installer have different defaults. 

I totally agree with this.  But I'm not arguing for different defaults, but for the RHEL build to have *no* default.

>  Does RHEL for Edge offer stream metadata that we can use to locate a default image?

I won't speak for them but AIUI it's really important to keep in mind that Image Builder is designed to make *custom* derived images,
and...I don't think we want to try to enumerate all of those in our shipped binary right?  I guess we could try
to support a drop-in config file in /usr/lib and /etc or so.

But it really seems simplest to just not have a default (unless osmet is detected).

Comment 6 Benjamin Gilbert 2022-05-05 23:30:07 UTC
>> The Fedora CoreOS docs heavily recommend using the container, which is built upstream and doesn't have the opportunity to ship differentiated wrapper scripts. 
> But since the binary already defaults to FCOS, what would be the problem with the container including a wrapper?

I understood you to be proposing that the wrapper would be a downstream packaging thing, and not in the upstream repo at all.  Even if we shipped it upstream, the goal is to avoid confusing non-FCOS users, right?  Some of them will use the upstream container.

>> Even if that weren't true, I think it's net _more_ confusing if different builds of coreos-installer have different defaults. 
> I totally agree with this.  But I'm not arguing for different defaults, but for the RHEL build to have *no* default.

I understand, but I don't think that completely solves the issue here.  Scripts (or users) that invoke `coreos-installer install /dev/qda` will start failing in some scenarios but not others.

> But it really seems simplest to just not have a default (unless osmet is detected).

I suppose that makes sense.  I'm not thrilled about the awkwardness of the --image-url flow, especially with the likely need for --insecure, but it does work.


Oh, there's also a corner case involving installation kargs.  coreos-installer-service will happily run in OS images that don't ship osmet.  I'm not sure if anyone uses that case right now, but we'd need to either disallow it or add something like a coreos.inst.distro karg.

Comment 7 Benjamin Gilbert 2022-05-06 01:04:10 UTC
The `download` and `list-stream` subcommands will need a --distro argument too.

In a world where OS images are signed, it'd make sense to always require --distro, even if --image-file or --image-url is specified.  We can use that to select the correct verification keyring for the distro, rather than having a TLS CA situation where any distro can sign any image.

Comment 8 Colin Walters 2022-05-06 01:26:22 UTC
> I understood you to be proposing that the wrapper would be a downstream packaging thing, and not in the upstream repo at all.  Even if we shipped it upstream, the goal is to avoid confusing non-FCOS users, right?  Some of them will use the upstream container.

Ah...non-FCOS users are confused today and that's how we got here right?  How would this proposal be more confusing?

But to expand on this, I am arguing for the creation of a `fedora-coreos-installer` script which does boil down to something like:
```
#!/bin/sh
exec coreos-installer --distro fcos "$@"
```

So I do agree we want --distro, but it'd *only* support "fcos" to start; we wouldn't *block* on trying to expand --distro to everything.  That can be a phase 2.

And broadly speaking we'd try to re-train people to invoke `fedora-coreos-installer` if that's what they want and not just `coreos-installer`, maybe via the classic approach of "echo 'Please use fedora-coreos-installer or coreos-installer --distro fedora'; sleep 2" approach.

In practice, this change might not happen for e.g. a year or more.  Or maybe never.  Dunno.

> I understand, but I don't think that completely solves the issue here.  Scripts (or users) that invoke `coreos-installer install /dev/qda` will start failing in some scenarios but not others.

Well, I think I would say "the issue" of installing FCOS would be solved.  Just failing (in the no-osmet case) with a useful error message I think is way, way better than installing FCOS in the case of Edge or RHCOS.

Could it be better?  Yes definitely.

> I suppose [no default] makes sense.  I'm not thrilled about the awkwardness of the --image-url flow, especially with the likely need for --insecure, but it does work.

Agree.

> We can use that to select the correct verification keyring for the distro, rather than having a TLS CA situation where any distro can sign any image.

I like this goal, but managing keyrings and such is a whole big task that I wouldn't want to block on versus the IMO simple approach of just not having a default for the CentOS/RHEL coreos-installer binary.

Comment 9 Benjamin Gilbert 2022-05-06 01:47:46 UTC
> Ah...non-FCOS users are confused today and that's how we got here right?  How would this proposal be more confusing?

Rather than behaving in a confusing but consistent way, we'd behave confusingly only in some flows, which is even more confusing.

> So I do agree we want --distro, but it'd *only* support "fcos" to start; we wouldn't *block* on trying to expand --distro to everything.  That can be a phase 2.

Right, I'm on board with that, but see the keyring discussion below.

>> I understand, but I don't think that completely solves the issue here.  Scripts (or users) that invoke `coreos-installer install /dev/qda` will start failing in some scenarios but not others.
> Well, I think I would say "the issue" of installing FCOS would be solved.  Just failing (in the no-osmet case) with a useful error message I think is way, way better than installing FCOS in the case of Edge or RHCOS.

Anyone who accidentally installs FCOS is not using the program correctly; they're failing to specify the image they want to install.  (Or, in some of the historical cases, we've broken something in the OS image.)  I agree that that's confusing, and should be improved, but it's basically a papercut.  You're proposing to fix that papercut by actively breaking FCOS users (in some flows) who are using the program as intended.  I agree that we probably need to do that, but we need a proper deprecation period and we need to do it consistently in all flows.

>> We can use that to select the correct verification keyring for the distro, rather than having a TLS CA situation where any distro can sign any image.
> I like this goal, but managing keyrings and such is a whole big task that I wouldn't want to block on versus the IMO simple approach of just not having a default for the CentOS/RHEL coreos-installer binary.

Right, I'm not proposing that we do the keyring management stuff now.  What I'm proposing is that we consider always requiring --distro, even in cases where that currently wouldn't change our behavior, so that we don't have to make another breaking change later.  I'm not 100% sold on the idea, but I think it's worth thinking about.

Comment 10 Antonio Murdaca 2023-06-28 09:42:15 UTC
Moving discussion upstream https://github.com/coreos/coreos-installer/issues/1225


Note You need to log in before you can comment on or make changes to this bug.