Description of problem: Subscription manager reports that it is running in container mode even though it is running on a baremetal host. "subscription-manager is disabled when running inside a container. Please refer to your host system for subscription management." It was working normally before attempting to deploy OSP 17 on RHEL 9 and the deployment failed because of other issues. "dnf clean all" and "rm -r /var/cache/dnf" were run after the deployment failed in an attempt to fix another issue. Version-Release number of selected component (if applicable): subscription-manager-1.29.21-1.el9.x86_64 Actual results: subscription-manager commands should work normally on a baremetal host. Expected results: subscription-manager commands on baremetal fail with error "subscription-manager is disabled when running inside a container. Please refer to your host system for subscription management." Additional info: A workaround for this is to run "export SMDEV_CONTAINER_OFF=False"
Hello, to detect whether the current system is a container, subscription-manager currently check for the presence of any of these files/directories: - /run/.containerenv - /.dockerenv - /etc/rhsm-host/ Does any of these exist on your system? If so, are you really sure it is not a container of some sort?
Yes, I am sure this is not a container host. It is a baremetal host. It seems that /run/.containerenv is present on the system. "(undercloud) [stack@e26-h19-740xd ~]$ ls -a /run/.containerenv /run/.containerenv"
What's the content of /run/.containerenv? How old is that file (output of `ls -l` please)? This file is supposed to exist only within containers, and it is created by podman. Again, are you really really sure it is not running inside a container? Alternative option: could it be that either some container currently running or some old container were started by bind-mounting the host /run to the container /run?
(undercloud) [stack@e26-h19-740xd ~]$ sudo ls -l /run/.containerenv -rwx------. 1 root root 0 Feb 25 06:25 /run/.containerenv (undercloud) [stack@e26-h19-740xd ~]$ sudo cat /run/.containerenv (undercloud) [stack@e26-h19-740xd ~]$ I am sure that it is not running inside a container. I checked Red Hat Openstack version 17, and Red Hat Openstack version 16.2, and both of the versions have the /run/.containerenv file on the undercloud host, which is not a container, but a baremetal host that is used to provision the overcloud. It should be noted that Openstack spawns multiple containers during undercloud deployment, so one of the containers could have been bind-mounted with the host /run to the container /run.
OK, and what does running `systemd-detect-virt` as root print?
[stack@e26-h19-740xd ~]$ sudo systemd-detect-virt podman [stack@e26-h19-740xd ~]$
(In reply to schari from comment #6) > [stack@e26-h19-740xd ~]$ sudo systemd-detect-virt > podman This shows how the current situation is confusing all the tools that do their own detection for a container environment. /run/.containerenv is basically a "standard" way to detect for podman, see e.g. https://github.com/containers/podman/issues/3586#issuecomment-661918679 For example, the fact that the systemd logic itself detects your baremetal installation as container means that certain units may be started or not (see the "AssertVirtualization" and "ConditionVirtualization" keys in unit files) differently than in a non-container environment. This is definitely not a subscription-manager issue, but rather a system misconfiguration. Please reassign this to the OpenStack team for investigation, as they need to determine what happened so that container-only files leaked on the host.
(In reply to Pino Toscano from comment #7) > Please reassign this to the OpenStack team for > investigation, as they need to determine what happened so that > container-only files leaked on the host. I asked Jiri Stransky (of the OpenStack team) about this earlier today, and he told me this is a wanted behaviour: https://github.com/openstack/tripleo-heat-templates/blob/3184c3471b42d8a3cb58ebfe3cd00f2f2ef355b4/deployment/nova/nova-libvirt-common.yaml#L148 Hence, reassigning to the (correct?) OpenStack component.
Just to confirm, bind mounting `/run:/run` is expected behavior throughout our container stack and used in a number of services, especially for those that interact with hardware. We'll need to look into how we can adjust the behavior the ansible role or figure out another way forward. [0] https://github.com/openstack/ansible-role-redhat-subscription/blob/master/tasks/register.yml#L2
I just want to confirm that we want to export SMDEV_CONTAINER_OFF=True on the host during subscription manager to have it bypass based on this [1]. I didn't test it but it appears to me that setting SMDEV_CONTAINER_OFF to false will not bypass the check, unless I'm missing something here. At least, this is how I built the role variable [2] and the THT change [3]. Can you please confirm? [1] https://github.com/candlepin/subscription-manager/pull/2652/files#diff-730d30b2294a82cb6496d2efa0fad270f4818a35433ff8fd49606a04964b9c2e [2] https://review.opendev.org/c/openstack/ansible-role-redhat-subscription/+/831810/ [3] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/831813
The approach taken in the patches looks good as a stop gap solution targeted specifically at RHSM, but it may not be sufficient over the long term, as other processes on the host may also be using /run/.containerenv to detect whether they're running in a container, and break in a similar way. In the longer term i think we'll have to either: (a) Stop mounting /run and mount some /run/... subdirs instead. This affects many containers though, and i'm not sure if we'd be able to always provide a static list of /run/... subdirs. Also, we couldn't mount any subdirs which may be created only *after* the container starts. So i'm not sure how achievable this option is. (b) Collab with podman engineers on how we can keep mounting /run without causing the "export" of /run/.containerenv onto the host. E.g. allow us to skip creation of .containerenv for our containers somehow, or podman could change the way how .containerenv is created. Instead of simply writing a file to /run/ dir in the container filesystem, podman could perhaps create a separate bind mount for the .containerenv file. That would then affect only the container, and not the host's /run dir.
So far, I'm not sure of anything else broken by this behavior. If podman can prevent this file from being created on the host that would be the best solution but I don't know how feasible this is without changing the path and possibly moving it to / just like .dockerenv. I'm not sure we want to take the path of cherrypicking things under /run without being sure something is broken as it might break other things.
Hi dvd, Sorry I didn't notice the needinfo. I made a mistake while reporting the bug originally. Yes I meant to say SMDEV_CONTAINER_OFF=True will bypass the check, not SMDEV_CONTAINER_OFF=False.
Hello there, Some more additions: - deploying OSP-17 on el9 - using our QE Satellite host instead of Red Hat CDN As soon as I get a container running with the /run:/run mount, dnf switches the sourcelist to the CDN in the redhat.repo, because "Subscription Manager is operating in container mode." This happens with any (clean, makecache, install, anything!!) dnf command, and ends in crashing the OSP deploy. After some more testing, removing all containers and cleaning the /run/.containerenv stops this unwanted behavior. After even some more poking, it seems we have an enabled plugin in dnf: libdnf-plugin-subscription-manager-1.29.26-3.el9_0.x86_64. Disabling it seems to stop updating the redhat.repo file as well. I'm wondering if we're not hitting a bug here, since subscription-manager should really check its configuration in /etc/rhsm/rhsm.conf instead of blindly overriding things based on some (wrong) assumption. WDYT? Cheers, C.
Hello Cédric, (In reply to Cédric Jeanneret from comment #22) > I'm wondering if we're not hitting a bug here, Yes, the bug is the mounting of the host /run in containers, causing container-only files created in /run to appear for the host. This is *not* a problem of subscription-manager only: see also comment 7 for another example of tool that does not behave as supposed to be on a non-container environment. See also comment 12; I recommend going through all the comments/status of this bz.
Hello Pino, imho, subscription-manager shouldn't touch the repository list if it thinks it's in a container. Especially when there's a valid /etc/rhsm/rhsm.conf file available...... So, OK, bind-mounting /run may be bad, but we won't be able to change that fast enough to prevent issues in osp-17. And there are other considerations, such as "what happens if we start a container with /etc/yum.repos.d and /etc/dnf bind-mounted in it, and try to install a package ?" [on a permissive system - since selinux would prevent any override of the repo list]... So, yes: imho, it's an actual issue in subscription-manager at this point. To a point I've opened a dedicated BZ with the full context, explanations and consequences of the source list editions within a container: https://bugzilla.redhat.com/show_bug.cgi?id=2095316 OSP is affected badly, but we won't be the only ones I bet. Cheers, C.
Stopping mounting /run and mount some /run/... subdirs is unlikely a way forward for OSP. I suggest to focus on the podman options to not use /run for storing the .containerenv file mark.
(In reply to Cédric Jeanneret from comment #24) > imho, subscription-manager shouldn't touch the repository list if it thinks > it's in a container. This is exactly what subscription-manager does since RHEL 9.0. The situation described in this bug is exactly the opposite of what you describe: - the OSP bits run the various containers bind-mounting the host /run in the containers - the container runtime (podman, however docker does something similar) creates a /run/.containerenv file to let the bits running in the container about it - since the container /run is actually the host /run, /run/.containerenv exists also on the host - subscription-manager on the host thus detects the system is a container, and disables itself by default The workaround I've seen so far was exporting SMDEV_CONTAINER_OFF=True (which is the "internal" environment variable to enable subscription-manager in containers) so that subscription-manager works. > And there are > other considerations, such as "what happens if we start a container with > /etc/yum.repos.d and /etc/dnf bind-mounted in it, and try to install a > package ?" You are already hurting yourself with the bind-mount of the host /run in containers, so... don't hurt yourself badly even more :) > So, yes: imho, it's an actual issue in subscription-manager at this point. I still disagree with that: subscription-manager by default *disables* itself in containers; because of this situation, my wild guess is that the SMDEV_CONTAINER_OFF=True workaround applied is applied too broadly. > OSP is affected badly, but we won't be the only ones I bet. So far you were the only one, at least to our (subscription-manager) knowledge. And to be fair, people have actually asked a way to explicitly register containers as systems, mostly in internal development/testing.
(tried to answer by mail, apparently it's a "nope" - or it may be duplicated later...) (In reply to Pino Toscano from comment #26) > (In reply to Cédric Jeanneret from comment #24) > > imho, subscription-manager shouldn't touch the repository list if it thinks > > it's in a container. > > This is exactly what subscription-manager does since RHEL 9.0. Well, it shouldn't? This change breaks things... > > The situation described in this bug is exactly the opposite of what you > describe: > - the OSP bits run the various containers bind-mounting the host /run in the > containers > - the container runtime (podman, however docker does something similar) > creates a /run/.containerenv file to let the bits running in the container > about it > - since the container /run is actually the host /run, /run/.containerenv > exists also on the host > - subscription-manager on the host thus detects the system is a container, > and disables itself by default errr... nope, that's the same. We're just adding a satellite registration, and subscription-manager breaks everything by switching back the source list to the CDN! > > The workaround I've seen so far was exporting SMDEV_CONTAINER_OFF=True > (which is the "internal" environment variable to enable subscription-manager > in containers) so that subscription-manager works. > > > And there are > > other considerations, such as "what happens if we start a container with > > /etc/yum.repos.d and /etc/dnf bind-mounted in it, and try to install a > > package ?" > > You are already hurting yourself with the bind-mount of the host /run in > containers, so... don't hurt yourself badly even more :) Well, how would you install package inside a container when you intent to (quickly) build an image, then? I mean, yes there's buildah, but a fast way to get an image is to actually run a container, do some commands, and export it... > > > So, yes: imho, it's an actual issue in subscription-manager at this point. > > I still disagree with that: subscription-manager by default *disables* > itself in containers; > because of this situation, my wild guess is that the > SMDEV_CONTAINER_OFF=True workaround applied is applied too broadly. > > > OSP is affected badly, but we won't be the only ones I bet. > > So far you were the only one, at least to our (subscription-manager) > knowledge. > And to be fair, people have actually asked a way to explicitly register > containers as systems, mostly in internal development/testing. Well, afaik el9 isn't GA yet, is it?
> > You are already hurting yourself with the bind-mount of the host /run in > containers, so... don't hurt yourself badly even more :) > Docker placed that mark file under / instead of /run for a reason. Some "legacy" (in the containers world) services still using a flat /run/foo.{pid,sock,whatnot) model, where things get created/removed dynamically in the shared /run space instead of a dedicated /run/foo space. And that doesn't play well when we containerize other things that depend on such services and/or communicate to it from containers. In the containers world we expect that to be /run/foo/pidfile,socket,etc instead. That also resembles the situation with the design choices made for isolation of /dev/ptmx from containers, because that is a host-only concept and it does not "instantiate" with containers.
...Meant to pinpoint issues with /dev/ptmx and /dev bind mounts in containers. Who could suggest taht placing drives into the dev root will show up as a sub-optimal in the containers world design choice. But we have to live with it now, and have inevitable /run and /dev in containers as well.
I reported a Podman issue: https://github.com/containers/podman/issues/14577 I think it would be best if the .containerenv file is bind mounted into the container, rather than written out into the container filesystem. If done via bind mount, only the container should be affected Alternatively Podman could allow us to disable containerenv file creation by something like `--no-containerenv` option on `podman run` and `podman create`. Let's see what Podman devs think about this.
Thanks Jiri ! I've added some more data - imho the "--no-containerenv" is probably the best for both podman and us, since it's a low-impact code change, and no impact with the current, default way. More over, after reading some more info about that file, we may face some not-so-nice issues anyway, since its content depends on the way the container is started; since we share /run everywhere, its content would be the one of the latest started container (I think?)... Which may be a bad thing. Anyway. Let's follow things. Would be good to also: - create a proper BZ targeting el9 - link that github issue in it - mention "we're from OSP" - we have a kind of "support agreement" related to backports, though it may not be useful anymore with osp-17 on el9 (not sure if we're taking podman from el9, or still filtering via some specific repo/stream). Would you mind taking care of that, since you're at it? :) Cheers, C.
There is now a fix in upstream Podman, i reported downstream bug 2097694 for a backport into RHEL 9. Can someone who reproduced the issue please provide version details (RHEL version, podman version) there in a comment?
Lemme have a look, I can take my QE job data I guess, and push those info to the related BZ.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543