Bug 1615837 - config.json in ovirt-guest-agent container containes duplicate mount points
Summary: config.json in ovirt-guest-agent container containes duplicate mount points
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-guest-agent
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Tomáš Golembiovský
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-14 11:18 UTC by Joy Pu
Modified: 2022-03-16 08:19 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1616112 (view as bug list)
Environment:
Last Closed: 2021-05-11 10:04:13 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Joy Pu 2018-08-14 11:18:24 UTC
Description of problem:

If we try to login into guest with ssh, it will always failed with following error:
PTY allocation request failed on channel 0


Version-Release number of selected component (if applicable):
rhel-atomic-rhevm-7.5.3-6.x86_64.rhevm.ova 

How reproducible:
100%


Steps to Reproduce:
1. Boot up the rhev guest
2. Try to login the guest with ssh
#ssh cloud-user@$guest_ip


Actual results:
Can not login into the guest 

Expected results:
Login into the guest normally


Additional info:

If we connect to the guest with vnc and then disable selinux. The ssh will works.

Comment 2 Jonathan Lebon 2018-08-14 14:11:30 UTC
OK, yeah I can reproduce this with the RHEVM image, but *not* the qcow2. Disabling SELinux and attempting to SSH makes the following denial show up:

Aug 14 14:02:27 rhelah-rhbz1615837 kernel: type=1400 audit(1534255347.976:4): avc:  denied  { open } for  pid=1358 comm="sshd" path="/dev/pts/ptmx" dev="devpts" ino=2 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file

Hmm, odd that it's specific to one of them. Need to check what else we change there other than the node agent.

Comment 3 Jonathan Lebon 2018-08-14 14:12:43 UTC
Sorry, premature reassignment to selinux-policy (though input from SELinux folks is appreciated).

Comment 4 Micah Abbott 2018-08-14 14:16:50 UTC
I unpacked the .ova into a .qcow2 and booted it on my local libvirt stack.

When trying to SSH to the host, I saw the following:

$ ssh rhel-atomic-rhevm-7.5.3-6.x86_64.rhevm.vm0814b
Warning: Permanently added '192.168.124.83' (ECDSA) to the list of known hosts.
PTY allocation request failed on channel 0


I searched Bugzilla and found this BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1329326


If the 'ovirt-guest-agent' is mounting /dev into it's container, it's possible we could be encountering something like this?

Comment 5 Jonathan Lebon 2018-08-14 14:24:58 UTC
Yeah, I also found this upstream issue which also corroborates this:

https://github.com/openshift/openshift-ansible/issues/1779

Doing `systemctl disable rhevm-guest-agent.service` and rebooting fixes it.

Sounds like it was fixed by https://github.com/moby/moby/pull/16639, though we might have regressed since then? Anyway, re-assigning to docker component.

Comment 6 Jonathan Lebon 2018-08-14 14:35:31 UTC
Ahh right, this is a system container. There's a runc version of the bug:
https://github.com/opencontainers/runc/issues/80

Which was fixed by this patch:
https://github.com/opencontainers/runc/pull/742

Which does seem to be in runc-1.0.0-36.rc5.dev.gitad0f525.el7.x86_64.

Comment 7 Jonathan Lebon 2018-08-14 15:30:40 UTC
Hmm, the config.json for that system container look suspect:

```
"mounts": [
    ...
    {
        "destination": "/dev",
        "type": "tmpfs",
        "source": "tmpfs",
        "options": [
            "nosuid",
            "strictatime",
            "mode=755",
            "size=65536k"
        ]
    },
    {
        "destination": "/dev/pts",
        "type": "devpts",
        "source": "devpts",
        "options": [
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
        ]
    },
    {
        "destination": "/dev/shm",
        "type": "tmpfs",
        "source": "shm",
        "options": [
            "nosuid",
            "noexec",
            "nodev",
            "mode=1777",
            "size=65536k"
        ]
    },
    {
        "destination": "/dev/mqueue",
        "type": "mqueue",
        "source": "mqueue",
        "options": [
            "nosuid",
            "noexec",
            "nodev"
        ]
    },
    ...
    {
        "destination": "/dev",
        "type": "bind",
        "source": "/dev",
        "options": [
            "rw",
            "bind"
        ]
    },
    {
        "destination": "/dev/pts",
        "type": "devpts",
        "source": "devpts",
        "options": [
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
        ]
    }
],
```

Why do we have two mounts for both /dev and /dev/pts? Anyway, dropping those last two mounts fixes the issue, though I'm not sure why the logic in `needsSetupDev()` didn't see that there was a bind mount to /dev and thus that `setupPtmx()` should've been skipped. We should also make sure to understand why that bind mount was added in the first place.

Leaving this in the hands of the containers team now.

Comment 8 Micah Abbott 2018-08-14 15:44:53 UTC
Tomas, do you know why the `config.json` has two mounts for /dev and /dev/pts for the `rhevm-guest-agent` system container?

Comment 9 Jonathan Lebon 2018-08-14 15:50:28 UTC
The originating commit is about 1.5 yrs old: http://pkgs.devel.redhat.com/cgit/rpms/ovirt-guest-agent-docker/commit/?h=rhevm-4.3-rhel-7&id=0b6b551fb336fc071b194077db2eb977f43469c3

So it sounds like it could be a regression in runc on how it's now handled? (Though that `config.json` should still be cleaned up.)

Comment 10 Giuseppe Scrivano 2018-08-14 16:16:29 UTC
Vinzenz, it looks like the config.json file for the ovirt-guest-agent node has some additional mounts that must be removed.  Could you please take a look at it?

Comment 12 Laurie Friedman 2018-08-14 19:45:13 UTC
Tomas Golembiovsky, Vinzenz Feenstra: What is the status of this issue?  It is blocking RHEL 7.5.3 Atomic Host (and therefore RHEL 7.5.3 GA).

Comment 13 Colin Walters 2018-08-14 19:55:03 UTC
I think we can skip respinning this RHEVM image for 7.5.3; let's definitely not block RHEL 7.5.3 GA on it.

Comment 14 Qian Cai 2018-08-14 20:50:39 UTC
I believe there is a regression in runc. See,

https://github.com/opencontainers/runc/issues/1866

Try to install

runc-1.0.0-27.rc5.dev.git4bb1fe4.el7

to confirm if that version should work. It is important to reboot the hosts before running and make sure that /dev/ptmx is a character file not NOT a symbolic link.

Comment 15 Tomáš Golembiovský 2018-08-14 21:23:15 UTC
If there is a regression in runc there is not much we can do. IIRC we do need /dev in ovirt-guest-agent to extract information about host so a fix will probably not be as easy as comment #7 suggests. I'll have a look nevertheless.

Comment 16 Laurie Friedman 2018-08-14 21:25:55 UTC
Colin - we need to respin the RHV OVA image for 7.5.3.  There is a high-touch kernel CVE that was just released that we need to fix in this OVA.
I agree if everything else is ready for Atomic Host 7.5.3 GA we can not block on this (assuming PM agrees).  But we do need to fix it ASAP for 7.5.3 because of the CVE.

Comment 17 Lokesh Mandvekar 2018-08-15 01:20:40 UTC
Joy, the runc build with the fix is available at https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17871868 . PTAL.

Comment 18 Tomáš Golembiovský 2018-08-15 08:01:09 UTC
As I understand it respin of the container is not necessary at the moment since the runc regression was fixed. I will leave the bug open to track the potential cleanup of our config.json

Comment 19 Micah Abbott 2018-08-15 15:37:42 UTC
The most recent 7.5.3 AH compose includes a fixed version of `runc` which appears to solve this issue.

I tested the same way as before, unpacking the RHEVM OVA and booting the QCOW2 locally.  SSH login was successful.

I'll defer to the Virt QE folks to mark this as VERIFIED, as they have a proper RHEV environment to test with.

Comment 20 Joy Pu 2018-08-27 02:09:26 UTC
Hi Micah,

Currectly the RHEVM OVA is already fixed after the runc update. But as #c18 said, this bug now is using to track the config.json update. So will following and update the bug status after the config.json is update.

(In reply to Micah Abbott from comment #19)
> The most recent 7.5.3 AH compose includes a fixed version of `runc` which
> appears to solve this issue.
> 
> I tested the same way as before, unpacking the RHEVM OVA and booting the
> QCOW2 locally.  SSH login was successful.
> 
> I'll defer to the Virt QE folks to mark this as VERIFIED, as they have a
> proper RHEV environment to test with.

Comment 22 Ryan Barry 2019-01-21 14:53:41 UTC
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both

Comment 27 Tomáš Golembiovský 2021-05-11 10:04:13 UTC
The container image has been deprecated.


Note You need to log in before you can comment on or make changes to this bug.