Description of problem: If we try to login into guest with ssh, it will always failed with following error: PTY allocation request failed on channel 0 Version-Release number of selected component (if applicable): rhel-atomic-rhevm-7.5.3-6.x86_64.rhevm.ova How reproducible: 100% Steps to Reproduce: 1. Boot up the rhev guest 2. Try to login the guest with ssh #ssh cloud-user@$guest_ip Actual results: Can not login into the guest Expected results: Login into the guest normally Additional info: If we connect to the guest with vnc and then disable selinux. The ssh will works.
OK, yeah I can reproduce this with the RHEVM image, but *not* the qcow2. Disabling SELinux and attempting to SSH makes the following denial show up: Aug 14 14:02:27 rhelah-rhbz1615837 kernel: type=1400 audit(1534255347.976:4): avc: denied { open } for pid=1358 comm="sshd" path="/dev/pts/ptmx" dev="devpts" ino=2 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:devpts_t:s0 tclass=chr_file Hmm, odd that it's specific to one of them. Need to check what else we change there other than the node agent.
Sorry, premature reassignment to selinux-policy (though input from SELinux folks is appreciated).
I unpacked the .ova into a .qcow2 and booted it on my local libvirt stack. When trying to SSH to the host, I saw the following: $ ssh rhel-atomic-rhevm-7.5.3-6.x86_64.rhevm.vm0814b Warning: Permanently added '192.168.124.83' (ECDSA) to the list of known hosts. PTY allocation request failed on channel 0 I searched Bugzilla and found this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1329326 If the 'ovirt-guest-agent' is mounting /dev into it's container, it's possible we could be encountering something like this?
Yeah, I also found this upstream issue which also corroborates this: https://github.com/openshift/openshift-ansible/issues/1779 Doing `systemctl disable rhevm-guest-agent.service` and rebooting fixes it. Sounds like it was fixed by https://github.com/moby/moby/pull/16639, though we might have regressed since then? Anyway, re-assigning to docker component.
Ahh right, this is a system container. There's a runc version of the bug: https://github.com/opencontainers/runc/issues/80 Which was fixed by this patch: https://github.com/opencontainers/runc/pull/742 Which does seem to be in runc-1.0.0-36.rc5.dev.gitad0f525.el7.x86_64.
Hmm, the config.json for that system container look suspect: ``` "mounts": [ ... { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] }, { "destination": "/dev/shm", "type": "tmpfs", "source": "shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=65536k" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, ... { "destination": "/dev", "type": "bind", "source": "/dev", "options": [ "rw", "bind" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] } ], ``` Why do we have two mounts for both /dev and /dev/pts? Anyway, dropping those last two mounts fixes the issue, though I'm not sure why the logic in `needsSetupDev()` didn't see that there was a bind mount to /dev and thus that `setupPtmx()` should've been skipped. We should also make sure to understand why that bind mount was added in the first place. Leaving this in the hands of the containers team now.
Tomas, do you know why the `config.json` has two mounts for /dev and /dev/pts for the `rhevm-guest-agent` system container?
The originating commit is about 1.5 yrs old: http://pkgs.devel.redhat.com/cgit/rpms/ovirt-guest-agent-docker/commit/?h=rhevm-4.3-rhel-7&id=0b6b551fb336fc071b194077db2eb977f43469c3 So it sounds like it could be a regression in runc on how it's now handled? (Though that `config.json` should still be cleaned up.)
Vinzenz, it looks like the config.json file for the ovirt-guest-agent node has some additional mounts that must be removed. Could you please take a look at it?
Tomas Golembiovsky, Vinzenz Feenstra: What is the status of this issue? It is blocking RHEL 7.5.3 Atomic Host (and therefore RHEL 7.5.3 GA).
I think we can skip respinning this RHEVM image for 7.5.3; let's definitely not block RHEL 7.5.3 GA on it.
I believe there is a regression in runc. See, https://github.com/opencontainers/runc/issues/1866 Try to install runc-1.0.0-27.rc5.dev.git4bb1fe4.el7 to confirm if that version should work. It is important to reboot the hosts before running and make sure that /dev/ptmx is a character file not NOT a symbolic link.
If there is a regression in runc there is not much we can do. IIRC we do need /dev in ovirt-guest-agent to extract information about host so a fix will probably not be as easy as comment #7 suggests. I'll have a look nevertheless.
Colin - we need to respin the RHV OVA image for 7.5.3. There is a high-touch kernel CVE that was just released that we need to fix in this OVA. I agree if everything else is ready for Atomic Host 7.5.3 GA we can not block on this (assuming PM agrees). But we do need to fix it ASAP for 7.5.3 because of the CVE.
Joy, the runc build with the fix is available at https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17871868 . PTAL.
As I understand it respin of the container is not necessary at the moment since the runc regression was fixed. I will leave the bug open to track the potential cleanup of our config.json
The most recent 7.5.3 AH compose includes a fixed version of `runc` which appears to solve this issue. I tested the same way as before, unpacking the RHEVM OVA and booting the QCOW2 locally. SSH login was successful. I'll defer to the Virt QE folks to mark this as VERIFIED, as they have a proper RHEV environment to test with.
Hi Micah, Currectly the RHEVM OVA is already fixed after the runc update. But as #c18 said, this bug now is using to track the config.json update. So will following and update the bug status after the config.json is update. (In reply to Micah Abbott from comment #19) > The most recent 7.5.3 AH compose includes a fixed version of `runc` which > appears to solve this issue. > > I tested the same way as before, unpacking the RHEVM OVA and booting the > QCOW2 locally. SSH login was successful. > > I'll defer to the Virt QE folks to mark this as VERIFIED, as they have a > proper RHEV environment to test with.
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both
The container image has been deprecated.