Description of problem: Via the web console after connecting to the terminal if I jump over to the logs tab or jump out and come back to the terminal tab it shows as the terminal is closed. I click reconnect and it does not reconnect. I have tried this with different pods and I get the same result regardless. Version-Release number of selected component (if applicable): OCP 4.8 build 4.8.0-0.nightly-s390x-2021-03-22-155743 How reproducible: I kicked created a deployment to run stress-ng with an io workload pods-pod details - Terminal After the container is created I check the Logs tab Then I use the Terminal tab so I can run iostat I go out of the terminal view either by going to another project or to go back and check on logs. I then go back to the terminal tab and that is when the failure occurs Steps to Reproduce: 1. 2. 3. Actual results: Terminal disconnects and can no longer connect Expected results: Should be able to switch back to terminal and be connected Additional info: Error Message when i try to go back to the terminal tab ERRO[0000] exec failed: container_linux.go:367: starting container process caused: open /dev/pts/4294967296: no such file or directory command terminated with non-zero exit code: exit status 1The terminal connection has closed. connecting to openshift-apiserver My Workload same output ERRO[0000] exec failed: container_linux.go:367: starting container process caused: open /dev/pts/4294967296: no such file or directory command terminated with non-zero exit code: exit status 1The terminal connection has closed. connecting to iomixstress1 I am not sure which logs to provide so please let me know what additional info you will need.
Johanna could you please attach at least screen shot of the error, cause I'm not sure if that error is in console or is printed in terminal. Checked for the `container_linux.go` in our codebase and there is no such file, so I have a feeling that the error is done the k8s or openshift or by the pod itself.
Created attachment 1765518 [details] screen shot of broken terminal
Created attachment 1765708 [details] Updated Screen shot including worker node This screen shot includes the message when trying to connect to a worker node terminal. For this one it did not work from first try.
Created attachment 1765878 [details] screen recording
I tried on 4.8.0-0.nightly-2021-03-22-104536, re-visiting pod terminal works for me, see the screen recording attached
Thanks Yadan I tried the same pod on my KVM environment which is running build 4.8.0-0.nightly-s390x-2021-03-22-155743 which is newer so I would assume same results. Unfortunately, I hit the issue as I have been. I am running on a z15 not sure if that matters what are you running on. Are there any specific logs I can take a look at to see why the terminal keeps breaking for me? Thank you for your help.
I hit the same issue on a z14 with z/VM and also version 4.8.0-0.nightly-s390x-2021-03-22-155743. I was able to access the terminal once, after the creation of the pod. After some time, I got the error as mentioned above.
Same result with oc rsh or oc exec: Directly after POD deployment: [root@m3558001 4_8]# oc create -f 4_8_LSO_POD_worker001_MP.yaml pod/so-test02 created [root@m3558001 4_8]# oc rsh so-test02 sh-4.2# After a couple of minutes, the session is terminated and I am not able to login again: [root@m3558001 4_8]# oc rsh so-test02 ERRO[0000] exec failed: container_linux.go:367: starting container process caused: open /dev/pts/4294967296: no such file or directory command terminated with exit code 1
It seems a general pod issue, I would like to check pod status with several command and confirm why the container is stopped * oc describe pod so-test02 * oc logs -f so-test02
[root@m3558001 4_8]# oc describe pod so-test02 Name: so-test02 Namespace: default Priority: 0 Node: worker-001.m3558001.lnxne.boe/10.107.1.56 Start Time: Fri, 26 Mar 2021 10:30:50 +0100 Labels: <none> Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.128.2.36" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.128.2.36" ], "default": true, "dns": {} }] Status: Running IP: 10.128.2.36 IPs: IP: 10.128.2.36 Containers: solsotest02: Container ID: cri-o://6bd322324327be641668ae7c8e0a89ce0eae0dd5ad6437fbce98bd0962129b12 Image: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0 Image ID: sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image@sha256:62403430158da217a0056ac0b4f8dad258d06fba132bc59a0f96aaebbff69106 Port: <none> Host Port: <none> State: Running Started: Fri, 26 Mar 2021 10:30:53 +0100 Ready: True Restart Count: 0 Environment: <none> Mounts: /lsodata from localpvcso2 (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-dlhjm (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: localpvcso2: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: localstorage-mp ReadOnly: false default-token-dlhjm: Type: Secret (a volume populated by a Secret) SecretName: default-token-dlhjm Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 17m default-scheduler Successfully assigned default/so-test02 to worker-001.m3558001.lnxne.boe Normal AddedInterface 17m multus Add eth0 [10.128.2.36/23] Normal Pulled 17m kubelet Container image "sys-loz-test-team-docker-local.artifactory.swg-devops.com/s390x_blank_base_image:3.0" already present on machine Normal Created 17m kubelet Created container solsotest02 Normal Started 17m kubelet Started container solsotest02 No logs available for the pod.
Moving to Node team since this looks not as a Console issue, but rather a general one.
Raise the severity of the bug since it is a base function currently not working.
this looks suspiciously like https://github.com/moby/moby/issues/36467 which was supposed to be fixed in https://github.com/opencontainers/runc/pull/1727/ Kir can you PTAL? it looks like a runc regression
Hi Kir and node team, this bug is blocking a base function as discovered by the multi-arch s390x team, and therefore, can we consider labeling the bug as a "Blocker+" flag?
Hi this is also blocking workload development on 4.8. I had to go back to 4.7 to workaround the issue. Anything we can do to move this one along would be great! Thank You
It is also blocking me to create a pod and do some testing with block storage and other stuff like iSCSI which I have to check inside the pod.
> ERRO[0000] exec failed: container_linux.go:367: starting container process caused: open /dev/pts/4294967296: no such file or directory The number 4294967296 is -1. It seems that someone passes -1 to runc, and instead of treating it as an error, it fails to open it. So the bug is in an upper level (but I will take a look at how runc interprets it -- maybe it need to provide a more sensible error).
> The number 4294967296 is -1. I was wrong, it is 1 shifted 32 bytes to the right. (gdb) p /x 4294967296 $1 = 0x100000000 (gdb) p 1ULL << 32 $2 = 4294967296 Peter's analysis is correct. This is a regression in containerd/console, which was once fixed by https://github.com/containerd/console/pull/20 and then broken again (most probably by https://github.com/containerd/console/commit/f1b333f2c5050f2c71fcf782caa0b7ccb540bfcb). Proposed fix: https://github.com/containerd/console/pull/51
Filed runc issue: https://github.com/opencontainers/runc/issues/2896 Hope we'll be able to fix this in time for rc94.
@jhusta is there a way for me to get access to s390 environment? I'd like to run some tests. No need to have anything installed, just bare Linux is fine.
Maybe reach out to Red Hat support's Prashanth Sundararaman and see if they have RedHat internal s390x resources you can have access to?
Cc'ing Prashanth from multi-arch for awareness per Comment 21 and Comment 20
Cc'ing Doug Slavens from our multi-arch team for availability of s390x environment for Kir
Was able to test Kir's fix on my s390x env with the go tests and it works fine.
This is fixed by https://github.com/opencontainers/runc/pull/2898 which is merged and the fix will be available in runc rc94. I have backported this to rhaos-4.8 branch: https://github.com/projectatomic/runc/pull/46
I tried it on: Client Version: 4.8.0-0.nightly-s390x-2021-04-19-052408 Server Version: 4.8.0-0.nightly-s390x-2021-04-19-052408 Kubernetes Version: v1.21.0-rc.0+2993be8 [core@worker-01 ~]$ cat /etc/os-release NAME="Red Hat Enterprise Linux CoreOS" VERSION="48.84.202104171019-0" and it looks fine for me. I "exit" the console and it reconnects directly. Also jump off and on of the console tab works fine.
Server Version: 4.8.0-0.nightly-s390x-2021-04-21-170513 Kubernetes Version: v1.21.0-rc.0+3ced7a9 I tested from cli and console and the terminal is now working with no issues. Thank you
verified on version: 4.8.0-0.nightly-2021-04-22-225832
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438