Hide Forgot
Description of problem: - Observed some errors while use 'oc debug -T node/NODE_NAME` command in loop Version-Release number of selected component (if applicable): - Checked in 4.7 and 4.8 How reproducible: - Random (Not always) Steps to Reproduce: - for i in {1..50}; do oc get nodes -o name | xargs -n 1 -i sh -c 'oc debug -T {} -- chroot /host uptime';sleep 10; done Actual results: - Sometimes getting below errors on random nodes (Not on specific nodes): [1] error: unable to upgrade connection: container container-00 not found in pod worker-2<LAB>-debug_<NS> [2] error: Internal error occurred: error attaching to container: container is not created or running Expected results: - output of command, In this case `uptime` command
Can you provide more detailed output from those cases where this breaks?
@Maciej Szulik <maszulik> See the original support case description of the issue and effect to the end customer ( Nokia NOM ) copy/pasted below: What problem/issue/behavior are you having trouble with? What do you expect to see? We had a requirement to launch a pod, execute the curl command provided, print the result and exit(upon exit terminate the pod). So we have used “oc run” command for this purpose and my command looks like: oc run -it --rm --image=image-registry.openshift-image-registry.svc:5000/cal-shared-product/nmcal-helper-utils:v1.0 nmcal-helper-utils-123 -n cal-shared-product --restart=Never -- /bin/sh -c "<CURL_COMMAND>" It works, however frequently we are seeing an error message being printed during this operation though the command execution is completed successfully. The error message is(2 slight variants): Error attaching, falling back to logs: Internal error occurred: error attaching to container: container is not created or running Error attaching, falling back to logs: unable to upgrade connection: container nmcal-helper-utils-123 not found in pod nmcal-helper-utils-123_cal-shared-product Expectation: When it is able to perform the operation successfully why does it throws error? This will create issue for us while processing the result. <<<<<<<--------------------------------------------------- <<<CG: The Bug Creates issues for Nokia NOM's Automation Scripts>>> Also I have attached the files which contains the logs(with log level 7 & 8) for both successful and failure scenarios. What is the business impact? Please also provide timeframe information. Even though the command execution is successful, due to this error present in the output our result processing will have issues <<<----- * Colum Gaynor - Senior partner Success Manager, Global Account
I'm working on backports, PRs will be landing today.
As soon as https://github.com/openshift/oc/pull/1270 merges this should be available in 4.10
@Maciej Szulik <maszulik> ----> THANK YOU VERY MUCH. This made my week ! Colum Gaynor - Senior Partner Success Manager, Nokia Global Account
with the merged pr , I still could reproduce this issue : [root@localhost oc]# oc version --client -oyaml clientVersion: buildDate: "2022-10-25T04:39:50Z" compiler: gc gitCommit: 8df677dc147fe8297d90c4757154469a931bdb90 gitTreeState: clean gitVersion: 4.10.0-202210250416.p0.g8df677d.assembly.stream-8df677d goVersion: go1.17.12 major: "" minor: "" platform: linux/amd64 releaseClientVersion: 4.10.39 [root@localhost oc]# git log commit 8df677dc147fe8297d90c4757154469a931bdb90 (HEAD -> release-4.10, origin/release-4.10) Merge: 442535c4d 39057a282 Author: OpenShift Merge Robot <openshift-merge-robot.github.com> Date: Thu Oct 20 09:20:56 2022 -0400 Merge pull request #1270 from soltysh/bug2015119 Bug 2015119: bump(k8s.io/kubectl) to pick up k/k#110764 for i in {1..50}; do oc get nodes -o name | xargs -n 1 -i sh -c 'oc debug -T {} -- chroot /host uptime';sleep 10; done xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value Starting pod/ip-10-0-131-116us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` 03:54:22 up 1:22, 0 users, load average: 1.75, 1.76, 1.27 .... Removing debug pod ... error: unable to upgrade connection: container container-00 not found in pod ip-10-0-203-69us-east-2computeinternal-debug_default Starting pod/ip-10-0-219-219us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` 04:02:22 up 1:25, 0 users, load average: 0.24, 0.29, 0.23 Removing debug pod ... xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value Starting pod/ip-10-0-131-116us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` 04:02:37 up 1:30, 0 users, load average: 0.86, 1.11, 1.13 Removing debug pod ... Starting pod/ip-10-0-150-56us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.150.56 If you don't see a command prompt, try pressing enter. Removing debug pod ... error: unable to upgrade connection: container container-00 not found in pod ip-10-0-150-56us-east-2computeinternal-debug_default Starting pod/ip-10-0-174-131us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` 04:06:36 up 1:29, 0 users, load average: 0.00, 0.03, 0.05 Removing debug pod ... Starting pod/ip-10-0-190-1us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` 04:06:40 up 1:35, 0 users, load average: 1.41, 1.13, 0.97
The fix in this bug was only to improve only error #2 from initial description, ie: Error attaching, falling back to logs... from an error to a warning. The other error is correct and is explicitly pointing that we started creating the connection sooner than the container was available. Based on the above, moving back to qa.
checked with: oc version --client Client Version: 4.10.41 Can't see 'Error attaching' again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.42 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8496