2015119 – Getting error while using `oc debug -T node/NODENAME`

Bug 2015119 - Getting error while using `oc debug -T node/NODENAME`

Summary: Getting error while using `oc debug -T node/NODENAME`

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	oc
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.10.z
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-18 12:32 UTC by schugh
Modified:	2022-11-22 07:19 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-11-22 07:19:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift oc pull 1270	0	None	open	Bug 2015119: bump(k8s.io/kubectl) to pick up k/k#110764	2022-10-19 11:28:04 UTC
Red Hat Product Errata	RHBA-2022:8496	0	None	None	None	2022-11-22 07:19:48 UTC

Description schugh 2021-10-18 12:32:25 UTC

Description of problem:
- Observed some errors while use 'oc debug -T node/NODE_NAME` command in loop

Version-Release number of selected component (if applicable):
- Checked in 4.7 and 4.8

How reproducible:
- Random (Not always)

Steps to Reproduce:
- for i in {1..50}; do oc get nodes -o name | xargs -n 1 -i sh -c 'oc debug  -T {} -- chroot /host uptime';sleep 10; done

Actual results:
- Sometimes getting below errors on random nodes (Not on specific nodes):
[1] error: unable to upgrade connection: container container-00 not found in pod worker-2<LAB>-debug_<NS>
[2] error: Internal error occurred: error attaching to container: container is not created or running

Expected results:
- output of command, In this case `uptime` command

Comment 1 Maciej Szulik 2021-10-18 13:17:15 UTC

Can you provide more detailed output from those cases where this breaks?

Comment 20 Colum Gaynor 2022-10-01 11:40:57 UTC

@Maciej Szulik <maszulik>

See the original support case description of the issue and effect to the end customer ( Nokia NOM ) copy/pasted below:

What problem/issue/behavior are you having trouble with?  What do you expect to see?
We had a requirement to launch a pod, execute the curl command provided, print the result and exit(upon exit terminate the pod). 
So we have used “oc run” command for this purpose and my command looks like:
oc run -it --rm --image=image-registry.openshift-image-registry.svc:5000/cal-shared-product/nmcal-helper-utils:v1.0 nmcal-helper-utils-123 -n cal-shared-product --restart=Never -- /bin/sh -c "<CURL_COMMAND>"

It works, however frequently we are seeing an error message being printed during this operation though the command execution is completed successfully. 
The error message is(2 slight variants):

 Error attaching, falling back to logs: Internal error occurred: error attaching to container: container is not created or running
 Error attaching, falling back to logs: unable to upgrade connection: container nmcal-helper-utils-123 not found in pod nmcal-helper-utils-123_cal-shared-product

Expectation:
When it is able to perform the operation successfully why does it throws error? 
This will create issue for us while processing the result. <<<<<<<---------------------------------------------------  <<<CG: The Bug Creates issues for Nokia NOM's Automation Scripts>>>

Also I have attached the files which contains the logs(with log level 7 & 8) for both successful and failure scenarios.

What is the business impact? Please also provide timeframe information.
Even though the command execution is successful, due to this error present in the output our result processing will have issues <<<----- *

Colum Gaynor - Senior partner Success Manager, Global Account

Comment 28 Maciej Szulik 2022-10-19 11:18:04 UTC

I'm working on backports, PRs will be landing today.

Comment 30 Maciej Szulik 2022-10-19 11:28:05 UTC

As soon as https://github.com/openshift/oc/pull/1270 merges this should be available in 4.10

Comment 32 Colum Gaynor 2022-10-22 13:35:19 UTC

@Maciej Szulik <maszulik> ----> THANK YOU VERY MUCH. This made my week !

Colum Gaynor - Senior Partner Success Manager, Nokia Global Account

Comment 36 zhou ying 2022-10-27 04:24:05 UTC

with the merged pr , I still could reproduce this issue :

[root@localhost oc]# oc version  --client -oyaml 
clientVersion:
  buildDate: "2022-10-25T04:39:50Z"
  compiler: gc
  gitCommit: 8df677dc147fe8297d90c4757154469a931bdb90
  gitTreeState: clean
  gitVersion: 4.10.0-202210250416.p0.g8df677d.assembly.stream-8df677d
  goVersion: go1.17.12
  major: ""
  minor: ""
  platform: linux/amd64
releaseClientVersion: 4.10.39

[root@localhost oc]# git log
commit 8df677dc147fe8297d90c4757154469a931bdb90 (HEAD -> release-4.10, origin/release-4.10)
Merge: 442535c4d 39057a282
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Thu Oct 20 09:20:56 2022 -0400

    Merge pull request #1270 from soltysh/bug2015119
    
    Bug 2015119: bump(k8s.io/kubectl) to pick up k/k#110764

for i in {1..50}; do oc get nodes -o name | xargs -n 1 -i sh -c 'oc debug  -T {} -- chroot /host uptime';sleep 10; done
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
Starting pod/ip-10-0-131-116us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
 03:54:22 up  1:22,  0 users,  load average: 1.75, 1.76, 1.27

....
Removing debug pod ...
error: unable to upgrade connection: container container-00 not found in pod ip-10-0-203-69us-east-2computeinternal-debug_default
Starting pod/ip-10-0-219-219us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
 04:02:22 up  1:25,  0 users,  load average: 0.24, 0.29, 0.23

Removing debug pod ...
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
Starting pod/ip-10-0-131-116us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
 04:02:37 up  1:30,  0 users,  load average: 0.86, 1.11, 1.13


Removing debug pod ...
Starting pod/ip-10-0-150-56us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.150.56
If you don't see a command prompt, try pressing enter.

Removing debug pod ...
error: unable to upgrade connection: container container-00 not found in pod ip-10-0-150-56us-east-2computeinternal-debug_default
Starting pod/ip-10-0-174-131us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
 04:06:36 up  1:29,  0 users,  load average: 0.00, 0.03, 0.05

Removing debug pod ...
Starting pod/ip-10-0-190-1us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
 04:06:40 up  1:35,  0 users,  load average: 1.41, 1.13, 0.97

Comment 38 Maciej Szulik 2022-11-09 09:30:50 UTC

The fix in this bug was only to improve only error #2 from initial description, ie:

Error attaching, falling back to logs...

from an error to a warning.

The other error is correct and is explicitly pointing that we started creating the connection sooner than the container was available. 

Based on the above, moving back to qa.

Comment 40 zhou ying 2022-11-10 05:35:03 UTC

checked with:
oc version --client
Client Version: 4.10.41

Can't see 'Error attaching' again.

Comment 43 errata-xmlrpc 2022-11-22 07:19:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.42 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8496

Note You need to log in before you can comment on or make changes to this bug.