Bug 1913109 - oc debug of an init container no longer works
Summary: oc debug of an init container no longer works
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.z
Assignee: Clayton Coleman
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On: 1909289
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-06 01:45 UTC by OpenShift BugZilla Robot
Modified: 2021-02-08 13:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: InitContainer support was lost during changes to oc debug command. Consequence: It is not possible to debug init containers. Fix: Add support for init containers in oc debug command. Result: It is possible to oc debug init container.
Clone Of:
Environment:
Last Closed: 2021-02-08 13:51:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 686 0 None closed Bug 1913109: Should be able to debug an init container 2021-02-05 18:34:12 UTC
Red Hat Product Errata RHSA-2021:0308 0 None None None 2021-02-08 13:51:37 UTC

Description OpenShift BugZilla Robot 2021-01-06 01:45:43 UTC
+++ This bug was initially created as a clone of Bug #1909289 +++

Sometime in last few releases init containers stopped being debuggable with `oc debug pod/foo -c <init_container_name>`.  The root cause is the wait logic for the pod container to be running ignores init containers.

The fix is to adjust the wait logic to correctly read init containers. Also, I simplified and removed some logic that was subject to exiting early on errors that might be transient (for instance, the first image pull can fail and the second can succeed) and replaced those with warning messages.

May need a backport for 4.6.

--- Additional comment from nstielau on 2021-01-04 20:10:14 UTC ---

Sounds like this is a regression but not a new one.  Moving to blocker- to denote that we won't block the release on this.

Comment 1 zhou ying 2021-01-22 07:22:55 UTC
Tested with the oc build form the repo , can't produce the issue now:

Compared with older oc , the `oc debug -c init-container` will hang:
[root@dhcp-140-138 roottest]# oc debug po/openshift-kube-scheduler-ci-ln-jq80922-f76d1-xjrpn-master-0  -c wait-for-host-port
Starting pod/openshift-kube-scheduler-ci-ln-jq80922-f76d1-xjrpn-master-0-debug, command was: /usr/bin/timeout 30 /bin/bash -c echo -n "Waiting for port :10259 and :10251 to be released."
while [ -n "$(lsof -ni :10251)" -o -n "$(lsof -i :10259)" ]; do
  echo -n "."
  sleep 1
done



While the oc build from the repo works well:
[root@dhcp-140-138 roottest]# /root/oc debug po/openshift-kube-scheduler-ci-ln-jq80922-f76d1-xjrpn-master-0  -c wait-for-host-port
Starting pod/openshift-kube-scheduler-ci-ln-jq80922-f76d1-xjrpn-master-0-debug, command was: /usr/bin/timeout 30 /bin/bash -c echo -n "Waiting for port :10259 and :10251 to be released."
while [ -n "$(lsof -ni :10251)" -o -n "$(lsof -i :10259)" ]; do
  echo -n "."
  sleep 1
done

Pod IP: 10.0.0.5
If you don't see a command prompt, try pressing enter.
sh-4.4# 
sh-4.4# ls
bin  boot  dev	etc  home  lib	lib64  lost+found  media  mnt  opt  proc  root	run  sbin  srv	sys  tmp  usr  var
sh-4.4# ps ax 
    PID TTY      STAT   TIME COMMAND
      1 pts/0    Ss     0:00 /bin/sh
      9 pts/0    R+     0:00 ps ax
sh-4.4# exit
exit

Removing debug pod ...

Comment 4 RamaKasturi 2021-02-03 06:18:51 UTC
Verified bug with payload & oc version below and i do not see any hang.

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2021-01-30-211400]$ ./oc version -o yaml
clientVersion:
  buildDate: "2021-01-30T16:34:42Z"
  compiler: gc
  gitCommit: 18d7461aca47e77cefb355339252a8d4c149188f
  gitTreeState: clean
  gitVersion: 4.6.0-202101301510.p0-18d7461
  goVersion: go1.15.5
  major: ""
  minor: ""
  platform: linux/amd64
openshiftVersion: 4.6.0-0.nightly-2021-01-30-211400
releaseClientVersion: 4.6.0-0.nightly-2021-01-30-211400
serverVersion:
  buildDate: "2021-01-28T07:35:27Z"
  compiler: gc
  gitCommit: e49167aad6a08046be6ab21ff13029110c76951d
  gitTreeState: clean
  gitVersion: v1.19.0+e49167a
  goVersion: go1.15.5
  major: "1"
  minor: "19"
  platform: linux/amd64

Do not see any hang:
======================
[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2021-01-30-211400]$ ./oc debug po/openshift-kube-scheduler-xiuwang-sharegcp-gs6jh-m-0.c.openshift-qe.internal -c wait-for-host-port -n openshift-kube-scheduler
Starting pod/openshift-kube-scheduler-xiuwang-sharegcp-gs6jh-m-0copenshift-q-debug, command was: /usr/bin/timeout 30 /bin/bash -c echo -n "Waiting for port :10259 and :10251 to be released."
while [ -n "$(lsof -ni :10251)" -o -n "$(lsof -i :10259)" ]; do
  echo -n "."
  sleep 1
done

Pod IP: 10.0.0.7
If you don't see a command prompt, try pressing enter.

sh-4.4# ls
bin   dev  home  lib64	     media  opt   root	sbin  sys  usr
boot  etc  lib	 lost+found  mnt    proc  run	srv   tmp  var
sh-4.4# exit
exit

Removing debug pod ...


With the previous version of oc i see it hangs:
==================================================
[knarra@knarra openshift-client-linux-4.6.10]$ ./oc debug po/openshift-kube-scheduler-xiuwang-sharegcp-gs6jh-m-0.c.openshift-qe.internal -c wait-for-host-port -n openshift-kube-scheduler
Starting pod/openshift-kube-scheduler-xiuwang-sharegcp-gs6jh-m-0copenshift-q-debug, command was: /usr/bin/timeout 30 /bin/bash -c echo -n "Waiting for port :10259 and :10251 to be released."
while [ -n "$(lsof -ni :10251)" -o -n "$(lsof -i :10259)" ]; do
  echo -n "."
  sleep 1
done




^C
Removing debug pod ...

Based on the above moving bug to verified state.

Comment 6 errata-xmlrpc 2021-02-08 13:51:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0308


Note You need to log in before you can comment on or make changes to this bug.