Bug 1990014 - oc debug <pod-name> does not work for Windows pods
Summary: oc debug <pod-name> does not work for Windows pods
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.9
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.10.0
Assignee: Ross Peoples
QA Contact: zhou ying
Depends On:
TreeView+ depends on / blocked
Reported: 2021-08-04 14:54 UTC by Mansi Kulkarni
Modified: 2022-03-11 18:15 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: oc debug previously assumed it was always targeting Linux-based containers by trying to run a Bash shell. Consequence: Attempting to debug a Windows container if Bash was not present in the container. Fix: oc debug now uses pod selectors to determine the OS of the containers, and tries to run the cmd.exe shell for Windows containers. Result: oc debug now works on both Linux and Windows-based containers.
Clone Of:
Last Closed: 2022-03-11 18:15:11 UTC
Target Upstream Version:
mohashai: needinfo+

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift oc pull 1028 0 None open Bug 1990014: Throw error when debugging Windows nodes 2022-01-18 16:44:28 UTC
Github openshift oc pull 907 0 None None None 2021-08-23 18:47:11 UTC

Description Mansi Kulkarni 2021-08-04 14:54:49 UTC
Description of problem:

oc debug <pod-name> does not work for Windows pods. 
The error thrown is: 
$ oc debug win-webserver-79878b949c-5hqw4

Starting pod/win-webserver-79878b949c-5hqw4-debug, command was: powershell.exe -command $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Red Hat OpenShift + Windows Container Workloads</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
failed to try resolving symlinks in path "\\var\\log\\pods\\default_win-webserver-79878b949c-5hqw4-debug_6db9ac83-3a9f-428e-9234-605b3d36927d\\windowswebserver\\0.log": CreateFile \var\log\pods\default_win-webserver-79878b949c-5hqw4-debug_6db9ac83-3a9f-428e-9234-605b3d36927d\windowswebserver\0.log: The system cannot find the file specified.
Removing debug pod ...

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. A Windows workload win-webserver is created on a Windows node brought up using a sample service and deployment: https://docs.openshift.com/container-platform/4.8/windows_containers/scheduling-windows-workloads.html#sample-windows-workload-deployment_scheduling-windows-workloads

2. oc debug win-webserver-79878b949c-5hqw4 is executed

Actual results:
The command errors out and removes the debug pod.

Expected results:
The command lands into a debug pod without errors.

Additional info:
After some digging around, it turns out the main issue is the default debug  container command being: /bin/sh which should be cmd for Windows.
what oc exec is doing for Windows(https://github.com/openshift/console/blob/8c7a7e60edb4722d4a5069030025bcc238dba714/frontend/public/components/pod-exec.jsx#L60) could be the way forward.
Another issue that's being highlighted from the error and needs fix is when the debug command errors out getLogs() is called, and the way it is being called is not compatible with Windows log collection.

Tested this on 4.9, pretty sure this would need to be backported to other versions as well.

Comment 1 Mohammad Saif Shaikh 2021-08-09 16:58:46 UTC
@maszulik Why has this been passed back to the Windows Container team? This is an issue with how oc handles the debug command for Windows and has nothing to with the Windows Machine Config Operator which is what the WinC team owns.

Comment 2 Mansi Kulkarni 2021-08-11 14:25:49 UTC
After the discussion with workloads team passing this bug back to the team.

Points to be considered for fixing this bug:

Windows pod spec is expected to have set the tolerations for host:
    kubernetes.io/os: windows
    node.kubernetes.io/windows-build: '10.0.17763'
    - key: "os"
      operator: "Equal"
      value: "windows"
      effect: "NoSchedule"
ref: https://kubernetes.io/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host

However the above taints and tolerations are recommended and not guaranteed to be present, in such cases we should do a best effort for Windows pods and fallback to /bin/sh where the OS can't be figured out.

It is also required to fix the oc debug pod command for a feature console team is adding to the admin console that allows to debug pod containers from the UI.

Comment 10 Ross Peoples 2022-01-18 21:44:24 UTC
This appears to be fixed in the latest oc version, I tried version 4.9.15 and it works.

Comment 13 zhou ying 2022-01-25 08:24:27 UTC
[root@localhost ~]# oc debug node/ip-10-0-137-166.us-east-2.compute.internal
error: cannot debug ip-10-0-137-166.us-east-2.compute.internal: can't debug Windows nodes
[root@localhost ~]# oc get clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-25-023600   True        False         64m     Cluster version is 4.10.0-0.nightly-2022-01-25-023600

Comment 14 Aravindh Puthiyaparambil 2022-01-25 16:05:16 UTC
@yinzhou , did you verify that `oc debug <podname>` works?

Note You need to log in before you can comment on or make changes to this bug.