Bug 1990014

Summary: oc debug <pod-name> does not work for Windows pods
Product: OpenShift Container Platform Reporter: Mansi Kulkarni <mankulka>
Component: ocAssignee: Ross Peoples <rpeoples>
oc sub component: oc QA Contact: zhou ying <yinzhou>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, aravindh, maszulik, mfojtik, mohashai, team-winc, yinzhou
Version: 4.9Flags: mohashai: needinfo+
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: oc debug previously assumed it was always targeting Linux-based containers by trying to run a Bash shell. Consequence: Attempting to debug a Windows container if Bash was not present in the container. Fix: oc debug now uses pod selectors to determine the OS of the containers, and tries to run the cmd.exe shell for Windows containers. Result: oc debug now works on both Linux and Windows-based containers.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-11 18:15:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mansi Kulkarni 2021-08-04 14:54:49 UTC
Description of problem:

oc debug <pod-name> does not work for Windows pods. 
The error thrown is: 
$ oc debug win-webserver-79878b949c-5hqw4

Starting pod/win-webserver-79878b949c-5hqw4-debug, command was: powershell.exe -command $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Red Hat OpenShift + Windows Container Workloads</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
failed to try resolving symlinks in path "\\var\\log\\pods\\default_win-webserver-79878b949c-5hqw4-debug_6db9ac83-3a9f-428e-9234-605b3d36927d\\windowswebserver\\0.log": CreateFile \var\log\pods\default_win-webserver-79878b949c-5hqw4-debug_6db9ac83-3a9f-428e-9234-605b3d36927d\windowswebserver\0.log: The system cannot find the file specified.
Removing debug pod ...


Version-Release number of selected component (if applicable):
4.9

How reproducible:
Always

Steps to Reproduce:
1. A Windows workload win-webserver is created on a Windows node brought up using a sample service and deployment: https://docs.openshift.com/container-platform/4.8/windows_containers/scheduling-windows-workloads.html#sample-windows-workload-deployment_scheduling-windows-workloads

2. oc debug win-webserver-79878b949c-5hqw4 is executed


Actual results:
The command errors out and removes the debug pod.

Expected results:
The command lands into a debug pod without errors.

Additional info:
After some digging around, it turns out the main issue is the default debug  container command being: /bin/sh which should be cmd for Windows.
what oc exec is doing for Windows(https://github.com/openshift/console/blob/8c7a7e60edb4722d4a5069030025bcc238dba714/frontend/public/components/pod-exec.jsx#L60) could be the way forward.
Another issue that's being highlighted from the error and needs fix is when the debug command errors out getLogs() is called, and the way it is being called is not compatible with Windows log collection.

Tested this on 4.9, pretty sure this would need to be backported to other versions as well.

Comment 1 Mohammad Saif Shaikh 2021-08-09 16:58:46 UTC
@maszulik Why has this been passed back to the Windows Container team? This is an issue with how oc handles the debug command for Windows and has nothing to with the Windows Machine Config Operator which is what the WinC team owns.

Comment 2 Mansi Kulkarni 2021-08-11 14:25:49 UTC
After the discussion with workloads team passing this bug back to the team.

Points to be considered for fixing this bug:

Windows pod spec is expected to have set the tolerations for host:
nodeSelector:
    kubernetes.io/os: windows
    node.kubernetes.io/windows-build: '10.0.17763'
tolerations:
    - key: "os"
      operator: "Equal"
      value: "windows"
      effect: "NoSchedule"
ref: https://kubernetes.io/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host

However the above taints and tolerations are recommended and not guaranteed to be present, in such cases we should do a best effort for Windows pods and fallback to /bin/sh where the OS can't be figured out.

It is also required to fix the oc debug pod command for a feature console team is adding to the admin console that allows to debug pod containers from the UI.

Comment 10 Ross Peoples 2022-01-18 21:44:24 UTC
This appears to be fixed in the latest oc version, I tried version 4.9.15 and it works.

Comment 13 zhou ying 2022-01-25 08:24:27 UTC
[root@localhost ~]# oc debug node/ip-10-0-137-166.us-east-2.compute.internal
error: cannot debug ip-10-0-137-166.us-east-2.compute.internal: can't debug Windows nodes
[root@localhost ~]# oc get clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-25-023600   True        False         64m     Cluster version is 4.10.0-0.nightly-2022-01-25-023600

Comment 14 Aravindh Puthiyaparambil 2022-01-25 16:05:16 UTC
@yinzhou , did you verify that `oc debug <podname>` works?