Bug 1990014
| Summary: | oc debug <pod-name> does not work for Windows pods | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mansi Kulkarni <mankulka> |
| Component: | oc | Assignee: | Ross Peoples <rpeoples> |
| oc sub component: | oc | QA Contact: | zhou ying <yinzhou> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aos-bugs, aravindh, maszulik, mfojtik, mohashai, team-winc, yinzhou |
| Version: | 4.9 | Flags: | mohashai:
needinfo+
|
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: oc debug previously assumed it was always targeting Linux-based containers by trying to run a Bash shell.
Consequence: Attempting to debug a Windows container if Bash was not present in the container.
Fix: oc debug now uses pod selectors to determine the OS of the containers, and tries to run the cmd.exe shell for Windows containers.
Result: oc debug now works on both Linux and Windows-based containers.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-11 18:15:11 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
@maszulik Why has this been passed back to the Windows Container team? This is an issue with how oc handles the debug command for Windows and has nothing to with the Windows Machine Config Operator which is what the WinC team owns. After the discussion with workloads team passing this bug back to the team.
Points to be considered for fixing this bug:
Windows pod spec is expected to have set the tolerations for host:
nodeSelector:
kubernetes.io/os: windows
node.kubernetes.io/windows-build: '10.0.17763'
tolerations:
- key: "os"
operator: "Equal"
value: "windows"
effect: "NoSchedule"
ref: https://kubernetes.io/docs/setup/production-environment/windows/user-guide-windows-containers/#ensuring-os-specific-workloads-land-on-the-appropriate-container-host
However the above taints and tolerations are recommended and not guaranteed to be present, in such cases we should do a best effort for Windows pods and fallback to /bin/sh where the OS can't be figured out.
It is also required to fix the oc debug pod command for a feature console team is adding to the admin console that allows to debug pod containers from the UI.
This appears to be fixed in the latest oc version, I tried version 4.9.15 and it works. [root@localhost ~]# oc debug node/ip-10-0-137-166.us-east-2.compute.internal error: cannot debug ip-10-0-137-166.us-east-2.compute.internal: can't debug Windows nodes [root@localhost ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-25-023600 True False 64m Cluster version is 4.10.0-0.nightly-2022-01-25-023600 @yinzhou , did you verify that `oc debug <podname>` works? |
Description of problem: oc debug <pod-name> does not work for Windows pods. The error thrown is: $ oc debug win-webserver-79878b949c-5hqw4 Starting pod/win-webserver-79878b949c-5hqw4-debug, command was: powershell.exe -command $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Red Hat OpenShift + Windows Container Workloads</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); }; failed to try resolving symlinks in path "\\var\\log\\pods\\default_win-webserver-79878b949c-5hqw4-debug_6db9ac83-3a9f-428e-9234-605b3d36927d\\windowswebserver\\0.log": CreateFile \var\log\pods\default_win-webserver-79878b949c-5hqw4-debug_6db9ac83-3a9f-428e-9234-605b3d36927d\windowswebserver\0.log: The system cannot find the file specified. Removing debug pod ... Version-Release number of selected component (if applicable): 4.9 How reproducible: Always Steps to Reproduce: 1. A Windows workload win-webserver is created on a Windows node brought up using a sample service and deployment: https://docs.openshift.com/container-platform/4.8/windows_containers/scheduling-windows-workloads.html#sample-windows-workload-deployment_scheduling-windows-workloads 2. oc debug win-webserver-79878b949c-5hqw4 is executed Actual results: The command errors out and removes the debug pod. Expected results: The command lands into a debug pod without errors. Additional info: After some digging around, it turns out the main issue is the default debug container command being: /bin/sh which should be cmd for Windows. what oc exec is doing for Windows(https://github.com/openshift/console/blob/8c7a7e60edb4722d4a5069030025bcc238dba714/frontend/public/components/pod-exec.jsx#L60) could be the way forward. Another issue that's being highlighted from the error and needs fix is when the debug command errors out getLogs() is called, and the way it is being called is not compatible with Windows log collection. Tested this on 4.9, pretty sure this would need to be backported to other versions as well.