Bug 1862643 - oc must-gather on Windows nodes doesn't collect container runtime logs
Summary: oc must-gather on Windows nodes doesn't collect container runtime logs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Christian Glombek
QA Contact: gaoshang
URL:
Whiteboard:
Depends On:
Blocks: 1896691
TreeView+ depends on / blocked
 
Reported: 2020-07-31 21:58 UTC by David Mulford
Modified: 2021-05-17 17:31 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Provide an administrator a streaming view of event logs on Windows machines without them having to implement a client side reader. The kubelet API for querying the Linux journal is re-used for invoking the Get-WinEvent cmdlet in a PowerShell. Parameters that have no functional equivalence in Get-WinEvent are ignored when assembling the command. Only available to cluster admins.
Clone Of:
Environment:
Last Closed: 2021-05-17 17:31:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 383 0 None closed Bug 1862643: UPSTREAM: 96120: kubelet: Expose a simple Get-WinEvent shim on the kubelet logs endpoint 2021-02-15 15:45:34 UTC
Github openshift must-gather pull 195 0 None closed Bug 1862643: Collect docker logs from WinEvent log on Windows machines 2021-02-15 15:45:34 UTC

Description David Mulford 2020-07-31 21:58:13 UTC
oc must-gather PR#165 doesn't collect container runtime logs

https://github.com/openshift/must-gather/pull/165

Actual results:

'oc must-gather' on Windows only pulls the kube-proxy, hybrid-overlay, and kubelet logs.

Expected results:

'oc must-gather' should also retrieve container runtime logs.

If we need to diagnose issues occurring on a Windows node, lack of container runtime logs will severely diminish support's and engineering's ability to troubleshoot the problem, without other means to get these logs (e.g. creating a bastion host).

Not having the runtime logs will likely lead to a negative customer experience due our inability to identify issues in a timely fashion.

Comment 1 Christian Glombek 2020-10-01 20:05:20 UTC
Apparently Docker Daemon is hardcoded to log to the WinEvent logs on Windows.

As a possible workaround, I've created a PR to enable streaming WinEvent logs from the nodes through the kubelet: https://github.com/openshift/kubernetes/pull/383

Comment 3 Christian Glombek 2020-11-12 13:20:36 UTC
Not for QA: https://github.com/openshift/kubernetes/pull/383 does not actually change adapt must-gather to gather the container runtime logs, that'll need a follow-up.
The functionality in https://github.com/openshift/kubernetes/pull/383 can be tested like this:

$ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker

Comment 4 Christian Glombek 2020-11-12 13:21:01 UTC
*Note, not "Not"

Comment 5 gaoshang 2020-11-13 08:13:45 UTC
(In reply to Christian Glombek from comment #3)
> Not for QA: https://github.com/openshift/kubernetes/pull/383 does not
> actually change adapt must-gather to gather the container runtime logs,
> that'll need a follow-up.
> The functionality in https://github.com/openshift/kubernetes/pull/383 can be
> tested like this:
> 
> $ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker

Hi Christian, thanks for your note, I tested it with OCP 4.7.0-0.nightly-2020-11-12-200927 on AWS which should contain https://github.com/openshift/kubernetes/pull/383, see bellow:

# oc adm release info registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-11-12-200927 --commits | grep kubernetes
  kubernetes 1.19.2               
  cluster-autoscaler                             https://github.com/openshift/kubernetes-autoscaler                          698efa2f989b509c5c1a2549a531a08e7639bd9f
  hyperkube                                      https://github.com/openshift/kubernetes                                     4abb4a77838037b8dbb8e4ca34e63c4a129654c8
  kube-storage-version-migrator                  https://github.com/openshift/kubernetes-kube-storage-version-migrator       6deeefa7d0ef32abd3ff7df04642cb7b29fc7ffa
  kuryr-cni                                      https://github.com/openshift/kuryr-kubernetes                               125b82addbdd5a0fc4ed32ff242ca44fd65464c2
  kuryr-controller                               https://github.com/openshift/kuryr-kubernetes                               125b82addbdd5a0fc4ed32ff242ca44fd65464c2
  ovn-kubernetes                                 https://github.com/openshift/ovn-kubernetes                                 08266144ce7cbea606d260ab4b7c3b610016d720

But failed to collect container runtime logs, could you take a look at it?

# oc get node -l kubernetes.io/os=windows
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-147-28.us-east-2.compute.internal   Ready    worker   18m   v0.0.0-master+$Format:%h$

# oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker
error: journal output not available

Version:
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-12-200927   True        False         41m     Cluster version is 4.7.0-0.nightly-2020-11-12-200927

windows-machine-config-operator git commit 337c8255a94c2964c3485839d63c9642548a740f

Comment 6 Christian Glombek 2020-11-13 09:18:52 UTC
Hi!
How long after the node came up did you try this? The node may look ready, while it's still being configured and
it may take a little while until the kubelet is actually up and running.

If you still have the cluster up, could you try the same command again?

If it's still the same, please try:

$ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker -u Microsoft-Windows-LoadPerf

Comment 7 Christian Glombek 2020-11-13 09:30:34 UTC
Ohhh, I'm sorry, it just occured to me that the kubelet for windows is built from a git submodule in the WMCO repo, which hasn't been updated with this change
(https://github.com/openshift/windows-machine-config-operator/).

Unfortunately we have no way of testing the windows kubelet with this change for now - I will try to test the 4.7 win kubelet manually with WMCO. 

Could you instead simply check there are no regressions on the linux side with `oc adm node-logs` as well as `must-gather`?

Thank you!

Comment 8 gaoshang 2020-11-13 10:05:03 UTC
Got it, thanks, I'll verify this bug after we are able to build 4.7 kubelet.

I checked `oc adm must-gather` and `oc adm node-logs` with OCP 4.7.0-0.nightly-2020-11-12-200927, no regressions error found, thanks.

Comment 9 Christian Glombek 2020-11-30 04:40:05 UTC
The 4.7 kubelet code has landed in WMCO master branch (as of commit 776dff46a07462fad1e310f12f34110c66b78847).

With the windows-machine-config-operator built and deployed from there, the docker service log collection should now be testable with:

$ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker

Comment 10 gaoshang 2020-11-30 09:27:27 UTC
This bug has been verified on OCP 4.7.0-0.nightly-2020-11-29-133728 and passed, thanks.

Version:
OCP 4.7.0-0.nightly-2020-11-29-133728
windows-machine-config-operator git commit 776dff46a07462fad1e310f12f34110c66b78847

Steps:
1, WMCO is running and Windows node bootstrapped.
2, Run oc adm node-logs, check Windows container runtime logs could be retrieved.

# oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker


   ProviderName: docker

TimeCreated           Id LevelDisplayName Message                                                                      
-----------           -- ---------------- -------                                                                      
11/21/2020 4:36:53 PM  1 Information      Starting up                                                                  
11/21/2020 4:36:53 PM  1 Information      Windows default isolation mode: process                                      
11/21/2020 4:36:53 PM  1 Information      Loading containers: start.                                                   
11/21/2020 4:36:53 PM  1 Information      Restoring existing overlay networks from HNS into docker                     
11/21/2020 4:36:57 PM  1 Information      Loading containers: done.                                                    
11/21/2020 4:36:57 PM  1 Information      starting telemetry service                                                   
11/21/2020 4:36:57 PM 11 Information      Docker daemon [version=19.03.11 commit=0da829ac52                            
                                          graphdriver(s)=windowsfilter]                                                
11/21/2020 4:36:57 PM 11 Warning          Unable to load license [error=unable to lookup licensing details without a   
                                          daemon.cluster]                                                              
11/21/2020 4:36:57 PM  1 Information      Daemon has completed initialization                                          
11/21/2020 4:36:57 PM  1 Information      API listen on //./pipe/docker_engine                                         
11/21/2020 4:38:16 PM  1 Information      Starting up                                                                  
11/21/2020 4:38:16 PM  1 Information      Windows default isolation mode: process                                      
11/21/2020 4:38:16 PM  1 Information      Loading containers: start.                                                   
11/21/2020 4:38:16 PM  1 Information      Restoring existing overlay networks from HNS into docker                     
11/21/2020 4:38:22 PM  1 Information      Loading containers: done.                                                    
11/21/2020 4:38:22 PM  1 Information      starting telemetry service                                                   
11/21/2020 4:38:22 PM 11 Information      Docker daemon [graphdriver(s)=windowsfilter version=19.03.11                 
                                          commit=0da829ac52]                                                           
11/21/2020 4:38:22 PM 11 Warning          Unable to load license [error=unable to lookup licensing details without a   
                                          daemon.cluster]                                                              
11/21/2020 4:38:22 PM  1 Information      Daemon has completed initialization                                          
11/21/2020 4:38:22 PM  1 Information      API listen on //./pipe/docker_engine                                         
11/30/2020 8:41:50 AM  1 Information      Starting up                                                                  
11/30/2020 8:41:50 AM  1 Information      Windows default isolation mode: process                                      
11/30/2020 8:41:51 AM  1 Information      Loading containers: start.                                                   
11/30/2020 8:41:51 AM  1 Information      Restoring existing overlay networks from HNS into docker                     
11/30/2020 8:41:57 AM  1 Information      Loading containers: done.                                                    
11/30/2020 8:41:57 AM 11 Warning          Unable to load license [error=unable to lookup licensing details without a   
                                          daemon.cluster]                                                              
11/30/2020 8:41:57 AM 11 Information      Docker daemon [version=19.03.11 commit=0da829ac52                            
                                          graphdriver(s)=windowsfilter]                                                
11/30/2020 8:41:57 AM  1 Information      starting telemetry service                                                   
11/30/2020 8:41:57 AM  1 Information      Daemon has completed initialization                                          
11/30/2020 8:41:58 AM  1 Information      API listen on //./pipe/docker_engine

Comment 11 Aravindh Puthiyaparambil 2020-11-30 19:16:38 UTC
Setting this back to assigned so that `oc adm must-gather` can be verified once https://github.com/openshift/must-gather/pull/195 merges.

Comment 12 Christian Glombek 2020-12-03 08:45:34 UTC
This'll need another round of verification from QA - this time `oc adm must-gather` should also be pulling the docker runtime logs from all Windows nodes

Comment 13 gaoshang 2020-12-04 08:21:33 UTC
(In reply to Christian Glombek from comment #12)
> This'll need another round of verification from QA - this time `oc adm
> must-gather` should also be pulling the docker runtime logs from all Windows
> nodes

QE testing passed on 4.7.0-0.nightly-2020-12-03-205004, will change status to VERIFIED when "ON_QA", thanks.

Version:
4.7.0-0.nightly-2020-12-03-205004
windows-machine-config-operator git commit 2d148b57ec9d546188f2f3a6a0e903286f260991

Steps:
1, WMCO is running and 2 Windows nodes bootstrapped.
2, Run must-gather, check it should pull the docker runtime logs from all Windows nodes

# oc adm must-gather --dest-dir /tmp/must-gather
...
[must-gather-ms5xc] OUT host_service_logs/windows/
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/hybrid-overlay/
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/hybrid-overlay/hybrid-overlay.log
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/kube-proxy.exe.ERROR
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/kube-proxy.exe.INFO
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/kube-proxy.exe.WARNING
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/kubelet/
[must-gather-ms5xc] OUT host_service_logs/windows/log_files/kubelet/kubelet.log
[must-gather-ms5xc] OUT host_service_logs/windows/log_winevent/
[must-gather-ms5xc] OUT host_service_logs/windows/log_winevent/docker_winevent.log
...

# tree /tmp/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-0605b2f28717288cd0ddd81f15536e0053508a6d3239925cc767738c5ed8f2dd/host_service_logs/windows/
/tmp/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-0605b2f28717288cd0ddd81f15536e0053508a6d3239925cc767738c5ed8f2dd/host_service_logs/windows/
├── log_files
│   ├── hybrid-overlay
│   │   └── hybrid-overlay.log
│   ├── kubelet
│   │   └── kubelet.log
│   └── kube-proxy
│       ├── kube-proxy.exe.ERROR
│       ├── kube-proxy.exe.INFO
│       └── kube-proxy.exe.WARNING
└── log_winevent
    └── docker_winevent.log

# cat /tmp/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-0605b2f28717288cd0ddd81f15536e0053508a6d3239925cc767738c5ed8f2dd/host_service_logs/windows/log_winevent/docker_winevent.log 




   ProviderName: docker

   ProviderName: docker

TimeCreated          Id LevelDisplayName Message                                                                       
-----------          -- ---------------- -------                                                                       
12/4/2020 7:09:57 AM  1 Information      Starting up                                                                   
12/4/2020 7:09:57 AM  1 Information      Windows default isolation mode: process                                       
12/4/2020 7:09:57 AM  1 Information      Loading containers: start.                                                    
12/4/2020 7:09:57 AM  1 Information      Restoring existing overlay networks from HNS into docker                      
12/4/2020 7:10:01 AM  1 Information      Loading containers: done.                                                     
12/4/2020 7:10:01 AM  1 Information      starting telemetry service                                                    
12/4/2020 7:10:01 AM 11 Information      Docker daemon [version=19.03.12 commit=f295753ffd                             
                                         graphdriver(s)=windowsfilter]                                                 
12/4/2020 7:10:01 AM  1 Information      Daemon has completed initialization                                           
12/4/2020 7:10:01 AM  1 Information      API listen on //./pipe/docker_engine                                          
12/4/2020 7:10:41 AM  1 Information      stopping telemetry service                                                    
12/4/2020 7:11:27 AM  1 Information      Starting up                                                                   
12/4/2020 7:11:27 AM  1 Information      Windows default isolation mode: process                                       
12/4/2020 7:11:27 AM  1 Information      Loading containers: start.                                                    
12/4/2020 7:11:27 AM  1 Information      Restoring existing overlay networks from HNS into docker                      
12/4/2020 7:11:33 AM  1 Information      Loading containers: done.                                                     
12/4/2020 7:11:33 AM  1 Information      starting telemetry service                                                    
12/4/2020 7:11:33 AM 11 Information      Docker daemon [version=19.03.12 commit=f295753ffd                             
                                         graphdriver(s)=windowsfilter]                                                 
12/4/2020 7:11:33 AM  1 Information      Daemon has completed initialization                                           
12/4/2020 7:11:33 AM  1 Information      API listen on //./pipe/docker_engine                                          
12/4/2020 7:59:34 AM  1 Information      Starting up                                                                   
12/4/2020 7:59:34 AM  1 Information      Windows default isolation mode: process                                       
12/4/2020 7:59:35 AM  1 Information      Loading containers: start.                                                    
12/4/2020 7:59:35 AM  1 Information      Restoring existing overlay networks from HNS into docker                      
12/4/2020 7:59:44 AM  1 Information      Loading containers: done.                                                     
12/4/2020 7:59:44 AM  1 Information      starting telemetry service                                                    
12/4/2020 7:59:44 AM 11 Information      Docker daemon [commit=f295753ffd graphdriver(s)=windowsfilter                 
                                         version=19.03.12]                                                             
12/4/2020 7:59:44 AM  1 Information      Daemon has completed initialization                                           
12/4/2020 7:59:44 AM  1 Information      API listen on //./pipe/docker_engine                                          


TimeCreated          Id LevelDisplayName Message                                                                       
-----------          -- ---------------- -------                                                                       
12/4/2020 5:26:55 AM  1 Information      Starting up                                                                   
12/4/2020 5:26:55 AM  1 Information      Windows default isolation mode: process                                       
12/4/2020 5:26:55 AM  1 Information      Loading containers: start.                                                    
12/4/2020 5:26:55 AM  1 Information      Restoring existing overlay networks from HNS into docker                      
12/4/2020 5:26:58 AM  1 Information      Loading containers: done.                                                     
12/4/2020 5:26:58 AM  1 Information      starting telemetry service                                                    
12/4/2020 5:26:58 AM 11 Information      Docker daemon [commit=f295753ffd graphdriver(s)=windowsfilter                 
                                         version=19.03.12]                                                             
12/4/2020 5:26:58 AM  1 Information      Daemon has completed initialization                                           
12/4/2020 5:26:58 AM  1 Information      API listen on //./pipe/docker_engine                                          
12/4/2020 5:28:18 AM  1 Information      Starting up                                                                   
12/4/2020 5:28:19 AM  1 Information      Windows default isolation mode: process                                       
12/4/2020 5:28:19 AM  1 Information      Loading containers: start.                                                    
12/4/2020 5:28:19 AM  1 Information      Restoring existing overlay networks from HNS into docker                      
12/4/2020 5:28:25 AM  1 Information      Loading containers: done.                                                     
12/4/2020 5:28:25 AM  1 Information      starting telemetry service                                                    
12/4/2020 5:28:25 AM 11 Information      Docker daemon [graphdriver(s)=windowsfilter version=19.03.12                  
                                         commit=f295753ffd]                                                            
12/4/2020 5:28:25 AM  1 Information      Daemon has completed initialization                                           
12/4/2020 5:28:25 AM  1 Information      API listen on //./pipe/docker_engine                                          
12/4/2020 7:47:52 AM  1 Information      Starting up                                                                   
12/4/2020 7:47:52 AM  1 Information      Windows default isolation mode: process                                       
12/4/2020 7:47:52 AM  1 Information      Loading containers: start.                                                    
12/4/2020 7:47:52 AM  1 Information      Restoring existing overlay networks from HNS into docker                      
12/4/2020 7:48:02 AM  1 Information      Loading containers: done.                                                     
12/4/2020 7:48:02 AM  1 Information      starting telemetry service                                                    
12/4/2020 7:48:02 AM 11 Information      Docker daemon [version=19.03.12 commit=f295753ffd                             
                                         graphdriver(s)=windowsfilter]                                                 
12/4/2020 7:48:02 AM  1 Information      Daemon has completed initialization                                           
12/4/2020 7:48:02 AM  1 Information      API listen on //./pipe/docker_engine

Comment 14 gmarkley 2020-12-04 15:41:00 UTC
Marking Verified based on gaoshang prior comment.


Note You need to log in before you can comment on or make changes to this bug.