oc must-gather PR#165 doesn't collect container runtime logs https://github.com/openshift/must-gather/pull/165 Actual results: 'oc must-gather' on Windows only pulls the kube-proxy, hybrid-overlay, and kubelet logs. Expected results: 'oc must-gather' should also retrieve container runtime logs. If we need to diagnose issues occurring on a Windows node, lack of container runtime logs will severely diminish support's and engineering's ability to troubleshoot the problem, without other means to get these logs (e.g. creating a bastion host). Not having the runtime logs will likely lead to a negative customer experience due our inability to identify issues in a timely fashion.
Apparently Docker Daemon is hardcoded to log to the WinEvent logs on Windows. As a possible workaround, I've created a PR to enable streaming WinEvent logs from the nodes through the kubelet: https://github.com/openshift/kubernetes/pull/383
Not for QA: https://github.com/openshift/kubernetes/pull/383 does not actually change adapt must-gather to gather the container runtime logs, that'll need a follow-up. The functionality in https://github.com/openshift/kubernetes/pull/383 can be tested like this: $ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker
*Note, not "Not"
(In reply to Christian Glombek from comment #3) > Not for QA: https://github.com/openshift/kubernetes/pull/383 does not > actually change adapt must-gather to gather the container runtime logs, > that'll need a follow-up. > The functionality in https://github.com/openshift/kubernetes/pull/383 can be > tested like this: > > $ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker Hi Christian, thanks for your note, I tested it with OCP 4.7.0-0.nightly-2020-11-12-200927 on AWS which should contain https://github.com/openshift/kubernetes/pull/383, see bellow: # oc adm release info registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-11-12-200927 --commits | grep kubernetes kubernetes 1.19.2 cluster-autoscaler https://github.com/openshift/kubernetes-autoscaler 698efa2f989b509c5c1a2549a531a08e7639bd9f hyperkube https://github.com/openshift/kubernetes 4abb4a77838037b8dbb8e4ca34e63c4a129654c8 kube-storage-version-migrator https://github.com/openshift/kubernetes-kube-storage-version-migrator 6deeefa7d0ef32abd3ff7df04642cb7b29fc7ffa kuryr-cni https://github.com/openshift/kuryr-kubernetes 125b82addbdd5a0fc4ed32ff242ca44fd65464c2 kuryr-controller https://github.com/openshift/kuryr-kubernetes 125b82addbdd5a0fc4ed32ff242ca44fd65464c2 ovn-kubernetes https://github.com/openshift/ovn-kubernetes 08266144ce7cbea606d260ab4b7c3b610016d720 But failed to collect container runtime logs, could you take a look at it? # oc get node -l kubernetes.io/os=windows NAME STATUS ROLES AGE VERSION ip-10-0-147-28.us-east-2.compute.internal Ready worker 18m v0.0.0-master+$Format:%h$ # oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker error: journal output not available Version: # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-12-200927 True False 41m Cluster version is 4.7.0-0.nightly-2020-11-12-200927 windows-machine-config-operator git commit 337c8255a94c2964c3485839d63c9642548a740f
Hi! How long after the node came up did you try this? The node may look ready, while it's still being configured and it may take a little while until the kubelet is actually up and running. If you still have the cluster up, could you try the same command again? If it's still the same, please try: $ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker -u Microsoft-Windows-LoadPerf
Ohhh, I'm sorry, it just occured to me that the kubelet for windows is built from a git submodule in the WMCO repo, which hasn't been updated with this change (https://github.com/openshift/windows-machine-config-operator/). Unfortunately we have no way of testing the windows kubelet with this change for now - I will try to test the 4.7 win kubelet manually with WMCO. Could you instead simply check there are no regressions on the linux side with `oc adm node-logs` as well as `must-gather`? Thank you!
Got it, thanks, I'll verify this bug after we are able to build 4.7 kubelet. I checked `oc adm must-gather` and `oc adm node-logs` with OCP 4.7.0-0.nightly-2020-11-12-200927, no regressions error found, thanks.
The 4.7 kubelet code has landed in WMCO master branch (as of commit 776dff46a07462fad1e310f12f34110c66b78847). With the windows-machine-config-operator built and deployed from there, the docker service log collection should now be testable with: $ oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker
This bug has been verified on OCP 4.7.0-0.nightly-2020-11-29-133728 and passed, thanks. Version: OCP 4.7.0-0.nightly-2020-11-29-133728 windows-machine-config-operator git commit 776dff46a07462fad1e310f12f34110c66b78847 Steps: 1, WMCO is running and Windows node bootstrapped. 2, Run oc adm node-logs, check Windows container runtime logs could be retrieved. # oc adm node-logs -l kubernetes.io/os=windows --path=journal -u docker ProviderName: docker TimeCreated Id LevelDisplayName Message ----------- -- ---------------- ------- 11/21/2020 4:36:53 PM 1 Information Starting up 11/21/2020 4:36:53 PM 1 Information Windows default isolation mode: process 11/21/2020 4:36:53 PM 1 Information Loading containers: start. 11/21/2020 4:36:53 PM 1 Information Restoring existing overlay networks from HNS into docker 11/21/2020 4:36:57 PM 1 Information Loading containers: done. 11/21/2020 4:36:57 PM 1 Information starting telemetry service 11/21/2020 4:36:57 PM 11 Information Docker daemon [version=19.03.11 commit=0da829ac52 graphdriver(s)=windowsfilter] 11/21/2020 4:36:57 PM 11 Warning Unable to load license [error=unable to lookup licensing details without a daemon.cluster] 11/21/2020 4:36:57 PM 1 Information Daemon has completed initialization 11/21/2020 4:36:57 PM 1 Information API listen on //./pipe/docker_engine 11/21/2020 4:38:16 PM 1 Information Starting up 11/21/2020 4:38:16 PM 1 Information Windows default isolation mode: process 11/21/2020 4:38:16 PM 1 Information Loading containers: start. 11/21/2020 4:38:16 PM 1 Information Restoring existing overlay networks from HNS into docker 11/21/2020 4:38:22 PM 1 Information Loading containers: done. 11/21/2020 4:38:22 PM 1 Information starting telemetry service 11/21/2020 4:38:22 PM 11 Information Docker daemon [graphdriver(s)=windowsfilter version=19.03.11 commit=0da829ac52] 11/21/2020 4:38:22 PM 11 Warning Unable to load license [error=unable to lookup licensing details without a daemon.cluster] 11/21/2020 4:38:22 PM 1 Information Daemon has completed initialization 11/21/2020 4:38:22 PM 1 Information API listen on //./pipe/docker_engine 11/30/2020 8:41:50 AM 1 Information Starting up 11/30/2020 8:41:50 AM 1 Information Windows default isolation mode: process 11/30/2020 8:41:51 AM 1 Information Loading containers: start. 11/30/2020 8:41:51 AM 1 Information Restoring existing overlay networks from HNS into docker 11/30/2020 8:41:57 AM 1 Information Loading containers: done. 11/30/2020 8:41:57 AM 11 Warning Unable to load license [error=unable to lookup licensing details without a daemon.cluster] 11/30/2020 8:41:57 AM 11 Information Docker daemon [version=19.03.11 commit=0da829ac52 graphdriver(s)=windowsfilter] 11/30/2020 8:41:57 AM 1 Information starting telemetry service 11/30/2020 8:41:57 AM 1 Information Daemon has completed initialization 11/30/2020 8:41:58 AM 1 Information API listen on //./pipe/docker_engine
Setting this back to assigned so that `oc adm must-gather` can be verified once https://github.com/openshift/must-gather/pull/195 merges.
This'll need another round of verification from QA - this time `oc adm must-gather` should also be pulling the docker runtime logs from all Windows nodes
(In reply to Christian Glombek from comment #12) > This'll need another round of verification from QA - this time `oc adm > must-gather` should also be pulling the docker runtime logs from all Windows > nodes QE testing passed on 4.7.0-0.nightly-2020-12-03-205004, will change status to VERIFIED when "ON_QA", thanks. Version: 4.7.0-0.nightly-2020-12-03-205004 windows-machine-config-operator git commit 2d148b57ec9d546188f2f3a6a0e903286f260991 Steps: 1, WMCO is running and 2 Windows nodes bootstrapped. 2, Run must-gather, check it should pull the docker runtime logs from all Windows nodes # oc adm must-gather --dest-dir /tmp/must-gather ... [must-gather-ms5xc] OUT host_service_logs/windows/ [must-gather-ms5xc] OUT host_service_logs/windows/log_files/ [must-gather-ms5xc] OUT host_service_logs/windows/log_files/hybrid-overlay/ [must-gather-ms5xc] OUT host_service_logs/windows/log_files/hybrid-overlay/hybrid-overlay.log [must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/ [must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/kube-proxy.exe.ERROR [must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/kube-proxy.exe.INFO [must-gather-ms5xc] OUT host_service_logs/windows/log_files/kube-proxy/kube-proxy.exe.WARNING [must-gather-ms5xc] OUT host_service_logs/windows/log_files/kubelet/ [must-gather-ms5xc] OUT host_service_logs/windows/log_files/kubelet/kubelet.log [must-gather-ms5xc] OUT host_service_logs/windows/log_winevent/ [must-gather-ms5xc] OUT host_service_logs/windows/log_winevent/docker_winevent.log ... # tree /tmp/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-0605b2f28717288cd0ddd81f15536e0053508a6d3239925cc767738c5ed8f2dd/host_service_logs/windows/ /tmp/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-0605b2f28717288cd0ddd81f15536e0053508a6d3239925cc767738c5ed8f2dd/host_service_logs/windows/ ├── log_files │ ├── hybrid-overlay │ │ └── hybrid-overlay.log │ ├── kubelet │ │ └── kubelet.log │ └── kube-proxy │ ├── kube-proxy.exe.ERROR │ ├── kube-proxy.exe.INFO │ └── kube-proxy.exe.WARNING └── log_winevent └── docker_winevent.log # cat /tmp/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-0605b2f28717288cd0ddd81f15536e0053508a6d3239925cc767738c5ed8f2dd/host_service_logs/windows/log_winevent/docker_winevent.log ProviderName: docker ProviderName: docker TimeCreated Id LevelDisplayName Message ----------- -- ---------------- ------- 12/4/2020 7:09:57 AM 1 Information Starting up 12/4/2020 7:09:57 AM 1 Information Windows default isolation mode: process 12/4/2020 7:09:57 AM 1 Information Loading containers: start. 12/4/2020 7:09:57 AM 1 Information Restoring existing overlay networks from HNS into docker 12/4/2020 7:10:01 AM 1 Information Loading containers: done. 12/4/2020 7:10:01 AM 1 Information starting telemetry service 12/4/2020 7:10:01 AM 11 Information Docker daemon [version=19.03.12 commit=f295753ffd graphdriver(s)=windowsfilter] 12/4/2020 7:10:01 AM 1 Information Daemon has completed initialization 12/4/2020 7:10:01 AM 1 Information API listen on //./pipe/docker_engine 12/4/2020 7:10:41 AM 1 Information stopping telemetry service 12/4/2020 7:11:27 AM 1 Information Starting up 12/4/2020 7:11:27 AM 1 Information Windows default isolation mode: process 12/4/2020 7:11:27 AM 1 Information Loading containers: start. 12/4/2020 7:11:27 AM 1 Information Restoring existing overlay networks from HNS into docker 12/4/2020 7:11:33 AM 1 Information Loading containers: done. 12/4/2020 7:11:33 AM 1 Information starting telemetry service 12/4/2020 7:11:33 AM 11 Information Docker daemon [version=19.03.12 commit=f295753ffd graphdriver(s)=windowsfilter] 12/4/2020 7:11:33 AM 1 Information Daemon has completed initialization 12/4/2020 7:11:33 AM 1 Information API listen on //./pipe/docker_engine 12/4/2020 7:59:34 AM 1 Information Starting up 12/4/2020 7:59:34 AM 1 Information Windows default isolation mode: process 12/4/2020 7:59:35 AM 1 Information Loading containers: start. 12/4/2020 7:59:35 AM 1 Information Restoring existing overlay networks from HNS into docker 12/4/2020 7:59:44 AM 1 Information Loading containers: done. 12/4/2020 7:59:44 AM 1 Information starting telemetry service 12/4/2020 7:59:44 AM 11 Information Docker daemon [commit=f295753ffd graphdriver(s)=windowsfilter version=19.03.12] 12/4/2020 7:59:44 AM 1 Information Daemon has completed initialization 12/4/2020 7:59:44 AM 1 Information API listen on //./pipe/docker_engine TimeCreated Id LevelDisplayName Message ----------- -- ---------------- ------- 12/4/2020 5:26:55 AM 1 Information Starting up 12/4/2020 5:26:55 AM 1 Information Windows default isolation mode: process 12/4/2020 5:26:55 AM 1 Information Loading containers: start. 12/4/2020 5:26:55 AM 1 Information Restoring existing overlay networks from HNS into docker 12/4/2020 5:26:58 AM 1 Information Loading containers: done. 12/4/2020 5:26:58 AM 1 Information starting telemetry service 12/4/2020 5:26:58 AM 11 Information Docker daemon [commit=f295753ffd graphdriver(s)=windowsfilter version=19.03.12] 12/4/2020 5:26:58 AM 1 Information Daemon has completed initialization 12/4/2020 5:26:58 AM 1 Information API listen on //./pipe/docker_engine 12/4/2020 5:28:18 AM 1 Information Starting up 12/4/2020 5:28:19 AM 1 Information Windows default isolation mode: process 12/4/2020 5:28:19 AM 1 Information Loading containers: start. 12/4/2020 5:28:19 AM 1 Information Restoring existing overlay networks from HNS into docker 12/4/2020 5:28:25 AM 1 Information Loading containers: done. 12/4/2020 5:28:25 AM 1 Information starting telemetry service 12/4/2020 5:28:25 AM 11 Information Docker daemon [graphdriver(s)=windowsfilter version=19.03.12 commit=f295753ffd] 12/4/2020 5:28:25 AM 1 Information Daemon has completed initialization 12/4/2020 5:28:25 AM 1 Information API listen on //./pipe/docker_engine 12/4/2020 7:47:52 AM 1 Information Starting up 12/4/2020 7:47:52 AM 1 Information Windows default isolation mode: process 12/4/2020 7:47:52 AM 1 Information Loading containers: start. 12/4/2020 7:47:52 AM 1 Information Restoring existing overlay networks from HNS into docker 12/4/2020 7:48:02 AM 1 Information Loading containers: done. 12/4/2020 7:48:02 AM 1 Information starting telemetry service 12/4/2020 7:48:02 AM 11 Information Docker daemon [version=19.03.12 commit=f295753ffd graphdriver(s)=windowsfilter] 12/4/2020 7:48:02 AM 1 Information Daemon has completed initialization 12/4/2020 7:48:02 AM 1 Information API listen on //./pipe/docker_engine
Marking Verified based on gaoshang prior comment.