Bug 1940499 - hybrid-overlay not logging properly before exiting due to an error [NEEDINFO]
Summary: hybrid-overlay not logging properly before exiting due to an error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ---
: 4.8.0
Assignee: Alexander Constantinescu
QA Contact: gaoshang
URL:
Whiteboard:
Depends On:
Blocks: 1940566
TreeView+ depends on / blocked
 
Reported: 2021-03-18 14:57 UTC by Sebastian Soto
Modified: 2021-07-27 22:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1940566 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:54:17 UTC
Target Upstream Version:
kkulkarn: needinfo? (aconstan)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 472 0 None closed 3-22-21 merge 2021-03-24 12:22:59 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:54:48 UTC

Description Sebastian Soto 2021-03-18 14:57:44 UTC
Description of problem:

hybrid-overlay is experiencing an issue where fatal errors are not being
logged when logging to a file using the logfile flag. This commit
makes it so that the error is logged properly before the program exits.

We've been seeing a lot of support issues being opened due to people attempting to use features not available on certain Windows kernels.
Fixing this should reduce the amount of issues opened

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Attempt to set up a Windows node on a VXLAN cluster with a Windows Server 2019 image

Actual results:
Hybrid overlay panics and does not log to the log file in /var/log/


Expected results:
Hybrid overlay exits and logs to the log file

Additional info:

Comment 3 Kedar Kulkarni 2021-03-25 15:06:50 UTC
Hi,

I checked with Sebastian today and he confirmed there are some changes that they need to pull into the Windows MCO. Once that is done, this bz will be easy to validate. He will be trying to pull in those changes asap. 

Till then keeping a NEEDINFO open on Dev. 

Thanks,
KK.

Comment 5 Sebastian Soto 2021-04-28 13:29:17 UTC
This has made its way into the WMCO.
If hybrid overlay crashes an error should be present in the Windows node logs, which can be retrieved with:
`oc adm node-logs <NODE_NAME> --path=/hybrid-overlay/hybrid-overlay.log

Comment 6 Sebastian Soto 2021-04-28 13:31:02 UTC
As a a note, builds from master have this, not any released builds as of yet.

Comment 7 Anurag saxena 2021-04-30 15:17:56 UTC
@sgao Can you verify it since you would be up to date on WMCO changes? Let SDN team know if you need our help in any way

Comment 8 gaoshang 2021-05-06 15:41:49 UTC
Sure, this bug has been verified on OCP 4.8 + vSphere + Windows Server 2019 and passed, thanks.

Version-Release number of selected component (if applicable):
WMCO built from https://github.com/openshift/windows-machine-config-operator/commit/1ca41c250ff937d1543559ba19e805a7473d45bf
OCP version 4.8.0-0.nightly-2021-04-30-201824

Steps:

1. Install OCP 4.8 with ovn-kubernetes on vSphere, set hybridOverlayVXLANPort: 9898

2. Build WMCO and install it, refer to https://github.com/openshift/windows-machine-config-operator/blob/master/docs/HACKING.md

3. Create Windows machineset with Windows Server 2019

4. In this combination, hybrid-overlay would experience an issue. Check hybrid overlay exits and logs to the log file

$ oc get nodes -l kubernetes.io/os=windows -owide
NAME              STATUS                     ROLES    AGE     VERSION                            INTERNAL-IP     EXTERNAL-IP     OS-IMAGE                       KERNEL-VERSION    CONTAINER-RUNTIME
winworker-hsdrx   Ready,SchedulingDisabled   worker   6m49s   v1.21.0-rc.0.1190+e22a836a8b2659   172.31.249.32   172.31.249.32   Windows Server 2019 Standard   10.0.17763.1697   docker://19.3.14

$ oc adm node-logs winworker-hsdrx --path=/hybrid-overlay/hybrid-overlay.log
I0506 17:26:57.589456    1996 cert_rotation.go:137] Starting client certificate rotation controller
F0506 17:26:57.603194    1996 hybrid-overlay-node.go:53] this version of Windows does not support setting the VXLAN UDP port. Please make sure you install all the KB updates on your system.
F0506 17:26:57.603194    1996 hybrid-overlay-node.go:53] this version of Windows does not support setting the VXLAN UDP port. Please make sure you install all the KB updates on your system.
F0506 17:26:57.603194    1996 hybrid-overlay-node.go:53] this version of Windows does not support setting the VXLAN UDP port. Please make sure you install all the KB updates on your system.
F0506 17:26:57.603194    1996 hybrid-overlay-node.go:53] this version of Windows does not support setting the VXLAN UDP port. Please make sure you install all the KB updates on your system.

PS C:\Users\Administrator> Get-Service hybrid-overlay-node 

Status   Name               DisplayName
------   ----               -----------
Stopped  hybrid-overlay-... hybrid-overlay-node

Comment 11 errata-xmlrpc 2021-07-27 22:54:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.