Bug 1862495

Summary: Running hybrid-overlay-node as a windows service exits abruptly on os.Exit()
Product: OpenShift Container Platform Reporter: Mansi Kulkarni <mankulka>
Component: Windows ContainersAssignee: Mansi Kulkarni <mankulka>
Status: CLOSED ERRATA QA Contact: gaoshang <sgao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: aos-bugs, aravindh, gmarkley, rgudimet, ssoto
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:21:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mansi Kulkarni 2020-07-31 14:42:51 UTC
Description of problem:

Running hybrid-overlay-node as a Windows service relies on os.Exit() to stop the service on svc.Stop or service shutdown. This approach has some concerns as it would exit the process immediately without leaving any chance for the code to cleanly exit (which differs from what happens when SIGINT is raised when running as a daemon or from the command line). The code needs to be updated to avoid using os.Exit() and exit the service gracefully.

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Actual results:
Running hybrid-overlay-node as a Windows service exits abruptly on svc.Stop or shutdown.

Expected results:
Running hybrid-overlay-node as a Windows service exits gracefully on svc.Stop or shutdown.

Additional info:
Some ideas around how to fix this can be found on the PR discussions: https://github.com/ovn-org/ovn-kubernetes/pull/1514

This issue is being tracked upstream at: https://github.com/ovn-org/ovn-kubernetes/issues/1562

Comment 1 Mansi Kulkarni 2020-08-25 14:31:46 UTC
created a PR at https://github.com/ovn-org/ovn-kubernetes/pull/1577, got approval waiting for lgtm.

Comment 2 Mansi Kulkarni 2020-08-27 14:39:50 UTC
This PR(https://github.com/ovn-org/ovn-kubernetes/pull/1577) has been merged into upstream https://github.com/ovn-org/ovn-kubernetes

Comment 4 Mansi Kulkarni 2020-08-31 16:23:05 UTC
This PR has been merged downstream at: https://github.com/openshift/ovn-kubernetes from merge: https://github.com/openshift/ovn-kubernetes/pull/243

Comment 5 gaoshang 2020-09-07 11:58:00 UTC
@mankulka Could you please give some hints on how to verify this bug? Or should I wait running hybrid-overlay-node as Windows service feature finished to test it? Thanks.

Comment 6 Mansi Kulkarni 2020-09-09 16:55:02 UTC
@sgao As this feature has not been implemented in wmco yet, we can wait for running hybrid-overlay-node as Windows service feature ticket-> https://issues.redhat.com/browse/WINC-296 to test it.

Comment 7 gaoshang 2020-10-10 16:26:15 UTC
This bug has been verified on OCP 4.6.0-0.nightly-2020-10-09-224055 and passed, thanks.

Version:
windows-machine-config-operator git commit b24e6404aea83c2e4be6da1a0a5b306f496f983d

Steps:
1. Try to stop hybrid-overlay-node
  
PS C:\Users\Administrator> Stop-Service hybrid-overlay-node
Stop-Service : Cannot stop service 'hybrid-overlay-node (hybrid-overlay-node)' because it has dependent services. It can only be stopped if the Force flag is set.
At line:1 char:1
+ Stop-Service hybrid-overlay-node
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.ServiceProcess.ServiceController:ServiceController) [Stop-Service], ServiceCommandException
    + FullyQualifiedErrorId : ServiceHasDependentServices,Microsoft.PowerShell.Commands.StopServiceCommand
 
PS C:\Users\Administrator> Get-Service hybrid-overlay-node -DependentServices

Status   Name               DisplayName
------   ----               -----------
Running  kube-proxy         kube-proxy

PS C:\Users\Administrator> Stop-Service kube-proxy
PS C:\Users\Administrator> Stop-Service hybrid-overlay-node
PS C:\Users\Administrator>

Comment 9 errata-xmlrpc 2020-10-27 16:21:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196