Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1921561

Summary: Stale pid files makes ovn-kubernetes pod restart continuously
Product: OpenShift Container Platform Reporter: Eduardo Minguez <eminguez>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: anbhat
Version: 4.4   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-03 15:40:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduardo Minguez 2021-01-28 07:56:47 UTC
Description of problem:

If for some reason (probably a race condition) the ovs-vswitchd.pid contains an invalid pid, the ovs pod goes into unready state as:

```
Warning  Unhealthy  23s (x3 over 33s)   kubelet, ci-ln-pg766xb-f76d1-cgnz2-master-2  Liveness probe failed: ovsdb-server is running with pid 2496
Pidfile for ovs-vswitchd (/var/run/openvswitch/ovs-vswitchd.pid) is stale
```

Version-Release number of selected component (if applicable):

This was found during an upgrade from 4.4.6 to 4.4.16

How reproducible:

Alter the PID file to some random value and observe the pod logs.

Steps to Reproduce:
1. Alter the PID file with some random value
2. Observe the ovs pods

Actual results:

The ovs pod went into an unready state with:

Warning  Unhealthy  23s (x3 over 33s)   kubelet, ci-ln-pg766xb-f76d1-cgnz2-master-2  Liveness probe failed: ovsdb-server is running with pid 2496
Pidfile for ovs-vswitchd (/var/run/openvswitch/ovs-vswitchd.pid) is stale

Expected results:

The PID file is overwritten with the proper content and the pods are running properly

Additional info:

It seems running `/usr/share/openvswitch/scripts/ovs-ctl status` in the pod fixes the issue...