Bug 1921561 - Stale pid files makes ovn-kubernetes pod restart continuously
Summary: Stale pid files makes ovn-kubernetes pod restart continuously
Keywords:
Status: CLOSED DUPLICATE of bug 1923753
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-28 07:56 UTC by Eduardo Minguez
Modified: 2024-06-14 00:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-03 15:40:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eduardo Minguez 2021-01-28 07:56:47 UTC
Description of problem:

If for some reason (probably a race condition) the ovs-vswitchd.pid contains an invalid pid, the ovs pod goes into unready state as:

```
Warning  Unhealthy  23s (x3 over 33s)   kubelet, ci-ln-pg766xb-f76d1-cgnz2-master-2  Liveness probe failed: ovsdb-server is running with pid 2496
Pidfile for ovs-vswitchd (/var/run/openvswitch/ovs-vswitchd.pid) is stale
```

Version-Release number of selected component (if applicable):

This was found during an upgrade from 4.4.6 to 4.4.16

How reproducible:

Alter the PID file to some random value and observe the pod logs.

Steps to Reproduce:
1. Alter the PID file with some random value
2. Observe the ovs pods

Actual results:

The ovs pod went into an unready state with:

Warning  Unhealthy  23s (x3 over 33s)   kubelet, ci-ln-pg766xb-f76d1-cgnz2-master-2  Liveness probe failed: ovsdb-server is running with pid 2496
Pidfile for ovs-vswitchd (/var/run/openvswitch/ovs-vswitchd.pid) is stale

Expected results:

The PID file is overwritten with the proper content and the pods are running properly

Additional info:

It seems running `/usr/share/openvswitch/scripts/ovs-ctl status` in the pod fixes the issue...


Note You need to log in before you can comment on or make changes to this bug.