1921561 – Stale pid files makes ovn-kubernetes pod restart continuously

Bug 1921561 - Stale pid files makes ovn-kubernetes pod restart continuously

Summary: Stale pid files makes ovn-kubernetes pod restart continuously

Keywords:
Status:	CLOSED DUPLICATE of bug 1923753
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Ben Bennett
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-28 07:56 UTC by Eduardo Minguez
Modified:	2024-06-14 00:04 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-03 15:40:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Eduardo Minguez 2021-01-28 07:56:47 UTC

Description of problem:

If for some reason (probably a race condition) the ovs-vswitchd.pid contains an invalid pid, the ovs pod goes into unready state as:

```
Warning  Unhealthy  23s (x3 over 33s)   kubelet, ci-ln-pg766xb-f76d1-cgnz2-master-2  Liveness probe failed: ovsdb-server is running with pid 2496
Pidfile for ovs-vswitchd (/var/run/openvswitch/ovs-vswitchd.pid) is stale
```

Version-Release number of selected component (if applicable):

This was found during an upgrade from 4.4.6 to 4.4.16

How reproducible:

Alter the PID file to some random value and observe the pod logs.

Steps to Reproduce:
1. Alter the PID file with some random value
2. Observe the ovs pods

Actual results:

The ovs pod went into an unready state with:

Warning  Unhealthy  23s (x3 over 33s)   kubelet, ci-ln-pg766xb-f76d1-cgnz2-master-2  Liveness probe failed: ovsdb-server is running with pid 2496
Pidfile for ovs-vswitchd (/var/run/openvswitch/ovs-vswitchd.pid) is stale

Expected results:

The PID file is overwritten with the proper content and the pods are running properly

Additional info:

It seems running `/usr/share/openvswitch/scripts/ovs-ctl status` in the pod fixes the issue...

Note You need to log in before you can comment on or make changes to this bug.