Description of problem: The keepalived liveness probe frequently fails during deployment with "/bin/bash: line 0: kill: `': not a pid or valid job spec". There's no indication in the logs that anything is actually wrong with keepalived, so I suspect there might be an issue with the liveness probe itself.
Version-Release number of selected component (if applicable): 4.8
How reproducible: Seems to be intermittent
Steps to Reproduce:
1. Deploy using dev-scripts
2. Check journal on one of the nodes. Sometimes the message above will be present and the keepalived container will have been restarted.
Actual results: Liveness probe errors and unexpected keepalived restarts.
Expected results: Keepalived starts and runs normally.
Additional info: My working theory right now is that sending the pgrep output to kill in  is occasionally getting tripped up. I want to try using pkill directly to avoid the shell output passing.
I was mistaken about the problem here. It's just that it takes too long to populate keepalived.conf on the first start. I have a patch proposed to fix that.
After the fix,
The bug didn't reproduced.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.