Bug 1883610

Summary: Use more reliable ss in place of lsof
Product: OpenShift Container Platform Reporter: Suresh Kolichala <skolicha>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:46:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Suresh Kolichala 2020-09-29 17:09:55 UTC
Description of problem:
When etcd restarts, we check the listen ports to make sure the previous static pod for etcd has exited. For this check, we used `lsof` but it has been determined that `ss` is more reliable with the check for TCP ports: `lsof` doesn't output ports in TIME_WAIT but `ss` does. 

Version-Release number of selected component (if applicable):


How reproducible:
Not easily reproducible

Steps to Reproduce:
1. Install or upgrade an OCP cluster
2. Make sure etcd comes after the previous etcd has exited.


Actual results:
If lsof/ss doesn't detect the previous etcd, we could get port conflict, and a crashloop.

Expected results:
No crashloops of etcd due to port conflicts.

Additional info:

Comment 5 errata-xmlrpc 2020-10-27 16:46:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196