Bug 1806579

Summary: etcd static pod does not wait for the host port to be released
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: Etcd OperatorAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: kewang, lszaszki, skolicha
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1809760 (view as bug list) Environment:
Last Closed: 2020-07-13 17:20:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1809760    

Description David Eads 2020-02-24 14:54:58 UTC
Because the port will be held by a *different* static pod during 4.3 to 4.4 upgrades, this cannot be done in an init container like the rest.  Instead, it must be embedded in the etcd container itself.

Comment 3 Ke Wang 2020-03-23 05:43:41 UTC
Per the PR https://github.com/openshift/cluster-etcd-operator/pull/226, before the etcd starts up, should there is a port collision checking, this is done normal startup, not just upgrade.

We do the following checking, 
1.  the etcd pod starting up command,
$ etcd_pod=$(oc get pod -n openshift-etcd | grep -i running | head -1 | cut -d " " -f1)
$ container_cmd=$(oc get pod $etcd_pod -n openshift-etcd -o json | jq .spec.containers[1].command[2])
$ echo -e $container_cmd | grep -A6 'conflict initcontainer'

# we cannot use the \"normal\" port conflict initcontainer because when we upgrade, the existing static pod will never yield,
# so we do the detection in etcd container itself.
echo -n \"Waiting for ports 2379, 2380 and 9978 to be released.\"
while [ -n \"$(lsof -ni :2379)$(lsof -ni :2380)$(lsof -ni :9978)\" ]; do
 echo -n \".\"
 sleep 1
done

2. The etcd started up with related ports in master node, 
sh-4.4# ps -ef |grep -v grep | grep 'etcd --initial'   
root       14894   14866 12 01:41 ?        00:30:36 etcd --initial-advertise-peer-urls=https://.......58:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-ip-......-58.us-east-2.compute.internal.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-ip-......-58.us-east-2.compute.internal.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-ip-......-58.us-east-2.compute.internal.crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-ip-......-58.us-east-2.compute.internal.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://.......58:2379 --listen-client-urls=https://0.0.0.0:2379 --listen-peer-urls=https://0.0.0.0:2380 --listen-metrics-urls=https://0.0.0.0:9978

Comment 5 errata-xmlrpc 2020-07-13 17:20:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409