Bug 1806579 - etcd static pod does not wait for the host port to be released
Summary: etcd static pod does not wait for the host port to be released
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Sam Batschelet
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks: 1809760
TreeView+ depends on / blocked
 
Reported: 2020-02-24 14:54 UTC by David Eads
Modified: 2020-07-13 17:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1809760 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:20:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 226 0 None closed Bug 1806579: waits for ports before starting etcd member 2021-02-01 22:05:16 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:21:02 UTC

Description David Eads 2020-02-24 14:54:58 UTC
Because the port will be held by a *different* static pod during 4.3 to 4.4 upgrades, this cannot be done in an init container like the rest.  Instead, it must be embedded in the etcd container itself.

Comment 3 Ke Wang 2020-03-23 05:43:41 UTC
Per the PR https://github.com/openshift/cluster-etcd-operator/pull/226, before the etcd starts up, should there is a port collision checking, this is done normal startup, not just upgrade.

We do the following checking, 
1.  the etcd pod starting up command,
$ etcd_pod=$(oc get pod -n openshift-etcd | grep -i running | head -1 | cut -d " " -f1)
$ container_cmd=$(oc get pod $etcd_pod -n openshift-etcd -o json | jq .spec.containers[1].command[2])
$ echo -e $container_cmd | grep -A6 'conflict initcontainer'

# we cannot use the \"normal\" port conflict initcontainer because when we upgrade, the existing static pod will never yield,
# so we do the detection in etcd container itself.
echo -n \"Waiting for ports 2379, 2380 and 9978 to be released.\"
while [ -n \"$(lsof -ni :2379)$(lsof -ni :2380)$(lsof -ni :9978)\" ]; do
 echo -n \".\"
 sleep 1
done

2. The etcd started up with related ports in master node, 
sh-4.4# ps -ef |grep -v grep | grep 'etcd --initial'   
root       14894   14866 12 01:41 ?        00:30:36 etcd --initial-advertise-peer-urls=https://.......58:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-ip-......-58.us-east-2.compute.internal.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-ip-......-58.us-east-2.compute.internal.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-ip-......-58.us-east-2.compute.internal.crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-ip-......-58.us-east-2.compute.internal.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://.......58:2379 --listen-client-urls=https://0.0.0.0:2379 --listen-peer-urls=https://0.0.0.0:2380 --listen-metrics-urls=https://0.0.0.0:9978

Comment 5 errata-xmlrpc 2020-07-13 17:20:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.