Bug 1878774 - on the nodes there are zombies caused by etcd
Summary: on the nodes there are zombies caused by etcd
Keywords:
Status: CLOSED DUPLICATE of bug 1844727
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.6.z
Hardware: s390x
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks: ocp-46-z-tracker
TreeView+ depends on / blocked
 
Reported: 2020-09-14 13:43 UTC by wvoesch
Modified: 2020-10-02 09:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-14 14:56:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description wvoesch 2020-09-14 13:43:18 UTC
On several nodes I have observed zombie processes caused by etcd (usually 2 or 3). Please see the additional info.  


I observed this on the following two separate clusters with these versions:
Cluster a) Z13: Version: 4.6.0-0.nightly-s390x-2020-08-27-080214 RHCOS: 46.82.202008261939-0 s8343022
Cluster b) Z13: Version: 4.6.0-0.nightly-s390x-2020-09-05-222506 RHCOS: 46.82.202009042339-0 s8343008
(these are two different environments) 

Please let me know what information you need for further debugging and I shall provide it happily. 
Thank you. 


Additional info:

root       16968       1  0 Sep07 ?        00:01:40 /usr/libexec/crio/conmon -b /var/run/containers/storage/overlay-containers/90eea5bc227afe29310c21e42e429f7e9cf1d79f432af26b51e376a94cca6ab1/userdata -c 90eea5bc227afe29310c21e42e429f7e9cf1d79f432af26b51e376a94cca6ab1 --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-etcd_etcd-master-01.ocp-s8343008.lnxne.boe_dbdfcdbe67ee372db0780000f39086ce/etcd/0.log --log-level info -n k8s_etcd_etcd-master-01.ocp-s8343008.lnxne.boe_openshift-etcd_dbdfcdbe67ee372db0780000f39086ce_0 -P /var/run/containers/storage/overlay-containers/90eea5bc227afe29310c21e42e429f7e9cf1d79f432af26b51e376a94cca6ab1/userdata/conmon-pidfile -p /var/run/containers/storage/overlay-containers/90eea5bc227afe29310c21e42e429f7e9cf1d79f432af26b51e376a94cca6ab1/userdata/pidfile --persist-dir /var/lib/containers/storage/overlay-containers/90eea5bc227afe29310c21e42e429f7e9cf1d79f432af26b51e376a94cca6ab1/userdata -r /usr/bin/runc --runtime-arg --root=/run/runc --socket-dir-path /var/run/crio -u 90eea5bc227afe29310c21e42e429f7e9cf1d79f432af26b51e376a94cca6ab1 -s
root       16983   16968 20 Sep07 ?        1-10:27:05  \_ etcd --initial-advertise-peer-urls=https://10.107.1.111:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-master-01.ocp-s8343008.lnxne.boe.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-serving/etcd-serving-master-01.ocp-s8343008.lnxne.boe.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master-01.ocp-s8343008.lnxne.boe.crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master-01.ocp-s8343008.lnxne.boe.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://10.107.1.111:2379 --listen-client-urls=https://0.0.0.0:2379 --listen-peer-urls=https://0.0.0.0:2380 --listen-metrics-urls=https://0.0.0.0:9978
root      823821   16983  0 Sep07 ?        00:00:00      \_ [lsof] <defunct>
root      823822   16983  0 Sep07 ?        00:00:00      \_ [grep] <defunct>

Comment 1 Sam Batschelet 2020-09-14 14:56:29 UTC

*** This bug has been marked as a duplicate of bug 1844727 ***


Note You need to log in before you can comment on or make changes to this bug.