Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1592195

Summary: Logging deploy pods are in Error state while deploying OCP 3.10.0-0.66.0 with logging enabled.
Product: OpenShift Container Platform Reporter: Ashmitha Ambastha <asambast>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: anli, aos-bugs, asambast, rmeggins
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1592203 (view as bug list) Environment:
Last Closed: 2018-08-28 19:25:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1592203    
Attachments:
Description Flags
The inventory file used for the deployment none

Description Ashmitha Ambastha 2018-06-18 07:31:11 UTC
Created attachment 1452519 [details]
The inventory file used for the deployment

Description of problem:
-----------------------
The logging deploy pods are in Error state while deploying OCP 3.10.0-0.66.0 build.

Version-Release number of selected component (if applicable):  
OCP 3.10.0-0.66.0 

How reproducible: always

Steps to Reproduce:
-------------------
1. While deploying OCP + CNS greenfield deployment using ansible installer with a set up of 1 master node, 3 worker nodes and 1 infra node with logging, metrics and gluster registry enabled (3 more nodes for gluster-registry).

2. Configured the set up with the pre-requisites required. 

3. Next, ran the prerequisite and deploy-cluster playbooks. Both passed successfully and hence the deployment should be successful.

4. But, the 3 logging-es-data-master deploy pods, the logging-curator deploy pod and the logging-kibana deploy pod are in Error state.

Actual results:
---------------

- Checked on the pods,

# oc get pods --all-namespaces
NAMESPACE           NAME                                                  READY     STATUS             RESTARTS   AGE
default             docker-registry-1-pmg5t                               1/1       Running            0          4d
default             registry-console-1-8s9q9                              1/1       Running            0          4d
default             router-1-hpn4g                                        1/1       Running            0          4d
default             router-1-l5j6s                                        1/1       Running            0          4d
default             router-1-mhp4c                                        1/1       Running            0          4d
default             router-1-zpk57                                        1/1       Running            0          4d
glusterfs           glusterblock-storage-provisioner-dc-1-t95dm           1/1       Running            0          4d
glusterfs           glusterfs-storage-57m6j                               1/1       Running            1          4d
glusterfs           glusterfs-storage-72q7b                               1/1       Running            0          4d
glusterfs           glusterfs-storage-mr2hx                               1/1       Running            0          4d
glusterfs           heketi-storage-1-nbx8m                                1/1       Running            0          4d
infra-storage       glusterblock-registry-provisioner-dc-1-mlmg7          1/1       Running            0          4d
infra-storage       glusterfs-registry-9sk2r                              1/1       Running            0          4d
infra-storage       glusterfs-registry-qgdf5                              1/1       Running            0          4d
infra-storage       glusterfs-registry-z7pcf                              1/1       Running            0          4d
infra-storage       heketi-registry-1-rhlpg                               1/1       Running            0          4d
kube-system         master-api-dhcp47-98.lab.eng.blr.redhat.com           1/1       Running            0          4d
kube-system         master-controllers-dhcp47-98.lab.eng.blr.redhat.com   1/1       Running            0          4d
kube-system         master-etcd-dhcp47-98.lab.eng.blr.redhat.com          1/1       Running            0          4d
openshift-infra     hawkular-cassandra-1-tbmx6                            0/1       CrashLoopBackOff   1174       4d
openshift-infra     hawkular-cassandra-2-7rhbk                            0/1       CrashLoopBackOff   1028       4d
openshift-infra     hawkular-cassandra-3-pjzkg                            0/1       Running            1026       4d
openshift-infra     hawkular-metrics-cskn2                                0/1       Running            1190       4d
openshift-infra     hawkular-metrics-schema-cvnjf                         1/1       Running            0          4d
openshift-infra     heapster-wnpvm                                        0/1       Running            683        4d
openshift-logging   logging-curator-1-deploy                              0/1       Error              0          4d
openshift-logging   logging-es-data-master-4ixg5tg1-1-deploy              0/1       Error              0          4d
openshift-logging   logging-es-data-master-e6e1tgde-1-deploy              0/1       Error              0          4d
openshift-logging   logging-es-data-master-h4wbgcm3-1-deploy              0/1       Error              0          4d
openshift-logging   logging-fluentd-2cgsp                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-6nkhx                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-h5ggs                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-mzcq9                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-pfsdv                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-trklp                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-vdwt9                                 1/1       Running            0          4d
openshift-logging   logging-fluentd-xbctq                                 1/1       Running            0          4d
openshift-logging   logging-kibana-1-deploy                               0/1       Error              0          4d
openshift-node      sync-6q9v7                                            1/1       Running            0          4d
openshift-node      sync-8zgrf                                            1/1       Running            0          4d
openshift-node      sync-cmbmt                                            1/1       Running            0          4d
openshift-node      sync-fl4g9                                            1/1       Running            0          4d
openshift-node      sync-njpvm                                            1/1       Running            0          4d
openshift-node      sync-r4gng                                            1/1       Running            0          4d
openshift-node      sync-rd6wv                                            1/1       Running            0          4d
openshift-node      sync-vtsd6                                            1/1       Running            0          4d
openshift-sdn       ovs-6hcxn                                             1/1       Running            0          4d
openshift-sdn       ovs-6r6hr                                             1/1       Running            0          4d
openshift-sdn       ovs-mh8td                                             1/1       Running            0          4d
openshift-sdn       ovs-p6h94                                             1/1       Running            0          4d
openshift-sdn       ovs-t54ns                                             1/1       Running            0          4d
openshift-sdn       ovs-v5ksp                                             1/1       Running            0          4d
openshift-sdn       ovs-x72p8                                             1/1       Running            0          4d
openshift-sdn       ovs-xtrj2                                             1/1       Running            0          4d
openshift-sdn       sdn-7lfxw                                             1/1       Running            0          4d
openshift-sdn       sdn-bfrjl                                             1/1       Running            0          4d
openshift-sdn       sdn-bxhqx                                             1/1       Running            0          4d
openshift-sdn       sdn-f68qp                                             1/1       Running            0          4d
openshift-sdn       sdn-f9gbb                                             1/1       Running            0          4d
openshift-sdn       sdn-r488h                                             1/1       Running            0          4d
openshift-sdn       sdn-tg8m2                                             1/1       Running            0          4d
openshift-sdn       sdn-w7pb5                                             1/1       Running            0          4d


#oc describe pod logging-es-data-master-4ixg5tg1-1-deploy
Name:         logging-es-data-master-4ixg5tg1-1-deploy
Namespace:    openshift-logging
Node:         dhcp47-149.lab.eng.blr.redhat.com/10.70.47.149
Start Time:   Thu, 14 Jun 2018 01:34:43 +0530
Labels:       openshift.io/deployer-pod-for.name=logging-es-data-master-4ixg5tg1-1
Annotations:  openshift.io/deployment-config.name=logging-es-data-master-4ixg5tg1
              openshift.io/deployment.name=logging-es-data-master-4ixg5tg1-1
              openshift.io/scc=restricted
Status:       Failed
IP:           10.130.2.4
Containers:
  deployment:
    Container ID:   docker://57603b95cd4f227a2428f57840adee7d9fa61ce5bd86930176a16d891819274d
    Image:          brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer:v3.10.0-0.66.0.0
    Image ID:       docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer@sha256:d5f9318a8cf2180565eaa870b44cab8e44be8121bdfe1ccdc08bcec27ba75d79
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 14 Jun 2018 01:35:06 +0530
      Finished:     Thu, 14 Jun 2018 01:45:28 +0530
    Ready:          False
    Restart Count:  0
    Environment:
      OPENSHIFT_DEPLOYMENT_NAME:       logging-es-data-master-4ixg5tg1-1
      OPENSHIFT_DEPLOYMENT_NAMESPACE:  openshift-logging
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-bq7tp (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  deployer-token-bq7tp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  deployer-token-bq7tp
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/infra=true
Tolerations:     <none>
Events:          <none>


Expected results: 
-----------------
The logging pods should be in Running and Ready (1/1) state.

Comment 1 Jeff Cantrill 2018-06-18 13:45:38 UTC
Can you please provide details regarding the size of your infra node on which you intend to run:

3 elasticsearch nodes @ 4G ram each
3 cassandra nodes @ 2G ram each
whatever else is considered infra

I suspect the issues is you are trying to run a multinode metrics and logging setup on a single infra node that is incapable of this configuration.  For these 6 pods alone you will require a minimum of 18G of ram along with the associated CPU.

Additionally, I see in your inventory the storage is grossly undersized.  We have production clusters for Elasticsearch that routinely run out of disk in under a week that are sized to be 200G per node.

Comment 2 Ashmitha Ambastha 2018-06-19 07:20:06 UTC
Details:
In the OCP infra node, 10.70.46.192, 
RAM = 32G, CPU=32, Cores per socket = 4

In the infra nodes (for Gluster-registry nodes - All 3)
RAM = 32G, CPU=1, Cores per socket = 1

Also, 
I checked if the logging and metrics pods are running on a single infra node. It doesn't seem like it.
For example, 
openshift-infra     hawkular-cassandra-1-tbmx6                            0/1       CrashLoopBackOff   1437       5d        10.129.0.8     dhcp46-245.lab.eng.blr.redhat.com
openshift-infra     hawkular-cassandra-2-7rhbk                            0/1       CrashLoopBackOff   1258       5d        10.129.2.4     dhcp47-80.lab.eng.blr.redhat.com
openshift-infra     hawkular-cassandra-3-pjzkg                            0/1       CrashLoopBackOff   1255       5d        10.131.2.10    dhcp46-192.lab.eng.blr.redhat.com

The hawkular-cassandra pod is running on the infra node 10.70.46.245 and other two pods are running on 2 different infra nodes.

The complete output of on which node each pod is running is given below.

# oc get pods --all-namespaces -o wide
NAMESPACE           NAME                                                  READY     STATUS             RESTARTS   AGE       IP             NODE
default             docker-registry-1-pmg5t                               1/1       Running            0          5d        10.131.2.6     dhcp46-192.lab.eng.blr.redhat.com
default             registry-console-1-8s9q9                              1/1       Running            0          5d        10.131.0.3     dhcp47-75.lab.eng.blr.redhat.com
default             router-1-hpn4g                                        1/1       Running            0          5d        10.70.46.192   dhcp46-192.lab.eng.blr.redhat.com
default             router-1-l5j6s                                        1/1       Running            0          5d        10.70.47.80    dhcp47-80.lab.eng.blr.redhat.com
default             router-1-mhp4c                                        1/1       Running            0          5d        10.70.47.149   dhcp47-149.lab.eng.blr.redhat.com
default             router-1-zpk57                                        1/1       Running            0          5d        10.70.46.245   dhcp46-245.lab.eng.blr.redhat.com
glusterfs           glusterblock-storage-provisioner-dc-1-r4x5h           1/1       Running            0          10h       10.130.0.7     dhcp47-58.lab.eng.blr.redhat.com
glusterfs           glusterblock-storage-provisioner-dc-1-t95dm           0/1       Evicted            0          5d        <none>         dhcp47-98.lab.eng.blr.redhat.com
glusterfs           glusterfs-storage-57m6j                               1/1       Running            1          5d        10.70.47.75    dhcp47-75.lab.eng.blr.redhat.com
glusterfs           glusterfs-storage-72q7b                               1/1       Running            0          5d        10.70.47.39    dhcp47-39.lab.eng.blr.redhat.com
glusterfs           glusterfs-storage-mr2hx                               1/1       Running            0          5d        10.70.47.58    dhcp47-58.lab.eng.blr.redhat.com
glusterfs           heketi-storage-1-nbx8m                                1/1       Running            0          5d        10.130.2.2     dhcp47-149.lab.eng.blr.redhat.com
infra-storage       glusterblock-registry-provisioner-dc-1-mlmg7          1/1       Running            0          5d        10.128.2.4     dhcp47-39.lab.eng.blr.redhat.com
infra-storage       glusterfs-registry-9sk2r                              1/1       Running            0          5d        10.70.46.245   dhcp46-245.lab.eng.blr.redhat.com
infra-storage       glusterfs-registry-qgdf5                              1/1       Running            0          5d        10.70.47.80    dhcp47-80.lab.eng.blr.redhat.com
infra-storage       glusterfs-registry-z7pcf                              1/1       Running            0          5d        10.70.47.149   dhcp47-149.lab.eng.blr.redhat.com
infra-storage       heketi-registry-1-rhlpg                               1/1       Running            0          5d        10.131.0.2     dhcp47-75.lab.eng.blr.redhat.com
kube-system         master-api-dhcp47-98.lab.eng.blr.redhat.com           1/1       Running            0          5d        10.70.47.98    dhcp47-98.lab.eng.blr.redhat.com
kube-system         master-controllers-dhcp47-98.lab.eng.blr.redhat.com   1/1       Running            0          5d        10.70.47.98    dhcp47-98.lab.eng.blr.redhat.com
kube-system         master-etcd-dhcp47-98.lab.eng.blr.redhat.com          1/1       Running            0          5d        10.70.47.98    dhcp47-98.lab.eng.blr.redhat.com
openshift-infra     hawkular-cassandra-1-tbmx6                            0/1       CrashLoopBackOff   1437       5d        10.129.0.8     dhcp46-245.lab.eng.blr.redhat.com
openshift-infra     hawkular-cassandra-2-7rhbk                            0/1       CrashLoopBackOff   1258       5d        10.129.2.4     dhcp47-80.lab.eng.blr.redhat.com
openshift-infra     hawkular-cassandra-3-pjzkg                            0/1       CrashLoopBackOff   1255       5d        10.131.2.10    dhcp46-192.lab.eng.blr.redhat.com
openshift-infra     hawkular-metrics-cskn2                                0/1       Running            1458       5d        10.131.2.8     dhcp46-192.lab.eng.blr.redhat.com
openshift-infra     hawkular-metrics-schema-cvnjf                         1/1       Running            0          5d        10.128.2.5     dhcp47-39.lab.eng.blr.redhat.com
openshift-infra     heapster-wnpvm                                        0/1       Running            836        5d        10.131.2.9     dhcp46-192.lab.eng.blr.redhat.com
openshift-logging   logging-curator-1-deploy                              0/1       Error              0          5d        10.131.2.11    dhcp46-192.lab.eng.blr.redhat.com
openshift-logging   logging-es-data-master-4ixg5tg1-1-deploy              0/1       Error              0          5d        10.130.2.4     dhcp47-149.lab.eng.blr.redhat.com
openshift-logging   logging-es-data-master-e6e1tgde-1-deploy              0/1       Error              0          5d        10.131.2.14    dhcp46-192.lab.eng.blr.redhat.com
openshift-logging   logging-es-data-master-h4wbgcm3-1-deploy              0/1       Error              0          5d        10.129.0.6     dhcp46-245.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-2cgsp                                 1/1       Running            0          5d        10.128.2.6     dhcp47-39.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-6nkhx                                 1/1       Running            0          5d        10.131.2.13    dhcp46-192.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-h5ggs                                 1/1       Running            0          5d        10.131.0.5     dhcp47-75.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-mzcq9                                 1/1       Running            0          5d        10.129.0.5     dhcp46-245.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-pfsdv                                 1/1       Running            1          5d        10.128.0.8     dhcp47-98.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-trklp                                 1/1       Running            0          5d        10.129.2.6     dhcp47-80.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-vdwt9                                 1/1       Running            0          5d        10.130.0.6     dhcp47-58.lab.eng.blr.redhat.com
openshift-logging   logging-fluentd-xbctq                                 1/1       Running            0          5d        10.130.2.6     dhcp47-149.lab.eng.blr.redhat.com
openshift-logging   logging-kibana-1-deploy                               0/1       Error              0          5d        10.129.0.4     dhcp46-245.lab.eng.blr.redhat.com
openshift-node      sync-6q9v7                                            1/1       Running            0          5d        10.70.47.149   dhcp47-149.lab.eng.blr.redhat.com
openshift-node      sync-8zgrf                                            1/1       Running            0          5d        10.70.47.80    dhcp47-80.lab.eng.blr.redhat.com
openshift-node      sync-99xtp                                            1/1       Running            0          5h        10.70.47.98    dhcp47-98.lab.eng.blr.redhat.com
openshift-node      sync-cmbmt                                            1/1       Running            0          5d        10.70.46.245   dhcp46-245.lab.eng.blr.redhat.com
openshift-node      sync-fl4g9                                            1/1       Running            0          5d        10.70.47.75    dhcp47-75.lab.eng.blr.redhat.com
openshift-node      sync-r4gng                                            1/1       Running            0          5d        10.70.47.39    dhcp47-39.lab.eng.blr.redhat.com
openshift-node      sync-rd6wv                                            1/1       Running            0          5d        10.70.47.58    dhcp47-58.lab.eng.blr.redhat.com
openshift-node      sync-vtsd6                                            1/1       Running            0          5d        10.70.46.192   dhcp46-192.lab.eng.blr.redhat.com
openshift-sdn       ovs-6hcxn                                             1/1       Running            0          5d        10.70.47.149   dhcp47-149.lab.eng.blr.redhat.com
openshift-sdn       ovs-hpvts                                             1/1       Running            0          5h        10.70.47.98    dhcp47-98.lab.eng.blr.redhat.com
openshift-sdn       ovs-mh8td                                             1/1       Running            0          5d        10.70.46.245   dhcp46-245.lab.eng.blr.redhat.com
openshift-sdn       ovs-p6h94                                             1/1       Running            0          5d        10.70.47.58    dhcp47-58.lab.eng.blr.redhat.com
openshift-sdn       ovs-t54ns                                             1/1       Running            0          5d        10.70.47.75    dhcp47-75.lab.eng.blr.redhat.com
openshift-sdn       ovs-v5ksp                                             1/1       Running            0          5d        10.70.47.80    dhcp47-80.lab.eng.blr.redhat.com
openshift-sdn       ovs-x72p8                                             1/1       Running            0          5d        10.70.47.39    dhcp47-39.lab.eng.blr.redhat.com
openshift-sdn       ovs-xtrj2                                             1/1       Running            0          5d        10.70.46.192   dhcp46-192.lab.eng.blr.redhat.com
openshift-sdn       sdn-7lfxw                                             1/1       Running            0          5d        10.70.46.245   dhcp46-245.lab.eng.blr.redhat.com
openshift-sdn       sdn-9rv8n                                             1/1       Running            1          5h        10.70.47.98    dhcp47-98.lab.eng.blr.redhat.com
openshift-sdn       sdn-bfrjl                                             1/1       Running            0          5d        10.70.46.192   dhcp46-192.lab.eng.blr.redhat.com
openshift-sdn       sdn-bxhqx                                             1/1       Running            0          5d        10.70.47.80    dhcp47-80.lab.eng.blr.redhat.com
openshift-sdn       sdn-f68qp                                             1/1       Running            0          5d        10.70.47.58    dhcp47-58.lab.eng.blr.redhat.com
openshift-sdn       sdn-f9gbb                                             1/1       Running            0          5d        10.70.47.39    dhcp47-39.lab.eng.blr.redhat.com
openshift-sdn       sdn-tg8m2                                             1/1       Running            0          5d        10.70.47.149   dhcp47-149.lab.eng.blr.redhat.com
openshift-sdn       sdn-w7pb5                                             1/1       Running            0          5d        10.70.47.75    dhcp47-75.lab.eng.blr.redhat.com

Comment 3 Rich Megginson 2018-06-19 15:20:47 UTC
please provide the output of

oc -n openshift-logging describe pod $podname

for all of the loggin gpods that are in the Error state.

Comment 11 Jeff Cantrill 2018-08-28 19:25:48 UTC
Closing INSUFFICIENT_Data as there are no changes to the information and we are unlikely to investigate further