Bug 1592195
| Summary: | Logging deploy pods are in Error state while deploying OCP 3.10.0-0.66.0 with logging enabled. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ashmitha Ambastha <asambast> | ||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Anping Li <anli> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.10.0 | CC: | anli, aos-bugs, asambast, rmeggins | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.10.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1592203 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-08-28 19:25:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1592203 | ||||||
| Attachments: |
|
||||||
Can you please provide details regarding the size of your infra node on which you intend to run: 3 elasticsearch nodes @ 4G ram each 3 cassandra nodes @ 2G ram each whatever else is considered infra I suspect the issues is you are trying to run a multinode metrics and logging setup on a single infra node that is incapable of this configuration. For these 6 pods alone you will require a minimum of 18G of ram along with the associated CPU. Additionally, I see in your inventory the storage is grossly undersized. We have production clusters for Elasticsearch that routinely run out of disk in under a week that are sized to be 200G per node. Details: In the OCP infra node, 10.70.46.192, RAM = 32G, CPU=32, Cores per socket = 4 In the infra nodes (for Gluster-registry nodes - All 3) RAM = 32G, CPU=1, Cores per socket = 1 Also, I checked if the logging and metrics pods are running on a single infra node. It doesn't seem like it. For example, openshift-infra hawkular-cassandra-1-tbmx6 0/1 CrashLoopBackOff 1437 5d 10.129.0.8 dhcp46-245.lab.eng.blr.redhat.com openshift-infra hawkular-cassandra-2-7rhbk 0/1 CrashLoopBackOff 1258 5d 10.129.2.4 dhcp47-80.lab.eng.blr.redhat.com openshift-infra hawkular-cassandra-3-pjzkg 0/1 CrashLoopBackOff 1255 5d 10.131.2.10 dhcp46-192.lab.eng.blr.redhat.com The hawkular-cassandra pod is running on the infra node 10.70.46.245 and other two pods are running on 2 different infra nodes. The complete output of on which node each pod is running is given below. # oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default docker-registry-1-pmg5t 1/1 Running 0 5d 10.131.2.6 dhcp46-192.lab.eng.blr.redhat.com default registry-console-1-8s9q9 1/1 Running 0 5d 10.131.0.3 dhcp47-75.lab.eng.blr.redhat.com default router-1-hpn4g 1/1 Running 0 5d 10.70.46.192 dhcp46-192.lab.eng.blr.redhat.com default router-1-l5j6s 1/1 Running 0 5d 10.70.47.80 dhcp47-80.lab.eng.blr.redhat.com default router-1-mhp4c 1/1 Running 0 5d 10.70.47.149 dhcp47-149.lab.eng.blr.redhat.com default router-1-zpk57 1/1 Running 0 5d 10.70.46.245 dhcp46-245.lab.eng.blr.redhat.com glusterfs glusterblock-storage-provisioner-dc-1-r4x5h 1/1 Running 0 10h 10.130.0.7 dhcp47-58.lab.eng.blr.redhat.com glusterfs glusterblock-storage-provisioner-dc-1-t95dm 0/1 Evicted 0 5d <none> dhcp47-98.lab.eng.blr.redhat.com glusterfs glusterfs-storage-57m6j 1/1 Running 1 5d 10.70.47.75 dhcp47-75.lab.eng.blr.redhat.com glusterfs glusterfs-storage-72q7b 1/1 Running 0 5d 10.70.47.39 dhcp47-39.lab.eng.blr.redhat.com glusterfs glusterfs-storage-mr2hx 1/1 Running 0 5d 10.70.47.58 dhcp47-58.lab.eng.blr.redhat.com glusterfs heketi-storage-1-nbx8m 1/1 Running 0 5d 10.130.2.2 dhcp47-149.lab.eng.blr.redhat.com infra-storage glusterblock-registry-provisioner-dc-1-mlmg7 1/1 Running 0 5d 10.128.2.4 dhcp47-39.lab.eng.blr.redhat.com infra-storage glusterfs-registry-9sk2r 1/1 Running 0 5d 10.70.46.245 dhcp46-245.lab.eng.blr.redhat.com infra-storage glusterfs-registry-qgdf5 1/1 Running 0 5d 10.70.47.80 dhcp47-80.lab.eng.blr.redhat.com infra-storage glusterfs-registry-z7pcf 1/1 Running 0 5d 10.70.47.149 dhcp47-149.lab.eng.blr.redhat.com infra-storage heketi-registry-1-rhlpg 1/1 Running 0 5d 10.131.0.2 dhcp47-75.lab.eng.blr.redhat.com kube-system master-api-dhcp47-98.lab.eng.blr.redhat.com 1/1 Running 0 5d 10.70.47.98 dhcp47-98.lab.eng.blr.redhat.com kube-system master-controllers-dhcp47-98.lab.eng.blr.redhat.com 1/1 Running 0 5d 10.70.47.98 dhcp47-98.lab.eng.blr.redhat.com kube-system master-etcd-dhcp47-98.lab.eng.blr.redhat.com 1/1 Running 0 5d 10.70.47.98 dhcp47-98.lab.eng.blr.redhat.com openshift-infra hawkular-cassandra-1-tbmx6 0/1 CrashLoopBackOff 1437 5d 10.129.0.8 dhcp46-245.lab.eng.blr.redhat.com openshift-infra hawkular-cassandra-2-7rhbk 0/1 CrashLoopBackOff 1258 5d 10.129.2.4 dhcp47-80.lab.eng.blr.redhat.com openshift-infra hawkular-cassandra-3-pjzkg 0/1 CrashLoopBackOff 1255 5d 10.131.2.10 dhcp46-192.lab.eng.blr.redhat.com openshift-infra hawkular-metrics-cskn2 0/1 Running 1458 5d 10.131.2.8 dhcp46-192.lab.eng.blr.redhat.com openshift-infra hawkular-metrics-schema-cvnjf 1/1 Running 0 5d 10.128.2.5 dhcp47-39.lab.eng.blr.redhat.com openshift-infra heapster-wnpvm 0/1 Running 836 5d 10.131.2.9 dhcp46-192.lab.eng.blr.redhat.com openshift-logging logging-curator-1-deploy 0/1 Error 0 5d 10.131.2.11 dhcp46-192.lab.eng.blr.redhat.com openshift-logging logging-es-data-master-4ixg5tg1-1-deploy 0/1 Error 0 5d 10.130.2.4 dhcp47-149.lab.eng.blr.redhat.com openshift-logging logging-es-data-master-e6e1tgde-1-deploy 0/1 Error 0 5d 10.131.2.14 dhcp46-192.lab.eng.blr.redhat.com openshift-logging logging-es-data-master-h4wbgcm3-1-deploy 0/1 Error 0 5d 10.129.0.6 dhcp46-245.lab.eng.blr.redhat.com openshift-logging logging-fluentd-2cgsp 1/1 Running 0 5d 10.128.2.6 dhcp47-39.lab.eng.blr.redhat.com openshift-logging logging-fluentd-6nkhx 1/1 Running 0 5d 10.131.2.13 dhcp46-192.lab.eng.blr.redhat.com openshift-logging logging-fluentd-h5ggs 1/1 Running 0 5d 10.131.0.5 dhcp47-75.lab.eng.blr.redhat.com openshift-logging logging-fluentd-mzcq9 1/1 Running 0 5d 10.129.0.5 dhcp46-245.lab.eng.blr.redhat.com openshift-logging logging-fluentd-pfsdv 1/1 Running 1 5d 10.128.0.8 dhcp47-98.lab.eng.blr.redhat.com openshift-logging logging-fluentd-trklp 1/1 Running 0 5d 10.129.2.6 dhcp47-80.lab.eng.blr.redhat.com openshift-logging logging-fluentd-vdwt9 1/1 Running 0 5d 10.130.0.6 dhcp47-58.lab.eng.blr.redhat.com openshift-logging logging-fluentd-xbctq 1/1 Running 0 5d 10.130.2.6 dhcp47-149.lab.eng.blr.redhat.com openshift-logging logging-kibana-1-deploy 0/1 Error 0 5d 10.129.0.4 dhcp46-245.lab.eng.blr.redhat.com openshift-node sync-6q9v7 1/1 Running 0 5d 10.70.47.149 dhcp47-149.lab.eng.blr.redhat.com openshift-node sync-8zgrf 1/1 Running 0 5d 10.70.47.80 dhcp47-80.lab.eng.blr.redhat.com openshift-node sync-99xtp 1/1 Running 0 5h 10.70.47.98 dhcp47-98.lab.eng.blr.redhat.com openshift-node sync-cmbmt 1/1 Running 0 5d 10.70.46.245 dhcp46-245.lab.eng.blr.redhat.com openshift-node sync-fl4g9 1/1 Running 0 5d 10.70.47.75 dhcp47-75.lab.eng.blr.redhat.com openshift-node sync-r4gng 1/1 Running 0 5d 10.70.47.39 dhcp47-39.lab.eng.blr.redhat.com openshift-node sync-rd6wv 1/1 Running 0 5d 10.70.47.58 dhcp47-58.lab.eng.blr.redhat.com openshift-node sync-vtsd6 1/1 Running 0 5d 10.70.46.192 dhcp46-192.lab.eng.blr.redhat.com openshift-sdn ovs-6hcxn 1/1 Running 0 5d 10.70.47.149 dhcp47-149.lab.eng.blr.redhat.com openshift-sdn ovs-hpvts 1/1 Running 0 5h 10.70.47.98 dhcp47-98.lab.eng.blr.redhat.com openshift-sdn ovs-mh8td 1/1 Running 0 5d 10.70.46.245 dhcp46-245.lab.eng.blr.redhat.com openshift-sdn ovs-p6h94 1/1 Running 0 5d 10.70.47.58 dhcp47-58.lab.eng.blr.redhat.com openshift-sdn ovs-t54ns 1/1 Running 0 5d 10.70.47.75 dhcp47-75.lab.eng.blr.redhat.com openshift-sdn ovs-v5ksp 1/1 Running 0 5d 10.70.47.80 dhcp47-80.lab.eng.blr.redhat.com openshift-sdn ovs-x72p8 1/1 Running 0 5d 10.70.47.39 dhcp47-39.lab.eng.blr.redhat.com openshift-sdn ovs-xtrj2 1/1 Running 0 5d 10.70.46.192 dhcp46-192.lab.eng.blr.redhat.com openshift-sdn sdn-7lfxw 1/1 Running 0 5d 10.70.46.245 dhcp46-245.lab.eng.blr.redhat.com openshift-sdn sdn-9rv8n 1/1 Running 1 5h 10.70.47.98 dhcp47-98.lab.eng.blr.redhat.com openshift-sdn sdn-bfrjl 1/1 Running 0 5d 10.70.46.192 dhcp46-192.lab.eng.blr.redhat.com openshift-sdn sdn-bxhqx 1/1 Running 0 5d 10.70.47.80 dhcp47-80.lab.eng.blr.redhat.com openshift-sdn sdn-f68qp 1/1 Running 0 5d 10.70.47.58 dhcp47-58.lab.eng.blr.redhat.com openshift-sdn sdn-f9gbb 1/1 Running 0 5d 10.70.47.39 dhcp47-39.lab.eng.blr.redhat.com openshift-sdn sdn-tg8m2 1/1 Running 0 5d 10.70.47.149 dhcp47-149.lab.eng.blr.redhat.com openshift-sdn sdn-w7pb5 1/1 Running 0 5d 10.70.47.75 dhcp47-75.lab.eng.blr.redhat.com please provide the output of oc -n openshift-logging describe pod $podname for all of the loggin gpods that are in the Error state. Closing INSUFFICIENT_Data as there are no changes to the information and we are unlikely to investigate further |
Created attachment 1452519 [details] The inventory file used for the deployment Description of problem: ----------------------- The logging deploy pods are in Error state while deploying OCP 3.10.0-0.66.0 build. Version-Release number of selected component (if applicable): OCP 3.10.0-0.66.0 How reproducible: always Steps to Reproduce: ------------------- 1. While deploying OCP + CNS greenfield deployment using ansible installer with a set up of 1 master node, 3 worker nodes and 1 infra node with logging, metrics and gluster registry enabled (3 more nodes for gluster-registry). 2. Configured the set up with the pre-requisites required. 3. Next, ran the prerequisite and deploy-cluster playbooks. Both passed successfully and hence the deployment should be successful. 4. But, the 3 logging-es-data-master deploy pods, the logging-curator deploy pod and the logging-kibana deploy pod are in Error state. Actual results: --------------- - Checked on the pods, # oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-pmg5t 1/1 Running 0 4d default registry-console-1-8s9q9 1/1 Running 0 4d default router-1-hpn4g 1/1 Running 0 4d default router-1-l5j6s 1/1 Running 0 4d default router-1-mhp4c 1/1 Running 0 4d default router-1-zpk57 1/1 Running 0 4d glusterfs glusterblock-storage-provisioner-dc-1-t95dm 1/1 Running 0 4d glusterfs glusterfs-storage-57m6j 1/1 Running 1 4d glusterfs glusterfs-storage-72q7b 1/1 Running 0 4d glusterfs glusterfs-storage-mr2hx 1/1 Running 0 4d glusterfs heketi-storage-1-nbx8m 1/1 Running 0 4d infra-storage glusterblock-registry-provisioner-dc-1-mlmg7 1/1 Running 0 4d infra-storage glusterfs-registry-9sk2r 1/1 Running 0 4d infra-storage glusterfs-registry-qgdf5 1/1 Running 0 4d infra-storage glusterfs-registry-z7pcf 1/1 Running 0 4d infra-storage heketi-registry-1-rhlpg 1/1 Running 0 4d kube-system master-api-dhcp47-98.lab.eng.blr.redhat.com 1/1 Running 0 4d kube-system master-controllers-dhcp47-98.lab.eng.blr.redhat.com 1/1 Running 0 4d kube-system master-etcd-dhcp47-98.lab.eng.blr.redhat.com 1/1 Running 0 4d openshift-infra hawkular-cassandra-1-tbmx6 0/1 CrashLoopBackOff 1174 4d openshift-infra hawkular-cassandra-2-7rhbk 0/1 CrashLoopBackOff 1028 4d openshift-infra hawkular-cassandra-3-pjzkg 0/1 Running 1026 4d openshift-infra hawkular-metrics-cskn2 0/1 Running 1190 4d openshift-infra hawkular-metrics-schema-cvnjf 1/1 Running 0 4d openshift-infra heapster-wnpvm 0/1 Running 683 4d openshift-logging logging-curator-1-deploy 0/1 Error 0 4d openshift-logging logging-es-data-master-4ixg5tg1-1-deploy 0/1 Error 0 4d openshift-logging logging-es-data-master-e6e1tgde-1-deploy 0/1 Error 0 4d openshift-logging logging-es-data-master-h4wbgcm3-1-deploy 0/1 Error 0 4d openshift-logging logging-fluentd-2cgsp 1/1 Running 0 4d openshift-logging logging-fluentd-6nkhx 1/1 Running 0 4d openshift-logging logging-fluentd-h5ggs 1/1 Running 0 4d openshift-logging logging-fluentd-mzcq9 1/1 Running 0 4d openshift-logging logging-fluentd-pfsdv 1/1 Running 0 4d openshift-logging logging-fluentd-trklp 1/1 Running 0 4d openshift-logging logging-fluentd-vdwt9 1/1 Running 0 4d openshift-logging logging-fluentd-xbctq 1/1 Running 0 4d openshift-logging logging-kibana-1-deploy 0/1 Error 0 4d openshift-node sync-6q9v7 1/1 Running 0 4d openshift-node sync-8zgrf 1/1 Running 0 4d openshift-node sync-cmbmt 1/1 Running 0 4d openshift-node sync-fl4g9 1/1 Running 0 4d openshift-node sync-njpvm 1/1 Running 0 4d openshift-node sync-r4gng 1/1 Running 0 4d openshift-node sync-rd6wv 1/1 Running 0 4d openshift-node sync-vtsd6 1/1 Running 0 4d openshift-sdn ovs-6hcxn 1/1 Running 0 4d openshift-sdn ovs-6r6hr 1/1 Running 0 4d openshift-sdn ovs-mh8td 1/1 Running 0 4d openshift-sdn ovs-p6h94 1/1 Running 0 4d openshift-sdn ovs-t54ns 1/1 Running 0 4d openshift-sdn ovs-v5ksp 1/1 Running 0 4d openshift-sdn ovs-x72p8 1/1 Running 0 4d openshift-sdn ovs-xtrj2 1/1 Running 0 4d openshift-sdn sdn-7lfxw 1/1 Running 0 4d openshift-sdn sdn-bfrjl 1/1 Running 0 4d openshift-sdn sdn-bxhqx 1/1 Running 0 4d openshift-sdn sdn-f68qp 1/1 Running 0 4d openshift-sdn sdn-f9gbb 1/1 Running 0 4d openshift-sdn sdn-r488h 1/1 Running 0 4d openshift-sdn sdn-tg8m2 1/1 Running 0 4d openshift-sdn sdn-w7pb5 1/1 Running 0 4d #oc describe pod logging-es-data-master-4ixg5tg1-1-deploy Name: logging-es-data-master-4ixg5tg1-1-deploy Namespace: openshift-logging Node: dhcp47-149.lab.eng.blr.redhat.com/10.70.47.149 Start Time: Thu, 14 Jun 2018 01:34:43 +0530 Labels: openshift.io/deployer-pod-for.name=logging-es-data-master-4ixg5tg1-1 Annotations: openshift.io/deployment-config.name=logging-es-data-master-4ixg5tg1 openshift.io/deployment.name=logging-es-data-master-4ixg5tg1-1 openshift.io/scc=restricted Status: Failed IP: 10.130.2.4 Containers: deployment: Container ID: docker://57603b95cd4f227a2428f57840adee7d9fa61ce5bd86930176a16d891819274d Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer:v3.10.0-0.66.0.0 Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer@sha256:d5f9318a8cf2180565eaa870b44cab8e44be8121bdfe1ccdc08bcec27ba75d79 Port: <none> Host Port: <none> State: Terminated Reason: Error Exit Code: 1 Started: Thu, 14 Jun 2018 01:35:06 +0530 Finished: Thu, 14 Jun 2018 01:45:28 +0530 Ready: False Restart Count: 0 Environment: OPENSHIFT_DEPLOYMENT_NAME: logging-es-data-master-4ixg5tg1-1 OPENSHIFT_DEPLOYMENT_NAMESPACE: openshift-logging Mounts: /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-bq7tp (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: deployer-token-bq7tp: Type: Secret (a volume populated by a Secret) SecretName: deployer-token-bq7tp Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/infra=true Tolerations: <none> Events: <none> Expected results: ----------------- The logging pods should be in Running and Ready (1/1) state.