Bug 1442321 - [starter][starter-us-east-2][starter-us-west-1]Logging is not deployed successfully, curator and es pod are not started up
Summary: [starter][starter-us-east-2][starter-us-west-1]Logging is not deployed succes...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Logging
Version: 3.x
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-14 05:23 UTC by Junqi Zhao
Modified: 2018-10-18 16:05 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-18 16:05:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
free-int kibana ui -- Application is not Avaiable (73.21 KB, image/png)
2017-04-24 05:43 UTC, Junqi Zhao
no flags Details
Kibana UI, 504 error (82.36 KB, image/png)
2017-04-25 07:13 UTC, Junqi Zhao
no flags Details
free-stg-3.5.5.8 env info (53.12 KB, text/plain)
2017-04-26 03:29 UTC, Junqi Zhao
no flags Details
Kibana starter-us-east-2 ui -- Application is not Avaiable (73.11 KB, image/png)
2017-04-27 00:54 UTC, Junqi Zhao
no flags Details
free-int kibana ui -- Application is not Avaiable (73.51 KB, image/png)
2017-04-27 00:54 UTC, Junqi Zhao
no flags Details
free-stg kibana ui -- Application is not Avaiable (72.73 KB, image/png)
2017-04-27 00:55 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2017-04-14 05:23:34 UTC
Description of problem:
free-int environments, curator pod is not started up and the es pod is not generated.
# oc get pod -n logging
NAME                           READY     STATUS             RESTARTS   AGE
logging-curator-1-kvfbq        0/1       CrashLoopBackOff   1731       10d
logging-es-cjnirw75-1-deploy   0/1       Error              0          10d
logging-es-eqofe4tu-1-deploy   0/1       Error              0          10d
logging-fluentd-7wqjj          1/1       Running            5          10d
logging-fluentd-l2c1p          1/1       Running            5          10d
logging-fluentd-l3p81          1/1       Running            5          10d
logging-fluentd-lpg0j          1/1       Running            5          10d
logging-fluentd-mlfjh          1/1       Running            5          10d
logging-fluentd-wfgt3          1/1       Running            5          10d
logging-kibana-1-dqhfn         2/2       Running            1079       10d
logging-kibana-1-zp62r         2/2       Running            1079       10d

Version-Release number of selected component (if applicable):
OpenShift Master:v3.5.5.6 (online version 3.5.0.13)
Kubernetes Master:v1.5.2+43a9be4

How reproducible:
Always

Steps to Reproduce:
1. check logging pods' status by "oc get pod -n logging"
2.
3.

Actual results:
curator pod is not started up and there are only es deploy pods in error status,
es pods are not generated.

Expected results:
curator and es pod should be in healthy status.

Additional info:

Comment 1 Junqi Zhao 2017-04-14 05:36:29 UTC
curator pod log
# oc logs logging-curator-1-kvfbq -n logging 
Was not able to connect to Elasticearch at logging-es:9200 within 60 attempts

and there are no output info for es-deployer pod

Comment 2 Junqi Zhao 2017-04-14 09:27:21 UTC
same error in starter-us-east-2 environment

Comment 3 Jeff Cantrill 2017-04-17 13:54:44 UTC
Assuming you are trying to deploy logging on a 3.5 cluster, the deployer is not being used.  It was replaced in favor of the openshift_logging ansible role.  Please see https://github.com/jcantrill/origin-aggregated-logging/blob/1bbef826cb432f2a8c37577d1bd7f8fa52589e3b/docs/issues.md for information to provide when filing an issue against logging.

Comment 4 Junqi Zhao 2017-04-18 00:19:46 UTC
@Jeff,

the logging is deployed by Stefanie Forrester <sedgar> from online team, I will assign this defect to him.

Comment 5 Junqi Zhao 2017-04-18 00:22:55 UTC
Hi Stefanie,

Please see Comment 3, we are using ansible to deploy logging now, deployer mod is not used. Please use ansible to deploy logging next time

Comment 6 Junqi Zhao 2017-04-18 03:27:01 UTC
same issue on free-stg environments

Comment 8 Stefanie Forrester 2017-04-20 00:09:56 UTC
I tried a fresh deploy of logging in free-stg, but I'm getting an error from some of the pods.

2017-04-19 23:57:18 +0000 UTC   2017-04-19 23:57:18 +0000 UTC   2         logging-es-ofspyd2m     DeploymentConfig                                            Warning   FailedRetry         {deployments-controller }                               logging-es-ofspyd2m-1: About to stop retrying logging-es-ofspyd2m-1: couldn't create deployer pod for logging/logging-es-ofspyd2m-1: pods "logging-es-ofspyd2m-1-deploy" is forbidden: pod node label selector conflicts with its project node label selector

----------------------------------------
OpenShift Cluster Details

Cluster version: atomic-openshift-3.5.5.7-1.git.0.644a8c2.el7.x86_64
Cluster infrastructure: AWS
Number of infrastructure nodes and size: 3 infra nodes. 4 vCPU @2.30GHz, 16GB memory
Number of compute nodes and size: 4 compute nodes. Same instance size as infra nodes.
Logging Image versions:

logging-kibana: 3.5.0-3
logging-curator: 3.5.0-3
logging-elasticsearch: 3.5.0-3

Installer Details

How was logging installed: openshift_ansible role, called by Ops install script
Openshift installer version: openshift-ansible 3.6.9-1
Ansible inventory file: none, dynamic ec2 inventory
Ansible installer logs: attached as installer_ansible_log.txt

Comment 10 Junqi Zhao 2017-04-20 00:28:37 UTC
If you use ansible to deploy logging, there should not be pods like "logging-es-ofspyd2m-1-deploy", I think you can delete the logging stacks and then re-deploy logging.

The deploy steps we usually use:
$ cd /usr/share/ansible/openshift-ansible/
$ ansible-playbook -vvv -i ${INVENTORY_FILE} playbooks/byo/openshift-cluster/openshift-logging.yml

You can refer to the following inventory file
[OSEv3:children]
masters

[masters]
${MASTER} openshift_public_hostname=${MASTER}

[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file=${LIBRA-KEY-FILE}
deployment_type=openshift-enterprise
openshift_logging_install_logging=true

openshift_logging_kibana_hostname=kibana.${SUBDOMAIN}
openshift_logging_kibana_ops_hostname=kibana-ops.${SUBDOMAIN}
public_master_url=https://${MASTER}:8443
 

openshift_logging_image_prefix=registry.ops.openshift.com/openshift3/
openshift_logging_image_version=3.5.0

openshift_logging_namespace=logging

Comment 11 Stefanie Forrester 2017-04-21 14:21:37 UTC
Thanks, I think it's working now.

Comment 12 Junqi Zhao 2017-04-24 05:42:04 UTC
for free-int
logging url:
https://kibana.d800.free-int.openshiftapps.com

"Application is not available" happens, see the attached picture.Maybe it is a router error too, I checked the pod, they are running well.

NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-cxj1b       1/1       Running   1          2d
logging-es-1559jcfj-1-2m3qd   1/1       Running   0          2d
logging-es-cciss2ei-1-ckcx2   1/1       Running   0          2d
logging-fluentd-0662g         1/1       Running   0          2d
logging-fluentd-2s1l9         1/1       Running   0          2d
logging-fluentd-78wxh         1/1       Running   0          2d
logging-fluentd-h7llk         1/1       Running   0          2d
logging-fluentd-mtlxw         1/1       Running   0          2d
logging-fluentd-snsfn         1/1       Running   0          2d
logging-kibana-1-1xqz7        2/2       Running   13         2d
logging-kibana-1-241hm        2/2       Running   13         2d

for starter-us-east-2 and free-stg, it is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1444715

Comment 13 Junqi Zhao 2017-04-24 05:43:05 UTC
Created attachment 1273511 [details]
free-int kibana ui -- Application is not Avaiable

Comment 14 Stefanie Forrester 2017-04-24 14:51:04 UTC
CICD team reported the following error after deploying logging in free-stg:

  20m		20m		1	{kubelet ip-172-31-75-47.us-east-2.compute.internal}	spec.containers{elasticsearch}	Warning		Failed		Failed to start container with docker id d8c65d20a41a with error: Error response from daemon: {"message":"SELinux relabeling of /var/lib/origin/openshift.local.volumes/pods/c5669c44-2693-11e7-8200-027558dba8e1/volumes/kubernetes.io~configmap/elasticsearch-config is not allowed: \"disk quota exceeded\""}

We might be hitting the configured disk quota of 512Mi.

I also saw this error when I logged in on free-int:

2m        18m       58        logging-fluentd           DaemonSet                                                        Normal    FailedPlacement               {daemonset-controller }                  failed to place pod on "ip-172-31-62-45.ec2.internal": Node didn't have enough resource: pods, requested: 1, used: 40, capacity: 40

Comment 15 Matt Woodson 2017-04-24 19:56:47 UTC
We discovered a bug in the openshift_logging role in openshift-ansible.  I have re-deployed logging on all of the free clusters. These all appear to be working now.

Comment 17 Junqi Zhao 2017-04-25 07:13:16 UTC
Created attachment 1273843 [details]
Kibana UI, 504 error

Comment 21 Junqi Zhao 2017-04-26 03:29:41 UTC
Created attachment 1274066 [details]
free-stg-3.5.5.8 env info

Comment 24 Junqi Zhao 2017-04-27 00:54:10 UTC
Created attachment 1274486 [details]
Kibana starter-us-east-2 ui -- Application is not Avaiable

Comment 25 Junqi Zhao 2017-04-27 00:54:41 UTC
Created attachment 1274487 [details]
free-int kibana ui -- Application is not Avaiable

Comment 26 Junqi Zhao 2017-04-27 00:55:05 UTC
Created attachment 1274488 [details]
free-stg kibana ui -- Application is not Avaiable

Comment 27 Junqi Zhao 2017-04-27 01:24:39 UTC
Logging has been removed from all the Free environments, email about this has been sent to us a few minutes ago, so we can not verify it now.

Comment 29 Xia Zhao 2017-05-04 09:35:14 UTC
Issue reproduced on dev-preview-int, adding the related prefix to this bug.

The behavior on dev-preview-int is: 
1) logging components were deployed, but no es pod is running:

# oc get po -n logging
NAME                      READY     STATUS    RESTARTS   AGE
logging-curator-1-5p00d   1/1       Running   14         3h
logging-fluentd-03j3w     1/1       Running   1          3h
logging-fluentd-5226h     1/1       Running   1          3h
logging-fluentd-lhz5r     1/1       Running   1          3h
logging-fluentd-t8cdq     1/1       Running   1          3h
logging-fluentd-tjj7m     1/1       Running   1          3h
logging-fluentd-xfhn8     1/1       Running   1          3h
logging-kibana-1-4jwh0    2/2       Running   10         3h
logging-kibana-1-bmbgg    2/2       Running   10         3h

2) logging route is not accessible, get  "Application is not available" when visiting it from browser

Comment 30 Liming Zhou 2017-05-08 06:07:29 UTC
Issue continue exist on dev-preview-int, logging route is not accessible, get  "Application is not available" when visiting it from browser.

And the curator and kibana pod failed to start, see:
Command ***** oc get pod -n logging ***** result as below:
 
NAME                          READY     STATUS              RESTARTS   AGE
logging-curator-2-plct3       0/1       CrashLoopBackOff    534        2d
logging-es-t7btqf5y-2-mv44q   1/1       Running             1          2d
logging-es-zkkcqgza-2-6pxp0   0/1       ContainerCreating   0          2d
logging-fluentd-1w2nz         1/1       Running             2          2d
logging-fluentd-6fn6w         1/1       Running             1          2d
logging-fluentd-bdg1s         1/1       Running             2          2d
logging-fluentd-j2c5c         1/1       Running             2          2d
logging-fluentd-lx9g7         1/1       Running             1          2d
logging-fluentd-svqdx         1/1       Running             1          2d
logging-kibana-1-8vj8m        1/2       CrashLoopBackOff    651        2d
logging-kibana-1-rt7cn        1/2       CrashLoopBackOff    649        2d

Comment 32 Jeff Cantrill 2017-05-11 15:51:17 UTC
Setting assigned to back but it looks like we need to narrow down the release stream to which this applies.  I see references to both 3.5 and 3.6.  Which is it?

Comment 34 Xia Zhao 2017-05-22 05:53:45 UTC
Issue continue exist on dev-preview-int, logging route is not accessible, get  "Application is not available" when visiting it from browser.

Comment 35 Xia Zhao 2017-05-23 09:55:12 UTC
on dev-preview-prod, no "View Arichieve" button can be seen on pods UI, and thus no entry to logging systems there.

Comment 36 Xia Zhao 2017-05-23 10:11:23 UTC
Please ignore comment #35 since logging is not deployed there.

Comment 41 Peter Portante 2017-07-07 00:13:48 UTC
Can somebody concisely explain what this BZ is tracking for a bug?  I see all manner of errors along the way, but I am having a hard time understanding what the particular bug is that makes this BZ worth tracking it.

Additionally, this is marked as being for 3.x currently, which is really broad. Could we consider closing this BZ and opening another when there is a bug that needs to be tracked?

Comment 44 Junqi Zhao 2017-09-28 08:27:03 UTC
Can we close this defect, since we are not going to deploy logging in Free tier.

Comment 45 Junqi Zhao 2017-09-30 02:48:40 UTC
Tested on free-stg, all logging pods are running well.

Delete [free-stg] from title

Comment 46 Junqi Zhao 2017-09-30 03:16:34 UTC
@Stefanie

Do we want to deploy logging on all free tier cluster?

I see logging is only deployed on the following cluster
free-stg,starter-us-east-2

but not deployed on
free-int, starter-us-east-1, starter-ca-central-1, starter-us-west-1,starter-us-west-2

Comment 47 Peter Portante 2017-10-01 04:40:33 UTC
Logging does not look good on free-stg, replica count is 0, logging pods are showing restarts, error messages in fluentd pods complaining of missing namespace_id field (known issue).

On starter-us-east-2, all the fluentd pods are deployed, but nothing else.  It is as if logging deployment failed during install.  Do we have the logs from this install?

Comment 48 Stefanie Forrester 2017-10-09 15:04:09 UTC
As far as I know, we still don't have any plans to deploy logging in starter. There are stability/scale issues that need to be addressed first.

Comment 49 Peter Portante 2017-10-09 17:36:46 UTC
If we have fluentd pods deployed, we deployed logging.  Looks like it was not deployed correctly though.

Comment 50 Junqi Zhao 2018-01-15 08:26:10 UTC
free-int cluster,curator and es pod are in running status now, but there is another issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1534419
fluentd pods are in Error status on master nodes

Remove free-int from title.

Comment 53 Jeff Cantrill 2018-10-18 16:05:51 UTC
Closing as both these clusters are functional


Note You need to log in before you can comment on or make changes to this bug.