Description of problem: free-int environments, curator pod is not started up and the es pod is not generated. # oc get pod -n logging NAME READY STATUS RESTARTS AGE logging-curator-1-kvfbq 0/1 CrashLoopBackOff 1731 10d logging-es-cjnirw75-1-deploy 0/1 Error 0 10d logging-es-eqofe4tu-1-deploy 0/1 Error 0 10d logging-fluentd-7wqjj 1/1 Running 5 10d logging-fluentd-l2c1p 1/1 Running 5 10d logging-fluentd-l3p81 1/1 Running 5 10d logging-fluentd-lpg0j 1/1 Running 5 10d logging-fluentd-mlfjh 1/1 Running 5 10d logging-fluentd-wfgt3 1/1 Running 5 10d logging-kibana-1-dqhfn 2/2 Running 1079 10d logging-kibana-1-zp62r 2/2 Running 1079 10d Version-Release number of selected component (if applicable): OpenShift Master:v3.5.5.6 (online version 3.5.0.13) Kubernetes Master:v1.5.2+43a9be4 How reproducible: Always Steps to Reproduce: 1. check logging pods' status by "oc get pod -n logging" 2. 3. Actual results: curator pod is not started up and there are only es deploy pods in error status, es pods are not generated. Expected results: curator and es pod should be in healthy status. Additional info:
curator pod log # oc logs logging-curator-1-kvfbq -n logging Was not able to connect to Elasticearch at logging-es:9200 within 60 attempts and there are no output info for es-deployer pod
same error in starter-us-east-2 environment
Assuming you are trying to deploy logging on a 3.5 cluster, the deployer is not being used. It was replaced in favor of the openshift_logging ansible role. Please see https://github.com/jcantrill/origin-aggregated-logging/blob/1bbef826cb432f2a8c37577d1bd7f8fa52589e3b/docs/issues.md for information to provide when filing an issue against logging.
@Jeff, the logging is deployed by Stefanie Forrester <sedgar> from online team, I will assign this defect to him.
Hi Stefanie, Please see Comment 3, we are using ansible to deploy logging now, deployer mod is not used. Please use ansible to deploy logging next time
same issue on free-stg environments
I tried a fresh deploy of logging in free-stg, but I'm getting an error from some of the pods. 2017-04-19 23:57:18 +0000 UTC 2017-04-19 23:57:18 +0000 UTC 2 logging-es-ofspyd2m DeploymentConfig Warning FailedRetry {deployments-controller } logging-es-ofspyd2m-1: About to stop retrying logging-es-ofspyd2m-1: couldn't create deployer pod for logging/logging-es-ofspyd2m-1: pods "logging-es-ofspyd2m-1-deploy" is forbidden: pod node label selector conflicts with its project node label selector ---------------------------------------- OpenShift Cluster Details Cluster version: atomic-openshift-3.5.5.7-1.git.0.644a8c2.el7.x86_64 Cluster infrastructure: AWS Number of infrastructure nodes and size: 3 infra nodes. 4 vCPU @2.30GHz, 16GB memory Number of compute nodes and size: 4 compute nodes. Same instance size as infra nodes. Logging Image versions: logging-kibana: 3.5.0-3 logging-curator: 3.5.0-3 logging-elasticsearch: 3.5.0-3 Installer Details How was logging installed: openshift_ansible role, called by Ops install script Openshift installer version: openshift-ansible 3.6.9-1 Ansible inventory file: none, dynamic ec2 inventory Ansible installer logs: attached as installer_ansible_log.txt
If you use ansible to deploy logging, there should not be pods like "logging-es-ofspyd2m-1-deploy", I think you can delete the logging stacks and then re-deploy logging. The deploy steps we usually use: $ cd /usr/share/ansible/openshift-ansible/ $ ansible-playbook -vvv -i ${INVENTORY_FILE} playbooks/byo/openshift-cluster/openshift-logging.yml You can refer to the following inventory file [OSEv3:children] masters [masters] ${MASTER} openshift_public_hostname=${MASTER} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file=${LIBRA-KEY-FILE} deployment_type=openshift-enterprise openshift_logging_install_logging=true openshift_logging_kibana_hostname=kibana.${SUBDOMAIN} openshift_logging_kibana_ops_hostname=kibana-ops.${SUBDOMAIN} public_master_url=https://${MASTER}:8443 openshift_logging_image_prefix=registry.ops.openshift.com/openshift3/ openshift_logging_image_version=3.5.0 openshift_logging_namespace=logging
Thanks, I think it's working now.
for free-int logging url: https://kibana.d800.free-int.openshiftapps.com "Application is not available" happens, see the attached picture.Maybe it is a router error too, I checked the pod, they are running well. NAME READY STATUS RESTARTS AGE logging-curator-1-cxj1b 1/1 Running 1 2d logging-es-1559jcfj-1-2m3qd 1/1 Running 0 2d logging-es-cciss2ei-1-ckcx2 1/1 Running 0 2d logging-fluentd-0662g 1/1 Running 0 2d logging-fluentd-2s1l9 1/1 Running 0 2d logging-fluentd-78wxh 1/1 Running 0 2d logging-fluentd-h7llk 1/1 Running 0 2d logging-fluentd-mtlxw 1/1 Running 0 2d logging-fluentd-snsfn 1/1 Running 0 2d logging-kibana-1-1xqz7 2/2 Running 13 2d logging-kibana-1-241hm 2/2 Running 13 2d for starter-us-east-2 and free-stg, it is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1444715
Created attachment 1273511 [details] free-int kibana ui -- Application is not Avaiable
CICD team reported the following error after deploying logging in free-stg: 20m 20m 1 {kubelet ip-172-31-75-47.us-east-2.compute.internal} spec.containers{elasticsearch} Warning Failed Failed to start container with docker id d8c65d20a41a with error: Error response from daemon: {"message":"SELinux relabeling of /var/lib/origin/openshift.local.volumes/pods/c5669c44-2693-11e7-8200-027558dba8e1/volumes/kubernetes.io~configmap/elasticsearch-config is not allowed: \"disk quota exceeded\""} We might be hitting the configured disk quota of 512Mi. I also saw this error when I logged in on free-int: 2m 18m 58 logging-fluentd DaemonSet Normal FailedPlacement {daemonset-controller } failed to place pod on "ip-172-31-62-45.ec2.internal": Node didn't have enough resource: pods, requested: 1, used: 40, capacity: 40
We discovered a bug in the openshift_logging role in openshift-ansible. I have re-deployed logging on all of the free clusters. These all appear to be working now.
Created attachment 1273843 [details] Kibana UI, 504 error
Created attachment 1274066 [details] free-stg-3.5.5.8 env info
Created attachment 1274486 [details] Kibana starter-us-east-2 ui -- Application is not Avaiable
Created attachment 1274487 [details] free-int kibana ui -- Application is not Avaiable
Created attachment 1274488 [details] free-stg kibana ui -- Application is not Avaiable
Logging has been removed from all the Free environments, email about this has been sent to us a few minutes ago, so we can not verify it now.
Issue reproduced on dev-preview-int, adding the related prefix to this bug. The behavior on dev-preview-int is: 1) logging components were deployed, but no es pod is running: # oc get po -n logging NAME READY STATUS RESTARTS AGE logging-curator-1-5p00d 1/1 Running 14 3h logging-fluentd-03j3w 1/1 Running 1 3h logging-fluentd-5226h 1/1 Running 1 3h logging-fluentd-lhz5r 1/1 Running 1 3h logging-fluentd-t8cdq 1/1 Running 1 3h logging-fluentd-tjj7m 1/1 Running 1 3h logging-fluentd-xfhn8 1/1 Running 1 3h logging-kibana-1-4jwh0 2/2 Running 10 3h logging-kibana-1-bmbgg 2/2 Running 10 3h 2) logging route is not accessible, get "Application is not available" when visiting it from browser
Issue continue exist on dev-preview-int, logging route is not accessible, get "Application is not available" when visiting it from browser. And the curator and kibana pod failed to start, see: Command ***** oc get pod -n logging ***** result as below: NAME READY STATUS RESTARTS AGE logging-curator-2-plct3 0/1 CrashLoopBackOff 534 2d logging-es-t7btqf5y-2-mv44q 1/1 Running 1 2d logging-es-zkkcqgza-2-6pxp0 0/1 ContainerCreating 0 2d logging-fluentd-1w2nz 1/1 Running 2 2d logging-fluentd-6fn6w 1/1 Running 1 2d logging-fluentd-bdg1s 1/1 Running 2 2d logging-fluentd-j2c5c 1/1 Running 2 2d logging-fluentd-lx9g7 1/1 Running 1 2d logging-fluentd-svqdx 1/1 Running 1 2d logging-kibana-1-8vj8m 1/2 CrashLoopBackOff 651 2d logging-kibana-1-rt7cn 1/2 CrashLoopBackOff 649 2d
Setting assigned to back but it looks like we need to narrow down the release stream to which this applies. I see references to both 3.5 and 3.6. Which is it?
Issue continue exist on dev-preview-int, logging route is not accessible, get "Application is not available" when visiting it from browser.
on dev-preview-prod, no "View Arichieve" button can be seen on pods UI, and thus no entry to logging systems there.
Please ignore comment #35 since logging is not deployed there.
Can somebody concisely explain what this BZ is tracking for a bug? I see all manner of errors along the way, but I am having a hard time understanding what the particular bug is that makes this BZ worth tracking it. Additionally, this is marked as being for 3.x currently, which is really broad. Could we consider closing this BZ and opening another when there is a bug that needs to be tracked?
Can we close this defect, since we are not going to deploy logging in Free tier.
Tested on free-stg, all logging pods are running well. Delete [free-stg] from title
@Stefanie Do we want to deploy logging on all free tier cluster? I see logging is only deployed on the following cluster free-stg,starter-us-east-2 but not deployed on free-int, starter-us-east-1, starter-ca-central-1, starter-us-west-1,starter-us-west-2
Logging does not look good on free-stg, replica count is 0, logging pods are showing restarts, error messages in fluentd pods complaining of missing namespace_id field (known issue). On starter-us-east-2, all the fluentd pods are deployed, but nothing else. It is as if logging deployment failed during install. Do we have the logs from this install?
As far as I know, we still don't have any plans to deploy logging in starter. There are stability/scale issues that need to be addressed first.
If we have fluentd pods deployed, we deployed logging. Looks like it was not deployed correctly though.
free-int cluster,curator and es pod are in running status now, but there is another issue: https://bugzilla.redhat.com/show_bug.cgi?id=1534419 fluentd pods are in Error status on master nodes Remove free-int from title.
Closing as both these clusters are functional