Bug 1327028

Summary:	aggregated logging / elasticsearch maintenance(https://github.com/openshift/origin-aggregated-logging/issues/42)
Product:	OpenShift Container Platform	Reporter:	Miheer Salunke <misalunk>
Component:	RFE	Assignee:	Luke Meyer <lmeyer>
Status:	CLOSED ERRATA	QA Contact:	chunchen <chunchen>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.1.0	CC:	aos-bugs, erich, erjones, ewolinet, jialiu, jkaur, jokerman, lmeyer, misalunk, mmccomas, pep, vigoyal, wsun, xiazhao
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-09-27 09:37:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1267746

Description Miheer Salunke 2016-04-14 06:39:27 UTC

Description of problem:

We've installed the aggregated-logging Stack from https://github.com/openshift/origin-aggregated-logging/tree/master/deployment
Now, collecting and displaying logs works fine, but collected data grows, and we cant invoke elasticsearch's REST API to make some cleanup/maintenance jobs. even when i connect to the elasticsearch-pod and call
curl -X GET http://127.0.0.1:9200
the response is always
curl: (52) Empty reply from server

How do you maintain elasticsearch-data? Is there a special token / secret that must be used to connect elasticsearch?

Version-Release number of selected component (if applicable):


How reproducible:
3.1

Steps to Reproduce:
1.Mentioned in description
2.
3.

Actual results:


Expected results:


Additional info:
https://github.com/openshift/origin-aggregated-logging/issues/42
Fix - https://github.com/openshift/origin-aggregated-logging/pull/57

Comment 3 Luke Meyer 2016-05-05 14:23:13 UTC

With 3.1.1 and 3.2.0 images (coming soon) we will be providing an update that also creates an administrative cert and ACL to allow manual maintenance of Elasticsearch. You can rsh into an ES pod and use the admin key/cert/ca from the logging-elasticsearch secret to run curl -X DELETE against indices and any other operation, manually. More explicit documentation on this is forthcoming.

This is a stopgap until the Curator solution is delivered post-3.2.0, which should happen as soon as we can get it through QE and out the door along with all the other major changes that are in Origin but missed 3.2.0.

Comment 4 Eric Jones 2016-05-19 16:50:26 UTC

*** Bug 1337633 has been marked as a duplicate of this bug. ***

Comment 5 Eric Jones 2016-05-19 16:56:50 UTC

@Luke, are those 3.1 images released yet? And if so, do we have that more explicit documentation yet?

Comment 6 Luke Meyer 2016-05-19 17:18:17 UTC

(In reply to Eric Jones from comment #5)
> @Luke, are those 3.1 images released yet?

Yes; if you redeploy with the latest 3.1.1 images, the deployer creates an admin cert in the elasticsearch secret.

> And if so, do we have that more
> explicit documentation yet?

Alas, no. Perhaps a project for me today.

The admin cert can be used either within the ES container or, when extracted, from anywhere that can reach the pod SDN.

Comment 7 Eric Jones 2016-05-19 17:23:20 UTC

Thanks for the information Luke.

I have another question in the same vein, how bad for the logging pods would it be to simply delete the contents of the logging PVC?

And if it would be catastrophic, is the best bet (to clean up the log storage) just to wait on how to use these new images?

Comment 8 Luke Meyer 2016-05-19 18:01:07 UTC

If you scale down the ES deployment(s) and delete the PVC contents, everything should be fine when you scale them back up, aside from (obviously) not having any data (including losing any customizations to kibana profiles). Whether to wait depends on the severity of the situation and how much you care about keeping old logs...

Comment 9 Eric Jones 2016-05-19 18:30:27 UTC

Thanks again Luke, that is exactly the answer I was hoping for.

Comment 11 Xia Zhao 2016-06-12 03:16:10 UTC

Set to verified since the logging-curator image is now available in Dockerhub for use

Comment 13 ewolinet 2016-06-15 15:13:20 UTC

There is the admin cert that is available as of 3.2 that a customer can use to manually delete old indices from Elasticsearch.

I will open up a docs PR to document how to do this.

1. Check that you have the following secret entries "admin-key", "admin-cert", "admin-ca" in the logging-elasticsearch secret
$ oc get secret/logging-elasticsearch -o yaml

1 a. If there are not these values, you will need to rerun the logging-deployer with version at least 3.2.0 so that the deployer can generate these certificates and attach them to the secret. That process is described in the OpenShift docs[1]

2. Connect to an Elasticsearch pod that is in the cluster you are attempting to clean up from:
Find a pod
$ oc get pods -l component=es
$ oc get pods -l component=es-ops

Connect to pod
$ oc rsh {your_es_pod}

2 a. From within an Elasticsearch pod you can issue the following to delete an index of your choice
$ curl --key /etc/elasticsearch/keys/admin-key --cert /etc/elasticsearch/keys/admin-cert --cacert /etc/elasticsearch/keys/admin-ca -XDELETE "https://localhost:9200/{your_index}"

There is further documentation described on the ES Delete[2] and Delete by query[3] API pages.

[1] https://docs.openshift.org/latest/install_config/upgrading/manual_upgrades.html#manual-upgrading-efk-logging-stack
[2] https://www.elastic.co/guide/en/elasticsearch/reference/1.5/docs-delete.html
[3] https://www.elastic.co/guide/en/elasticsearch/reference/1.5/docs-delete-by-query.html

Comment 14 ewolinet 2016-06-15 19:03:08 UTC

Docs PR opened here:
https://github.com/openshift/openshift-docs/pull/2291

Comment 15 Jaspreet Kaur 2016-06-16 11:16:17 UTC

Thanks for the workaround. This is definetely helpful.

Additional info :

With delete api [1], if your indices tagged with date is more simple
we can delete data that start with 2016-OR 2016-05- the first will delete all data in 2016 the second all data in may 2016.

e.g;
curl -XDELETE 'http://localhost:9200/logstash-2016-05-'
curl -XDELETE 'http://localhost:9200/logstash-2016-05-'

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html

Comment 22 Luke Meyer 2016-07-15 19:45:45 UTC

https://github.com/openshift/openshift-docs/pull/2475 adds doc to describe using Curator for ES index maintenance. This is available in Origin and OSE 3.2.1. I expect the docs will be updated next week.

Comment 27 errata-xmlrpc 2016-09-27 09:37:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933