1352486 – [Intsvc_public_275_281] Index deletion is not completely performed

Bug 1352486 - [Intsvc_public_275_281] Index deletion is not completely performed

Summary: [Intsvc_public_275_281] Index deletion is not completely performed

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Rich Megginson
QA Contact:	chunchen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-04 09:17 UTC by Xia Zhao
Modified:	2016-09-30 02:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-17 18:40:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	https://github.com/openshift origin-aggregated-logging issues 196	0	None	None	None	2020-05-28 09:33:59 UTC

Description Xia Zhao 2016-07-04 09:17:59 UTC

Problem description: 
Index deletion is not 100% performed, please refer to details in the following parts

Version-Release number of selected component (if applicable):
openshift3/logging-curator         3.3.0               0f4e933a812a

How reproducible:
Always

Steps to Reproduce:
0. Note down the current indices for your interest in openshift:
current indices after 2nd deletion are:
oc exec logging-curator-1-qv9pi -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices
.operations.2016.04.30
.operations.2016.07.04
project-dev.2016.07.03
project-dev.2016.07.04
project-prod.2016.06.06
project-prod.2016.07.04
project-qe.2016.06.27
project-qe.2016.07.04

1. Configure curator settings like this (by editting the configmap) :

$ oc rsh logging-curator-1-c5odl
sh-4.2$ cat /etc/curator/settings/config.yaml 
# Logging example curator config file

# uncomment and use this to override the defaults from env vars
.defaults:
  delete:
    days: 30
  runhour: 0
  runminute: 0

# to keep ops logs for a different duration:
.operations:
  delete:
    months: 2

# example for a normal project
myapp:
  delete:
    weeks: 1

test:
  delete:
    days: 7

project-dev:
  delete:
    days: 1

project-qe:
  delete:
    days: 7

project-prod:
  delete:
    weeks: 4

2. Scale down curator pod:
oc scale rc logging-curator-1 --replicas=0
3. Scale up curator pod:
oc scale rc logging-curator-1 --replicas=1
4. Check the index deletion result:
oc exec logging-curator-1-sh7gc -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices

Actual Result:
Log entries for index project-qe which is older than 7 days: "project-qe.2016.06.27" still existed, the others are deleted:
.operations.2016.07.04
project-dev.2016.07.04
project-prod.2016.07.04
project-qe.2016.06.27
project-qe.2016.07.04


Expected Result:
Log entry project-qe.2016.06.27 should be deleted

Additional info:
$ oc logs -f logging-curator-1-sh7gc
curator running [5] jobs
No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'test.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 7, 'all_indices': False}
curator run finish
curator running [5] jobs
No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': None, 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': ('.searchguard*', '.kibana*', '.apiman_*', 'project-qe.*', 'myapp.*', '.operations.*', 'test.*', 'project-prod.*', 'project-dev.*'), 'older_than': 30, 'all_indices': False}
No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': '.operations.', 'time_unit': 'months', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 2, 'all_indices': False}
No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'project-dev.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 1, 'all_indices': False}
No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'project-prod.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 28, 'all_indices': False}
No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'test.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 7, 'all_indices': False}
curator run finish

Comment 1 Rich Megginson 2016-07-07 03:22:32 UTC

What is the timezone of your node, and the timezone of the curator pod?

Comment 2 Rich Megginson 2016-07-07 03:23:21 UTC

If the timezone is not UTC, can you create a node with UTC, then deploy logging to it, and verify that both the node and the curator pod are UTC?

Comment 3 Rich Megginson 2016-07-07 21:53:59 UTC

did you use `oc edit configmap logging-curator` to edit the config map?

Comment 4 Rich Megginson 2016-07-08 02:33:14 UTC

Try to reproduce it with debugging enabled.

# oadm policy add-scc-to-user privileged system:serviceaccount:logging:aggregated-logging-curator

# oc get pods -l component=curator # set pod name to curpod
# oc exec $curpod -- cat /opt/app-root/src/run_cron.py > /tmp/run_cron.py

edit /tmp/run_cron.py - change INFO to DEBUG, and change ERROR to DEBUG

# oc edit dc logging-curator

add under volumeMounts:

        - mountPath: /opt/app-root/src/run_cron.py
          name: run-cron
          readOnly: true

add under volumes:

      - hostPath:
          path: /tmp/run_cron.py
        name: run-cron

# redeploy the dc

# oc deploy dc/logging-curator --latest

# wait for the new pod to start

# oc logs $newcurpod # see if it spews a lot of messages

NOTE: This will make the curator run take a lot longer - you will need to look for "curator run finish" in the curator pod log to know for sure if the run is complete

# oc logs $newcurpod | grep "curator run finish"

After it has run, grab the oc logs $newcurpod > curator.log 2>&1 and attach to the bug

Comment 5 Rich Megginson 2016-07-08 03:31:02 UTC

ok - the problem is that curator doesn't handle it when you have two projects that have the exact same time unit and value.  In this case:

test:
  delete:
    days: 7

project-qe:
  delete:
    days: 7

What happens is that curator only processes one of these.  At the low level we are passing in this:

# curator .... --prefix test. --prefix project-qe.

Curator doesn't like having two prefixes, so it just drops one.

Comment 6 Xia Zhao 2016-07-08 10:22:15 UTC

@Rich Thanks a lot for the instruction about enable debugging for curator, I appreciated.
-- I struggled to get a working install of OSE to exercise it, but was blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1353804. Will do it later.

Some answer to the questions:
1. node and curator pod is in same time zone, and I've tested that curator runhour/runminute worked fine

2. Yes, I used `oc edit configmap logging-curator` to edit the config map, that is the only way I found that can pass my curator config into pod

3. Yes, I have two index with same policy "days: 7" , thank you for pointing out that's the key for this problem -- I don't know about it before

Thanks,
Xia

Comment 7 Rich Megginson 2016-07-09 03:41:08 UTC

PR https://github.com/openshift/origin-aggregated-logging/pull/197

Comment 8 Rich Megginson 2016-07-21 03:48:24 UTC

PR was merged

Comment 9 Rich Megginson 2016-07-21 21:55:52 UTC

New deployment and curator images are in brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
I've verified that a 3.3 install uses the new deployment and curator images.

Comment 10 Peng Li 2016-07-22 09:14:47 UTC

verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0' and could still reproduce, I'll wait and verify it on next Monday.

Comment 11 Rich Megginson 2016-07-22 13:35:22 UTC

(In reply to Peng Li from comment #10)
> verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0'
> and could still reproduce, I'll wait and verify it on next Monday.

Can you verify the version of the curator image you are using?

Comment 12 Rich Megginson 2016-07-22 13:36:09 UTC

(In reply to Rich Megginson from comment #11)
> (In reply to Peng Li from comment #10)
> > verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0'
> > and could still reproduce, I'll wait and verify it on next Monday.
> 
> Can you verify the version of the curator image you are using?

What I mean is doing something like a `docker inspect` of the image UUID, or `oc describe` or something like that.  just 3.3.0 isn't enough.

Comment 13 Peng Li 2016-07-25 07:04:07 UTC

Verified and close this bug, test image id:
# oc describe pod logging-curator-3-3tv9q
<--snipte-->
    Image ID:		docker://sha256:2c88e1273c11bc5fa63b227e19fc2e03afcceae3fc049788848fea7b586480b6
<-- snipte-->

Test with project-dev and project-qe both delete after 1 day, and project-dev and project-qe both delete after 1 week.

Case 1: test delete after 1 day
[root@ origin]# oc exec logging-curator-1-eqzai -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices

.searchguard.logging-es-fwtkjflr-1-6gl3o
.searchguard.logging-es-w67sao1r-1-utmij
project-dev.2016.07.24
project-dev.2016.07.25
project-qe.2016.07.24
project-qe.2016.07.25
[root@ origin]# oc get configmap/logging-curator -o yamlapiVersion: v1
data:
  config.yaml: "# Logging example curator config file\n\n# uncomment and use this
    to override the defaults from env vars\n.defaults:\n  delete:\n    days: 30\n
    \ runhour: 06\n  runminute: 30\n\n# to keep ops logs for a different duration:\n.operations:\n
    \ delete:\n    weeks: 8\n\n# example for a normal project\nproject-dev:\n  delete:\n
    \   days1: 1\n\nproject-qe:\n  delete:\n    days1: 1        \n\n"
kind: ConfigMap
metadata:
  creationTimestamp: 2016-07-25T06:06:42Z
  labels:
    logging-infra: support
  name: logging-curator
  namespace: clogg
  resourceVersion: "6778"
  selfLink: /api/v1/namespaces/clogg/configmaps/logging-curator
  uid: f5647dc9-522d-11e6-8907-fa163e8cd48e
[root@openshift-131 ~]# oc deploy --latest logging-curator
Started deployment #2
[root@ origin]# oc exec logging-curator-2-zxx0a -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices

.searchguard.logging-es-fwtkjflr-1-6gl3o
.searchguard.logging-es-w67sao1r-1-utmij
project-dev.2016.07.25
project-qe.2016.07.25


Case 2: test delete after 1 week.
[root@ ~]# oc exec logging-curator-2-vfl8h -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices

.searchguard.logging-es-fwtkjflr-1-6gl3o
.searchguard.logging-es-w67sao1r-1-utmij
project-dev.2016.07.18
project-dev.2016.07.25
project-qe.2016.07.18
project-qe.2016.07.25
[root@ origin]# oc get configmap/logging-curator -o yaml
apiVersion: v1
data:
  config.yaml: "# Logging example curator config file\n\n# uncomment and use this
    to override the defaults from env vars\n.defaults:\n  delete:\n    days: 30\n
    \ runhour: 06\n  runminute: 30\n\n# to keep ops logs for a different duration:\n.operations:\n
    \ delete:\n    weeks: 8\n\n# example for a normal project\nproject-dev:\n  delete:\n
    \   weeks: 1\n\nproject-qe:\n  delete:\n    weeks: 1        \n\n"
kind: ConfigMap
metadata:
  creationTimestamp: 2016-07-25T06:06:42Z
  labels:
    logging-infra: support
  name: logging-curator
  namespace: clogg
  resourceVersion: "6778"
  selfLink: /api/v1/namespaces/clogg/configmaps/logging-curator
  uid: f5647dc9-522d-11e6-8907-fa163e8cd48e
[root@openshift-131 ~]# oc deploy --latest logging-curator
Started deployment #3
[root@ ~]# oc exec logging-curator-3-3tv9q -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices
.searchguard.logging-es-fwtkjflr-1-6gl3o
.searchguard.logging-es-w67sao1r-1-utmij
project-dev.2016.07.25
project-qe.2016.07.25

Comment 14 chunchen 2016-07-25 07:15:22 UTC

According to comment #13, mark it as verified.

Comment 15 Peng Li 2016-07-25 07:27:16 UTC

(In reply to Rich Megginson from comment #12)
> (In reply to Rich Megginson from comment #11)
> > (In reply to Peng Li from comment #10)
> > > verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0'
> > > and could still reproduce, I'll wait and verify it on next Monday.
> > 
> > Can you verify the version of the curator image you are using?
> 
> What I mean is doing something like a `docker inspect` of the image UUID, or
> `oc describe` or something like that.  just 3.3.0 isn't enough.

Thanks for your help, that should be the reason.

Note You need to log in before you can comment on or make changes to this bug.