Problem description: Index deletion is not 100% performed, please refer to details in the following parts Version-Release number of selected component (if applicable): openshift3/logging-curator 3.3.0 0f4e933a812a How reproducible: Always Steps to Reproduce: 0. Note down the current indices for your interest in openshift: current indices after 2nd deletion are: oc exec logging-curator-1-qv9pi -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices .operations.2016.04.30 .operations.2016.07.04 project-dev.2016.07.03 project-dev.2016.07.04 project-prod.2016.06.06 project-prod.2016.07.04 project-qe.2016.06.27 project-qe.2016.07.04 1. Configure curator settings like this (by editting the configmap) : $ oc rsh logging-curator-1-c5odl sh-4.2$ cat /etc/curator/settings/config.yaml # Logging example curator config file # uncomment and use this to override the defaults from env vars .defaults: delete: days: 30 runhour: 0 runminute: 0 # to keep ops logs for a different duration: .operations: delete: months: 2 # example for a normal project myapp: delete: weeks: 1 test: delete: days: 7 project-dev: delete: days: 1 project-qe: delete: days: 7 project-prod: delete: weeks: 4 2. Scale down curator pod: oc scale rc logging-curator-1 --replicas=0 3. Scale up curator pod: oc scale rc logging-curator-1 --replicas=1 4. Check the index deletion result: oc exec logging-curator-1-sh7gc -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices Actual Result: Log entries for index project-qe which is older than 7 days: "project-qe.2016.06.27" still existed, the others are deleted: .operations.2016.07.04 project-dev.2016.07.04 project-prod.2016.07.04 project-qe.2016.06.27 project-qe.2016.07.04 Expected Result: Log entry project-qe.2016.06.27 should be deleted Additional info: $ oc logs -f logging-curator-1-sh7gc curator running [5] jobs No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'test.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 7, 'all_indices': False} curator run finish curator running [5] jobs No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': None, 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': ('.searchguard*', '.kibana*', '.apiman_*', 'project-qe.*', 'myapp.*', '.operations.*', 'test.*', 'project-prod.*', 'project-dev.*'), 'older_than': 30, 'all_indices': False} No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': '.operations.', 'time_unit': 'months', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 2, 'all_indices': False} No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'project-dev.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 1, 'all_indices': False} No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'project-prod.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 28, 'all_indices': False} No indices matched provided args: {'regex': None, 'index': (), 'suffix': None, 'newer_than': None, 'closed_only': False, 'prefix': 'test.', 'time_unit': 'days', 'timestring': '%Y.%m.%d', 'exclude': (), 'older_than': 7, 'all_indices': False} curator run finish
What is the timezone of your node, and the timezone of the curator pod?
If the timezone is not UTC, can you create a node with UTC, then deploy logging to it, and verify that both the node and the curator pod are UTC?
did you use `oc edit configmap logging-curator` to edit the config map?
Try to reproduce it with debugging enabled. # oadm policy add-scc-to-user privileged system:serviceaccount:logging:aggregated-logging-curator # oc get pods -l component=curator # set pod name to curpod # oc exec $curpod -- cat /opt/app-root/src/run_cron.py > /tmp/run_cron.py edit /tmp/run_cron.py - change INFO to DEBUG, and change ERROR to DEBUG # oc edit dc logging-curator add under volumeMounts: - mountPath: /opt/app-root/src/run_cron.py name: run-cron readOnly: true add under volumes: - hostPath: path: /tmp/run_cron.py name: run-cron # redeploy the dc # oc deploy dc/logging-curator --latest # wait for the new pod to start # oc logs $newcurpod # see if it spews a lot of messages NOTE: This will make the curator run take a lot longer - you will need to look for "curator run finish" in the curator pod log to know for sure if the run is complete # oc logs $newcurpod | grep "curator run finish" After it has run, grab the oc logs $newcurpod > curator.log 2>&1 and attach to the bug
ok - the problem is that curator doesn't handle it when you have two projects that have the exact same time unit and value. In this case: test: delete: days: 7 project-qe: delete: days: 7 What happens is that curator only processes one of these. At the low level we are passing in this: # curator .... --prefix test. --prefix project-qe. Curator doesn't like having two prefixes, so it just drops one.
@Rich Thanks a lot for the instruction about enable debugging for curator, I appreciated. -- I struggled to get a working install of OSE to exercise it, but was blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1353804. Will do it later. Some answer to the questions: 1. node and curator pod is in same time zone, and I've tested that curator runhour/runminute worked fine 2. Yes, I used `oc edit configmap logging-curator` to edit the config map, that is the only way I found that can pass my curator config into pod 3. Yes, I have two index with same policy "days: 7" , thank you for pointing out that's the key for this problem -- I don't know about it before Thanks, Xia
PR https://github.com/openshift/origin-aggregated-logging/pull/197
PR was merged
New deployment and curator images are in brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 I've verified that a 3.3 install uses the new deployment and curator images.
verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0' and could still reproduce, I'll wait and verify it on next Monday.
(In reply to Peng Li from comment #10) > verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0' > and could still reproduce, I'll wait and verify it on next Monday. Can you verify the version of the curator image you are using?
(In reply to Rich Megginson from comment #11) > (In reply to Peng Li from comment #10) > > verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0' > > and could still reproduce, I'll wait and verify it on next Monday. > > Can you verify the version of the curator image you are using? What I mean is doing something like a `docker inspect` of the image UUID, or `oc describe` or something like that. just 3.3.0 isn't enough.
Verified and close this bug, test image id: # oc describe pod logging-curator-3-3tv9q <--snipte--> Image ID: docker://sha256:2c88e1273c11bc5fa63b227e19fc2e03afcceae3fc049788848fea7b586480b6 <-- snipte--> Test with project-dev and project-qe both delete after 1 day, and project-dev and project-qe both delete after 1 week. Case 1: test delete after 1 day [root@ origin]# oc exec logging-curator-1-eqzai -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices .searchguard.logging-es-fwtkjflr-1-6gl3o .searchguard.logging-es-w67sao1r-1-utmij project-dev.2016.07.24 project-dev.2016.07.25 project-qe.2016.07.24 project-qe.2016.07.25 [root@ origin]# oc get configmap/logging-curator -o yamlapiVersion: v1 data: config.yaml: "# Logging example curator config file\n\n# uncomment and use this to override the defaults from env vars\n.defaults:\n delete:\n days: 30\n \ runhour: 06\n runminute: 30\n\n# to keep ops logs for a different duration:\n.operations:\n \ delete:\n weeks: 8\n\n# example for a normal project\nproject-dev:\n delete:\n \ days1: 1\n\nproject-qe:\n delete:\n days1: 1 \n\n" kind: ConfigMap metadata: creationTimestamp: 2016-07-25T06:06:42Z labels: logging-infra: support name: logging-curator namespace: clogg resourceVersion: "6778" selfLink: /api/v1/namespaces/clogg/configmaps/logging-curator uid: f5647dc9-522d-11e6-8907-fa163e8cd48e [root@openshift-131 ~]# oc deploy --latest logging-curator Started deployment #2 [root@ origin]# oc exec logging-curator-2-zxx0a -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices .searchguard.logging-es-fwtkjflr-1-6gl3o .searchguard.logging-es-w67sao1r-1-utmij project-dev.2016.07.25 project-qe.2016.07.25 Case 2: test delete after 1 week. [root@ ~]# oc exec logging-curator-2-vfl8h -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices .searchguard.logging-es-fwtkjflr-1-6gl3o .searchguard.logging-es-w67sao1r-1-utmij project-dev.2016.07.18 project-dev.2016.07.25 project-qe.2016.07.18 project-qe.2016.07.25 [root@ origin]# oc get configmap/logging-curator -o yaml apiVersion: v1 data: config.yaml: "# Logging example curator config file\n\n# uncomment and use this to override the defaults from env vars\n.defaults:\n delete:\n days: 30\n \ runhour: 06\n runminute: 30\n\n# to keep ops logs for a different duration:\n.operations:\n \ delete:\n weeks: 8\n\n# example for a normal project\nproject-dev:\n delete:\n \ weeks: 1\n\nproject-qe:\n delete:\n weeks: 1 \n\n" kind: ConfigMap metadata: creationTimestamp: 2016-07-25T06:06:42Z labels: logging-infra: support name: logging-curator namespace: clogg resourceVersion: "6778" selfLink: /api/v1/namespaces/clogg/configmaps/logging-curator uid: f5647dc9-522d-11e6-8907-fa163e8cd48e [root@openshift-131 ~]# oc deploy --latest logging-curator Started deployment #3 [root@ ~]# oc exec logging-curator-3-3tv9q -- curator --host logging-es --use_ssl --certificate /etc/curator/keys/ca --client-cert /etc/curator/keys/cert --client-key /etc/curator/keys/key --loglevel ERROR show indices --all-indices .searchguard.logging-es-fwtkjflr-1-6gl3o .searchguard.logging-es-w67sao1r-1-utmij project-dev.2016.07.25 project-qe.2016.07.25
According to comment #13, mark it as verified.
(In reply to Rich Megginson from comment #12) > (In reply to Rich Megginson from comment #11) > > (In reply to Peng Li from comment #10) > > > verified with 'registry.qe.openshift.com/openshift3/logging-curator:3.3.0' > > > and could still reproduce, I'll wait and verify it on next Monday. > > > > Can you verify the version of the curator image you are using? > > What I mean is doing something like a `docker inspect` of the image UUID, or > `oc describe` or something like that. just 3.3.0 isn't enough. Thanks for your help, that should be the reason.