1688647 – Curator pod in error status: curator.exceptions.ConfigurationError

Bug 1688647 - Curator pod in error status: curator.exceptions.ConfigurationError

Summary: Curator pod in error status: curator.exceptions.ConfigurationError

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Josef Karasek
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-14 07:23 UTC by Qiaoling Tang
Modified:	2019-06-04 10:45 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:45:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin-aggregated-logging pull 1568	0	'None'	closed	Fix bz 1688647 envvar subsitution in curator config files	2021-01-19 17:20:39 UTC
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:45:56 UTC

Description Qiaoling Tang 2019-03-14 07:23:11 UTC

Description of problem:
Deploy logging via community-operators, wait for all pods running, set curator cronjob schedule time to: */5 * * * * , then wait for a while, the curator pod in error status, check curator pod log, there is an error message: curator.exceptions.ConfigurationError: Configuration: Client Configuration: Location: full configuration dictionary: Bad Value: "{'certificate': '${ES_CA}', 'client_cert': '${ES_CLIENT_CERT}', 'hosts': ['${ES_HOST}'], 'timeout': '${CURATOR_TIMEOUT}', 'use_ssl': True, 'master_only': False, 'port': '${ES_PORT}', 'ssl_no_validate': False, 'client_key': '${ES_CLIENT_KEY}'}", not a valid value for dictionary value @ data['client']['port']. Check configuration file.

$ oc get pod
NAME                                                  READY   STATUS    RESTARTS   AGE
cluster-logging-operator-5bc59f67bf-c6stq             1/1     Running   0          31m
curator-1552546200-9wlfw                              0/1     Error     0          3m52s
elasticsearch-clientdatamaster-0-1-555888f475-jzdkd   2/2     Running   0          6m57s

$ oc logs curator-1552546200-9wlfw
2019-03-14 06:50:14,269 INFO	Found curator configuration in [/etc/curator/settings/config.yaml]
2019-03-14 06:50:14,278 INFO	Converting config file.
/usr/lib/python2.7/site-packages/curator/utils.py:53: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cfg = yaml.load(raw)
No handlers could be found for logger "curator.validators.SchemaCheck"
Traceback (most recent call last):
  File "/usr/bin/curator", line 9, in <module>
    load_entry_point('elasticsearch-curator==5.2.0', 'console_scripts', 'curator')()
  File "/usr/lib64/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib64/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib64/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/curator/cli.py", line 211, in cli
    run(config, action_file, dry_run)
  File "/usr/lib/python2.7/site-packages/curator/cli.py", line 106, in run
    client_args = process_config(config)
  File "/usr/lib/python2.7/site-packages/curator/config_utils.py", line 45, in process_config
    config = test_config(yaml_file)
  File "/usr/lib/python2.7/site-packages/curator/config_utils.py", line 19, in test_config
    'Client Configuration', 'full configuration dictionary').result()
  File "/usr/lib/python2.7/site-packages/curator/validators/schemacheck.py", line 68, in result
    self.test_what, self.location, self.badvalue, self.error)
curator.exceptions.ConfigurationError: Configuration: Client Configuration: Location: full configuration dictionary: Bad Value: "{'certificate': '${ES_CA}', 'client_cert': '${ES_CLIENT_CERT}', 'hosts': ['${ES_HOST}'], 'timeout': '${CURATOR_TIMEOUT}', 'use_ssl': True, 'master_only': False, 'port': '${ES_PORT}', 'ssl_no_validate': False, 'client_key': '${ES_CLIENT_KEY}'}", not a valid value for dictionary value @ data['client']['port']. Check configuration file.

$ oc get cm curator -o yaml
apiVersion: v1
data:
  actions.yaml: |
    # ---
    # Remember, leave a key empty if there is no value.  None will be a string,
    # not a Python "NoneType"
    #
    # Also remember that all examples have 'disable_action' set to True.  If you
    # want to use this action as a template, be sure to set this to False after
    # copying it.
    # actions:
    #   1:
    #     action: delete_indices
    #     description: >-
    #       Delete .operations indices older than 30 days.
    #       Ignore the error if the filter does not
    #       result in an actionable list of indices (ignore_empty_list).
    #       See https://www.elastic.co/guide/en/elasticsearch/client/curator/5.2/ex_delete_indices.html
    #     options:
    #       # Swallow curator.exception.NoIndices exception
    #       ignore_empty_list: True
    #       # In seconds, default is 300
    #       timeout_override: ${CURATOR_TIMEOUT}
    #       # Don't swallow any other exceptions
    #       continue_if_exception: False
    #       # Optionally disable action, useful for debugging
    #       disable_action: False
    #     # All filters are bound by logical AND
    #     filters:
    #     - filtertype: pattern
    #       kind: regex
    #       value: '^\.operations\..*$'
    #       exclude: False
    #     - filtertype: age
    #       # Parse timestamp from index name
    #       source: name
    #       direction: older
    #       timestring: '%Y.%m.%d'
    #       unit: days
    #       unit_count: 30
    #       exclude: False
  config.yaml: |
    # Logging example curator config file

    # uncomment and use this to override the defaults from env vars
    #.defaults:
    #  delete:
    #    days: 30

    # to keep ops logs for a different duration:
    #.operations:
    #  delete:
    #    weeks: 8

    # example for a normal project
    #myapp:
    #  delete:
    #    weeks: 1
  curator5.yaml: "---\nclient:\n  hosts:\n  - ${ES_HOST}\n  port: ${ES_PORT}\n  use_ssl:
    True\n  certificate: ${ES_CA}\n  client_cert: ${ES_CLIENT_CERT}\n  client_key:
    ${ES_CLIENT_KEY}\n  ssl_no_validate: False\n  timeout: ${CURATOR_TIMEOUT}\n  master_only:
    False\nlogging:\n  loglevel: ${CURATOR_LOG_LEVEL}\n  logformat: default\n  blacklist:
    ['elasticsearch', 'urllib3']\n  \n"
kind: ConfigMap
metadata:
  creationTimestamp: 2019-03-14T06:46:59Z
  name: curator
  namespace: openshift-logging
  ownerReferences:
  - apiVersion: logging.openshift.io/v1alpha1
    controller: true
    kind: ClusterLogging
    name: customresourcefluentd
    uid: f38fd146-4624-11e9-88a3-0afa9a7871ea
  resourceVersion: "168406"
  selfLink: /api/v1/namespaces/openshift-logging/configmaps/curator
  uid: f725bfdd-4624-11e9-90f1-065f5fd6ebc0

Version-Release number of selected component (if applicable):
4.0.0-0.nightly-2019-03-13-233958

How reproducible:
Always

Steps to Reproduce:
1.Deploy logging via community operators
2.wait for curator pod start
3.

Actual results:


Expected results:


Additional info:

Comment 1 Qiaoling Tang 2019-03-14 07:24:18 UTC

$ oc get cj -o yaml
apiVersion: v1
items:
- apiVersion: batch/v1beta1
  kind: CronJob
  metadata:
    creationTimestamp: 2019-03-14T06:46:59Z
    labels:
      component: curator
      logging-infra: curator
      provider: openshift
    name: curator
    namespace: openshift-logging
    ownerReferences:
    - apiVersion: logging.openshift.io/v1alpha1
      controller: true
      kind: ClusterLogging
      name: customresourcefluentd
      uid: f38fd146-4624-11e9-88a3-0afa9a7871ea
    resourceVersion: "199391"
    selfLink: /apis/batch/v1beta1/namespaces/openshift-logging/cronjobs/curator
    uid: f74444bb-4624-11e9-90f1-065f5fd6ebc0
  spec:
    concurrencyPolicy: Allow
    failedJobsHistoryLimit: 1
    jobTemplate:
      metadata:
        creationTimestamp: null
      spec:
        backoffLimit: 0
        parallelism: 1
        template:
          metadata:
            creationTimestamp: null
            labels:
              component: curator
              logging-infra: curator
              provider: openshift
            name: curator
            namespace: openshift-logging
          spec:
            containers:
            - env:
              - name: K8S_HOST_URL
                value: https://kubernetes.default.svc.cluster.local
              - name: ES_HOST
                value: elasticsearch
              - name: ES_PORT
                value: "9200"
              - name: ES_CLIENT_CERT
                value: /etc/curator/keys/cert
              - name: ES_CLIENT_KEY
                value: /etc/curator/keys/key
              - name: ES_CA
                value: /etc/curator/keys/ca
              - name: CURATOR_DEFAULT_DAYS
                value: "30"
              - name: CURATOR_SCRIPT_LOG_LEVEL
                value: INFO
              - name: CURATOR_LOG_LEVEL
                value: ERROR
              - name: CURATOR_TIMEOUT
                value: "300"
              image: quay.io/openshift/origin-logging-curator5:v4.0
              imagePullPolicy: IfNotPresent
              name: curator
              resources:
                limits:
                  memory: 200Mi
                requests:
                  cpu: 200m
                  memory: 200Mi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /etc/curator/keys
                name: certs
                readOnly: true
              - mountPath: /etc/curator/settings
                name: config
                readOnly: true
            dnsPolicy: ClusterFirst
            restartPolicy: Never
            schedulerName: default-scheduler
            securityContext: {}
            serviceAccount: curator
            serviceAccountName: curator
            terminationGracePeriodSeconds: 600
            volumes:
            - configMap:
                defaultMode: 420
                name: curator
              name: config
            - name: certs
              secret:
                defaultMode: 420
                secretName: curator
    schedule: '*/5 * * * *'
    successfulJobsHistoryLimit: 1
    suspend: false
  status:
    lastScheduleTime: 2019-03-14T07:20:00Z
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 5 Qiaoling Tang 2019-04-11 02:07:43 UTC

Verified in quay.io/openshift/origin-logging-curator5@sha256:83af9a080c3432d9b9dae82ea212f1fd00065981578877a6c8570f8ea4669383

Comment 7 errata-xmlrpc 2019-06-04 10:45:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.