Bug 1648453

Summary: [3.11] Curator5 fails to start in 3.11 - missing package
Product: OpenShift Container Platform Reporter: Wesley Hearn <whearn>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Qiaoling Tang <qitang>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: anli, ansverma, aos-bugs, dapark, haowang, jcantril, jgoulding, jrosenta, kramdoss, lmeyer, lstanton, mbarnes, mnoguera, msaini, nils.ketelsen, ocasalsa, qitang, rbost, rmeggins, scortopa, sgarciam, snalawad, tmanor, wsun
Target Milestone: ---Keywords: OpsBlocker, Regression
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ose-logging-curator5:v3.11.54 Doc Type: Bug Fix
Doc Text:
Cause: The curator image was built with the wrong version of the python-elasticsearch package. Consequence: The curator image would not start. Fix: Use the correct version of the python-elasticsearch package to build the curator image. Result: The curator image works as expected.
Story Points: ---
Clone Of:
: 1657560 (view as bug list) Environment:
Last Closed: 2019-01-10 09:04:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1657560    

Description Wesley Hearn 2018-11-09 18:26:53 UTC
Description of problem:
When curator tries to run it errors out with a dependency error.

Version-Release number of selected component (if applicable):
v3.11

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


[root@engint-master-39a9e ~]# oc logs logging-curator-1541734200-zj5cq
2018-11-09 03:30:11,490 INFO	Found curator configuration in [/etc/curator/settings/config.yaml]
2018-11-09 03:30:11,500 INFO	Converting config file.
Traceback (most recent call last):
  File "/usr/bin/curator", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3084, in <module>
    @_call_aside
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3070, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3097, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 651, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 952, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 839, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'elasticsearch<6.0.0,>=5.4.0' distribution was not found and is required by elasticsearch-curator
[root@engint-master-39a9e ~]# 



Expected results:
Curator to be running

Additional info:
There is a PR open to fix this https://github.com/openshift/origin-aggregated-logging/pull/1446
This BZ is just to track the fix.

Comment 1 Rich Megginson 2018-11-09 18:29:25 UTC
We are trying to revert the curator image to use the curator package version 5.2 - but CI is currently broken - see https://github.com/openshift/release/issues/2106 and https://github.com/openshift/aos-cd-jobs/pull/1526

Comment 3 Rich Megginson 2018-11-09 21:48:10 UTC
ART needs to sync and rebuild the curator5 image

Comment 4 Takeshi Larsson 2018-11-27 08:42:32 UTC
I am still getting the same curator python stacktrace, exact same message using the the reverted 5.2.0 version in ocp 3.11.43.

Comment 8 Rich Megginson 2018-11-28 21:35:31 UTC
We discovered one more problem - we need to use the previous version of python-elasticsearch.  We have tagged python-elasticsearch-5.4.0-1.el7 into rhaos-3.11-rhel-7-candidate and rhaos-4.0-rhel-7-candidate and are waiting for ART to rebuild the composes with these packages and rebuild the curator images.

Comment 9 Matthew Barnes 2018-11-29 19:27:55 UTC
For anyone else hitting this, the v3.11.23 image works as a temporary workaround.

Comment 11 Qiaoling Tang 2018-12-04 02:54:21 UTC
Verified in ose-logging-curator5:v3.11.50

Comment 12 Serena Cortopassi 2018-12-06 10:41:03 UTC
(In reply to Matthew Barnes from comment #9)
> For anyone else hitting this, the v3.11.23 image works as a temporary
> workaround.

It seems that v3.11.23 is not longer available, am I right?

(In reply to Qiaoling Tang from comment #11)
> Verified in ose-logging-curator5:v3.11.50

Where did you find .50? I guess latest tag right now points to v3.11.43 [1], which is the one I have on a fresh 3.11 installation, where the issue is still present.

[1] https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/ose-logging-curator5

Comment 13 Nils Ketelsen 2018-12-06 11:02:08 UTC
(In reply to Serena Cortopassi from comment #12)
> (In reply to Matthew Barnes from comment #9)
> > For anyone else hitting this, the v3.11.23 image works as a temporary
> > workaround.
> 
> It seems that v3.11.23 is not longer available, am I right?

If you are using openshift Enterprise: As far as I know it never was. released versions were 3.11.16 and 3.11.43. 

As a workaround I have created a onetime Job in my setting that will clean out the logs - My disks were running full and I needed spontaneous action. The Job uses the curator image from openshift origin (your cluster needs to be able to download from docker hub). Yaml-File for the ontime job is below. I guess the same image would also work with the standard CronJob setup by openshift enterprise, but I have not tested that.

-----snip-----
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: null
  name: logging-curator-onetime
spec:
  backoffLimit: 0
  template:
    metadata:
      creationTimestamp: null
      labels:
        component: curator
        logging-infra: curator
        provider: openshift
      name: logging-curator-onetime
    spec:
      containers:
      - env:
        - name: K8S_HOST_URL
          value: https://kubernetes.default.svc.cluster.local
        - name: ES_HOST
          value: logging-es
        - name: ES_PORT
          value: "9200"
        - name: ES_CLIENT_CERT
          value: /etc/curator/keys/cert
        - name: ES_CLIENT_KEY
          value: /etc/curator/keys/key
        - name: ES_CA
          value: /etc/curator/keys/ca
        - name: CURATOR_DEFAULT_DAYS
          value: "7"
        - name: CURATOR_SCRIPT_LOG_LEVEL
          value: DEBUG
        - name: CURATOR_LOG_LEVEL
          value: ERROR
        - name: CURATOR_TIMEOUT
          value: "300"
        image: openshift/origin-logging-curator5:v3.11
        imagePullPolicy: IfNotPresent
        name: curator
        resources:
          limits:
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/curator/keys
          name: certs
          readOnly: true
        - mountPath: /etc/curator/settings
          name: config
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/infra: "true"
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: aggregated-logging-curator
      serviceAccountName: aggregated-logging-curator
      terminationGracePeriodSeconds: 30
      volumes:
      - name: certs
        secret:
          defaultMode: 420
          secretName: logging-curator
      - configMap:
          defaultMode: 420
          name: logging-curator
        name: config

Comment 14 Rich Megginson 2018-12-06 16:47:28 UTC
The tentative release date for the updated image is Dec. 11

Comment 17 Qiaoling Tang 2018-12-10 08:45:29 UTC
*** Bug 1657560 has been marked as a duplicate of this bug. ***

Comment 20 Rich Megginson 2018-12-10 15:52:15 UTC
AFAICT the fix is in openshift3/ose-logging-curator5:v3.11.54-1

>docker run -it openshift3/ose-logging-curator5:v3.11.54-1 bash
...
bash-4.2$ curator --help
Usage: curator [OPTIONS] ACTION_FILE

  Curator for Elasticsearch indices.

  See http://elastic.co/guide/en/elasticsearch/client/curator/current

Options:
  --config PATH  Path to configuration file. Default: ~/.curator/curator.yml
  --dry-run      Do not perform any changes.
  --version      Show the version and exit.
  --help         Show this message and exit.


If I use a broken version I get an error message:

>docker run -it openshift3/ose-logging-curator5:v3.11.51-3 bash
...
bash-4.2$ curator --help
Traceback (most recent call last):
  File "/usr/bin/curator", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 3007, in <module>
    working_set.require(__requires__)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 728, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 626, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch>=5.4.0,<6.0.0

I've looked at the list of python packages from the last known good version of curator v3.11.23-1 and compared them to v3.11.54-1.  The only difference is the python-setuptools package, which seems to have reverted from python-setuptools-17.1.1-4.el7.noarch in v3.11.23-1 to python-setuptools-0.9.8-7.el7.noarch in 3.11.54?

Comment 21 Rich Megginson 2018-12-10 16:14:25 UTC
@Qiaoling - can you verify ose-logging-curator5:v3.11.54 - we really need to get this fix into the errata that is shipping tomorrow.

Comment 25 Anping Li 2018-12-11 10:36:12 UTC
It is  ose-logging-curator5:v3.11.51 in https://errata.devel.redhat.com/advisory/38178. But the fix is ose-logging-curator5:v3.11.54. So the bug is not fix in advisory/38178. Move to modified.

Comment 39 Qiaoling Tang 2018-12-25 01:33:51 UTC
Verified in openshift3/ose-logging-curator5/images/v3.11.59-2.

Steps and results are the same as comment 18

Comment 42 errata-xmlrpc 2019-01-10 09:04:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0024