Bug 1671315

Summary:

Kibana and Curator pods names are with ops

Product:

OpenShift Container Platform

Reporter:

Ivana Saranova <isaranov>

Component:

Logging

Assignee:

Rich Megginson <rmeggins>

Status:

CLOSED ERRATA

QA Contact:

Anping Li <anli>

Severity:

medium

Docs Contact:

Priority:

high

Version:

3.11.0

CC:

aos-bugs, isaranov, jcantril, jzmeskal, qitang, rmeggins, sradco

Target Milestone:

---

Target Release:

3.11.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: The logging playbooks didn't work with ansible 2.7. The include_role and import_role behavior changed between 2.6 and 2.7 which broke logging. Consequence: Strange errors such as pods with "-ops" suffixes even when not deploying with the ops cluster. Fix: Use include_role instead of import_role in logging playbooks and roles. Result: The logging ansible code works on both ansible 2.6 and ansible 2.7.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-06-26 09:07:54 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ansible log file	none
inventory file	none
ansible facts fot the master node	none

Description Ivana Saranova 2019-01-31 11:12:59 UTC

Description of problem:
After deployment of openshift the names of Kibana and Curator pods are with ops

How reproducible:
Always

Steps to Reproduce:
1. Set up OpenShift Aggregated Logging
2. Log into the project openshift-logging, see the pods 
    oc get pods
3. 

Actual results:
The names of kibana and curator pods are logging-kibana-ops-1-wfw7k and logging-curator-ops-1548905400-c5b8c

Expected results:
The name of kibana and curator pods should be without the 'ops'

Comment 1 Rich Megginson 2019-01-31 11:15:26 UTC

please attach your inventory file and vars.yaml file, and attach the ansible log(s)

Comment 2 Rich Megginson 2019-01-31 11:16:41 UTC

also please include the version of openshift-ansible e.g. rpm -q openshift-ansible

Comment 5 Shirly Radco 2019-02-20 07:04:31 UTC

Created attachment 1536594 [details]
ansible log file

Comment 6 Shirly Radco 2019-02-20 07:08:32 UTC

Created attachment 1536595 [details]
inventory file

Comment 7 Shirly Radco 2019-02-20 07:09:17 UTC

Created attachment 1536596 [details]
ansible facts fot the master node

Comment 8 Shirly Radco 2019-02-20 07:11:48 UTC

Using openshift-ansible-3.11.69-1.git.0.2ff281f.el7.noarch

[root@master0 ~]# oc get pods
NAME                                       READY     STATUS    RESTARTS   AGE
logging-curator-ops-1550633400-78zsm       0/1       Error     0          3h
logging-es-data-master-alz7d6n1-1-deploy   0/1       Error     0          8h
logging-fluentd-nn5kf                      1/1       Running   0          8h
logging-kibana-ops-1-n69g9                 2/2       Running   0          8h

Comment 9 Ivana Saranova 2019-02-20 10:29:11 UTC

I see Shirly provided the logs and openshift-ansible version faster than me. My logs, inventory file and openshift-ansible version are the same, so you can use them.

Thanks Shirly!

Comment 10 Rich Megginson 2019-02-20 17:07:23 UTC

From which repo did you get ansible-2.7.7?

Comment 11 Rich Megginson 2019-02-20 18:12:19 UTC

In terms of the "-ops" components problem, I was able to reproduce this with openshift-ansible 3.11.69 and ansible 2.7.7.

I was not able to reproduce with the latest internally available version of openshift-ansible http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.11/building/ using ansible 2.6.

I'm going to try with openshift-ansible latest and ansible 2.7.7 to rule out any ansible version differences.

As far as the "elasticsearch is already being rolled out" problem, that is a genuine bug in all versions of openshift-ansible that needs a fix.

Comment 12 Rich Megginson 2019-02-20 18:48:24 UTC

I can reproduce the "-ops" problem when using ansible 2.7.7 in both the old and new versions of openshift-ansible

Comment 13 Rich Megginson 2019-02-21 00:09:09 UTC

PR for master: https://github.com/openshift/openshift-ansible/pull/11219
PR for release-3.11: https://github.com/openshift/openshift-ansible/pull/11220

Comment 14 Rich Megginson 2019-02-21 00:09:59 UTC

I would encourage you to test the release-3.11 branch PR with your deployment - they are simple edits you can make to your openshift-ansible role files for logging.

Comment 15 Shirly Radco 2019-02-21 08:00:55 UTC

Can I set
openshift_logging_curator_ops_deployment: false
openshift_logging_kibana_ops_deployment: false
as a workaround for the ops issue for now?

Comment 17 Rich Megginson 2019-02-21 17:01:21 UTC

(In reply to Shirly Radco from comment #15)
> Can I set
> openshift_logging_curator_ops_deployment: false
> openshift_logging_kibana_ops_deployment: false
> as a workaround for the ops issue for now?

Sure, if it works, use it.

Comment 18 Shirly Radco 2019-02-25 22:06:40 UTC

I still se a problem th the route to kibana.
It still includes "ops"

[root@master0 ~]# oc get pods
NAME                                       READY     STATUS    RESTARTS   AGE
logging-es-data-master-k1yf82xf-1-deploy   1/1       Running   0          55s
logging-es-data-master-k1yf82xf-1-l99sx    1/2       Running   0          46s
logging-fluentd-vfz2f                      1/1       Running   0          1m
logging-kibana-1-6bmzf                     2/2       Running   0          2m
logging-mux-1-n7bhv                        1/1       Running   0          1m
[root@master0 ~]# oc get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP    PORT(S)     AGE
logging-es              ClusterIP   172.30.175.225   <none>         9200/TCP    3m
logging-es-cluster      ClusterIP   None             <none>         9300/TCP    3m
logging-es-prometheus   ClusterIP   172.30.25.170    <none>         443/TCP     3m
logging-kibana          ClusterIP   172.30.156.255   <none>         443/TCP     2m
logging-mux             ClusterIP   172.30.239.17    10.35.18.137   24284/TCP   2m
[root@master0 ~]# oc get routes
NAME             HOST/PORT                           PATH      SERVICES         PORT      TERMINATION          WILDCARD
logging-es       es.eng.lab.tlv.redhat.com                     logging-es       <all>     reencrypt            None
logging-kibana   kibana-ops.eng.lab.tlv.redhat.com             logging-kibana   <all>     reencrypt/Redirect   None

Comment 19 Jan Zmeskal 2019-02-27 14:08:25 UTC

Hello, do we know with what version of openshift-ansible will this be released?

Comment 20 Rich Megginson 2019-02-27 22:14:19 UTC

Upon further investigation - it appears due to the difference between how "import_role" and "include_role" works between ansible 2.6 and ansible 2.7.

We use include_role with elasticsearch: https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging/tasks/install_logging.yaml#L85 and elasticsearch does not have this "-ops" problem.

But we use import_role with kibana and curator https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging/tasks/install_logging.yaml#L236 which does have the "-ops" problem.

It would take a fairly large effort to change this code, and verify that no regressions were introduced, to work with both ansible 2.6 and ansible 2.7.

The problem is only with ansible 2.7, and openshift-ansible does not support ansible 2.7.

Is there any way you could run the ovirt/rhv playbooks with ansible 2.6?  If not, why?

Comment 21 Rich Megginson 2019-02-27 23:00:30 UTC

I have filed a pull request to change the logging roles to use include_role instead of import_role.  I have done minimal testing with ansible 2.6 and ansible 2.7 and it seems to work well, and solves the "-ops" problem.

Comment 22 Jan Zmeskal 2019-02-28 07:55:12 UTC

(In reply to Rich Megginson from comment #21)
> I have filed a pull request to change the logging roles to use include_role
> instead of import_role.  I have done minimal testing with ansible 2.6 and
> ansible 2.7 and it seems to work well, and solves the "-ops" problem.

Rich, that's great! Can you please link your PR here? 

But I also have to say that I have not experienced the -ops problem after applying your previous patch (https://github.com/openshift/openshift-ansible/pull/11220/files) and running deployment playbook. That is while using Ansbile 2.7.7.

Comment 23 Rich Megginson 2019-02-28 15:32:31 UTC

(In reply to Jan Zmeskal from comment #22)
> (In reply to Rich Megginson from comment #21)
> > I have filed a pull request to change the logging roles to use include_role
> > instead of import_role.  I have done minimal testing with ansible 2.6 and
> > ansible 2.7 and it seems to work well, and solves the "-ops" problem.
> 
> Rich, that's great! Can you please link your PR here? 

Sure - it's already linked in the external trackers section above - https://github.com/openshift/openshift-ansible/pull/11262

> 
> But I also have to say that I have not experienced the -ops problem after
> applying your previous patch
> (https://github.com/openshift/openshift-ansible/pull/11220/files) and
> running deployment playbook. That is while using Ansbile 2.7.7.

You didn't see this?  https://bugzilla.redhat.com/show_bug.cgi?id=1671315#c18

Comment 24 Jan Zmeskal 2019-03-01 10:08:46 UTC

> You didn't see this?  https://bugzilla.redhat.com/show_bug.cgi?id=1671315#c18

Right, I see it, I just did not notice.

Comment 25 Qiaoling Tang 2019-03-22 09:20:27 UTC

Verified with:
# rpm -qa |grep ansible
ansible-2.7.7-1.el7ae.noarch
openshift-ansible-roles-3.11.98-1.git.0.3cfa7c3.el7.noarch
openshift-ansible-playbooks-3.11.98-1.git.0.3cfa7c3.el7.noarch
openshift-ansible-3.11.98-1.git.0.3cfa7c3.el7.noarch
openshift-ansible-docs-3.11.98-1.git.0.3cfa7c3.el7.noarch

Comment 27 errata-xmlrpc 2019-06-26 09:07:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605