Bug 1671315
Summary: | Kibana and Curator pods names are with ops | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ivana Saranova <isaranov> | ||||||||
Component: | Logging | Assignee: | Rich Megginson <rmeggins> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 3.11.0 | CC: | aos-bugs, isaranov, jcantril, jzmeskal, qitang, rmeggins, sradco | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 3.11.z | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
Cause: The logging playbooks didn't work with ansible 2.7. The include_role and import_role behavior changed between 2.6 and 2.7 which broke logging.
Consequence: Strange errors such as pods with "-ops" suffixes even when not deploying with the ops cluster.
Fix: Use include_role instead of import_role in logging playbooks and roles.
Result: The logging ansible code works on both ansible 2.6 and ansible 2.7.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-06-26 09:07:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Ivana Saranova
2019-01-31 11:12:59 UTC
please attach your inventory file and vars.yaml file, and attach the ansible log(s) also please include the version of openshift-ansible e.g. rpm -q openshift-ansible Created attachment 1536594 [details]
ansible log file
Created attachment 1536595 [details]
inventory file
Created attachment 1536596 [details]
ansible facts fot the master node
Using openshift-ansible-3.11.69-1.git.0.2ff281f.el7.noarch [root@master0 ~]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-ops-1550633400-78zsm 0/1 Error 0 3h logging-es-data-master-alz7d6n1-1-deploy 0/1 Error 0 8h logging-fluentd-nn5kf 1/1 Running 0 8h logging-kibana-ops-1-n69g9 2/2 Running 0 8h I see Shirly provided the logs and openshift-ansible version faster than me. My logs, inventory file and openshift-ansible version are the same, so you can use them. Thanks Shirly! From which repo did you get ansible-2.7.7? In terms of the "-ops" components problem, I was able to reproduce this with openshift-ansible 3.11.69 and ansible 2.7.7. I was not able to reproduce with the latest internally available version of openshift-ansible http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.11/building/ using ansible 2.6. I'm going to try with openshift-ansible latest and ansible 2.7.7 to rule out any ansible version differences. As far as the "elasticsearch is already being rolled out" problem, that is a genuine bug in all versions of openshift-ansible that needs a fix. I can reproduce the "-ops" problem when using ansible 2.7.7 in both the old and new versions of openshift-ansible PR for master: https://github.com/openshift/openshift-ansible/pull/11219 PR for release-3.11: https://github.com/openshift/openshift-ansible/pull/11220 I would encourage you to test the release-3.11 branch PR with your deployment - they are simple edits you can make to your openshift-ansible role files for logging. Can I set openshift_logging_curator_ops_deployment: false openshift_logging_kibana_ops_deployment: false as a workaround for the ops issue for now? (In reply to Shirly Radco from comment #15) > Can I set > openshift_logging_curator_ops_deployment: false > openshift_logging_kibana_ops_deployment: false > as a workaround for the ops issue for now? Sure, if it works, use it. I still se a problem th the route to kibana. It still includes "ops" [root@master0 ~]# oc get pods NAME READY STATUS RESTARTS AGE logging-es-data-master-k1yf82xf-1-deploy 1/1 Running 0 55s logging-es-data-master-k1yf82xf-1-l99sx 1/2 Running 0 46s logging-fluentd-vfz2f 1/1 Running 0 1m logging-kibana-1-6bmzf 2/2 Running 0 2m logging-mux-1-n7bhv 1/1 Running 0 1m [root@master0 ~]# oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE logging-es ClusterIP 172.30.175.225 <none> 9200/TCP 3m logging-es-cluster ClusterIP None <none> 9300/TCP 3m logging-es-prometheus ClusterIP 172.30.25.170 <none> 443/TCP 3m logging-kibana ClusterIP 172.30.156.255 <none> 443/TCP 2m logging-mux ClusterIP 172.30.239.17 10.35.18.137 24284/TCP 2m [root@master0 ~]# oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD logging-es es.eng.lab.tlv.redhat.com logging-es <all> reencrypt None logging-kibana kibana-ops.eng.lab.tlv.redhat.com logging-kibana <all> reencrypt/Redirect None Hello, do we know with what version of openshift-ansible will this be released? Upon further investigation - it appears due to the difference between how "import_role" and "include_role" works between ansible 2.6 and ansible 2.7. We use include_role with elasticsearch: https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging/tasks/install_logging.yaml#L85 and elasticsearch does not have this "-ops" problem. But we use import_role with kibana and curator https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging/tasks/install_logging.yaml#L236 which does have the "-ops" problem. It would take a fairly large effort to change this code, and verify that no regressions were introduced, to work with both ansible 2.6 and ansible 2.7. The problem is only with ansible 2.7, and openshift-ansible does not support ansible 2.7. Is there any way you could run the ovirt/rhv playbooks with ansible 2.6? If not, why? I have filed a pull request to change the logging roles to use include_role instead of import_role. I have done minimal testing with ansible 2.6 and ansible 2.7 and it seems to work well, and solves the "-ops" problem. (In reply to Rich Megginson from comment #21) > I have filed a pull request to change the logging roles to use include_role > instead of import_role. I have done minimal testing with ansible 2.6 and > ansible 2.7 and it seems to work well, and solves the "-ops" problem. Rich, that's great! Can you please link your PR here? But I also have to say that I have not experienced the -ops problem after applying your previous patch (https://github.com/openshift/openshift-ansible/pull/11220/files) and running deployment playbook. That is while using Ansbile 2.7.7. (In reply to Jan Zmeskal from comment #22) > (In reply to Rich Megginson from comment #21) > > I have filed a pull request to change the logging roles to use include_role > > instead of import_role. I have done minimal testing with ansible 2.6 and > > ansible 2.7 and it seems to work well, and solves the "-ops" problem. > > Rich, that's great! Can you please link your PR here? Sure - it's already linked in the external trackers section above - https://github.com/openshift/openshift-ansible/pull/11262 > > But I also have to say that I have not experienced the -ops problem after > applying your previous patch > (https://github.com/openshift/openshift-ansible/pull/11220/files) and > running deployment playbook. That is while using Ansbile 2.7.7. You didn't see this? https://bugzilla.redhat.com/show_bug.cgi?id=1671315#c18
> You didn't see this? https://bugzilla.redhat.com/show_bug.cgi?id=1671315#c18
Right, I see it, I just did not notice.
Verified with: # rpm -qa |grep ansible ansible-2.7.7-1.el7ae.noarch openshift-ansible-roles-3.11.98-1.git.0.3cfa7c3.el7.noarch openshift-ansible-playbooks-3.11.98-1.git.0.3cfa7c3.el7.noarch openshift-ansible-3.11.98-1.git.0.3cfa7c3.el7.noarch openshift-ansible-docs-3.11.98-1.git.0.3cfa7c3.el7.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1605 |