Bug 1540893

Summary: Failed to deploy logging-mux when ops is enabled
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.1CC: aos-bugs, nhosoi, rmeggins
Target Milestone: ---Keywords: Regression
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Attempting to use an ansible 2.4 feature in ansible 2.3 Consequence: maximum recursion depth exceeded in cmp errors Fix: Make sure to use the right ansible features with the correct version of ansible Result: mux can be installed correctly with ansible
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-05 09:37:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anping Li 2018-02-01 09:42:23 UTC
Description of problem:
Deploy logging with mux failed for maximum recursion depth exceeded in cmp.

Version-Release number of selected component (if applicable):
openshift-ansible-3.7.23
ansible 2.4.2.0

How reproducible:
always

Steps to Reproduce:
1. deploy logging with mux
openshift_logging_mux_client_mode=maximal
openshift_logging_use_mux=true
openshift_logging_es_ops_pvc_storage_class_name=standard
openshift_logging_es_ops_pvc_dynamic=true
openshift_logging_use_ops=true
openshift_logging_es_pvc_storage_class_name=standard
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_memory_limit=1Gi
openshift_logging_install_logging=true

#
TASK [openshift_logging_mux : Add mux namespaces] *****************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_mux/tasks/main.yaml:216
ok: [host-8-242-21.host.centralci.eng.rdu2.redhat.com] => (item=mux-undefined) => {"changed": false, "item": "mux-undefined", "results": {"cmd": "/usr/local/bin/oc get namespace mux-undefined -o json", "results": {"apiVersion": "v1", "kind": "Namespace", "metadata": {"annotations": {"openshift.io/description": "", "openshift.io/display-name": "", "openshift.io/node-selector": "", "openshift.io/sa.scc.mcs": "s0:c16,c5", "openshift.io/sa.scc.supplemental-groups": "1000250000/10000", "openshift.io/sa.scc.uid-range": "1000250000/10000"}, "creationTimestamp": "2018-02-01T08:55:30Z", "name": "mux-undefined", "resourceVersion": "24064", "selfLink": "/api/v1/namespaces/mux-undefined", "uid": "a807d32a-072d-11e8-8f01-fa163eec0a3d"}, "spec": {"finalizers": ["openshift.io/origin", "kubernetes"]}, "status": {"phase": "Active"}}, "returncode": 0}, "state": "present"}

TASK [openshift_logging_mux : Delete temp directory] **************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_mux/tasks/main.yaml:223
ok: [host-8-242-21.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "path": "/tmp/openshift-logging-ansible-BrTeid", "state": "absent"}

TASK [openshift_logging : include_role] ***************************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/install_logging.yaml:297
ERROR! Unexpected Exception, this is probably a bug: maximum recursion depth exceeded in cmp
to see the full traceback, use -vvv

Expected Result:
Mux can be deployed with logging

Additional info:

Comment 1 Noriko Hosoi 2018-02-01 20:43:19 UTC
Hi @Anping,

Interesting finding.  I've tried myself using the upstream 3.7 branches (O-A-L & O-A), then I could reproduce the same error you reported although it occurred in openshift_logging_curator.  I'd imagine it is not an issue of mux, but the way how to execute "include_role"?

TASK [openshift_logging_curator : Delete temp directory] ***********************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_curator/tasks/main.yaml:123
Using module file /usr/lib/python2.7/site-packages/ansible/modules/files/file.py
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: origin
<localhost> EXEC /bin/sh -c 'echo ~ && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654 `" && echo ansible-tmp-1517515481.35-137831890118654="` echo /home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654 `" ) && sleep 0'
<localhost> PUT /tmp/tmpzVoKkD TO /home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654/file.py
<localhost> EXEC /bin/sh -c 'chmod u+x /home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654/ /home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654/file.py && sleep 0'
<localhost> EXEC /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-tsuzboxpvwsnfjevtacblkiumkbehvcw; /usr/bin/python /home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654/file.py; rm -rf "/home/origin/.ansible/tmp/ansible-tmp-1517515481.35-137831890118654/" > /dev/null 2>&1'"'"' && sleep 0'
[...]

TASK [openshift_logging : include_role] ****************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/install_logging.yaml:293
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/queues.py", line 266, in _feed
    send(obj)
RuntimeError: maximum recursion depth exceeded while calling a Python object

Comment 2 Noriko Hosoi 2018-02-01 20:50:30 UTC
Well, it was too early to mention "it may not be mux" since install_logging.yaml:293 points this:
roles/openshift_logging/tasks/install_logging.yaml
292 ## Mux
293 - include_role:
294     name: openshift_logging_mux
295   vars:
296     generated_certs_dir: "{{openshift.common.config_base}}/logging"
297     openshift_logging_mux_ops_host: "{{ ( openshift_logging_use_ops | bool ) | ternary('logging-es-ops', 'logging-es') }}"
298     openshift_logging_mux_namespace: "{{ openshift_logging_namespace }}"
299     openshift_logging_mux_master_url: "{{ openshift_logging_master_url }}"
300     openshift_logging_mux_image_pull_secret: "{{ openshift_logging_image_pull_secret }}"
301   when:
302   - openshift_logging_use_mux | bool

Comment 3 Rich Megginson 2018-02-01 20:57:21 UTC
There was as bug in the openshift-ansible 3.7 branch around the include_role directive - let me see if I can find it

Comment 4 Rich Megginson 2018-02-01 21:09:08 UTC
I think it is related to https://github.com/openshift/openshift-ansible/pull/6724

Comment 5 Noriko Hosoi 2018-02-01 21:50:16 UTC
Thank you, Rich!

I see this commit introduced "static: true"
  commit 0ee90f016f33fa18df7df4d73a251c7e8618e7de (origin/pr/6613)
    [release-3.7] Migrate to static: true for include_role

And the pr6724 is going to revert it.

The ansible doc [1] says:
  Note
    Handlers are made available to the whole play.
    Before 2.4, as with include, this task could be static or dynamic, If static it implied that it won’t need templating nor loops nor conditionals and will show included tasks in the –list options. Ansible would try to autodetect what is needed, but you can set static to yes or no at task level to control this.
    After 2.4, you can use import_role for ‘static’ behaviour and this action for ‘dynamic’ one.

And I noticed "include_role" was replaced with "import_role" in the master branch.  But there's no problem in installing mux with ops using the master branches.  Does this imply import_role is not necessarily equivalent to static include_role?  (sorry, i'm a bit confused... :)

[1] http://docs.ansible.com/ansible/latest/include_role_module.html

Comment 6 Rich Megginson 2018-02-01 22:08:29 UTC
I'm not sure - with openshift-ansible 3.8 and later they moved to support ansible 2.4 but still had to support ansible 2.3 with openshift-ansible 3.7 - I'm not sure how they resolved this for openshift-ansible 3.7 but at least now our logging CI is using a version of openshift-ansible that does not have this problem - our builds/installs/tests pass on release-3.7 branch.  So it may be that the problem was fixed at some point in openshift-ansible 3.7, but the appropriate upstream/downstream openshift-ansible 3.7 packages have not yet been built and released.

Comment 7 Noriko Hosoi 2018-02-01 22:17:47 UTC
Good news!

> our builds/installs/tests pass on release-3.7 branch.
Could you tell us which repo/release-3.7 branch includes the fix?

Or maybe where we should keep eye on?

Comment 8 Noriko Hosoi 2018-02-01 22:56:27 UTC
Note: I've verified O-A release-3.7 with pr/6724 solves the error -- 
RuntimeError: maximum recursion depth exceeded while calling a Python object

Comment 9 Anping Li 2018-02-02 02:29:23 UTC
@Noriko
The PR is in v3.7.25 and later. Test pass when use openshift-ansible:v3.7.26.  Could you move bug to ON_QA?

Comment 13 errata-xmlrpc 2018-04-05 09:37:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636