Bug 1568714

Summary: [UPGRADES] Client stops producing output during OC upgrade
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: python-tripleoclientAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: augol, ccamacho, cjanisze, hbrock, jschluet, jslagle, jstransk, mandreou, mbracho, mbultel, mburns, mcornea, rbartal
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: python-tripleoclient-9.2.1-4.el7ost openstack-tripleo-common-8.6.1-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:52:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1485415, 1575620, 1607143    

Description Yurii Prokulevych 2018-04-18 07:20:51 UTC
Description of problem:
-----------------------
While running overcloud upgrade client sporadically stops producing any output.

echo "Runing major upgrade deploy_steps_playbook.yaml playbook for Compute role"
openstack overcloud upgrade run \
        --nodes Compute --playbook deploy_steps_playbook.yaml 2>&1

Runing major upgrade deploy_steps_playbook.yaml playbook for Compute role
Waiting for messages on queue 'upgrade' with no timeout.
Started Mistral Workflow tripleo.package_update.v1.update_nodes. Execution ID: 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
[u'Using /tmp/ansible-mistral-actionuIHhzS/ansible.cfg as config file',
 u' [WARNING]: Skipping unexpected key (hostvars) in group (_meta), only "vars",',
 u'"children" and "hosts" are valid',
 u"[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use ",
 u"'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.",
 u' This feature will be removed in a future release. Deprecation warnings can be ',
 u'disabled by setting deprecation_warnings=False in ansible.cfg.',
 u'[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is',
 u' discouraged. The module documentation details page may explain more about this',
 u' rationale.. This feature will be removed in a future release. Deprecation ',
 u'warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.',
...
 u'TASK [Start containers for step 5] *********************************************',
 u'ok: [192.168.24.10] => {"censored": "the output has been hidden due to the fact that \'no_l'headers'
og: true\' was specified for this result", "changed": false}']
...
And that's all.

Checking mistra's execution reports SUCCESS
mistral execution-get 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
+--------------------+----------------------------------------+
| Field              | Value                                  |
+--------------------+----------------------------------------+
| ID                 | 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43   |
| Workflow ID        | b1ab6ffe-0719-4540-a4b7-167243809e32   |
| Workflow name      | tripleo.package_update.v1.update_nodes |
| Workflow namespace |                                        |
| Description        |                                        |
| Task Execution ID  | <none>                                 |
| State              | SUCCESS                                |
| State info         | None                                   |
| Created at         | 2018-04-18 06:27:02                    |
| Updated at         | 2018-04-18 06:33:53                    |
+--------------------+----------------------------------------+

mistral task-list -f yaml 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43 
- Created at: '2018-04-18 06:27:02'
  Execution ID: 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
  ID: 1056d47c-08d7-4e69-9022-cafd9cd3b559
  Name: download_config
  State: SUCCESS
  State info: null
  Updated at: '2018-04-18 06:27:04'
  Workflow name: tripleo.package_update.v1.update_nodes
  Workflow namespace: ''
- Created at: '2018-04-18 06:27:04'
  Execution ID: 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
  ID: 7aa29740-ef5d-48be-b74c-be4b4cafc13e
  Name: node_update
  State: SUCCESS
  State info: null
  Updated at: '2018-04-18 06:33:51'
  Workflow name: tripleo.package_update.v1.update_nodes
  Workflow namespace: ''
- Created at: '2018-04-18 06:27:04'
  Execution ID: 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
  ID: f7b19d74-b211-44b0-995a-b2305ccc7661
  Name: get_private_key
  State: SUCCESS
  State info: null
  Updated at: '2018-04-18 06:27:04'
  Workflow name: tripleo.package_update.v1.update_nodes
  Workflow namespace: ''
- Created at: '2018-04-18 06:33:51'
  Execution ID: 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
  ID: 805eaac1-5e9f-4cb7-bb91-b21a41d37d6e
  Name: node_update_passed
  State: SUCCESS
  State info: null
  Updated at: '2018-04-18 06:33:51'
  Workflow name: tripleo.package_update.v1.update_nodes
  Workflow namespace: ''
- Created at: '2018-04-18 06:33:51'
  Execution ID: 1a6b67a9-dc85-4b0b-9afd-c3b9207e4d43
  ID: d497b0bb-17c9-41b0-aaa2-705fd5fc0d6b
  Name: notify_zaqar
  State: SUCCESS
  State info: null
  Updated at: '2018-04-18 06:33:52'
  Workflow name: tripleo.package_update.v1.update_nodes
  Workflow namespace: ''


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-mistral-executor-6.0.1-0.20180319140929.eb59183.el7ost.noarch
puppet-mistral-12.3.1-0.20180221121107.c04206e.el7ost.noarch
python2-mistralclient-3.3.0-1.el7ost.noarch
openstack-mistral-common-6.0.1-0.20180319140929.eb59183.el7ost.noarch
openstack-mistral-engine-6.0.1-0.20180319140929.eb59183.el7ost.noarch
openstack-mistral-api-6.0.1-0.20180319140929.eb59183.el7ost.noarch
python2-mistral-lib-0.4.0-1.el7ost.noarch
python-mistral-6.0.1-0.20180319140929.eb59183.el7ost.noarch

ansible-tripleo-ipsec-8.1.1-0.20180308133440.8f5369a.el7ost.noarch
openstack-tripleo-validations-8.4.0-1.el7ost.noarch
openstack-tripleo-image-elements-8.0.0-2.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-1.el7ost.noarch
openstack-tripleo-common-8.5.1-0.20180326153322.91f52e9.el7ost.noarch
openstack-tripleo-common-containers-8.5.1-0.20180326153322.91f52e9.el7ost.noarch
puppet-tripleo-8.3.2-0.20180327181746.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-0.20180327213846.el7ost.noarch
openstack-tripleo-ui-8.3.1-2.el7ost.noarch

python-tripleoclient-9.2.0-2.el7ost.noarch

How reproducible:
-----------------
Occasionally, fails on random task/role/node

Steps to Reproduce:
-------------------
1. Upgrade UC to 2018-04-10.2
2. Setup repos on OC and prepare containers
3. Start running major upgrade, one playbook by one per role. E.g:
    openstack overcloud upgrade run \
        --nodes Controller --playbook upgrade_steps_playbook.yaml
    openstack overcloud upgrade run \
        --nodes Controller --playbook deploy_steps_playbook.yaml
    openstack overcloud upgrade run \
        --nodes Controller --playbook post_upgrade_steps_playbook.yaml

    Then move to next role(Ceph/Compute/etc,)

Actual results:
---------------
Client might stop writing during running upgrade related tasks

Expected results:
-----------------
Client's output is not affected or fails with a clear message.

Additional info:
----------------
Virtual setup: 3controllers + 2computes + 3ceph

Comment 2 Jon Schlueter 2018-04-25 13:33:08 UTC
removing master patches since stable/queens patches have landed

Comment 6 Lukas Bezdicka 2018-05-09 15:40:14 UTC
*** Bug 1571858 has been marked as a duplicate of this bug. ***

Comment 7 Lukas Bezdicka 2018-05-09 15:41:16 UTC
*** Bug 1572825 has been marked as a duplicate of this bug. ***

Comment 8 Raviv Bar-Tal 2018-05-10 11:04:23 UTC
*** Bug 1575620 has been marked as a duplicate of this bug. ***

Comment 10 Yurii Prokulevych 2018-05-23 10:48:59 UTC
Verified with:
- python-tripleoclient-9.2.1-9.el7ost.noarch
- openstack-tripleo-common-8.6.1-12.el7ost.noarch

Comment 12 errata-xmlrpc 2018-06-27 13:52:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086