Bug 1493131 - OSP11 -> OSP12 upgrade: post-upgrade undercloud upgrade fail: ERROR error running the validation groups ['post-upgrade']
Summary: OSP11 -> OSP12 upgrade: post-upgrade undercloud upgrade fail: ERROR error run...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: beta
: 12.0 (Pike)
Assignee: Toure Dunnon
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-19 12:43 UTC by Marius Cornea
Modified: 2018-02-05 19:15 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-common-7.6.3-0.20171022171807.f27b723.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:10:20 UTC
Target Upstream Version:


Attachments (Terms of Use)
manually run post-upgrade validations (16.62 KB, text/plain)
2017-10-04 12:54 UTC, Marios Andreou
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1716625 0 None None None 2017-10-05 08:41:01 UTC
OpenStack gerrit 503002 0 None MERGED Fix chown command in sudoers file 2020-07-24 07:42:52 UTC
OpenStack gerrit 509712 0 None MERGED Fix chown command in sudoers file 2020-07-24 07:42:52 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Marius Cornea 2017-09-19 12:43:08 UTC
Description of problem:

2017-09-19 08:40:24,295 INFO: Starting and waiting for validation groups ['post-upgrade'] 

2017-09-19 08:40:43,138 ERROR: ERROR error running the validation groups ['post-upgrade']   {"stderr": "\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n    #1) Respect the privacy of others.\n    #2) Think before you type.\n    #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n", "stdout": ""} {"stderr": "\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n    #1) Respect the privacy of others.\n    #2) Think before you type.\n    #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n", "stdout": ""} {"stderr": "\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n    #1) Respect the privacy of others.\n    #2) Think before you type.\n    #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n", "stdout": ""} Mistral execution ID: 6d083696-bbb4-45dd-8eb1-20cb67f4e881
2017-09-19 08:40:43,140 INFO: 


Version-Release number of selected component (if applicable):
[stack@undercloud-0 ~]$ rpm -qa | grep validation
openstack-tripleo-validations-7.3.1-0.20170907082220.efe8a72.el7ost.noarch
[stack@undercloud-0 ~]$ rpm -qa | grep instack-undercloud
instack-undercloud-7.4.1-0.20170912115418.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 undercloud 
2. Upgrade to OSP12

Actual results:
Upgrade completes but post upgrade validations fail:

017-09-19 08:40:24,295 INFO: Starting and waiting for validation groups ['post-upgrade'] 

2017-09-19 08:40:43,138 ERROR: ERROR error running the validation groups ['post-upgrade']   {"stderr": "\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n    #1) Respect the privacy of others.\n    #2) Think before you type.\n    #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n", "stdout": ""} {"stderr": "\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n    #1) Respect the privacy of others.\n    #2) Think before you type.\n    #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n", "stdout": ""} {"stderr": "\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n    #1) Respect the privacy of others.\n    #2) Think before you type.\n    #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n", "stdout": ""} Mistral execution ID: 6d083696-bbb4-45dd-8eb1-20cb67f4e881
2017-09-19 08:40:43,140 INFO: 
#############################################################################
Undercloud upgrade complete.

The file containing this installation's passwords is at
/home/stack/undercloud-passwords.conf.

There is also a stackrc file at /home/stack/stackrc.

These files are needed to interact with the OpenStack services, and should be
secured.

#############################################################################


Expected results:
The upgrade log is clear without errors.

Additional info:

Comment 1 Marios Andreou 2017-09-21 13:30:18 UTC
So there is this BZ  https://bugzilla.redhat.com/show_bug.cgi?id=1020147 where there is discussion about whether having requiretty as default provides more security or not. 

This is not causing an error, i.e. the undercloud upgrade is completing OK. AFAICS it is this validation which is failing https://github.com/openstack/tripleo-validations/blob/47c1c06562a6ac9e91cec146d1f09dbc056363ed/validations/undercloud-service-status.yaml#L14 ... it is the only one of the three 'post-upgrade' validations (the other two are openstack-endpoints.yaml and stack-health.yaml if you grep 'post-upgrade' in tripleo-validations/validations/ ) that has a 'become' . 

We can:
    --> ignore this completely,
    --> disable that one validation
    --> disable all validations with the  CONF.enable_validations
    --> detect and exit if we find requiretty in sudoers (yuk no lets not ?)

Marking as triaged for now will disuss this on upgrades scrum today

Comment 6 Marios Andreou 2017-10-04 12:54:51 UTC
Created attachment 1334222 [details]
manually run post-upgrade validations

Comment 7 Marios Andreou 2017-10-04 13:03:29 UTC
o/ thanks mcornea for making the box available. After much prodding I found this https://review.openstack.org/#/c/486147/1/sudoers which appears to be causing this issue. After manually reverting that in /etc/sudoers.d/tripleo-common I could run the validations again.

This issue "sudo: no tty present and no askpass program specified" happens when trying to run any validation. For example I could recreate it by running validations manually like [1] (openstack workflow execution create tripleo.validations.v1.run_groups) - output attached to this BZ [2]. The same was seen manuallyrunning the pre-deployment validations so I suspect it affects all validations.

After changing /etc/sudoers.d/tripleo-common and reverting the fix from /#/c/486147/ I could run these validations again. 

Adding needinfo on TC for workflows wdyt... are we going to do a straight revert or does someone have time to check this more carefully and see if we can preserve some of the benefits of the fix landed in /#/c/486147/ . This is very time sensitive so I will propose a revert tomorrow if I don't hear otherwise here,

thanks

[1] https://docs.openstack.org/tripleo-docs/latest/install/validations/validations.html#running-a-group-of-validations
[2] https://bugzilla.redhat.com/attachment.cgi?id=1334222

Comment 8 Brad P. Crochet 2017-10-04 13:51:32 UTC
The patch in question was made in response to a CVE. So, it would be better if we can fix as opposed to just an outright revert. Reassigning to tdunnon for a fix.

Comment 9 Carlos Camacho 2017-10-04 13:54:33 UTC
Hey Marios,

I have seen this issue before when deploying.

Filled this time ago. https://bugs.launchpad.net/tripleo/+bug/1714917

Maybe is related, we can disable the pipelining mode when installing ansible.

Comment 10 Brad P. Crochet 2017-10-04 20:39:23 UTC
@ccamacho Thanks for the heads-up on that. I don't think it is related though. I think we had a mismatch between the sudoers file and the code. I or @tdunnon will be posting a patch shortly.

Comment 11 Toure Dunnon 2017-10-04 21:07:02 UTC
Fix has been posted.

Comment 12 Marios Andreou 2017-10-05 08:31:57 UTC
Brad & Toure thanks for looking into it so quickly. It looks like this is fixed by https://review.openstack.org/#/c/503002/ in which case lets add that one to tracker (we'll need to track it into stable/pike with this BZ). I added some comments to your review too not sure we will need that one but wdyt?

Comment 13 Brad P. Crochet 2017-10-05 12:46:25 UTC
https://review.openstack.org/#/c/509639/ still needs to be applied. The patch you referred to lessens the security, which I'm not a fan of. I would much rather see this fixed "right" than "quick".

Comment 16 Dan Trainor 2017-11-08 00:50:48 UTC
Using an OSP11 Undercloud with no Overcloud deployed, I attempted to verify that an Undercloud upgrade from 11 -> 12 using upstream documentation[0] with Pike-specific notes passes.

The post-upgrade validation failure as reported no longer occurs, however two additional failures do occur:

1)  sudo  deprecation notice on DEFAULT_SUDO_FLAGS
2)  The "Check stack resource statuses" task fails because it assumes there's an Overcloud named "overcloud" that exists

2017-11-07 19:40:53,586 INFO: Starting and waiting for validation groups ['post-upgrade'] 
2017-11-07 19:41:15,261 ERROR: ERROR error running the validation groups ['post-upgrade']   {"stderr": "[DEPRECATION WARNING]: DEFAULT_SUDO_FLAGS option, In favor of become which is a\n generic framework . This feature will be removed in version 2.8. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n", "stdout": "Task 'fail' failed:\nHost: localhost\nMessage: The `HorizonPublic` endpoint is not defined in the `EndpointMap` of the deployed stack. This means Horizon may not have been deployed correctly.\n\nFailure! The validation failed for all hosts:\n* localhost\n"} {"stderr": "[DEPRECATION WARNING]: DEFAULT_SUDO_FLAGS option, In favor of become which is a\n generic framework . This feature will be removed in version 2.8. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n", "stdout": "Task 'Check stack resource statuses' failed:\nHost: localhost\nMessage: An unhandled exception occurred while running the lookup plugin 'stack_resources'. Error was a <class 'heatclient.exc.HTTPNotFound'>, original message: ERROR: The Stack (overcloud) could not be found.\n\nFailure! The validation failed for all hosts:\n* localhost\n"} Mistral execution ID: 85b87bf9-55c8-4a0a-807e-ef21d6eb2f31
2017-11-07 19:41:15,263 INFO: 
#############################################################################
Undercloud upgrade complete.

The file containing this installation's passwords is at
/home/stack/undercloud-passwords.conf.

There is also a stackrc file at /home/stack/stackrc.

These files are needed to interact with the OpenStack services, and should be
secured.

#############################################################################




---
[0] https://docs.openstack.org/tripleo-docs/latest/install/post_deployment/upgrade.html

Comment 17 Marios Andreou 2017-11-08 11:06:22 UTC
Hi Dan, thanks for checking that. There is one more in that output ("The `HorizonPublic` endpoint is not defined in the `EndpointMap` of the deployed stack") for the horizon public endpoint. So the validations fail but during the review process we decided to let the upgrade process continue when any of these do fail (and since these validations are the very last thing to happen during the upgrade run).  In this particular case I think those validations would pass if you had deployed an overcloud.


The sudo deprecation notice is a new one that I hadn't noticed before (should look into this at some point, we should file a new BZ for it and sounds like it will hit us for ansible 2.8), but for this particular BZ the fact the validations are now running, albeit failing since you don't have overcloud, fixes the reported issue.

Comment 18 Dan Trainor 2017-11-08 16:10:58 UTC
Excellent, thanks for the feedback Marios.  I agree with your comments and will file bugs against dealing with HorizonPublic and the sudo deprecation notice.

Comment 22 errata-xmlrpc 2017-12-13 22:10:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.