Bug 1326111 - rhel-osp-director: 7.3->8.0 upgrade, during major-upgrade-pacemaker-converge.yaml step: Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request
Summary: rhel-osp-director: 7.3->8.0 upgrade, during major-upgrade-pacemaker-converge....
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: async
: 8.0 (Liberty)
Assignee: Marios Andreou
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-11 20:54 UTC by Alexander Chuzhoy
Modified: 2016-12-29 16:53 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-02 12:42:36 UTC
Target Upstream Version:


Attachments (Terms of Use)
keystone log interesting parts (42.35 KB, text/plain)
2016-04-20 14:22 UTC, Marios Andreou
no flags Details
journalctl full trace from os-collect-config error during upgrade (28.83 KB, text/plain)
2016-04-20 14:36 UTC, Marios Andreou
no flags Details

Description Alexander Chuzhoy 2016-04-11 20:54:48 UTC
rhel-osp-director: 7.3->8.0 upgrade, during major-upgrade-pacemaker-converge.yaml step:  Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request


Environment:
openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch
openstack-puppet-modules-7.0.17-1.el7ost.noarch
instack-undercloud-2.2.7-4.el7ost.noarch



Steps to reproduce:
1. deploy 7.3 and populate.
2. Upgrade to 8.0 (via sat5)

Result:
During the step with /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml template , the upgrade fails:


Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.                          
Error: Could not prefetch keystone_tenant provider 'openstack': Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-842c4377-e60f-40fa-a079-9a9892407472)                                                                                                                                           
Error: Execution of '/usr/bin/openstack project create --format shell services --enable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default                
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[services]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell services --enable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default                                                                                                                         
Error: Could not prefetch keystone_role provider 'openstack': Execution of '/usr/bin/openstack role list --quiet --format csv' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c00fe00b-b597-4538-917f-681dee425550)                                                                                                                                                       
Error: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c0cc647e-2113-4e67-9147-fc401c712993)                                                                                                                                                                                                            
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c0cc647e-2113-4e67-9147-fc401c712993)                                                                                                     
Error: Execution of '/usr/bin/openstack project create --format shell openstack --enable --description admin tenant --domain Default' returned 1: Could not find resource Default
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[openstack]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell openstack --enable --description admin tenant --domain Default' returned 1: Could not find resource Default
Error: Could not prefetch keystone_user provider 'openstack': Execution of '/usr/bin/openstack user list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-e6a2a220-c4ea-42f0-99de-5a9128e1e2f0)
Error: Execution of '/usr/bin/openstack user create --format shell admin --enable --password fdcjuGcqMTcbBDewsZrHjgvps --email root@localhost --domain Default' returned 1: Could not find resource Default
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --password fdcjuGcqMTcbBDewsZrHjgvps --email root@localhost --domain Default' returned 1: Could not find resource Default

Comment 4 Marios Andreou 2016-04-12 14:30:47 UTC
update as was requested earlier on irc - myself and thrash spent some time poking at the environment here. There is indeed an error in the keystone.log on control0:

2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi   File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi     return Connection(*args, **kwargs)
2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi   File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi     super(Connection, self).__init__(*args, **kwargs2)
2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi OperationalError: (_mysql_exceptions.OperationalError) (1045, "Access denied for user 'keystone'@'192.168.100.12' (using password: YES)")
2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi 

FWIW I used the same versions today as in the description above: 

[stack@instack ~]$ rpm -qa | grep "tripleo-heat-templates\|puppet-modules\|instack-undercloud"
openstack-puppet-modules-7.0.17-1.el7ost.noarch
instack-undercloud-2.2.7-4.el7ost.noarch
openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch

and completed the upgrade without hitting this (poodle, virt, 3control h1 1 compute net-iso). We are trying to reproduce on the same environment at the moment.

Comment 5 Alexander Chuzhoy 2016-04-12 20:16:43 UTC
An updated attempted with:
 openstack overcloud deploy  --templates /usr/share/openstack-tripleo-heat-templates -e   /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-pup
pet.yaml  -e  /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml  -e  /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml  -e  /usr/sh
are/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/network-e
nvironment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml -e /home/stack/comp.yaml


Failed:
ERROR: Authentication failed: Authentication required



Checking the reason - I see the infamous:
nt_check=false.
Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.
Error: Could not prefetch keystone_tenant provider 'openstack': Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the serve
r from fulfilling your request. (HTTP 500) (Request-ID: req-4863c7ed-00d5-4f7d-a586-73a0a22f0471)
Error: Execution of '/usr/bin/openstack project create --format shell services --enable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource
Default
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[services]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell services --e
nable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default
Error: Could not prefetch keystone_role provider 'openstack': Execution of '/usr/bin/openstack role list --quiet --format csv' returned 1: An unexpected error prevented the server from fulfi
lling your request. (HTTP 500) (Request-ID: req-6fb5f53a-85b7-48a9-b2b7-9b264ef73cb5)
Error: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c4d83
5a0-ec16-4bd1-a197-273f8ca303c2)
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: A
n unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c4d835a0-ec16-4bd1-a197-273f8ca303c2)
Error: Execution of '/usr/bin/openstack project create --format shell openstack --enable --description admin tenant --domain Default' returned 1: Could not find resource Default
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[openstack]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell openstack -
-enable --description admin tenant --domain Default' returned 1: Could not find resource Default
Error: Could not prefetch keystone_user provider 'openstack': Execution of '/usr/bin/openstack user list --quiet --format csv --long' returned 1: An unexpected error prevented the server fro
m fulfilling your request. (HTTP 500) (Request-ID: req-95a6f3f6-010e-4050-aa3c-5156d721f222)
Error: Execution of '/usr/bin/openstack user create --format shell admin --enable --password yAseqzgdaBdwa2nhpT4KrgRjF --email root@localhost --domain Default' returned 1: Could not find res
ource Default
Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --pas
sword yAseqzgdaBdwa2nhpT4KrgRjF --email root@localhost --domain Default' returned 1: Could not find resource Default
Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@openstack]: Skipping because of failed dependencies

Comment 6 Marios Andreou 2016-04-13 05:37:43 UTC
(In reply to Alexander Chuzhoy from comment #5)
> An updated attempted with:
>  openstack overcloud deploy  --templates
> /usr/share/openstack-tripleo-heat-templates -e  
> /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-pup
> pet.yaml  -e 
> /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.
> yaml  -e 
> /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.
> yaml  -e  /usr/sh
> are/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.
> yaml -e
> /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.
> yaml -e /home/stack/network-e
> nvironment.yaml -e
> /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-
> pacemaker-converge.yaml -e /home/stack/comp.yaml
> 
> 
> Failed:
> ERROR: Authentication failed: Authentication required
> 
> 

To be clear, this ^^^ was on the same environment where the issue was first seen right? If I understood correctly it didn't reproduce on another box with the same setup (sat5) packages etc.

Comment 7 Mike Burns 2016-04-18 13:09:20 UTC
Sasha,  is this reproducible anywhere else?  Can you try and if not, do you object to closing this out?

Comment 8 Alexander Chuzhoy 2016-04-19 15:59:17 UTC
Reproduced.

Comment 9 Graeme Gillies 2016-04-19 22:14:12 UTC
I reproduced this as well. Looking on the boxes it seems like the database connection is using the wrong vip (it looks like the vips change as part of the upgrade?)

Comment 11 Marios Andreou 2016-04-20 14:22:46 UTC
Created attachment 1149107 [details]
keystone log interesting parts

Comment 12 Marios Andreou 2016-04-20 14:32:57 UTC
Still not clear why this is happening yet, but some update/debug info below. @ggillies I don't think what you're seeing is the same thing since we don't expect the vips to move on upgrade and they don't in the case of the env for this bug.

From director's side it fails exactly at https://code.engineering.redhat.com/gerrit/#/c/67039/1/puppet/manifests/overcloud_controller_pacemaker.pp and that's where we get the keystone setup. It's not clear what is specific about this environment (sat5 seems to be the common denominator for repro @sasha right?) to induce the error, which looks to be an authorization error. I think the relevant part in the keystone log is like:

2016-04-19 15:37:56.039 13537 ERROR keystone.common.wsgi [req-964871a5-60b2-4974-86e0-7489b8ea98d3 - - - - -] (_mysql_exceptions.OperationalError) (1045, "Access denied for user 'keystone'@'192.168.100.14' (using password: YES)")

fuller trace with "(_mysql_exceptions.OperationalError" for that is in the attachment https://bugzilla.redhat.com/attachment.cgi?id=1149107  )

I see the versions of keystone* packages on the controller match what I have in my just upgraded setup. I wonder if there is a problem upgrading from a specific version of keystone. @sasha can you check what the just deployed overcloud has for packages ... (rpm -qa would be great) so we can see if that is the difference.

On the environment I also see a difference in the tripleo-overcloud-passwords (missing a couple of entries I have in my env, like OVERCLOUD_RABBITMQ_PASSWORD or HAPROXY_STATS_PASSWORD ) but the undercluod there has the same version tripleoclient as I have so I don't know what causes that discrepancy. what's more the rabbit pass seems to be configured correctly in ctrl0 /etc/rabbitmq/rabbitmq.config "{default_pass, <<"TnBhNMb9wuuWJrqcsk3fXyTcq">>}" and if you see the trace at https://bugzilla.redhat.com/attachment.cgi?id=1149107 from keystone log just before the error we see it connecting to rabbitmq fine like "2016-04-19 15:05:56.190 13540 INFO oslo.messaging._drivers.impl_rabbit [req-2e35e0da-668f-47d4-8114-b48c408dd9e0 - - - - -] Connected to AMQP server on 192.168.100.14:5672".
Sasha do you know why the tripleo-overcloud-passwords is different here did you change it manually?

Sasha were you able to  recover the cluster (in particular keystone) last time or end up reprovisioning?  


thanks, marios

Comment 13 Marios Andreou 2016-04-20 14:36:24 UTC
Created attachment 1149113 [details]
journalctl full trace from os-collect-config error during upgrade

Comment 14 Alexander Chuzhoy 2016-04-20 20:18:18 UTC
The issue is intermittent as I was able to upgrade with sat5.

Comment 15 Graeme Gillies 2016-04-20 22:24:33 UTC
(In reply to marios from comment #12)
> Still not clear why this is happening yet, but some update/debug info below.
> @ggillies I don't think what you're seeing is the same thing since we don't
> expect the vips to move on upgrade and they don't in the case of the env for
> this bug.
> 
> From director's side it fails exactly at
> https://code.engineering.redhat.com/gerrit/#/c/67039/1/puppet/manifests/
> overcloud_controller_pacemaker.pp and that's where we get the keystone
> setup. It's not clear what is specific about this environment (sat5 seems to
> be the common denominator for repro @sasha right?) to induce the error,
> which looks to be an authorization error. I think the relevant part in the
> keystone log is like:
> 
> 2016-04-19 15:37:56.039 13537 ERROR keystone.common.wsgi
> [req-964871a5-60b2-4974-86e0-7489b8ea98d3 - - - - -]
> (_mysql_exceptions.OperationalError) (1045, "Access denied for user
> 'keystone'@'192.168.100.14' (using password: YES)")
> 
> fuller trace with "(_mysql_exceptions.OperationalError" for that is in the
> attachment https://bugzilla.redhat.com/attachment.cgi?id=1149107  )
> 
> I see the versions of keystone* packages on the controller match what I have
> in my just upgraded setup. I wonder if there is a problem upgrading from a
> specific version of keystone. @sasha can you check what the just deployed
> overcloud has for packages ... (rpm -qa would be great) so we can see if
> that is the difference.
> 
> On the environment I also see a difference in the
> tripleo-overcloud-passwords (missing a couple of entries I have in my env,
> like OVERCLOUD_RABBITMQ_PASSWORD or HAPROXY_STATS_PASSWORD ) but the
> undercluod there has the same version tripleoclient as I have so I don't
> know what causes that discrepancy. what's more the rabbit pass seems to be
> configured correctly in ctrl0 /etc/rabbitmq/rabbitmq.config "{default_pass,
> <<"TnBhNMb9wuuWJrqcsk3fXyTcq">>}" and if you see the trace at
> https://bugzilla.redhat.com/attachment.cgi?id=1149107 from keystone log just
> before the error we see it connecting to rabbitmq fine like "2016-04-19
> 15:05:56.190 13540 INFO oslo.messaging._drivers.impl_rabbit
> [req-2e35e0da-668f-47d4-8114-b48c408dd9e0 - - - - -] Connected to AMQP
> server on 192.168.100.14:5672".
> Sasha do you know why the tripleo-overcloud-passwords is different here did
> you change it manually?
> 
> Sasha were you able to  recover the cluster (in particular keystone) last
> time or end up reprovisioning?  
> 
> 
> thanks, marios

Hi Marios,

Which environment are you talking about here? In the environment we reproduced this, we definitely have OVERCLOUD_RABBITMQ_PASSWORD set in our tripleo-overcloud-passwords file ( just checked). I seem to be able to reproduce this issue 100% by doing the following

1) Do a new overcloud deploy with a template change that causes it to fail
2) fix the template error and do another overcloud deploy, this will now hit the error above

Likewise doing an upgrade first with an error and then attempting to continue with the error fixed causes the issue as well

Regards,

Graeme

Comment 16 Marios Andreou 2016-04-21 07:22:20 UTC
Hi Graeme - thanks for the extra info. Sorry indeed I should have specified; I was talking about the box on which Sasha hit this issue (irc for login details if you want to have a look/compare).

For sasha the common denominator for repro seems to be simply 'upgrade 7.3 to 8 with sat5' - and even then my understanding is it isn't 100% of the time.

So in your case, you deploy an overcloud 'with a template error' (are you just trying to induce an arbitrary error for testing for example?). At which point does the original deploy fail? I mean do the services come up/start/is the overcloud functional at that point? You mentioned VIPs having moved/recreated in comment #9 so it must have gotten as far as the puppet manifest/controller post config if they were created on deploy and then moved on the update? 

After it fails, you fix/not the template issue and update the orignal failed stack deployment to see the "Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1 this keystone issue". Did you try and recover the original setup before running the new update - in general for the upgrades workflow we recommend that the pcs cluster is at least recovered (not in maintenance mode, services running) before trying to re-run a failed upgrade step (stack update). I mention upgrades because earlier in comment #10 you said this was part of upgrades testing. Which repos are you using for your upgrades testing btw - puddle/poodle/subscription ? Did you also hit this following the upgrades workflow or only with the repro from comment #15?

thanks, marios

Comment 17 Marios Andreou 2016-04-21 08:28:04 UTC
@sasha I see a discrepancy in the HEAT_STACK_DOMAIN_PASSWORD of the environment you gave me access to for this bug - do you have any idea why this would be (did the tripleo-overcloud-passwords file change from one of the earlier upgrades steps?). 


from tripleo-overcloud-passwords:
OVERCLOUD_HEAT_STACK_DOMAIN_PASSWORD=endkthAaPEc43vQEN8YHhUUBw

[stack@instack ~]$ for i in $(nova list|grep controller|grep ctlplane|awk -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh heat-admin@$i 'echo ********; hostname ; echo ********* ; sudo grep -rni ".*domain.*password" /etc/*; echo ""' ; done
********
overcloud-controller-0.localdomain
*********
grep: /etc/extlinux.conf: No such file or directory
/etc/heat/heat.conf:180:#stack_domain_admin_password = <None>
/etc/heat/heat.conf:181:stack_domain_admin_password = endkthAaPEc43vQEN8YHhUUBw
/etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None>
/etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide a password. To
/etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz
/etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: Pfu26vnCsgvyGBwGuPqAWrFpz

********
overcloud-controller-1.localdomain
*********
grep: /etc/extlinux.conf: No such file or directory
/etc/heat/heat.conf:180:#stack_domain_admin_password = <None>
/etc/heat/heat.conf:181:stack_domain_admin_password = Pfu26vnCsgvyGBwGuPqAWrFpz
/etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None>
/etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide a password. To
/etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz
/etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: Pfu26vnCsgvyGBwGuPqAWrFpz

********
overcloud-controller-2.localdomain
*********
grep: /etc/extlinux.conf: No such file or directory
/etc/heat/heat.conf:180:#stack_domain_admin_password = <None>
/etc/heat/heat.conf:181:stack_domain_admin_password = Pfu26vnCsgvyGBwGuPqAWrFpz
/etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None>
/etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide a password. To
/etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz
/etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: Pfu26vnCsgvyGBwGuPqAWrFpz

[stack@instack ~]$

Comment 18 Marios Andreou 2016-04-21 15:23:14 UTC
(In reply to marios from comment #17)
> @sasha I see a discrepancy in the HEAT_STACK_DOMAIN_PASSWORD of the
> environment you gave me access to for this bug - do you have any idea why
> this would be (did the tripleo-overcloud-passwords file change from one of
> the earlier upgrades steps?). 

> 
> from tripleo-overcloud-passwords:
> OVERCLOUD_HEAT_STACK_DOMAIN_PASSWORD=endkthAaPEc43vQEN8YHhUUBw

in fact there are more (thanks gfidente - there was a bug last week related to the tripleo-passwords-file being regenerated as the deploy was run from a different directory?). Nova as an example:


[stack@instack ~]$ cat tripleo-overcloud-passwords 
OVERCLOUD_NOVA_PASSWORD=XcnhRNUfsqUfWP39P4qgGVshQ

but a different password is configured for nova api:

[stack@instack ~]$ for i in $(nova list|grep controller|grep ctlplane|awk -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh heat-admin@$i 'echo ********; hostname ; echo ********* ; sudo grep -rni ".*nova.*password" /etc/*; echo ""' ; done
********
overcloud-controller-0.localdomain
*********
grep: /etc/extlinux.conf: No such file or directory
/etc/neutron/neutron.conf:402:# nova_admin_password =
/etc/neutron/neutron.conf.rpmnew:384:# nova_admin_password =
/etc/puppet/hieradata/controller.yaml:465:nova::api::admin_password: s7ANCCd8MPXTxcVC4ApNPN8zX
/etc/puppet/hieradata/controller.yaml:477:nova::db::mysql::password: s7ANCCd8MPXTxcVC4ApNPN8zX
/etc/puppet/hieradata/controller.yaml:481:nova::network::neutron::neutron_admin_password: 48DumaJDkzwV88wAD8QhFwPVt
/etc/puppet/hieradata/controller.yaml:483:nova::rabbit_password: TnBhNMb9wuuWJrqcsk3fXyTcq

********
overcloud-controller-1.localdomain
*********
grep: /etc/extlinux.conf: No such file or directory
/etc/neutron/neutron.conf:402:# nova_admin_password =
/etc/neutron/neutron.conf.rpmnew:384:# nova_admin_password =
/etc/puppet/hieradata/controller.yaml:465:nova::api::admin_password: s7ANCCd8MPXTxcVC4ApNPN8zX
/etc/puppet/hieradata/controller.yaml:477:nova::db::mysql::password: s7ANCCd8MPXTxcVC4ApNPN8zX
/etc/puppet/hieradata/controller.yaml:481:nova::network::neutron::neutron_admin_password: 48DumaJDkzwV88wAD8QhFwPVt
/etc/puppet/hieradata/controller.yaml:483:nova::rabbit_password: TnBhNMb9wuuWJrqcsk3fXyTcq

********
overcloud-controller-2.localdomain
*********
grep: /etc/extlinux.conf: No such file or directory
/etc/neutron/neutron.conf:402:# nova_admin_password =
/etc/neutron/neutron.conf.rpmnew:384:# nova_admin_password =
/etc/puppet/hieradata/controller.yaml:465:nova::api::admin_password: s7ANCCd8MPXTxcVC4ApNPN8zX
/etc/puppet/hieradata/controller.yaml:477:nova::db::mysql::password: s7ANCCd8MPXTxcVC4ApNPN8zX
/etc/puppet/hieradata/controller.yaml:481:nova::network::neutron::neutron_admin_password: 48DumaJDkzwV88wAD8QhFwPVt
/etc/puppet/hieradata/controller.yaml:483:nova::rabbit_password: TnBhNMb9wuuWJrqcsk3fXyTcq


> 
> [stack@instack ~]$ for i in $(nova list|grep controller|grep ctlplane|awk
> -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh heat-admin@$i 'echo
> ********; hostname ; echo ********* ; sudo grep -rni ".*domain.*password"
> /etc/*; echo ""' ; done
> ********
> overcloud-controller-0.localdomain
> *********
> grep: /etc/extlinux.conf: No such file or directory
> /etc/heat/heat.conf:180:#stack_domain_admin_password = <None>
> /etc/heat/heat.conf:181:stack_domain_admin_password =
> endkthAaPEc43vQEN8YHhUUBw
> /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None>
> /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide
> a password. To
> /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::
> domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz
> /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password:
> Pfu26vnCsgvyGBwGuPqAWrFpz
> 
> ********
> overcloud-controller-1.localdomain
> *********
> grep: /etc/extlinux.conf: No such file or directory
> /etc/heat/heat.conf:180:#stack_domain_admin_password = <None>
> /etc/heat/heat.conf:181:stack_domain_admin_password =
> Pfu26vnCsgvyGBwGuPqAWrFpz
> /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None>
> /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide
> a password. To
> /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::
> domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz
> /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password:
> Pfu26vnCsgvyGBwGuPqAWrFpz
> 
> ********
> overcloud-controller-2.localdomain
> *********
> grep: /etc/extlinux.conf: No such file or directory
> /etc/heat/heat.conf:180:#stack_domain_admin_password = <None>
> /etc/heat/heat.conf:181:stack_domain_admin_password =
> Pfu26vnCsgvyGBwGuPqAWrFpz
> /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None>
> /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide
> a password. To
> /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::
> domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz
> /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password:
> Pfu26vnCsgvyGBwGuPqAWrFpz
> 
> [stack@instack ~]$

Comment 19 Marios Andreou 2016-04-21 15:24:25 UTC
(In reply to marios from comment #18)
> (In reply to marios from comment #17)
> > @sasha I see a discrepancy in the HEAT_STACK_DOMAIN_PASSWORD of the
> > environment you gave me access to for this bug - do you have any idea why
> > this would be (did the tripleo-overcloud-passwords file change from one of
> > the earlier upgrades steps?). 
> 
> > 
> > from tripleo-overcloud-passwords:
> > OVERCLOUD_HEAT_STACK_DOMAIN_PASSWORD=endkthAaPEc43vQEN8YHhUUBw
> 
> in fact there are more (thanks gfidente - there was a bug last week related

jistr, correction sorry

Comment 20 Graeme Gillies 2016-04-22 00:05:03 UTC
(In reply to marios from comment #16)
> Hi Graeme - thanks for the extra info. Sorry indeed I should have specified;
> I was talking about the box on which Sasha hit this issue (irc for login
> details if you want to have a look/compare).
> 
> For sasha the common denominator for repro seems to be simply 'upgrade 7.3
> to 8 with sat5' - and even then my understanding is it isn't 100% of the
> time.
> 
> So in your case, you deploy an overcloud 'with a template error' (are you
> just trying to induce an arbitrary error for testing for example?). At which
> point does the original deploy fail? I mean do the services come up/start/is
> the overcloud functional at that point? You mentioned VIPs having
> moved/recreated in comment #9 so it must have gotten as far as the puppet
> manifest/controller post config if they were created on deploy and then
> moved on the update? 
> 
> After it fails, you fix/not the template issue and update the orignal failed
> stack deployment to see the "Execution of '/usr/bin/openstack project list
> --quiet --format csv --long' returned 1 this keystone issue". Did you try
> and recover the original setup before running the new update - in general
> for the upgrades workflow we recommend that the pcs cluster is at least
> recovered (not in maintenance mode, services running) before trying to
> re-run a failed upgrade step (stack update). I mention upgrades because
> earlier in comment #10 you said this was part of upgrades testing. Which
> repos are you using for your upgrades testing btw -
> puddle/poodle/subscription ? Did you also hit this following the upgrades
> workflow or only with the repro from comment #15?
> 
> thanks, marios

So the scenarios we reproduced this was organic (we actually did accidentally have errors) but when attempting to reproduce it we noticed that for us it requires a template deployment error first (but that doesn't mean that's the only way to reproduce it).

In our case it was failing at the step that looks like

overcloud-ControllerNodesPostDeployment-xmxqudx3njvt-ControllerOvercloudServicesDeployment_Step4-7iiqcv4gxrmc

Hope this helps.

Regards,

Graeme

Comment 21 Giulio Fidente 2016-04-22 17:01:45 UTC
hi Graeme, 

I am trying to understand how to reproduce this. I read the following:

1) Do a new overcloud deploy with a template change that causes it to fail
2) fix the template error and do another overcloud deploy, this will now hit the error above

In 1) did you mean to use 'openstack overcloud deploy' against a pre-existing overcloud deployed with 7.3 or a pre-existing overcloud deployed with 8?

Also, the specific template error you introduce is relevant in this case because if it tricks heat into thinking that the network resources are in failed state, it will try to recreate them on the further attempt... and that could change your VIP as a side effect, which is what you pointed in comment #9; so which error do you introduce to make it fail the first time?

Comment 23 Jaromir Coufal 2016-07-12 14:12:48 UTC
Let's just verify, seems that there was issue with password being re-generated.

Comment 24 Marios Andreou 2016-08-08 12:09:43 UTC
what is the status here can we close this bug or is it still a thing?

Comment 26 Marios Andreou 2016-08-10 08:28:45 UTC
Hi John, this is precisely what I was trying to call out with my comment #24 and the needinfo on the original bz reporter. There is a lot of discussion above but I think the root cause of the issue here is suspected to be that discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the overcloud passwords because it was being run in a different directory (or in any case in a directory thaHi John, this is precisely what I was trying to call out with my comment #24 and the needinfo on the original bz reporter. There is a lot of discussion above but I think the root cause of the issue here is suspected to be that discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the overcloud passwords because it was being run in a different directory (or in any case in a directory that didn't contain the overcloud passwords file).

There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.t didn't contain the overcloud passwords file).

There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.

Comment 27 Marios Andreou 2016-08-10 08:31:50 UTC
o/ sasha can you please check comments 24-26 can we close this bug

Comment 28 Marios Andreou 2016-08-10 08:34:05 UTC
(In reply to marios from comment #26)
> Hi John, this is precisely what I was trying to call out with my comment #24
> and the needinfo on the original bz reporter. There is a lot of discussion
> above but I think the root cause of the issue here is suspected to be that
> discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the
> overcloud passwords because it was being run in a different directory (or in
> any case in a directory thaHi John, this is precisely what I was trying to
> call out with my comment #24 and the needinfo on the original bz reporter.
> There is a lot of discussion above but I think the root cause of the issue
> here is suspected to be that discussed in comment #18 and comment #19 - i.e.
> the deploy wasn't using the overcloud passwords because it was being run in
> a different directory (or in any case in a directory that didn't contain the
> overcloud passwords file).

Apologies for the copy/paste foo above, hopefully it mostly made sense o_O but incase:


Hi John, this is precisely what I was trying to call out with my comment #24 and the needinfo on the original bz reporter. There is a lot of discussion above but I think the root cause of the issue here is suspected to be that discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the overcloud passwords because it was being run in a different directory (or in any case in a directory that didn't contain the overcloud passwords file).

There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.t didn't contain the overcloud passwords file).

There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.

Comment 29 Marios Andreou 2016-09-02 12:42:36 UTC
closing this... @jcoufal let me know if you disagree see comment 24 & comment 28

Comment 30 Jaromir Coufal 2016-09-06 18:03:37 UTC
Sounds good, reopen if the issue reproduces and the identified root cause is different then mentioned above.


Note You need to log in before you can comment on or make changes to this bug.