Hide Forgot
rhel-osp-director: 7.3->8.0 upgrade, during major-upgrade-pacemaker-converge.yaml step: Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request Environment: openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch openstack-puppet-modules-7.0.17-1.el7ost.noarch instack-undercloud-2.2.7-4.el7ost.noarch Steps to reproduce: 1. deploy 7.3 and populate. 2. Upgrade to 8.0 (via sat5) Result: During the step with /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml template , the upgrade fails: Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications. Error: Could not prefetch keystone_tenant provider 'openstack': Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-842c4377-e60f-40fa-a079-9a9892407472) Error: Execution of '/usr/bin/openstack project create --format shell services --enable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[services]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell services --enable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default Error: Could not prefetch keystone_role provider 'openstack': Execution of '/usr/bin/openstack role list --quiet --format csv' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c00fe00b-b597-4538-917f-681dee425550) Error: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c0cc647e-2113-4e67-9147-fc401c712993) Error: /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c0cc647e-2113-4e67-9147-fc401c712993) Error: Execution of '/usr/bin/openstack project create --format shell openstack --enable --description admin tenant --domain Default' returned 1: Could not find resource Default Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[openstack]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell openstack --enable --description admin tenant --domain Default' returned 1: Could not find resource Default Error: Could not prefetch keystone_user provider 'openstack': Execution of '/usr/bin/openstack user list --quiet --format csv --long' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-e6a2a220-c4ea-42f0-99de-5a9128e1e2f0) Error: Execution of '/usr/bin/openstack user create --format shell admin --enable --password fdcjuGcqMTcbBDewsZrHjgvps --email root@localhost --domain Default' returned 1: Could not find resource Default Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --password fdcjuGcqMTcbBDewsZrHjgvps --email root@localhost --domain Default' returned 1: Could not find resource Default
update as was requested earlier on irc - myself and thrash spent some time poking at the environment here. There is indeed an error in the keystone.log on control0: 2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect 2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi return Connection(*args, **kwargs) 2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__ 2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi super(Connection, self).__init__(*args, **kwargs2) 2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi OperationalError: (_mysql_exceptions.OperationalError) (1045, "Access denied for user 'keystone'@'192.168.100.12' (using password: YES)") 2016-04-12 11:51:56.760 25719 ERROR keystone.common.wsgi FWIW I used the same versions today as in the description above: [stack@instack ~]$ rpm -qa | grep "tripleo-heat-templates\|puppet-modules\|instack-undercloud" openstack-puppet-modules-7.0.17-1.el7ost.noarch instack-undercloud-2.2.7-4.el7ost.noarch openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch and completed the upgrade without hitting this (poodle, virt, 3control h1 1 compute net-iso). We are trying to reproduce on the same environment at the moment.
An updated attempted with: openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-pup pet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/sh are/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/network-e nvironment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml -e /home/stack/comp.yaml Failed: ERROR: Authentication failed: Authentication required Checking the reason - I see the infamous: nt_check=false. Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications. Error: Could not prefetch keystone_tenant provider 'openstack': Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1: An unexpected error prevented the serve r from fulfilling your request. (HTTP 500) (Request-ID: req-4863c7ed-00d5-4f7d-a586-73a0a22f0471) Error: Execution of '/usr/bin/openstack project create --format shell services --enable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[services]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell services --e nable --description Tenant for the openstack services --domain Default' returned 1: Could not find resource Default Error: Could not prefetch keystone_role provider 'openstack': Execution of '/usr/bin/openstack role list --quiet --format csv' returned 1: An unexpected error prevented the server from fulfi lling your request. (HTTP 500) (Request-ID: req-6fb5f53a-85b7-48a9-b2b7-9b264ef73cb5) Error: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c4d83 5a0-ec16-4bd1-a197-273f8ca303c2) Error: /Stage[main]/Keystone::Roles::Admin/Keystone_role[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack role create --format shell admin' returned 1: A n unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-c4d835a0-ec16-4bd1-a197-273f8ca303c2) Error: Execution of '/usr/bin/openstack project create --format shell openstack --enable --description admin tenant --domain Default' returned 1: Could not find resource Default Error: /Stage[main]/Keystone::Roles::Admin/Keystone_tenant[openstack]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack project create --format shell openstack - -enable --description admin tenant --domain Default' returned 1: Could not find resource Default Error: Could not prefetch keystone_user provider 'openstack': Execution of '/usr/bin/openstack user list --quiet --format csv --long' returned 1: An unexpected error prevented the server fro m fulfilling your request. (HTTP 500) (Request-ID: req-95a6f3f6-010e-4050-aa3c-5156d721f222) Error: Execution of '/usr/bin/openstack user create --format shell admin --enable --password yAseqzgdaBdwa2nhpT4KrgRjF --email root@localhost --domain Default' returned 1: Could not find res ource Default Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Execution of '/usr/bin/openstack user create --format shell admin --enable --pas sword yAseqzgdaBdwa2nhpT4KrgRjF --email root@localhost --domain Default' returned 1: Could not find resource Default Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@openstack]: Skipping because of failed dependencies
(In reply to Alexander Chuzhoy from comment #5) > An updated attempted with: > openstack overcloud deploy --templates > /usr/share/openstack-tripleo-heat-templates -e > /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-pup > pet.yaml -e > /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker. > yaml -e > /usr/share/openstack-tripleo-heat-templates/environments/network-isolation. > yaml -e /usr/sh > are/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans. > yaml -e > /usr/share/openstack-tripleo-heat-templates/environments/storage-environment. > yaml -e /home/stack/network-e > nvironment.yaml -e > /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade- > pacemaker-converge.yaml -e /home/stack/comp.yaml > > > Failed: > ERROR: Authentication failed: Authentication required > > To be clear, this ^^^ was on the same environment where the issue was first seen right? If I understood correctly it didn't reproduce on another box with the same setup (sat5) packages etc.
Sasha, is this reproducible anywhere else? Can you try and if not, do you object to closing this out?
Reproduced.
I reproduced this as well. Looking on the boxes it seems like the database connection is using the wrong vip (it looks like the vips change as part of the upgrade?)
Created attachment 1149107 [details] keystone log interesting parts
Still not clear why this is happening yet, but some update/debug info below. @ggillies I don't think what you're seeing is the same thing since we don't expect the vips to move on upgrade and they don't in the case of the env for this bug. From director's side it fails exactly at https://code.engineering.redhat.com/gerrit/#/c/67039/1/puppet/manifests/overcloud_controller_pacemaker.pp and that's where we get the keystone setup. It's not clear what is specific about this environment (sat5 seems to be the common denominator for repro @sasha right?) to induce the error, which looks to be an authorization error. I think the relevant part in the keystone log is like: 2016-04-19 15:37:56.039 13537 ERROR keystone.common.wsgi [req-964871a5-60b2-4974-86e0-7489b8ea98d3 - - - - -] (_mysql_exceptions.OperationalError) (1045, "Access denied for user 'keystone'@'192.168.100.14' (using password: YES)") fuller trace with "(_mysql_exceptions.OperationalError" for that is in the attachment https://bugzilla.redhat.com/attachment.cgi?id=1149107 ) I see the versions of keystone* packages on the controller match what I have in my just upgraded setup. I wonder if there is a problem upgrading from a specific version of keystone. @sasha can you check what the just deployed overcloud has for packages ... (rpm -qa would be great) so we can see if that is the difference. On the environment I also see a difference in the tripleo-overcloud-passwords (missing a couple of entries I have in my env, like OVERCLOUD_RABBITMQ_PASSWORD or HAPROXY_STATS_PASSWORD ) but the undercluod there has the same version tripleoclient as I have so I don't know what causes that discrepancy. what's more the rabbit pass seems to be configured correctly in ctrl0 /etc/rabbitmq/rabbitmq.config "{default_pass, <<"TnBhNMb9wuuWJrqcsk3fXyTcq">>}" and if you see the trace at https://bugzilla.redhat.com/attachment.cgi?id=1149107 from keystone log just before the error we see it connecting to rabbitmq fine like "2016-04-19 15:05:56.190 13540 INFO oslo.messaging._drivers.impl_rabbit [req-2e35e0da-668f-47d4-8114-b48c408dd9e0 - - - - -] Connected to AMQP server on 192.168.100.14:5672". Sasha do you know why the tripleo-overcloud-passwords is different here did you change it manually? Sasha were you able to recover the cluster (in particular keystone) last time or end up reprovisioning? thanks, marios
Created attachment 1149113 [details] journalctl full trace from os-collect-config error during upgrade
The issue is intermittent as I was able to upgrade with sat5.
(In reply to marios from comment #12) > Still not clear why this is happening yet, but some update/debug info below. > @ggillies I don't think what you're seeing is the same thing since we don't > expect the vips to move on upgrade and they don't in the case of the env for > this bug. > > From director's side it fails exactly at > https://code.engineering.redhat.com/gerrit/#/c/67039/1/puppet/manifests/ > overcloud_controller_pacemaker.pp and that's where we get the keystone > setup. It's not clear what is specific about this environment (sat5 seems to > be the common denominator for repro @sasha right?) to induce the error, > which looks to be an authorization error. I think the relevant part in the > keystone log is like: > > 2016-04-19 15:37:56.039 13537 ERROR keystone.common.wsgi > [req-964871a5-60b2-4974-86e0-7489b8ea98d3 - - - - -] > (_mysql_exceptions.OperationalError) (1045, "Access denied for user > 'keystone'@'192.168.100.14' (using password: YES)") > > fuller trace with "(_mysql_exceptions.OperationalError" for that is in the > attachment https://bugzilla.redhat.com/attachment.cgi?id=1149107 ) > > I see the versions of keystone* packages on the controller match what I have > in my just upgraded setup. I wonder if there is a problem upgrading from a > specific version of keystone. @sasha can you check what the just deployed > overcloud has for packages ... (rpm -qa would be great) so we can see if > that is the difference. > > On the environment I also see a difference in the > tripleo-overcloud-passwords (missing a couple of entries I have in my env, > like OVERCLOUD_RABBITMQ_PASSWORD or HAPROXY_STATS_PASSWORD ) but the > undercluod there has the same version tripleoclient as I have so I don't > know what causes that discrepancy. what's more the rabbit pass seems to be > configured correctly in ctrl0 /etc/rabbitmq/rabbitmq.config "{default_pass, > <<"TnBhNMb9wuuWJrqcsk3fXyTcq">>}" and if you see the trace at > https://bugzilla.redhat.com/attachment.cgi?id=1149107 from keystone log just > before the error we see it connecting to rabbitmq fine like "2016-04-19 > 15:05:56.190 13540 INFO oslo.messaging._drivers.impl_rabbit > [req-2e35e0da-668f-47d4-8114-b48c408dd9e0 - - - - -] Connected to AMQP > server on 192.168.100.14:5672". > Sasha do you know why the tripleo-overcloud-passwords is different here did > you change it manually? > > Sasha were you able to recover the cluster (in particular keystone) last > time or end up reprovisioning? > > > thanks, marios Hi Marios, Which environment are you talking about here? In the environment we reproduced this, we definitely have OVERCLOUD_RABBITMQ_PASSWORD set in our tripleo-overcloud-passwords file ( just checked). I seem to be able to reproduce this issue 100% by doing the following 1) Do a new overcloud deploy with a template change that causes it to fail 2) fix the template error and do another overcloud deploy, this will now hit the error above Likewise doing an upgrade first with an error and then attempting to continue with the error fixed causes the issue as well Regards, Graeme
Hi Graeme - thanks for the extra info. Sorry indeed I should have specified; I was talking about the box on which Sasha hit this issue (irc for login details if you want to have a look/compare). For sasha the common denominator for repro seems to be simply 'upgrade 7.3 to 8 with sat5' - and even then my understanding is it isn't 100% of the time. So in your case, you deploy an overcloud 'with a template error' (are you just trying to induce an arbitrary error for testing for example?). At which point does the original deploy fail? I mean do the services come up/start/is the overcloud functional at that point? You mentioned VIPs having moved/recreated in comment #9 so it must have gotten as far as the puppet manifest/controller post config if they were created on deploy and then moved on the update? After it fails, you fix/not the template issue and update the orignal failed stack deployment to see the "Execution of '/usr/bin/openstack project list --quiet --format csv --long' returned 1 this keystone issue". Did you try and recover the original setup before running the new update - in general for the upgrades workflow we recommend that the pcs cluster is at least recovered (not in maintenance mode, services running) before trying to re-run a failed upgrade step (stack update). I mention upgrades because earlier in comment #10 you said this was part of upgrades testing. Which repos are you using for your upgrades testing btw - puddle/poodle/subscription ? Did you also hit this following the upgrades workflow or only with the repro from comment #15? thanks, marios
@sasha I see a discrepancy in the HEAT_STACK_DOMAIN_PASSWORD of the environment you gave me access to for this bug - do you have any idea why this would be (did the tripleo-overcloud-passwords file change from one of the earlier upgrades steps?). from tripleo-overcloud-passwords: OVERCLOUD_HEAT_STACK_DOMAIN_PASSWORD=endkthAaPEc43vQEN8YHhUUBw [stack@instack ~]$ for i in $(nova list|grep controller|grep ctlplane|awk -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh heat-admin@$i 'echo ********; hostname ; echo ********* ; sudo grep -rni ".*domain.*password" /etc/*; echo ""' ; done ******** overcloud-controller-0.localdomain ********* grep: /etc/extlinux.conf: No such file or directory /etc/heat/heat.conf:180:#stack_domain_admin_password = <None> /etc/heat/heat.conf:181:stack_domain_admin_password = endkthAaPEc43vQEN8YHhUUBw /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None> /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide a password. To /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: Pfu26vnCsgvyGBwGuPqAWrFpz ******** overcloud-controller-1.localdomain ********* grep: /etc/extlinux.conf: No such file or directory /etc/heat/heat.conf:180:#stack_domain_admin_password = <None> /etc/heat/heat.conf:181:stack_domain_admin_password = Pfu26vnCsgvyGBwGuPqAWrFpz /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None> /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide a password. To /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: Pfu26vnCsgvyGBwGuPqAWrFpz ******** overcloud-controller-2.localdomain ********* grep: /etc/extlinux.conf: No such file or directory /etc/heat/heat.conf:180:#stack_domain_admin_password = <None> /etc/heat/heat.conf:181:stack_domain_admin_password = Pfu26vnCsgvyGBwGuPqAWrFpz /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None> /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide a password. To /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain::domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: Pfu26vnCsgvyGBwGuPqAWrFpz [stack@instack ~]$
(In reply to marios from comment #17) > @sasha I see a discrepancy in the HEAT_STACK_DOMAIN_PASSWORD of the > environment you gave me access to for this bug - do you have any idea why > this would be (did the tripleo-overcloud-passwords file change from one of > the earlier upgrades steps?). > > from tripleo-overcloud-passwords: > OVERCLOUD_HEAT_STACK_DOMAIN_PASSWORD=endkthAaPEc43vQEN8YHhUUBw in fact there are more (thanks gfidente - there was a bug last week related to the tripleo-passwords-file being regenerated as the deploy was run from a different directory?). Nova as an example: [stack@instack ~]$ cat tripleo-overcloud-passwords OVERCLOUD_NOVA_PASSWORD=XcnhRNUfsqUfWP39P4qgGVshQ but a different password is configured for nova api: [stack@instack ~]$ for i in $(nova list|grep controller|grep ctlplane|awk -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh heat-admin@$i 'echo ********; hostname ; echo ********* ; sudo grep -rni ".*nova.*password" /etc/*; echo ""' ; done ******** overcloud-controller-0.localdomain ********* grep: /etc/extlinux.conf: No such file or directory /etc/neutron/neutron.conf:402:# nova_admin_password = /etc/neutron/neutron.conf.rpmnew:384:# nova_admin_password = /etc/puppet/hieradata/controller.yaml:465:nova::api::admin_password: s7ANCCd8MPXTxcVC4ApNPN8zX /etc/puppet/hieradata/controller.yaml:477:nova::db::mysql::password: s7ANCCd8MPXTxcVC4ApNPN8zX /etc/puppet/hieradata/controller.yaml:481:nova::network::neutron::neutron_admin_password: 48DumaJDkzwV88wAD8QhFwPVt /etc/puppet/hieradata/controller.yaml:483:nova::rabbit_password: TnBhNMb9wuuWJrqcsk3fXyTcq ******** overcloud-controller-1.localdomain ********* grep: /etc/extlinux.conf: No such file or directory /etc/neutron/neutron.conf:402:# nova_admin_password = /etc/neutron/neutron.conf.rpmnew:384:# nova_admin_password = /etc/puppet/hieradata/controller.yaml:465:nova::api::admin_password: s7ANCCd8MPXTxcVC4ApNPN8zX /etc/puppet/hieradata/controller.yaml:477:nova::db::mysql::password: s7ANCCd8MPXTxcVC4ApNPN8zX /etc/puppet/hieradata/controller.yaml:481:nova::network::neutron::neutron_admin_password: 48DumaJDkzwV88wAD8QhFwPVt /etc/puppet/hieradata/controller.yaml:483:nova::rabbit_password: TnBhNMb9wuuWJrqcsk3fXyTcq ******** overcloud-controller-2.localdomain ********* grep: /etc/extlinux.conf: No such file or directory /etc/neutron/neutron.conf:402:# nova_admin_password = /etc/neutron/neutron.conf.rpmnew:384:# nova_admin_password = /etc/puppet/hieradata/controller.yaml:465:nova::api::admin_password: s7ANCCd8MPXTxcVC4ApNPN8zX /etc/puppet/hieradata/controller.yaml:477:nova::db::mysql::password: s7ANCCd8MPXTxcVC4ApNPN8zX /etc/puppet/hieradata/controller.yaml:481:nova::network::neutron::neutron_admin_password: 48DumaJDkzwV88wAD8QhFwPVt /etc/puppet/hieradata/controller.yaml:483:nova::rabbit_password: TnBhNMb9wuuWJrqcsk3fXyTcq > > [stack@instack ~]$ for i in $(nova list|grep controller|grep ctlplane|awk > -F' ' '{ print $12 }'|awk -F'=' '{ print $2 }'); do ssh heat-admin@$i 'echo > ********; hostname ; echo ********* ; sudo grep -rni ".*domain.*password" > /etc/*; echo ""' ; done > ******** > overcloud-controller-0.localdomain > ********* > grep: /etc/extlinux.conf: No such file or directory > /etc/heat/heat.conf:180:#stack_domain_admin_password = <None> > /etc/heat/heat.conf:181:stack_domain_admin_password = > endkthAaPEc43vQEN8YHhUUBw > /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None> > /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide > a password. To > /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain:: > domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz > /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: > Pfu26vnCsgvyGBwGuPqAWrFpz > > ******** > overcloud-controller-1.localdomain > ********* > grep: /etc/extlinux.conf: No such file or directory > /etc/heat/heat.conf:180:#stack_domain_admin_password = <None> > /etc/heat/heat.conf:181:stack_domain_admin_password = > Pfu26vnCsgvyGBwGuPqAWrFpz > /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None> > /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide > a password. To > /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain:: > domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz > /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: > Pfu26vnCsgvyGBwGuPqAWrFpz > > ******** > overcloud-controller-2.localdomain > ********* > grep: /etc/extlinux.conf: No such file or directory > /etc/heat/heat.conf:180:#stack_domain_admin_password = <None> > /etc/heat/heat.conf:181:stack_domain_admin_password = > Pfu26vnCsgvyGBwGuPqAWrFpz > /etc/heat/heat.conf.rpmnew:184:#stack_domain_admin_password = <None> > /etc/libvirt/qemu.conf:134:# per-domain XML config does not already provide > a password. To > /etc/puppet/hieradata/controller.yaml:352:heat::keystone::domain:: > domain_password: Pfu26vnCsgvyGBwGuPqAWrFpz > /etc/puppet/hieradata/controller.yaml:360:heat_stack_domain_admin_password: > Pfu26vnCsgvyGBwGuPqAWrFpz > > [stack@instack ~]$
(In reply to marios from comment #18) > (In reply to marios from comment #17) > > @sasha I see a discrepancy in the HEAT_STACK_DOMAIN_PASSWORD of the > > environment you gave me access to for this bug - do you have any idea why > > this would be (did the tripleo-overcloud-passwords file change from one of > > the earlier upgrades steps?). > > > > > from tripleo-overcloud-passwords: > > OVERCLOUD_HEAT_STACK_DOMAIN_PASSWORD=endkthAaPEc43vQEN8YHhUUBw > > in fact there are more (thanks gfidente - there was a bug last week related jistr, correction sorry
(In reply to marios from comment #16) > Hi Graeme - thanks for the extra info. Sorry indeed I should have specified; > I was talking about the box on which Sasha hit this issue (irc for login > details if you want to have a look/compare). > > For sasha the common denominator for repro seems to be simply 'upgrade 7.3 > to 8 with sat5' - and even then my understanding is it isn't 100% of the > time. > > So in your case, you deploy an overcloud 'with a template error' (are you > just trying to induce an arbitrary error for testing for example?). At which > point does the original deploy fail? I mean do the services come up/start/is > the overcloud functional at that point? You mentioned VIPs having > moved/recreated in comment #9 so it must have gotten as far as the puppet > manifest/controller post config if they were created on deploy and then > moved on the update? > > After it fails, you fix/not the template issue and update the orignal failed > stack deployment to see the "Execution of '/usr/bin/openstack project list > --quiet --format csv --long' returned 1 this keystone issue". Did you try > and recover the original setup before running the new update - in general > for the upgrades workflow we recommend that the pcs cluster is at least > recovered (not in maintenance mode, services running) before trying to > re-run a failed upgrade step (stack update). I mention upgrades because > earlier in comment #10 you said this was part of upgrades testing. Which > repos are you using for your upgrades testing btw - > puddle/poodle/subscription ? Did you also hit this following the upgrades > workflow or only with the repro from comment #15? > > thanks, marios So the scenarios we reproduced this was organic (we actually did accidentally have errors) but when attempting to reproduce it we noticed that for us it requires a template deployment error first (but that doesn't mean that's the only way to reproduce it). In our case it was failing at the step that looks like overcloud-ControllerNodesPostDeployment-xmxqudx3njvt-ControllerOvercloudServicesDeployment_Step4-7iiqcv4gxrmc Hope this helps. Regards, Graeme
hi Graeme, I am trying to understand how to reproduce this. I read the following: 1) Do a new overcloud deploy with a template change that causes it to fail 2) fix the template error and do another overcloud deploy, this will now hit the error above In 1) did you mean to use 'openstack overcloud deploy' against a pre-existing overcloud deployed with 7.3 or a pre-existing overcloud deployed with 8? Also, the specific template error you introduce is relevant in this case because if it tricks heat into thinking that the network resources are in failed state, it will try to recreate them on the further attempt... and that could change your VIP as a side effect, which is what you pointed in comment #9; so which error do you introduce to make it fail the first time?
Let's just verify, seems that there was issue with password being re-generated.
what is the status here can we close this bug or is it still a thing?
Hi John, this is precisely what I was trying to call out with my comment #24 and the needinfo on the original bz reporter. There is a lot of discussion above but I think the root cause of the issue here is suspected to be that discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the overcloud passwords because it was being run in a different directory (or in any case in a directory thaHi John, this is precisely what I was trying to call out with my comment #24 and the needinfo on the original bz reporter. There is a lot of discussion above but I think the root cause of the issue here is suspected to be that discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the overcloud passwords because it was being run in a different directory (or in any case in a directory that didn't contain the overcloud passwords file). There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.t didn't contain the overcloud passwords file). There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.
o/ sasha can you please check comments 24-26 can we close this bug
(In reply to marios from comment #26) > Hi John, this is precisely what I was trying to call out with my comment #24 > and the needinfo on the original bz reporter. There is a lot of discussion > above but I think the root cause of the issue here is suspected to be that > discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the > overcloud passwords because it was being run in a different directory (or in > any case in a directory thaHi John, this is precisely what I was trying to > call out with my comment #24 and the needinfo on the original bz reporter. > There is a lot of discussion above but I think the root cause of the issue > here is suspected to be that discussed in comment #18 and comment #19 - i.e. > the deploy wasn't using the overcloud passwords because it was being run in > a different directory (or in any case in a directory that didn't contain the > overcloud passwords file). Apologies for the copy/paste foo above, hopefully it mostly made sense o_O but incase: Hi John, this is precisely what I was trying to call out with my comment #24 and the needinfo on the original bz reporter. There is a lot of discussion above but I think the root cause of the issue here is suspected to be that discussed in comment #18 and comment #19 - i.e. the deploy wasn't using the overcloud passwords because it was being run in a different directory (or in any case in a directory that didn't contain the overcloud passwords file). There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.t didn't contain the overcloud passwords file). There isn't anything to test atm and no fixed in/by. I think we should close this bug, but I wanted to check with the original bug reporter too.
closing this... @jcoufal let me know if you disagree see comment 24 & comment 28
Sounds good, reopen if the issue reproduces and the identified root cause is different then mentioned above.