Bug 1320612

Summary: Authorization Failed: Unable to establish a connection at the end of overcloud "successful" install
Product: Red Hat OpenStack Reporter: rlopez
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: dbecker, jcoufal, jdennis, mburns, mcornea, morazi, nkinder, rhel-osp-director-maint, rlopez, srevivo
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-11 12:55:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description rlopez 2016-03-23 15:59:01 UTC
Description of problem:

Using osp8 beta 8, when installing the overcloud the stack completes but ends with:

Stack overcloud CREATE_COMPLETE
PKI initialization in init-keystone is deprecated and will be removed.
Warning: Permanently added '192.0.2.21' (ECDSA) to the list of known hosts.
No handlers could be found for logger "oslo_config.cfg"
2016-03-22 22:57:23.925 44140 WARNING keystone.cmd.cli [-] keystone-manage pki_setup is not recommended for production use.
The following cert files already exist, use --rebuild to remove the existing files before regenerating:
/etc/keystone/ssl/certs/ca.pem already exists
/etc/keystone/ssl/private/signing_key.pem already exists
/etc/keystone/ssl/certs/signing_cert.pem already exists
Connection to 192.0.2.21 closed.
Authorization Failed: Unable to establish connection to http://10.19.137.121:5000/v2.0/tokens

When attempting a few commands when sourced overcloudrc

$ source overcloudrc 

$ nova list
No handlers could be found for logger "keystoneauth.identity.generic.base"
ERROR (ConnectFailure): Unable to establish connection to http://10.19.137.121:5000/v2.0/tokens

$ nova service-list
No handlers could be found for logger "keystoneauth.identity.generic.base"
ERROR (ConnectFailure): Unable to establish connection to http://10.19.137.121:5000/v2.0/tokens


$ openstack endpoint list
Discovering versions from the identity service failed when creating the password plugin. Attempting to determine version from URL.
Unable to establish connection to http://10.19.137.121:5000/v2.0/tokens


Looking at the os-collect-config log of controller-0:

# journalctl -u os-collect-config | grep -e ERROR
Mar 22 22:33:49 overcloud-controller-0.localdomain os-collect-config[5486]: ice: /Stage[main]/Pacemaker::Stonith/Exec[Disable STONITH]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Loadbalancer/Haproxy::Listen[redis]/Concat::Fragment[haproxy-redis_listen_block]/File[/var/lib/puppet/concat/_etc_haproxy_haproxy.cfg/fragments/20-redis-00_haproxy-redis_listen_block]/ensure: defined content as '{md5}5606f5658028844284781f92ea4595ea'\u001b[0m\n\u001b[mNotice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]: Triggered 'refresh' from 41 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content: content changed '{md5}1f337186b0e1ba5ee82760cb437fb810' to '{md5}f8964e40f5be369103730123076dce57'\u001b[0m\n\u001b[mNotice: /File[/etc/haproxy/haproxy.cfg]/seluser: seluser changed 'unconfined_u' to 'system_u'\u001b[0m\n\u001b[mNotice: Finished catalog run in 70.37 seconds\u001b[0m\n", "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL
Mar 22 22:33:49 overcloud-controller-0.localdomain os-collect-config[5486]: server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n", "deploy_status_code": 0}
Mar 22 22:33:49 overcloud-controller-0.localdomain os-collect-config[5486]: Error: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Mar 22 22:33:49 overcloud-controller-0.localdomain os-collect-config[5486]: Error: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Mar 22 22:37:01 overcloud-controller-0.localdomain os-collect-config[5486]: u001b[0m\n\u001b[mNotice: Finished catalog run in 117.63 seconds\u001b[0m\n", "deploy_stderr": "Could not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\nCould not retrieve fact='apache_version', resolution='<anonymous>': undefined method `[]' for nil:NilClass\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\u001b[0m\n", "deploy_status_code": 0}
Mar 22 22:37:01 overcloud-controller-0.localdomain os-collect-config[5486]: Error: Could not prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
Mar 22 22:37:01 overcloud-controller-0.localdomain os-collect-config[5486]: Error: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)


Within that same controller 0 this is what I see relating to the SQL process:

# ps -ef | grep mysql
root     41042 38174  0 15:48 pts/0    00:00:00 grep --color=auto mysql
root     42837     1  0 Mar22 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysqld.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
mysql    43729 42837  2 Mar22 ?        00:22:19 /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep-provider=/usr/lib64/galera/libgalera_smm.so --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 --log-error=/var/log/mysqld.log --open-files-limit=16384 --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1



This particular IP address http://10.19.137.121:5000/v2.0/tokens , I find in 3 locations:

 $ grep -r 10.19.137 /home/stack/
/home/stack/templates/network-environment.yaml:  ExternalAllocationPools: [{'start': '10.19.137.121', 'end': '10.19.137.151'}]
/home/stack/overcloudrc:export no_proxy=,10.19.137.121
/home/stack/overcloudrc:export OS_AUTH_URL=http://10.19.137.121:5000/v2.0

Within my controller ndoes I do not see this 10.19.137.121 address anywhere. Should I?

My external (br-ex) for my controller nodes (3 total) are as follows:

controller0: 10.19.137.124/21
controller1: 10.19.137.122/21
controller2: 10.19.137.123/21

I found this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1311382 that mentions the "No handlers could be found for logger "oslo_config.cfg" "

However, while the BZ is closed it doesn't actually mention how it was fixed.

Version-Release number of selected component (if applicable):
osp8 beta8

How reproducible:

I've been able to reproduce this multiple times already. 2 times in osp beta7 and now I just recently moved to osp beta8.


Steps to Reproduce:
1. Deploy undercloud
2. Deploy overcloud
3.

Comment 2 Nathan Kinder 2016-03-23 16:09:48 UTC
(In reply to rlopez from comment #0)
> I found this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1311382 that
> mentions the "No handlers could be found for logger "oslo_config.cfg" "
> 
> However, while the BZ is closed it doesn't actually mention how it was fixed.

This bug mentions that the real issue was in bug#1310956.  There are some root cause details there.

This should not be field against the openstack-keystone component.  I'm moving this to be filed against rhel-osp-director, but you should take a look at the bug referenced above to see if it's the same issue.

Comment 3 rlopez 2016-03-23 16:15:02 UTC
(In reply to Nathan Kinder from comment #2)
> (In reply to rlopez from comment #0)
> > I found this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1311382 that
> > mentions the "No handlers could be found for logger "oslo_config.cfg" "
> > 
> > However, while the BZ is closed it doesn't actually mention how it was fixed.
> 
> This bug mentions that the real issue was in bug#1310956.  There are some
> root cause details there.
> 
> This should not be field against the openstack-keystone component.  I'm
> moving this to be filed against rhel-osp-director, but you should take a
> look at the bug referenced above to see if it's the same issue.

Thanks Nathan for that very prompt reply. I will check out that bug you referenced.

Comment 4 rlopez 2016-03-23 16:27:07 UTC
(In reply to rlopez from comment #3)
> (In reply to Nathan Kinder from comment #2)
> > (In reply to rlopez from comment #0)
> > > I found this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1311382 that
> > > mentions the "No handlers could be found for logger "oslo_config.cfg" "
> > > 
> > > However, while the BZ is closed it doesn't actually mention how it was fixed.
> > 
> > This bug mentions that the real issue was in bug#1310956.  There are some
> > root cause details there.
> > 
> > This should not be field against the openstack-keystone component.  I'm
> > moving this to be filed against rhel-osp-director, but you should take a
> > look at the bug referenced above to see if it's the same issue.
> 
> Thanks Nathan for that very prompt reply. I will check out that bug you
> referenced.

Looking at the bug, I don't think its the same problem but obviously not sure.

A couple things, in that bug mentioned he has errors with the overcloud install. My install is status create_completed

$ heat stack-list
+--------------------------------------+------------+-----------------+---------------------+--------------+
| id                                   | stack_name | stack_status    | creation_time       | updated_time |
+--------------------------------------+------------+-----------------+---------------------+--------------+
| 0d837169-b514-4cc5-bd34-e712a41f7656 | overcloud  | CREATE_COMPLETE | 2016-03-22T22:13:00 | None         |
+--------------------------------------+------------+-----------------+---------------------+--------------+

heat resource-list overcloud does not list any errors. all resources show CREATE_COMPLETE.

Comment 5 Marius Cornea 2016-03-23 16:38:00 UTC
Can you check that you get connectivity(ping, curl on port 5000) to 10.19.137.121 from the undercloud node? There is a requirement for the undercloud to be able to reach the overcloud external network.

Comment 6 rlopez 2016-03-23 16:53:44 UTC
(In reply to Marius Cornea from comment #5)
> Can you check that you get connectivity(ping, curl on port 5000) to
> 10.19.137.121 from the undercloud node? There is a requirement for the
> undercloud to be able to reach the overcloud external network.

Marius,

I can't ping 10.19.137.121 at all. It's as if its not assigned to anything.

My external (br-ex) for my controller nodes (3 total) are as follows:

controller0: 10.19.137.124/21
controller1: 10.19.137.122/21
controller2: 10.19.137.123/21

Comment 7 Marius Cornea 2016-03-23 17:00:10 UTC
10.19.137.121 is the public VIP and should be set by a pacemaker resource on one of the controllers (check 'pcs status'). If it's correctly set should be able to see a 10.19.137.121/32 address on one of them.

Comment 8 rlopez 2016-03-23 17:24:57 UTC
Marius,

So on the undercloud is where I attempted the ping of 10.19.137.121 that doesn't ping.

At the controller level, yes I can ping 10.19.137.121


Within controller-0 curl gives:

# curl http://10.19.137.121:5000
{"versions": {"values": [{"status": "stable", "updated": "2015-03-30T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.4", "links": [{"href": "http://10.19.137.121:5000/v3/", "rel": "self"}]}, {"status": "stable", "updated": "2014-04-17T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://10.19.137.121:5000/v2.0/", "rel": "self"}, {"href": "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}}

# pcs status
Cluster name: tripleo_cluster
Last updated: Wed Mar 23 17:20:21 2016		Last change: Tue Mar 22 22:52:23 2016 by root via cibadmin on overcloud-controller-0
Stack: corosync
Current DC: overcloud-controller-0 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
3 nodes and 112 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-192.0.2.21	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.16.1.11	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-10.19.137.121	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
 ip-172.16.2.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.16.1.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-172.16.3.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2

[Rest of Output Omitted]

I can see that the 10.19.137.121 address is part of overcloud-controller-2


So is my issue that my undercloud can't reach 10.19.137.121?

Comment 9 Marius Cornea 2016-03-23 17:27:47 UTC
(In reply to rlopez from comment #8)
> Marius,
> 
> So on the undercloud is where I attempted the ping of 10.19.137.121 that
> doesn't ping.
> 
> At the controller level, yes I can ping 10.19.137.121
> 
> 
> Within controller-0 curl gives:
> 
> # curl http://10.19.137.121:5000
> {"versions": {"values": [{"status": "stable", "updated":
> "2015-03-30T00:00:00Z", "media-types": [{"base": "application/json", "type":
> "application/vnd.openstack.identity-v3+json"}], "id": "v3.4", "links":
> [{"href": "http://10.19.137.121:5000/v3/", "rel": "self"}]}, {"status":
> "stable", "updated": "2014-04-17T00:00:00Z", "media-types": [{"base":
> "application/json", "type":
> "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links":
> [{"href": "http://10.19.137.121:5000/v2.0/", "rel": "self"}, {"href":
> "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}]}}
> 
> # pcs status
> Cluster name: tripleo_cluster
> Last updated: Wed Mar 23 17:20:21 2016		Last change: Tue Mar 22 22:52:23
> 2016 by root via cibadmin on overcloud-controller-0
> Stack: corosync
> Current DC: overcloud-controller-0 (version 1.1.13-10.el7_2.2-44eb2dd) -
> partition with quorum
> 3 nodes and 112 resources configured
> 
> Online: [ overcloud-controller-0 overcloud-controller-1
> overcloud-controller-2 ]
> 
> Full list of resources:
> 
>  Clone Set: haproxy-clone [haproxy]
>      Started: [ overcloud-controller-0 overcloud-controller-1
> overcloud-controller-2 ]
>  ip-192.0.2.21	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
>  ip-172.16.1.11	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
>  ip-10.19.137.121	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
>  ip-172.16.2.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
>  ip-172.16.1.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
>  ip-172.16.3.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
> 
> [Rest of Output Omitted]
> 
> I can see that the 10.19.137.121 address is part of overcloud-controller-2
> 
> 
> So is my issue that my undercloud can't reach 10.19.137.121?

Yes, the undercloud needs to be able to reach 10.19.137.121.

Comment 10 rlopez 2016-03-23 17:30:38 UTC
Once I fix this undercloud problem not being able to hit 10.19.137.121, do I need to restart any services?

Thank you so much btw for all your help.

Comment 11 Marius Cornea 2016-03-23 17:37:34 UTC
(In reply to rlopez from comment #10)
> Once I fix this undercloud problem not being able to hit 10.19.137.121, do I
> need to restart any services?
> 
> Thank you so much btw for all your help.

I'd recommend you to redeploy after you fix the connectivity issue(heat stack-delete overcloud and rerun openstack overcloud deploy command).

Comment 12 rlopez 2016-03-23 18:24:14 UTC
(In reply to Marius Cornea from comment #11)
> (In reply to rlopez from comment #10)
> > Once I fix this undercloud problem not being able to hit 10.19.137.121, do I
> > need to restart any services?
> > 
> > Thank you so much btw for all your help.
> 
> I'd recommend you to redeploy after you fix the connectivity issue(heat
> stack-delete overcloud and rerun openstack overcloud deploy command).

Marius,

How is the vip assigned? The reason I ask is I don't have anything in my undercloud.conf with regards to the 10.19.137.0/24 address and that 10.19.137.121 is coming from the external allocation pool.

Is this correct? Should the vip be an IP from the provisioning network not external network?

Comment 13 Marius Cornea 2016-03-23 18:31:56 UTC
(In reply to rlopez from comment #12)
> (In reply to Marius Cornea from comment #11)
> > (In reply to rlopez from comment #10)
> > > Once I fix this undercloud problem not being able to hit 10.19.137.121, do I
> > > need to restart any services?
> > > 
> > > Thank you so much btw for all your help.
> > 
> > I'd recommend you to redeploy after you fix the connectivity issue(heat
> > stack-delete overcloud and rerun openstack overcloud deploy command).
> 
> Marius,
> 
> How is the vip assigned? The reason I ask is I don't have anything in my
> undercloud.conf with regards to the 10.19.137.0/24 address and that
> 10.19.137.121 is coming from the external allocation pool.
> 
> Is this correct? Should the vip be an IP from the provisioning network not
> external network?

Yes, that's correct - it is the first address of the ExternalAllocationPools range in the network-environment.yaml

Comment 14 rlopez 2016-03-24 04:04:14 UTC
Marius,

Deleted the overcloud and attempted to reinstall but that still failed.

I redeployed my undercloud and then the overcloud and it was successful! Thank you for assisting. Essentially my issue was I didn't properly set the vlan to my blades  hence 10.19.137.121 was only accessible within the controller nodes and not able to communicate with the udnercloud.


Roger

Comment 15 Mike Burns 2016-04-07 21:36:02 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.