Bug 1659866

Summary: Attempts to save initial overcloud state using "tempest cleanup --init-saved-state" fail for multiple reasons
Product: Red Hat OpenStack Reporter: Ganesh Kadam <gkadam>
Component: openstack-tempestAssignee: Chandan Kumar <chkumar>
Status: CLOSED WORKSFORME QA Contact: Martin Kopec <mkopec>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: apevec, chkumar, lhh, slinaber, udesale
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-14 16:19:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ganesh Kadam 2018-12-17 05:05:14 UTC
Description of problem:

One of our Cu is trying to save the initial overcloud state in our RHOSP 13 environment but the --init-saved-state command is failing for multiple reasons:

1. Tempest is still trying to reach Keystone over the adminURL endpoint which is not reachable from the undercloud, the identity section of tempest.conf was updated to force tempest to use the publicURL but it looks like tempest is ignoring this (at least for the cleanup --init-saved-state option)

2. Cu was able to update the python scripts to get past the first issue but then I see that tempest is trying to access v2 Keystone endpoints to list the current roles which has since been deprecated in queens:
~~~
2018-12-01 01:17:18.481 589448 INFO tempest.lib.common.rest_client [req-7079564c-3525-48ce-9a3e-654c4cf94280 ] Request (main): 404 GET http://10.0.3.10:5000/v2.0/OS-KSADM/roles 0.150s
~~~

As for the configuration, Cu is generating that manually.

You won't see one of the issues, Cu is reporting if you are running tempest directly from a controller that can reach the adminURL...in their case, they are running tempest from the undercloud which has no connectivity to the adminURL.

The point is the tempest.conf configuration clearly points to the publicURL of keystone but this is in face being ignored by Tempest

~~~
Command: tempest cleanup --init-saved-state

Errors in logs showing both the attempt to connect over the Internal API network (aka adminURL) AND using the v2 keystone endpoint deprecated in RHOSP queens:

2018-12-12 22:29:11.050 3598 DEBUG tempest.cmd.cleanup_service [-] List count, 2 Domains after reconcile list /usr/lib/python2.7/site-packages/tempest/cmd/cleanup_service.py:897
2018-12-12 22:30:11.112 3598 WARNING urllib3.connectionpool [-] Retrying (Retry(total=9, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f1a1a220290>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')': /v2.0/OS-KSADM/roles: ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7f1a1a220290>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')
2018-12-12 22:31:11.137 3598 WARNING urllib3.connectionpool [-] Retrying (Retry(total=8, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f1a1a220310>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')': /v2.0/OS-KSADM/roles: ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7f1a1a220310>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')
2018-12-12 22:32:11.197 3598 WARNING urllib3.connectionpool [-] Retrying (Retry(total=7, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f1a1a220850>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')': /v2.0/OS-KSADM/roles: ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7f1a1a220850>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')
2018-12-12 22:33:11.254 3598 WARNING urllib3.connectionpool [-] Retrying (Retry(total=6, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f1a1a220690>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')': /v2.0/OS-KSADM/roles: ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7f1a1a220690>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')
2018-12-12 22:34:11.315 3598 WARNING urllib3.connectionpool [-] Retrying (Retry(total=5, connect=None, read=None, redirect=5, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f1a1a220a50>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')': /v2.0/OS-KSADM/roles: ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7f1a1a220a50>, 'Connection to 10.0.3.10 timed out. (connect timeout=60)')
~~~

Version-Release number of selected component (if applicable):

$ rpm -qa | grep tempest
python2-tempest-18.0.0-2.el7ost.noarch
python2-tempestconf-1.1.5-0.20180326143753.f9d956f.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch


Actual results:
Trying to run tempest from undercloud, tempest cleanup save state fails

Expected results:
Ability to run tempest from undercloud, and tempest cleanup save state should not fail.

Comment 2 Martin Kopec 2019-01-07 09:54:29 UTC
The biggest problem I see is the version of python-tempestconf. The tool has gone through a big refactoring and many improvements were implemented in python-tempestconf-2.0.0 and higher. 
So if you used tempest.conf generated by this old version of python-tempestconf, it's possible, that tempest.conf contained credentials or endpoint urls wrongly set, which could result in the issues you described.
I'd recommend to use the newest python-tempestconf package and try again.
I'm gonna try to reproduce this issue using the versions of packages you used and I'll let you know how it went.

Comment 4 Martin Kopec 2019-01-14 16:19:00 UTC
I have tried to reproduce this issue using the latest packages:
rpm -qa | grep tempest
openstack-tempest-18.0.0-5.el7ost.noarch
python2-horizon-tests-tempest-0.0.1-0.20180219094157.a23f407.el7ost.noarch
python2-tempest-18.0.0-5.el7ost.noarch
python2-tempestconf-2.0.0-1.el7ost.noarch

as well as the packages, which were mentioned in the bug description:
$ rpm -qa | grep tempest
python2-tempest-18.0.0-2.el7ost.noarch
python2-tempestconf-1.1.5-0.20180326143753.f9d956f.el7ost.noarch
openstack-tempest-18.0.0-2.el7ost.noarch

but I couldn't reproduce it.

As I mentioned earlier, the biggest problem I see is the old version of python-tempestconf, which went through major refactoring in version 2.0.0 and higher. I'd recommend to update the tempest packages.
Another problem I see and I couldn't take into account in my testing is that Cu generated tempest.conf manually. Please, try to follow this documentation [1].

I'm gonna close this bug, but if you you experience the issue again, please, reopen it.
If you reopen the bug, please, provide the following information:
 - tempest.conf used (the best way to share is via attachments)
 - list of packages installed ($ rpm -qa | grep tempest)
 - the exact steps which lead to the error (from creating tempest workspace, generating tempest.conf to running tempest)

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/openstack_integration_test_suite_guide/index

Best regards,
Martin Kopec