Bug 1420536 - Refresh of infrastructure provider fails with bad request with OSP director as provider
Summary: Refresh of infrastructure provider fails with bad request with OSP director a...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: GA
: 5.8.0
Assignee: Tzu-Mainn Chen
QA Contact: Ola Pavlenko
URL:
Whiteboard: openstack
: 1421421 (view as bug list)
Depends On:
Blocks: 1415544 1420916
TreeView+ depends on / blocked
 
Reported: 2017-02-08 22:20 UTC by michael_rasoulian
Modified: 2020-04-15 15:14 UTC (History)
29 users (show)

Fixed In Version: 5.8.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1415544
: 1420916 1420919 (view as bug list)
Environment:
Last Closed: 2017-06-12 16:42:37 UTC
Category: ---
Cloudforms Team: Openstack
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
EVM log for 4.2 CF appliance (4.04 MB, text/plain)
2017-02-08 22:20 UTC, michael_rasoulian
no flags Details
EVM log for 4.1 appliance (2.64 MB, text/plain)
2017-02-08 22:21 UTC, michael_rasoulian
no flags Details
heat resource-list output n 50 (172.28 KB, text/plain)
2017-02-09 16:03 UTC, michael_rasoulian
no flags Details
heat resource-list output n 2 (124.07 KB, text/plain)
2017-02-09 16:03 UTC, michael_rasoulian
no flags Details
evm.log after one-line change (12.60 MB, text/plain)
2017-02-09 17:59 UTC, michael_rasoulian
no flags Details
fog.log after one-line change (55.55 KB, text/plain)
2017-02-09 18:00 UTC, michael_rasoulian
no flags Details
"rpm -qa | grep openstack" output (2.83 KB, text/plain)
2017-02-09 18:01 UTC, michael_rasoulian
no flags Details

Description michael_rasoulian 2017-02-08 22:20:25 UTC
Created attachment 1248676 [details]
EVM log for 4.2 CF appliance

Description of problem:
After successfully adding OSP director as an infrastructure provider (credentials validate successfully), the refresh fails and no infrastructure data is populated.  The error given is:

<Excon::Error::BadRequest: Expected(200) <=> Actual(400 Bad Request)
excon.error.response

Attached is the evm_4.2.log with more error details.

This issue is not seen in CF 4.1 with the same configuration.  Attached is the evm_4.1.log which refreshes cleanly.


Version-Release number of selected component (if applicable):
CF 4.2

How reproducible:
Refresh is never successful.

Steps to Reproduce:
1. Add OSP director as an infrastructure provider, selecting either AMQP or Ceilometer for Events.
2. Once added, attempt a refresh of the infrastructure to populate collect the data.

Actual results:
Error in status seen in "Last Refresh"

Expected results:
Refresh status successful

Additional info:

Comment 2 michael_rasoulian 2017-02-08 22:21:15 UTC
Created attachment 1248677 [details]
EVM log for 4.1 appliance

Comment 3 Tzu-Mainn Chen 2017-02-09 15:30:59 UTC
Hi!  It looks like the error is caused by something we changed between 4.1 and 4.2 to make the infra provider's refresh more efficient: add a filter to the stack resource list query.  However I can't seem to reproduce the error you're getting.

Could you tell me what version of OSP you're using, and then run the following through the openstack CLI?


nova list
heat resource-list -n 50 --filter 'physical_resource_id=<nova server id 1>;physical_resource_id=<nova server id 2>;physical_resource_id=<etc>'

And then run it again with '-n 2'?


So for example:


[stack@instack ~]$ nova list
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks            |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| fcd55792-2134-4e2f-844a-0511a074d71b | overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.16 |
| dc883110-979a-4706-bf87-cd5954383a64 | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.12 |


[stack@instack ~]$ heat resource-list -n 50 --filter 'physical_resource_id=fcd55792-2134-4e2f-844a-0511a074d71b;physical_resource_id=dc883110-979a-4706-bf87-cd5954383a64' overcloud
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
+---------------+--------------------------------------+---------------------+-----------------+----------------------+--------------------------------------------------+
| resource_name | physical_resource_id                 | resource_type       | resource_status | updated_time         | stack_name                                       |
+---------------+--------------------------------------+---------------------+-----------------+----------------------+--------------------------------------------------+
| Controller    | dc883110-979a-4706-bf87-cd5954383a64 | OS::TripleO::Server | CREATE_COMPLETE | 2017-02-08T20:24:36Z | overcloud-Controller-irlqbti2ss23-0-io26vdivgb32 |
| NovaCompute   | fcd55792-2134-4e2f-844a-0511a074d71b | OS::TripleO::Server | CREATE_COMPLETE | 2017-02-08T20:24:38Z | overcloud-Compute-xc2ksahjmtfu-0-kf2r4yfbyc3x    |
+---------------+--------------------------------------+---------------------+-----------------+----------------------+--------------------------------------------------+

Comment 4 michael_rasoulian 2017-02-09 15:47:26 UTC
We are using OSP 9.0.


[osp_admin@director ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+--------------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks                 |
+--------------------------------------+-------------------------+--------+------------+-------------+--------------------------+
| 2b669962-e924-49f6-abb2-90dcc05e9a2d | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.120.140 |
| 8c6ddb1f-c9a5-4693-87e4-7d8ac587067d | overcloud-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.168.120.127 |
| 18b7d918-13aa-4b9d-af26-68f96c4472db | overcloud-cephstorage-2 | ACTIVE | -          | Running     | ctlplane=192.168.120.126 |
| 1e0c5d0e-f13d-49cf-81c7-04cefc4d1c86 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=192.168.120.129 |
| 53ac570d-cf52-45ba-9a7e-f0d14b9d6ab6 | overcloud-compute-1     | ACTIVE | -          | Running     | ctlplane=192.168.120.141 |
| 127f31bf-6891-4d34-96e6-429f2fa3297c | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.120.139 |
| ae3e5d17-0544-42fb-9c34-ae9ca6a8d913 | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.120.128 |
| 0908613d-8847-4e28-b2fd-595c3fdbac17 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.120.146 |
+--------------------------------------+-------------------------+--------+------------+-------------+--------------------------+


When I run the 2nd command as printed above, I get a "too few arguments" error.  If I include the stack name, I get:

[osp_admin@director ~]$ heat resource-list overcloud -n 50 --filter 'physical_resource_id=<nova server id 1>;physical_resource_id=<nova server id 2>;physical_resource_id=<etc>'
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
ERROR: type object 'Resource' has no attribute 'physical_resource_id'

Comment 5 Tzu-Mainn Chen 2017-02-09 15:56:08 UTC
Ah, yep, sorry about that.  Can you attach the output of just 'heat resource-list -n 50 overcloud'?

Comment 6 michael_rasoulian 2017-02-09 16:03:04 UTC
Created attachment 1248856 [details]
heat resource-list output n 50

Comment 7 michael_rasoulian 2017-02-09 16:03:33 UTC
Created attachment 1248857 [details]
heat resource-list output n 2

Comment 8 Tzu-Mainn Chen 2017-02-09 16:15:12 UTC
Okay - I'm not sure I know why the heat resource-list query fails, but I think I know what will fix it.  The 4.2 evm.log shows the fog error to involve this API call:

/v1/f626a28f15474e07af7734d60999f045/stacks/overcloud-CephStorageNodesPostDeployment-l6ur5gyetwlz-ExtraConfig-k5qw34aixb5g-ExtraDeployments-3e7a3d66tlz7/beb83f53-c7f3-4221-b3e1-caaa338f1ec7/resources

I'm guessing that the recursive resource query fails on that for some reason I don't understand.

The good news is that in 4.2 this recursive query goes 50 deep, when in fact we only need to go 2 deep to get the information we need.  The resource specified in the failing API call is present when you specify a depth of 50, but *not* when you specify a depth of 2.

Would it be possible to try and apply this one-line change to your CF 4.2 instance?

https://github.com/ManageIQ/manageiq/pull/13748

If that works, I can try and get it included in 4.2.1.

Comment 9 arkady kanevsky 2017-02-09 16:39:12 UTC
When is 4.2.1 expected to be released?

Looks like a simple fix of replacing
heat resource-list with
openstack stack resource list
should fix the problem.

How did it passed QE?
Did anybody tested CF-4.2 with OSP9 or newer?

Comment 10 Tzu-Mainn Chen 2017-02-09 16:46:54 UTC
Hi Arkady.  It's actually not a question of replacing a command; both of those CLI commands are equivalent to the single fog command we're using, and won't cause an error here.

We've definitely tested 4.2 on OSP9 and are currently developing against OSP10, and we've never seen this error.  However we can't test every overcloud deployment possibility (and although this is an infra provider refresh issue, the error is happening when the infra provider is trying to analyze its deployed overcloud) and it looks like this is one that caused an issue.

The good news is I'm pretty sure the one-line fix will resolve the issue, but it would be great if we could get some confirmation.  4.2.1 GA is targeted for February 22nd.

Comment 11 michael_rasoulian 2017-02-09 16:57:18 UTC
I'll test the fix shortly and report back with the refresh results.

Comment 12 michael_rasoulian 2017-02-09 17:24:06 UTC
I'm still getting the same refresh error with the one-line fix implemented.

Here's where I made the change:

[root@localhost infra_manager]# pwd
/var/www/miq/vmdb/app/models/manageiq/providers/openstack/infra_manager
[root@localhost infra_manager]# grep orchestration_service.list_resources refresh_parser.rb
        @orchestration_service.list_resources(:stack => stack, :nested_depth => 2, :physical_resource_id => server_ids).body['resources']

Comment 13 Tzu-Mainn Chen 2017-02-09 17:37:58 UTC
Strange - can you attach evm.log and fog.log?

Comment 14 michael_rasoulian 2017-02-09 17:59:31 UTC
Created attachment 1248871 [details]
evm.log after one-line change

Comment 15 michael_rasoulian 2017-02-09 18:00:13 UTC
Created attachment 1248872 [details]
fog.log after one-line change

Comment 16 michael_rasoulian 2017-02-09 18:01:42 UTC
Created attachment 1248873 [details]
"rpm -qa | grep openstack" output

Detail on our installed packages

Comment 17 Tzu-Mainn Chen 2017-02-09 18:02:20 UTC
Actually, nevermind - I figured it out.  The Heat filtering option is a fairly recent feature; it was added in Mitaka, which corresponds to OSP9, but perhaps it actually slipped to one release later.

To confirm: the line you changed before, can you update to the following?

        @orchestration_service.list_resources(:stack => stack, :nested_depth => 2).body['resources']

Sorry for the confusion.

Comment 18 michael_rasoulian 2017-02-09 19:24:40 UTC
That worked!  It seems I had to restart the appliance however for it to take.  The refresh was successful after reboot.

Is the filtering referenced anywhere else that would potentially cause another issue?

Comment 19 Tzu-Mainn Chen 2017-02-09 19:31:33 UTC
Nope, the filtering only happens in this one place in the infra provider.  I've already tested this fix against OSP10 successfully as well, so I'll create a PR and get it into 4.2.1.  Thanks for your patience!

Comment 21 Tzu-Mainn Chen 2017-02-13 15:56:04 UTC
*** Bug 1421421 has been marked as a duplicate of this bug. ***

Comment 22 Ronnie Rasouli 2017-03-07 11:54:15 UTC
5.8.0.3, verified and RHOS9


Note You need to log in before you can comment on or make changes to this bug.