Bug 1808888

Summary: Ironic bare-metal discovery not working with idrac driver type (failed to get power state - WSMan request failed)
Product: Red Hat OpenStack Reporter: Nenad Peric <nperic>
Component: openstack-ironicAssignee: RHOS Maint <rhos-maint>
Status: CLOSED CANTFIX QA Contact: Alistair Tonner <atonner>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16.0 (Train)CC: bfournie, christopher_dearborn, dtantsur, ietingof, mburns
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-02 13:32:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nenad Peric 2020-03-01 21:26:13 UTC
Description of problem:

When enrolling overcloud baremetal nodes, the process fails when iDRAC pm_driver is being used. 
 

Version-Release number of selected component (if applicable):

python3-ironic-inspector-client-3.7.0-0.20190923163033.d95a4cd.el8ost.noarch
puppet-ironic-15.4.1-0.20191022165413.8fe6978.el8ost.noarch


How reproducible:

Everytime


Steps to Reproduce:

Set pm_type: idrac

Try to add nodes with:

openstack overcloud node import ~/nodes.json



Actual results:

[{'result': 'Node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1. Error: DRAC operation failed. Reason: WSMan request failed'}, {'result': 'Node d6440e61-4a61-430a-ae4c-644dff6b0fef did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d6440e61-4a61-430a-ae4c-644dff6b0fef. Error: DRAC operation failed. Reason: WSMan request failed'}, {'result': 'Node 6f9fcedd-0711-494f-9017-db60f5e7147e did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 6f9fcedd-0711-494f-9017-db60f5e7147e. Error: DRAC operation failed. Reason: WSMan request failed'}]
{'result': 'Failure caused by error in tasks: send_message\n\n  send_message [task_ex_id=1c6b2cc6-cf5e-40f1-954f-e1ef30377103] -> Workflow failed due to message status. Status:FAILED Message:({\'result\': \'Node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1. Error: DRAC operation failed. Reason: WSMan request failed\'}, {\'result\': \'Node d6440e61-4a61-430a-ae4c-644dff6b0fef did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d6440e61-4a61-430a-ae4c-644dff6b0fef. Error: DRAC operation failed. Reason: WSMan request failed\'}, {\'result\': \'Node 6f9fcedd-0711-494f-9017-db60f5e7147e did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 6f9fcedd-0711-494f-9017-db60f5e7147e. Error: DRAC operation failed. Reason: WSMan request failed\'})\n    [wf_ex_id=6b8e31a9-6e86-4184-be63-57a1915dc63d, idx=0]: Workflow failed due to message status. Status:FAILED Message:({\'result\': \'Node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1. Error: DRAC operation failed. Reason: WSMan request failed\'}, {\'result\': \'Node d6440e61-4a61-430a-ae4c-644dff6b0fef did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d6440e61-4a61-430a-ae4c-644dff6b0fef. Error: DRAC operation failed. Reason: WSMan request failed\'}, {\'result\': \'Node 6f9fcedd-0711-494f-9017-db60f5e7147e did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 6f9fcedd-0711-494f-9017-db60f5e7147e. Error: DRAC operation failed. Reason: WSMan request failed\'})\n', 'status': 'FAILED', 'message': [{'result': 'Node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1 did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 2870b3e3-d654-4aac-8b1c-8f3d8120dbc1. Error: DRAC operation failed. Reason: WSMan request failed'}, {'result': 'Node d6440e61-4a61-430a-ae4c-644dff6b0fef did not reach state "manageable", the state is "enroll", error: Failed to get power state for node d6440e61-4a61-430a-ae4c-644dff6b0fef. Error: DRAC operation failed. Reason: WSMan request failed'}, {'result': 'Node 6f9fcedd-0711-494f-9017-db60f5e7147e did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 6f9fcedd-0711-494f-9017-db60f5e7147e. Error: DRAC operation failed. Reason: WSMan request failed'}]}


Main part being:

error: Failed to get power state for node 


When pm_type is set to IPMI however (pm_type: ipmi) it works:

3 node(s) successfully moved to the "manageable" state.
Successfully registered node UUID a7ab6673-e2d1-4f75-8d04-19210817e411
Successfully registered node UUID 84ecf90c-1b09-44b0-9459-8ae6ee3ddb58
Successfully registered node UUID bd4d53f6-ad0b-48f2-bfcf-f8ee136509be


The hardware in both cases is the same: blades in a Dell m1000e chassis.

Expected results:

Introspection should work with iDRAC if that is still a supported pm_type.


Additional info:

If idrac is no longer supported for some reason, it should be removed from the sample undercloud.conf.
A warning should be added to the documentation regarding this problem in this case as well.

Comment 1 Bob Fournier 2020-03-02 01:52:16 UTC
Please provide an sosreport when the problem occurs.  iDrac is currently supported.

Comment 2 Nenad Peric 2020-03-02 10:09:08 UTC
Since this is easily reproducible I can do it right now.

But before I do so, are there any other things you wish to collect (apart from sosreport)?

Would it be beneficial if I left the systems in a broken state for a day or two if someone wants to log in and check it out?
I cannot leave them like this for much longer that that unfortunately, since we need the environment deployed...

Thanks!

Comment 4 Dmitry Tantsur 2020-03-02 13:32:48 UTC
ironic-conductor.log.1:5331:2020-03-01 16:14:51.963 8 ERROR dracclient.wsman [req-481413fd-cb58-4c76-9223-c089032e586d 634ba25f2ea14d0bb00b43b517bb3740 a5ec179025db483ca19a2847dfeb6a9a - default default] A SSLError error occurred while  communicating with 10.19.136.1, attempt 3 of 3: requests.exceptions.SSLError: HTTPSConnectionPool(host='10.19.136.1', port=443): Max retries exceeded with url: /wsman (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:897)'),))


Note the "dh key too small". We've seen it with another vendor: apparently RHEL 8 no longer accepts weak certificates that have previously been accepted. There is nothing we can do about it. Could you try updating/regenerating the TLS certificate on the server side?

You may be able to set drac_protocol in the node's driver_info to "http" to use insecure connection, but it depends on whether the server will accept it (probably won't).

As the last resort, switch to IPMI. If you don't need any advanced features, it should work for you just fine.

Comment 5 Nenad Peric 2020-03-02 14:04:59 UTC
Hmm that is a certificate on the Dell chassis, or blades rather. 
Those certs might be older ones, that means that I would have to find a way to change/update the certificate on the blade(s). 

Since I only need this for simple OC deployment, I will switch to the IPMI, no exotic features needed in my case. 

Thanks for debugging!