Bug 1416622 - Deployment always fails while using shared ILO port.
Summary: Deployment always fails while using shared ILO port.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: async
: ---
Assignee: Dmitry Tantsur
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-26 04:42 UTC by VIKRANT
Modified: 2020-03-11 15:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-26 16:18:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1660609 0 None None None 2017-01-31 12:31:54 UTC
Red Hat Knowledge Base (Solution) 2889341 0 None None None 2017-01-26 05:09:15 UTC

Description VIKRANT 2017-01-26 04:42:59 UTC
Description of problem:

 OSP 10 deployment on physical nodes which are having iLO configured with shared LOM port. 

Eno1 talk to the ILO network 
eno52 is the provisioning interface. 

ILO of these servers are shared physically with their eno1 port - they use the same RJ45 socket.


Version-Release number of selected component (if applicable):
RHEL OSP 10

How reproducible:
Everytime for Cu. 

Steps to Reproduce:
1.  Try the deployment using simple configuration with 3 controller and 3 compute nodes. 
2.  Deployment is getting failed. 
3.  

Actual results:
Deployment is getting failed at different stages in various attempts. 

Expected results:
It should get completed successfully. 

Additional info:

1) here is the deployment command:

~~~
openstack overcloud deploy	--templates		                                   								\
				  -e /home/stack/templates/overcloud/network-environment.yaml						\
				  -e /home/stack/templates/overcloud/firstboot.yaml							\
				  -e /home/stack/templates/overcloud/timezone.yaml							\
				  -e /home/stack/templates/overcloud/rhel-registration/environment-rhel-registration.yaml		\
				  -e /home/stack/templates/overcloud/rhel-registration/rhel-registration-resource-registry.yaml		\
				  -e /home/stack/templates/overcloud/cinder-solidfire-environment.yaml					\
				  -e /home/stack/templates/overcloud/logging-environment.yaml						\
				--stack			overcloud 									\
				--control-scale		3										\
				--compute-scale		4										\
				--ceph-storage-scale	0										\
				--ntp-server		pool.ntp.org									\
                                --neutron-network-type	vxlan										\
				--neutron-tunnel-types	vxlan										\
				--timeout		600	
~~~

2) It seems that the introspection process is more resilient to short gaps in the availability of the iLO IP address during reboots etc..

I do see errors (last_error on nodes) such as: 

~~~
Failed to change power state to 'power on'. Error: HTTPSConnectionPool(host='xx.xx.30.46', port=44
3): Max retries exceeded with url: /ribcl (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x5825a10>: Failed to establish a new connection: [Errno 113] EHOS
TUNREACH',))

or 

Failed to change power state to 'power on'. Error: iLO get_power_status failed, error: EOF occurre
d in violation of protocol (_ssl.c:579)
~~~

but it then recovers when the introspection image is up and running.

Comment 8 Dmitry Tantsur 2017-01-31 12:31:55 UTC
Thanks for your report. I'm glad you have some luck with pxe_ipmitool. We don't have big experience with the iLO drivers, so I've escalated it to proliantutils developers. I'm assigning myself to track it.


Note You need to log in before you can comment on or make changes to this bug.