1416622 – Deployment always fails while using shared ILO port.

Bug 1416622 - Deployment always fails while using shared ILO port.

Summary: Deployment always fails while using shared ILO port.

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-ironic
Sub Component:
Version:	10.0 (Newton)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	---
Assignee:	Dmitry Tantsur
QA Contact:	Raviv Bar-Tal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-26 04:42 UTC by VIKRANT
Modified:	2020-03-11 15:39 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-26 16:18:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1660609	0	None	None	None	2017-01-31 12:31:54 UTC
Red Hat Knowledge Base (Solution)	2889341	0	None	None	None	2017-01-26 05:09:15 UTC

Description VIKRANT 2017-01-26 04:42:59 UTC

Description of problem:

 OSP 10 deployment on physical nodes which are having iLO configured with shared LOM port. 

Eno1 talk to the ILO network 
eno52 is the provisioning interface. 

ILO of these servers are shared physically with their eno1 port - they use the same RJ45 socket.


Version-Release number of selected component (if applicable):
RHEL OSP 10

How reproducible:
Everytime for Cu. 

Steps to Reproduce:
1.  Try the deployment using simple configuration with 3 controller and 3 compute nodes. 
2.  Deployment is getting failed. 
3.  

Actual results:
Deployment is getting failed at different stages in various attempts. 

Expected results:
It should get completed successfully. 

Additional info:

1) here is the deployment command:

~~~
openstack overcloud deploy	--templates		                                   								\
				  -e /home/stack/templates/overcloud/network-environment.yaml						\
				  -e /home/stack/templates/overcloud/firstboot.yaml							\
				  -e /home/stack/templates/overcloud/timezone.yaml							\
				  -e /home/stack/templates/overcloud/rhel-registration/environment-rhel-registration.yaml		\
				  -e /home/stack/templates/overcloud/rhel-registration/rhel-registration-resource-registry.yaml		\
				  -e /home/stack/templates/overcloud/cinder-solidfire-environment.yaml					\
				  -e /home/stack/templates/overcloud/logging-environment.yaml						\
				--stack			overcloud 									\
				--control-scale		3										\
				--compute-scale		4										\
				--ceph-storage-scale	0										\
				--ntp-server		pool.ntp.org									\
                                --neutron-network-type	vxlan										\
				--neutron-tunnel-types	vxlan										\
				--timeout		600	
~~~

2) It seems that the introspection process is more resilient to short gaps in the availability of the iLO IP address during reboots etc..

I do see errors (last_error on nodes) such as: 

~~~
Failed to change power state to 'power on'. Error: HTTPSConnectionPool(host='xx.xx.30.46', port=44
3): Max retries exceeded with url: /ribcl (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x5825a10>: Failed to establish a new connection: [Errno 113] EHOS
TUNREACH',))

or 

Failed to change power state to 'power on'. Error: iLO get_power_status failed, error: EOF occurre
d in violation of protocol (_ssl.c:579)
~~~

but it then recovers when the introspection image is up and running.

Comment 8 Dmitry Tantsur 2017-01-31 12:31:55 UTC

Thanks for your report. I'm glad you have some luck with pxe_ipmitool. We don't have big experience with the iLO drivers, so I've escalated it to proliantutils developers. I'm assigning myself to track it.

Note You need to log in before you can comment on or make changes to this bug.