Description of problem:
OSP 10 deployment on physical nodes which are having iLO configured with shared LOM port.
Eno1 talk to the ILO network
eno52 is the provisioning interface.
ILO of these servers are shared physically with their eno1 port - they use the same RJ45 socket.
Version-Release number of selected component (if applicable):
RHEL OSP 10
How reproducible:
Everytime for Cu.
Steps to Reproduce:
1. Try the deployment using simple configuration with 3 controller and 3 compute nodes.
2. Deployment is getting failed.
3.
Actual results:
Deployment is getting failed at different stages in various attempts.
Expected results:
It should get completed successfully.
Additional info:
1) here is the deployment command:
~~~
openstack overcloud deploy --templates \
-e /home/stack/templates/overcloud/network-environment.yaml \
-e /home/stack/templates/overcloud/firstboot.yaml \
-e /home/stack/templates/overcloud/timezone.yaml \
-e /home/stack/templates/overcloud/rhel-registration/environment-rhel-registration.yaml \
-e /home/stack/templates/overcloud/rhel-registration/rhel-registration-resource-registry.yaml \
-e /home/stack/templates/overcloud/cinder-solidfire-environment.yaml \
-e /home/stack/templates/overcloud/logging-environment.yaml \
--stack overcloud \
--control-scale 3 \
--compute-scale 4 \
--ceph-storage-scale 0 \
--ntp-server pool.ntp.org \
--neutron-network-type vxlan \
--neutron-tunnel-types vxlan \
--timeout 600
~~~
2) It seems that the introspection process is more resilient to short gaps in the availability of the iLO IP address during reboots etc..
I do see errors (last_error on nodes) such as:
~~~
Failed to change power state to 'power on'. Error: HTTPSConnectionPool(host='xx.xx.30.46', port=44
3): Max retries exceeded with url: /ribcl (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x5825a10>: Failed to establish a new connection: [Errno 113] EHOS
TUNREACH',))
or
Failed to change power state to 'power on'. Error: iLO get_power_status failed, error: EOF occurre
d in violation of protocol (_ssl.c:579)
~~~
but it then recovers when the introspection image is up and running.
Thanks for your report. I'm glad you have some luck with pxe_ipmitool. We don't have big experience with the iLO drivers, so I've escalated it to proliantutils developers. I'm assigning myself to track it.