Created attachment 1168828 [details] messages from one of the computes Description of problem: After implementing https://review.openstack.org/327807 to pass the introspection I moved to the overcloud deployment. 1. During the overcloud deployment ( which eventually fails ) I notice the following on the nodes ( which boots from same modified images as introspection ) : After several minutes ( ~330 seconds) that the OS is up, there **seems** to be a disconnection toward the iscsi and operation system is unable to reach the 'disks'. I'm not sure if that's the last of my problems in the deployment, but we sure need to figure our this out. I have tried to use newer be2iscsi emulex driver ( 10.7.110.34 ) and seeing same results. As mentioned, the above occurs using the overcloud deployment using the modified ODPd imaged. I have installed RH7.2 on same hardware , and I do not see the same problem there. 2. I dont think it's related ( maybe I should issue a different ticket on this one ), I notice the below in /var/log/messages Jun 16 15:14:54 localhost ironic-python-agent: 2016-06-16 15:14:54.655 2797 INFO root [-] Hardware manager found: ironic_python_agent.hardware:GenericHardwareManager Jun 16 15:14:54 localhost ironic-python-agent: 2016-06-16 15:14:54.655 2797 INFO ironic_python_agent.inspector [-] Inspection is disabled, skipping Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 CRITICAL ironic-python-agent [-] AttributeError: 'module' object has no attribute 'BackOffLoopingCall' Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent Traceback (most recent call last): Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent File "/usr/bin/ironic-python-agent", line 10, in <module> Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent sys.exit(run()) Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent File "/usr/lib/python2.7/site-packages/ironic_python_agent/cmd/agent.py", line 47, in run Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent CONF.hardware_initialization_delay).run() Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent File "/usr/lib/python2.7/site-packages/ironic_python_agent/agent.py", line 311, in run Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent node_uuid=uuid) Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent File "/usr/lib/python2.7/site-packages/ironic_python_agent/ironic_api_client.py", line 84, in lookup_node Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent timer = loopingcall.BackOffLoopingCall( Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent AttributeError: 'module' object has no attribute 'BackOffLoopingCall' Jun 16 15:14:55 localhost ironic-python-agent: 2016-06-16 15:14:55.092 2797 ERROR ironic-python-agent Jun 16 15:14:55 localhost systemd: openstack-ironic-python-agent.service: main process exited, code=exited, status=1/FAILURE Jun 16 15:14:55 localhost systemd: Unit openstack-ironic-python-agent.service entered failed state. Jun 16 15:14:55 localhost systemd: openstack-ironic-python-agent.service failed. Version-Release number of selected component (if applicable): Redhat OSP8 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: This bug is related to 1283436 but not same.
Hi, Two things I made have improve the status - - I notice that running iscsistart –b several times halts the system, so, we need to added code that make sure it will executed only once. - I have changed the template to RH latest templates ( it resolved 'callback exception' ) Now this is where I am and need assistance - When running the overcloud deployment I notice the following behavioural - Two PXE boots - At first boot, it loads deploy_kernel and deploy_ramdisk , note that deploy_ramdisk is my modified image. Everything looks fine , then it reboots and In second boot , it loads – 'kernel' and 'ramdisk' ( See below. ) What I am seeing on seconds boot is that it dropped to dracut and unable to proceed. I don't know how it is possible to modify the 'ramdisk' file ( I was unable to extract it same way I as with deploy_ramdisk) [root@undercloud httpboot]# cd 485e57a4-eeb5-49c1-a379-ed7d4e92afe7/ [root@undercloud 485e57a4-eeb5-49c1-a379-ed7d4e92afe7]# ll total 432632 -rw-r--r--. 1 ironic ironic 1049 Jun 21 18:48 config -rw-r--r--. 5 ironic ironic 5153536 Jun 21 18:08 deploy_kernel -rw-r--r--. 5 ironic ironic 392371696 Jun 21 18:08 deploy_ramdisk -rw-r--r--. 5 ironic ironic 5153408 Jun 15 17:32 kernel -rw-r--r--. 5 ironic ironic 40324447 Jun 15 17:32 ramdisk Screen shots attached shows the two different boots. Please advise ! Thanks.
Created attachment 1171052 [details] Overcloud first compute boot
Created attachment 1171053 [details] Overcloud second compute boot.
Created attachment 1171054 [details] Before dropping to dracut Note that I dont see be2iscsi driver loads.
So the question to ask is this: Where does this other image come from and can it be accessed to so we can modify like you did the first image. Can someone in eng please advise?
I have a way ( I got frank ) of editing the overcloud images. The procedure is to edit overcloud qcow , which then being build up by the `dib` ( disk image builder ) (I think that) The problem i'm facing is that the output initramfs does not include be2iscsi drivers. I did try to add the module to the image, but failed since it seems that the kernel in the overcloud qcow is missing some dependencies.
Hi Yossi. Can you please attach error message with the dependency failures? The missing be2iscsi drivers in the overlcloud images seem to be the missing link to this entire installation failure. Would that be a proper assessment?
Hi Paul. no logs. compute fails to boot. ( on second boot. ) dropping to dracut. Your assessment is correct afaik.
Hi Yossi, I was facing a similar issue, I used the updated overcloud.qcow2 image with the modules mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1283436#c19 and generated new initramfs and kernel images for the overcloud.qcow2 image. e.g. virt-builder --get-kernel /var/lib/libvirt/images/overcloud-full.qcow2 you could also use disk image builder. I found that it is important to have iscsi and multipath modules in the initramfs of the overcloud image
Hi Amit, We reached the same conclusion, drivers need to be in the initramfs. - I'm not familiar with a way to add the drivers to initramfs without the DiB, can you elaborate how you did that with virt-builder ? - in case you missed it, we also realised that it is required to modify the ironic python agent otherwise introspection fails ( on diskless hardware. ) This was merged in master , see - https://review.openstack.org/#/c/327807/
It seems that the problem is that there is a specific proprietary driver is needed is available in a specific kernel (3.10.0-327.el7) The latest kernels do not work and and are not supported. Custom workarounds can be done, but this is not a long term supportable solution, and new features cannot be taken advantage of if the image is pinned to a specific kernel. Options: - contact with hardware provider (as Yuval said, is HP), to ask for better support and updates on that driver for current and future kernels - if that's not possible, investigate the possibility of getting the source code for that driver, so custom builds can be done per kernel update - Investigate the possibility of using alternative drivers with better support. Feedback from the partner requested.
Is there a support case logged with HPE? or Emulex? We'll need it to get them engaged. Thanks.
*** Bug 1322430 has been marked as a duplicate of this bug. ***
Lately we checked the overcloud images, and the be2iscsi native driver is present there, so they shall be using it. We passed the driver information to Yossi and Joey, and we are waiting for their feedback about if that's possible to use that driver instead of HP one.
Follow up email to comment #21 sent to confirm driver presence.
So, we'll need to download the latest overcloud images and give it a shot. I'll try that soon and will update. Thanks !
*** Bug 1283436 has been marked as a duplicate of this bug. ***
Changing the component, as it seems like the proposed fix is not within Ironic, and the Ironic problem in the original report seems resolved. Thanks!
Yossi, does this bug essentially duplicate https://bugzilla.redhat.com/show_bug.cgi?id=1276147 at this stage? It seems like both are about be2iscsi now.
Yes, its seems indeed dup.
Thanks! I'll close this one to keep our backlog clean. *** This bug has been marked as a duplicate of bug 1276147 ***