Description of problem: HP Proliant Gen 9 servers with pre-configured RAID 1 won't deploy with RHEL7.5 images (which is the only option for RHOSP-13 at this moment AFAIK). We're tracking the issue from kernel POV in BZ 1608955. However, since Openstack relies on Ironic for the deployment, we'd like to raise awareness and get feedback if the issue can be worked on RHOSP level by any means, potentially working around any kernel constraints and/or in collaboration with the latest RHEL kernel changes. Initial feedback on the issue: ~~~ When deploying it together with the raid controller (with UEFI) the introspection completes successfully, but when looking at the introspection data I can't really see that the director recognizes any RAID that I have configured on the server (I have configured the RAID in the UEFI setup). When it comes to the deployment itself, my server fails to finish the whole iPXE process and keeps it on loop (restarts the server and tries to iPXE again); I think that it doesn't work because there is a conflict between the configuration on the server (the raid), and the introspection data (two separate disks). I have set the boot_mode capability on the server to uefi. ~~~ Follow-up: ~~~ *) Introspection does not work in UEFI mode (not with clean 'stock' init ramdisk, not with mod'ed ramdisk w/ dirvers); with iPXE. *) Introspection works in Legacy mode. >> However, in Legacy mode, introspection finds two disks when it should see just one (i.e. 2 disks in RAID 1). ~~~ Additional info: There's been another test in RHOS-10, with RHEL 7.3, and the result is the same (introspection fails in a RAID1 setup). Hence, here's the list of tested kernels: *) 7.5 kernel 3.10.0-862 *) 7.3 kernel 3.10.0-514 Tests with PXE (vs iPXE) in UEFI mode were to be done.
Update: we're past introspection now. After they have re-installed the Director node, the 'alloc highmem for initrd' failure is gone: ~~~~ 1. We used Lenny's kernel + original initramfs = Introspection fails (no highmem error). 2. We used original kernel + original initramfs = Introspection fails (no highmem error). 3. We used original kernel + edited initramfs = Introspection successful (no high highmem error), but the out put of 'openstack baremetal introspection data save 1a4e30da-b6dc-499d-ba87-0bd8a3819bc0 | jq ".inventory.disks"' produces 2 disks (/dev/sda and /dev/sdb). Is this a normal situation? or does it need to produce only one disk (/dev/sda)? because the raid controller is enable. ~~~~ Now we're trying to have them configure properly the iLO driver, so that it recognizes the SW RAID: http://specs.openstack.org/openstack/ironic-specs/specs/approved/ironic-generic-raid-interface.html https://docs.openstack.org/ironic/latest/admin/raid.html#raid https://docs.openstack.org/ironic/latest/admin/drivers/ilo.html Any straight-forward must-dos would be extremely helpful as the information is spread all over the place (see the three docs above). As I understand they must push the ilo driver, if they want to use SW RAID. Is that correct? From BZ 1494361, we see that SW RAID was *not* an option with RHOSP-10. Dp we have a generic way to push SW RAID in RHOSP-13; or, again they *must* set the ilo driver as per the upstream documentation quoted above?
Hi! The ironic driver has no effect on how in-band introspection recognizes disks. The documentation you're linking to is about creating HW RAID. Ironic introspection merely uses the output of lsblk, so I guess the question is why lsblk does not recognize the RAID.
(In reply to Dmitry Tantsur from comment #2) Hey! :) Is there any form of logging we can enable in order to see what exactly is going on there? By the way, do we support SW RAID to begin with?
Hi all, Software RAID is not currently supported by Ironic in any of the OSP releases. Hardware RAID will work if the kernel supports it and lsblk returns the RAID disks as normal disks. However, hardware RAID is not officially supported by OSP and requires a support exception. I'm not sure what exactly is happening though. Is it just software RAID, hardware RAID or hybrid hardware-assisted RAID? I see both references to "software RAID" and to "RAID controller". Ironic only works with RAID that is anyhow abstracted away by the operating systems. So if the operating system sees and reports two disks, Ironic will report and use two disks. Could you please clarify the exact RAID type used and your expectations of Ironic?
Hi Dmitry, Let's clarify the RAID terminology. Hardware RAID The hardware-based array manages the RAID subsystem independently from the host. It **presents a single disk** per RAID array to the host. Software RAID Software RAID implements the various RAID levels in the **kernel disk (block device) code**. It offers the cheapest possible solution, as expensive disk controller cards or hot-swap chassis are not required. In between these two, we have a thing called Firmware RAID, or also colloquially known as 'fake' RAID. 'Fake' because it's a RAID controller that cannot provide a real Hardware RAID but instead just tells the OS to configure a Software RAID. Firmware RAID Firmware RAID, also known as ATARAID, is a **type of software RAID** where the RAID sets can be configured using a firmware-based menu. ## source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-raid Aviv (TAM) dropped down the information that the customer has this card: https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04406959 From the link above: the card is a [Key features] Smart Array **RAID engine running in OS driver**. Additionally, judging from the core problem, we can say that their HP RAID controller (card) can do (a variation of) a Software RAID. So, the customer has a RAID controller that **canNOT provide Hardware RAID**. Using these definitions, I would say that we do support real HW RAID in OSP, simply because on OS level you see it as a normal disk (just one device!). The controller hides everything else and from the OS you really cannot differentiate between a regular disk and the device provided to you by the controller. I would argue that *if* it does work, they might configure a SW RAID (with the links I provided above and/or in combination with the ansible driver you pushed on the prio thread internally [1]). Should that work, they would of course need a SE (Support Exception). If we all agree on this, I think we can just close this Bug. [1] http://tripleo.org/install/advanced_deployment/ansible_deploy_interface.html
> In between these two, we have a thing called Firmware RAID, or also colloquially known as 'fake' RAID. 'Fake' because it's a RAID controller that cannot provide a real Hardware RAID but instead just tells the OS to configure a Software RAID. Right, this is what I meant by "hybrid", forgot the right word. > I would argue that *if* it does work, they might configure a SW RAID (with the links I provided above and/or in combination with the ansible driver you pushed on the prio thread internally [1]). Should that work, they would of course need a SE (Support Exception). So, software RAID of any kind (purely software or firmware) will not work automatically, because lsblk will still report two disks. Furthermore, on deployment Ironic wipes the target block device, so any traces of software RAID will be gone. Indeed, using a heavily customized deployment process with the ansible deploy interface may help here. Please sync with Ramon on whether he may allow a support exception in this case (I'm +1 from dev standpoint).
Hi! Can someone please provide the current status on two separate issues: 1. Introspection returns 2 disks instead of one. I understand it's annoying, but is it critical for you? I'm afraid it may be to difficult technically to make introspection consider software/firmware RAID. 2. Deployment fails on the iPXE stage. Note that this cannot be caused by wrong introspection data, there must be something else. Could you make sure it does boot in the UEFI mode by looking at its screen during boot? Is it possible to screencast the console during the boot?
Hi Dmitry. I thought #2 was clear from: (Irina Petrova from comment #1) > Update: we're past introspection now. > > After they have re-installed the Director node, the 'alloc highmem for > initrd' failure is gone: > > ~~~~ > 1. We used Lenny's kernel + original initramfs = Introspection fails (no > highmem error). > 2. We used original kernel + original initramfs = Introspection fails (no > highmem error). > 3. We used original kernel + edited initramfs = Introspection successful (no > high highmem error), but the out put of 'openstack baremetal introspection > data save 1a4e30da-b6dc-499d-ba87-0bd8a3819bc0 | jq ".inventory.disks"' > produces 2 disks (/dev/sda and /dev/sdb). Is this a normal situation? or > does it need to produce only one disk (/dev/sda)? because the raid > controller is enable. > ~~~~ That is: Problem: Deployment fails on the iPXE stage. Resolution: re-install Undercloud and re-try. Introspection succeeds. RCA: Unknown. As for #1, 'Introspection returns 2 disks instead of one.', I haven't heard a word from them since I told them we don't actually support that out-of-the-box. However, I just noticed a new rhos-prio thread about this. I'll update you if I get any new info.
I think this is no longer an issue? Can we close this or is do we still want to pursue it?
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days