Bug 1834298 - Document how to provide drivers on oVirt Node where DUD feature is not supported by Anaconda
Summary: Document how to provide drivers on oVirt Node where DUD feature is not suppor...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-node
Classification: oVirt
Component: Documentation
Version: master
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.7-2
: ---
Assignee: Steve Goodman
QA Contact: shiyi lei
URL:
Whiteboard:
: 1885932 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-11 13:36 UTC by Marco Marino
Modified: 2021-08-08 11:57 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-08 11:57:45 UTC
oVirt Team: Node
Embargoed:
sgoodman: needinfo-
pm-rhel: ovirt-4.4+
pm-rhel: testing_ack+


Attachments (Terms of Use)
Dracut Timeout Error (393.78 KB, image/jpeg)
2020-05-11 13:36 UTC, Marco Marino
no flags Details
More details about dracut error (97.22 KB, image/jpeg)
2020-05-12 07:11 UTC, Marco Marino
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5799761 0 None None None 2021-02-14 07:53:36 UTC

Description Marco Marino 2020-05-11 13:36:40 UTC
Created attachment 1687324 [details]
Dracut Timeout Error

Description of problem:
I'm trying to install ovirt host using iso image (2020050620.el8.iso) on a Dell R710 server with raid controller:
03:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 1078 (rev 04)
I had problems because this controller is now deprecated on rhel8 and I solved using DUD. I prepared a separate usb (xfs) disk with megaraid_sas driver (http://elrepo.reloumirrors.net/dud/el8/x86_64/dd-megaraid_sas-07.707.51.00-1.el8_1.elrepo.iso ).
After loading megaraid module, the ovirt installer found 1.82 TB disk (and this is correct because I have 1 RAID1 volume of the same size with 2 disks). Then I tried to install but I had two kind of errors:

[First Installation process]
Process installation completed without problems, but when I restarted the server I had errors related to dracut (initqueue timeout and sdracut console after a while). Please refer to the attachment for details 

[Second installation process]
I tried several times to reinstall ovirt host using iso file but I had another type of error basically an lvmthin error (please see details below in "Actual results" section). It seems that if another disk layout is already present, this error happens.

As a side note I can say that installation of CentOS8 works without problems (also with lvm thin provisioning), after megaraid_sas driver was installed using DUD.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install ovirt host using an usb drive with iso image (https://resources.ovirt.org/pub/ovirt-4.4-pre/iso/ovirt-node-ng-installer/4.4.0-2020050620/el8/ovirt-node-ng-installer-4.4.0-2020050620.el8.iso) on a server with a previous installed operating system. 
2. Delete old partition scheme and use the ovirt suggested new scheme using anaconda gui
3. Launch the installation process

Actual results:
ERROR: LVM Thin Provisioning partitioning scheme is required. For autoinstall via kickstart with LVM Thin Provisioning check options --thinpool and --grow. Please consult documentation for details 



Expected results:
Installation process completed without problems


Additional info:

Comment 1 Sandro Bonazzola 2020-05-11 13:57:27 UTC
Can you please attach logs?

Comment 2 Marco Marino 2020-05-11 14:05:28 UTC
(In reply to Sandro Bonazzola from comment #1)
> Can you please attach logs?

what kind of log? During installation on a new disk (without existing partitioning scheme) errors appear only during reboot as I told above. You can refer to the attachment to see error messages

Comment 3 Marco Marino 2020-05-12 07:11:40 UTC
Created attachment 1687550 [details]
More details about dracut error

Comment 4 Marco Marino 2020-05-12 07:15:33 UTC
Update: I solved one parte of the problem. When entering in the storage menu during the installation process, I have to select "LVM Thin Provisioning" instead of "LVM" and then I can use the automatically generated disk layout. Doing this the installation process ends without problems but when I reboot I have the error "initqueue timeout and sdracut console after a while". I added another image in order to show more details about the error

Comment 5 i.klimov 2020-09-09 15:17:24 UTC
Good day!
Have you found a solution to this problem today?

Faced a similar issue on an IBM flex system x220 with an LSI SAS2004 controller. Googling and trying to fix it so far have not resolved the boot problem after installing oVirt Node using DUD during installation. After rebooting, it does not see RAID and starts emergency mode, reporting that the LVM layer does not exist.

Comment 6 cshao 2020-09-10 01:58:45 UTC
Hi Shiyi, 
This bug occurred after installing oVirt Node using DUD during installation, could you help to reproduce it?

Comment 7 i.klimov 2020-09-10 09:45:00 UTC
(In reply to cshao from comment #6)
> Hi Shiyi, 
> This bug occurred after installing oVirt Node using DUD during installation,
> could you help to reproduce it?

Hello!
I solved the problem as follows:
1) Installed oVirt Node with DUD loading
2) After the installation is complete, do not reboot the system, but go to the Alt + F3 console
3) chroot / mnt / sysimage
4) Install oVirt Node on the virtual machine in parallel (Virtual Box)
5) Install the required DUD driver from iso https://mirror.yandex.ru/elrepo/dud/el8/x86_64/ for your controller using rpm
6) Add the necessary module dracut --force --add-drivers mpt3sas to the kernel
7) Copy to USB or over the network to a server with oVirt 4.4.x
8) Change the kernel file in / boot / and in / boot / ovirt *

After these instances on the IBM Flex System x220 server with the LSI SAS 2004 controller, the oVirt Node 4.4.1 started

Comment 8 Sandro Bonazzola 2020-09-23 09:39:56 UTC
closing as can't fix but workaround in comment #7 may help.

Comment 9 Stefano Stagnaro 2020-10-05 14:01:22 UTC
(In reply to Sandro Bonazzola from comment #8)
> closing as can't fix but workaround in comment #7 may help.

Hi Sandro. Since it's not a remote scenario and it happened twice in consulting recently, I think it worth having a solution in the KB for this.

Thank you.

Comment 10 Sandro Bonazzola 2020-11-17 16:17:16 UTC
Steve let's document this flow.

1) Installed oVirt Node with DUD loading
2) After the installation is complete, do not reboot the system, but go to the Alt + F3 console
3) chroot /mnt/sysimage
4) Install oVirt Node on a virtual machine in parallel (virt-manager/cockpit on different host)
5) Install the required DUD driver from hardware vendor
6) Add the necessary module dracut --force --add-drivers <your hardware drivers>
7) Copy to USB or over the network to the host being installed
8) Change the kernel file in /boot/ and in /boot/ovirt*

Let's cross check this works.

Comment 11 RHEL Program Management 2020-11-17 16:17:25 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 12 Sandro Bonazzola 2020-11-17 16:19:30 UTC
*** Bug 1885932 has been marked as a duplicate of this bug. ***

Comment 13 Steve Goodman 2020-12-03 09:00:51 UTC
It appears to me that this should be a KB article, not part of the regular documentation.

Comment 15 Sandro Bonazzola 2020-12-09 12:46:58 UTC
I would consider having a section in install documentation. We don't really maintain a KB section on oVirt website.

Comment 21 Steve Goodman 2021-04-21 09:58:40 UTC
DUD = Driver Update Disk

Comment 25 Steve Goodman 2021-05-27 12:30:22 UTC
Didi, can you help? See comment 24.

Comment 26 Yedidyah Bar David 2021-06-02 05:35:37 UTC
I tried using the latest ELRepo megaraid_sas driver with the latest ovirt-node image, and it does not work - ELRepo's driver was compiled against CentOS Linux 8.3's kernel, and node is now using Stream, which is too new for it.

I'll try again with previous node and update. If it works, should be good enough for RHEL/RHV users and for ovirt-node users that can remain on 4.4.5. Keeping needinfo for now.

For 4.4.6 and going forward, the procedure for oVirt will likely remain mostly the same as what we do for this bug, but it will require some more time/work for ELRepo (and other projects/people maintaining drivers not shipped in Stream's kernel) and Stream people to come up with some solution. See also the still on-going thread on centos-devel starting at [1] (but note that the archive is split by month and [1] is from 5 months ago) "[CentOS-devel] RFC: Stream Kernel SIG Proposal"

[1] https://lists.centos.org/pipermail/centos-devel/2021-January/076208.html

Comment 27 Yedidyah Bar David 2021-06-03 08:56:03 UTC
Based on the above, now tried to reproduce/verify, and the worked for me.

I only tested this on a VM, which didn't have any (virtual) hardware for which I didn't already have drivers, so this isn't a complete verification.

At the time or writing, elrepo does not support CentOS Stream - there are on-going discussions about this - so I used the latest node image that was based on CentOS Linux 8.3 [1].

For testing, even though I didn't have hardware for it, I used 3w-9xxx from elrepo [2]. elrepo already has rpms for 8.4, which might work (for now) with Stream, but I didn't find there a DUD iso image with the rpm.

At first I tried also with megaraid_sas as was asked/mentioned in this bug, but eventually decided to give up, as it's IMO even less relevant than 3w-9xxx, on a VM - because the vanilla kernel already has a megaraid_sas driver, and the reason that people used elrepo's version was due to support for specific hardware that's disabled in the vanilla kernel.

I managed to do everything during the installation, didn't need a separate VM for creating the new initrd image.

In the following procedure, I mark [DEBUG] steps that are optional, mainly useful for understanding/debugging/verification.

1. I created a VM (with virt-manager) with a disk and two CDs - first with [1], second with [2]. With real hardware, you'll probably have to use two USB sticks or something like that.

2. I booted the VM from the first CD, then moved, in the boot menu, to the line for installing node

3. To make it easier to do stuff on this VM, especially copy/paste, I wanted  to be able to ssh to it during installation. So pressed TAB, to get to the kernel command line, and added to its end ' inst.sshd' (note the space).

4. Then press Enter to start the installation.

5. Go through the installation GUI and configure stuff as applicable. I wanted to ssh to it, so enabled networking (network interface was Off by default). Then continue and install, reach the final screen, but do not reboot yet.

6. Get a shell on the machine. Either by ssh to it, or by moving to a text console - e.g. <Ctrl><Alt><F2>. There:

7. [DEBUG] Run:
# lsmod | grep 3w
3w_9xxx                49152  0

This proves that the DUD "worked" - the rpm was installed, and driver loaded to the kernel.

8. [DEBUG] Run:

# find / -ls | grep -i 3w.9 | less

This finds, for me, also:

/mnt/sysimage/run/install/DD-1/kmod-3w-9xxx-2.26.02.014-5.el8_3.elrepo.x86_64.rpm
/mnt/sysimage/run/install/DD-2/kmod-3w-9xxx-2.26.02.014-5.el8_3.elrepo.x86_64.rpm

which I suppose are left-overs from the code handling the DUD. To be on the safe side,
I will not use them, but manually mount the ISO

9. Run:

# mkdir /mnt/dud
# mount -r /dev/sr1 /mnt/dud

You might need to replace /dev/sr1 with the device you used for the DUD.

10. Copy the rpm inside the DUD to the target machine's disk, e.g.:

# cp /mnt/dud/rpms/x86_64/kmod-3w-9xxx-2.26.02.014-5.el8_3.elrepo.x86_64.rpm /mnt/sysroot/root/

11. Run:

# chroot /mnt/sysroot

The next steps are inside the chroot - inside the target machine's disk.
Please note that this disk is "managed" by the logic of ovirt-node, so most of your changes will be lost eventually. However, we do make sure to update the initrd image and keep it where node expects.

12. [DEBUG] Run:

# ls -ltr /boot/initramfs-*
-rw-------. 1 root root 88717417 Jun  2 14:29 /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img

# lsinitrd /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img | grep 3w
-rw-r--r--   1 root     root         4092 Feb 22 15:57 usr/lib/modules/4.18.0-240.15.1.el8_3.x86_64/kernel/arch/x86/crypto/twofish-x86_64-3way.ko.xz

So - the current image does not include 3w-9xxx.

# ls -ltrd /boot/ovirt-node*
drwxr-xr-x. 2 root root 240 Jun  3 07:26 /boot/ovirt-node-ng-4.4.5.1-0.20210323.0+1

# ls -l /boot/ovirt-node-ng-4.4.5.1-0.20210323.0+1
total 100036
-rw-r--r--. 1 root root   189466 Mar  1 17:24 config-4.18.0-240.15.1.el8_3.x86_64
-rw-------. 1 root root 88717417 Jun  3 07:26 initramfs-4.18.0-240.15.1.el8_3.x86_64.img
-rw-------. 1 root root  4034607 Mar  1 17:24 System.map-4.18.0-240.15.1.el8_3.x86_64
-rwxr-xr-x. 1 root root  9485448 Mar  1 17:24 vmlinuz-4.18.0-240.15.1.el8_3.x86_64

This is node's copy of the kernel/initrd/config/symbols.

13. [DEBUG] Backup the current initrd images, e.g.:

# cp -p /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img.bck1
# cp -p /boot/ovirt-node-ng-4.4.5.1-0.20210323.0+1/initramfs-4.18.0-240.15.1.el8_3.x86_64.img /boot/ovirt-node-ng-4.4.5.1-0.20210323.0+1/initramfs-4.18.0-240.15.1.el8_3.x86_64.img.bck1

14. Install the drivers rpm from the copy we made in step 10. If you have network connectivity, you can install directly from the network (dnf accepts a url). E.g.:

# dnf install /root/kmod-3w-9xxx-2.26.02.014-5.el8_3.elrepo.x86_64.rpm

15. [DEBUG] This particular rpm also includes:

# cat /etc/dracut.conf.d/3w-9xxx.conf
add_drivers+=" 3w-9xxx "

which make dracut automatically include it in images it will generate. I am going to ignore this and pass the option manually anyway.

16. Create a new image, forcefully adding the driver. E.g.:

# dracut --force --add-drivers 3w-9xxx --kver 4.18.0-240.15.1.el8_3.x86_64

17. [DEBUG] Check the results:

# ls -ltr /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img*
-rw-------. 1 root root 88717417 Jun  2 14:29 /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img.bck1
-rw-------. 1 root root 88739013 Jun  2 17:47 /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img

We see that the new image is a bit larger, as expected.

# lsinitrd /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img | grep 3w
-rw-r--r--   1 root     root         4092 Feb 22 15:57 usr/lib/modules/4.18.0-240.15.1.el8_3.x86_64/kernel/arch/x86/crypto/twofish-x86_64-3way.ko.xz
drwxr-xr-x   2 root     root            0 Feb 22 15:57 usr/lib/modules/4.18.0-240.15.1.el8_3.x86_64/weak-updates/3w-9xxx
lrwxrwxrwx   1 root     root           55 Feb 22 15:57 usr/lib/modules/4.18.0-240.15.1.el8_3.x86_64/weak-updates/3w-9xxx/3w-9xxx.ko -> ../../../4.18.0-240.el8.x86_64/extra/3w-9xxx/3w-9xxx.ko
drwxr-xr-x   2 root     root            0 Feb 22 15:57 usr/lib/modules/4.18.0-240.el8.x86_64/extra/3w-9xxx
-rw-r--r--   1 root     root        80121 Nov 10  2020 usr/lib/modules/4.18.0-240.el8.x86_64/extra/3w-9xxx/3w-9xxx.ko

And that it includes the driver. Good.

18. Copy the image to node's directory. E.g.:

# cp -p /boot/initramfs-4.18.0-240.15.1.el8_3.x86_64.img /boot/ovirt-node-ng-4.4.5.1-0.20210323.0+1/initramfs-4.18.0-240.15.1.el8_3.x86_64.img

19. Press 'exit<Enter>' twice - first to exit the chroot, second to exit the shell (ssh)

20. Go back to the GUI screen (e.g. <Ctrl><Alt><F5>, IIRC) and press the button to reboot.

21. [DEBUG] To make sure that we indeed boot with the driver:

Go to the relevant line in the boot menu (there should be only one, normally, on a new setup)

Press <Ctrl><x>

Move to the line starting with 'linux'

Press <End>, add there ' rd.break'. This will cause the boot process to break in the middle, and get you a shell. There:

# lsmod | grep 3w

Output should be empty.

# modprobe 3w-9xxx

Should succeed.

# lsmod | grep 3w

Output should include the driver.

# exit

Should reboot.

Hopefully, if you have real hardware that needs this driver, the initrd will automatically load the driver for you.

[1] https://resources.ovirt.org/pub/ovirt-4.4/iso/ovirt-node-ng-installer/4.4.5-2021032318/el8/ovirt-node-ng-installer-4.4.5-2021032318.el8.iso

[2] https://elrepo.org/linux/dud/el8/x86_64/dd-3w-9xxx-2.26.02.014-5.el8_3.elrepo.iso

Comment 28 Yedidyah Bar David 2021-06-03 09:07:25 UTC
I see that I had a few typos here and there, I hope you can understand. Most notably:

- Step 13 is not [DEBUG]. It's not mandatory per se, but very good to have.

- In step 21: 'Press <Ctrl><x>' should be 'Press <e>', and after typing ' rd.break' you should press <Ctrl><x>

Sorry.

Comment 29 Steve Goodman 2021-07-08 17:30:10 UTC
(In reply to Yedidyah Bar David from comment #27)

> 20. Go back to the GUI screen (e.g. <Ctrl><Alt><F5>, IIRC) and press the button to reboot.

On my RHEL 8.4 CSB machine, <Ctrl><Alt><F5> didn't work. Neither did <Ctrl><Alt><F7> as per [1]. But <Ctrl><Alt><F1> _did_ work.

Anyway, updates based on your comments are now in Gitlab. Please review:

https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Virtualization/-/merge_requests/1955

[1] https://askubuntu.com/questions/547290/how-do-i-get-out-of-ctrl-alt-f3

Comment 30 Yedidyah Bar David 2021-07-20 07:32:02 UTC
(In reply to Steve Goodman from comment #29)
> (In reply to Yedidyah Bar David from comment #27)
> 
> > 20. Go back to the GUI screen (e.g. <Ctrl><Alt><F5>, IIRC) and press the button to reboot.
> 
> On my RHEL 8.4 CSB machine, <Ctrl><Alt><F5> didn't work. Neither did
> <Ctrl><Alt><F7> as per [1]. But <Ctrl><Alt><F1> _did_ work.
> 
> Anyway, updates based on your comments are now in Gitlab. Please review:
> 
> https://gitlab.cee.redhat.com/rhci-documentation/docs-
> Red_Hat_Enterprise_Virtualization/-/merge_requests/1955
> 
> [1] https://askubuntu.com/questions/547290/how-do-i-get-out-of-ctrl-alt-f3

The use of virtual consoles differs between distributions and between use-cases.
What you see on your machine is RHEL during normal use (with a GUI), which is
different from what its (GUI) installer does and from Ubuntu.

Not sure F5 was correct, though - we should better have someone from QE verify
this properly, on real unsupported hardware for which we can find/make a DUD.

Comment 31 Steve Goodman 2021-07-27 14:37:32 UTC
Moving to ON_QA.

Shiyi,

Please see comment 27 for info on how Didi created a DUD so that you can test this procedure.

Let me know if you have any questions.

The latest preview is here:

https://jenkins.dxp.redhat.com/job/CCS/job/ccs-mr-preview/3595/artifact/assembly-Installing_Red_Hat_Virtualization_as_a_self-hosted_engine_using_the_command_line/preview/index.html#Advanced_RHVH_Install_SHE_cli_deploy

See "Installing a DUD driver on a host without installer support"

The merge request is here:
https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Virtualization/-/merge_requests/1955?diff_head=true#49bd4e4ad7b6aa30864251508c997a0e6a23cb53


Note You need to log in before you can comment on or make changes to this bug.