Bug 1737926

Summary: [downstream clone - 4.3.6] Setting FIPS parameter from the engine will make the host unable to reboot if /boot resides on a separate partition (as in RHV-H case)
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-host-deployAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Pavol Brilla <pbrilla>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.5CC: cshao, dfediuck, dougsland, lleistne, lsurette, lsvaty, mavital, mperina, nlevy, pelauter, qiyuan, Rhev-m-bugs, sbonazzo, srevivo, tburke, weiwang, yaniwang, yturgema
Target Milestone: ovirt-4.3.6Keywords: ZStream
Target Release: 4.3.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-host-deploy-1.8.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1736873 Environment:
Last Closed: 2019-10-10 15:39:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1736873    
Bug Blocks: 1745961    

Description RHV bug bot 2019-08-06 11:38:34 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1736873 +++
======================================================================

Description of problem:
Once set into FIPS mode, RHV-H fails to reboot.

On the serial console we see:
[   13.854806] dracut: FATAL: FIPS integrity test failed
[   13.859879] dracut: Refusing to continue
[   13.823293] dracut-pre-pivot[1171]: Warning: /boot/.vmlinuz-3.10.0-1062.el7.x86_64.hmac does not exist
[   15.502617] System halted.

but .vmlinuz-3.10.0-1062.el7.x86_64.hmac is there on FS:

[root@dell-r210ii-10 ~]# ls -l /boot/.vmlinuz-3.10.0-1062.el7.x86_64.hmac
-rw-r--r--. 1 root root 167  1 ago 15.35 /boot/.vmlinuz-3.10.0-1062.el7.x86_64.hmac
[root@dell-r210ii-10 ~]# rpm -qf /boot/.vmlinuz-3.10.0-1062.el7.x86_64.hmac
kernel-3.10.0-1062.el7.x86_64

although:
[root@dell-r210ii-10 ~]# FIPSCHECK_DEBUG=error fipscheck  /boot/vmlinuz-3.10.0-1062.el7.x86_64 
fipscheck: Hmac mismatch on file '/boot/vmlinuz-3.10.0-1062.el7.x86_64' : No such file or directory



Version-Release number of selected component (if applicable):
[root@dell-r210ii-10 boot]# nodectl info
layers: 
  rhvh-4.3.5.2-0.20190722.0: 
    rhvh-4.3.5.2-0.20190722.0+1
bootloader: 
  default: rhvh-4.3.5.2-0.20190722.0 (3.10.0-1062.el7.x86_64)
  entries: 
    rhvh-4.3.5.2-0.20190722.0 (3.10.0-1062.el7.x86_64): 
      index: 0
      title: rhvh-4.3.5.2-0.20190722.0 (3.10.0-1062.el7.x86_64)
      kernel: /boot/rhvh-4.3.5.2-0.20190722.0+1/vmlinuz-3.10.0-1062.el7.x86_64
      args: "ro nofb quiet default_hugepagesz=1GB hugepagesz=1GB hugepages=4 hugepagesz=2M hugepages=1024console=tty0 crashkernel=auto rd.lvm.lv=rhvh_dell-r210ii-10/swap rd.lvm.lv=rhvh_dell-r210ii-10/rhvh-4.3.5.2-0.20190722.0+1 console=ttyS1,115200 LANG=en_US.UTF-8 img.bootid=rhvh-4.3.5.2-0.20190722.0+1"
      initrd: /boot/rhvh-4.3.5.2-0.20190722.0+1/initramfs-3.10.0-1062.el7.x86_64.img
      root: /dev/rhvh_dell-r210ii-10/rhvh-4.3.5.2-0.20190722.0+1
current_layer: rhvh-4.3.5.2-0.20190722.0+1



How reproducible:
2 hosts over 2

Steps to Reproduce:
1. deploy RHV-H
2. add the host to the engine choosing: Kernel parameters -> FIPS 
3. reboot the host

Actual results:
The host doesn't boot with:
[   13.854806] dracut: FATAL: FIPS integrity test failed
[   13.859879] dracut: Refusing to continue
[   13.823293] dracut-pre-pivot[1171]: Warning: /boot/.vmlinuz-3.10.0-1062.el7.x86_64.hmac does not exist
[   15.502617] System halted.

Expected results:
the host successfully reboots

Additional info:

(Originally by Simone Tiraboschi)

Comment 1 RHV bug bot 2019-08-06 11:38:37 UTC
Additional info:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/chap-federal_standards_and_regulations explicitly says:

"If your /boot or /boot/EFI/ partitions reside on separate partitions, add the boot=<partition> (where <partition> stands for /boot) parameter to the kernel command line as well."

and this is definitively our case on RHV-H.
I'm just wondering if host-deploy simply skips that step.

(Originally by Simone Tiraboschi)

Comment 2 RHV bug bot 2019-08-06 11:38:40 UTC
The issue is probably here: https://github.com/oVirt/ovirt-engine/blob/master/frontend/webadmin/modules/uicommonweb/src/main/java/org/ovirt/engine/ui/uicommonweb/models/hosts/KernelCmdlineUtil.java#L110

The engine computes the whole parameter line for the kernel and, for the FIPS case, it will add only fips=1 without setting also boot=UUID=... as documented https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/security_guide/chap-federal_standards_and_regulations

so the host will not reboot in FIPS mode if /boot is not on the root partition as in RHV-H case.

Unfortunately I fear that the engine doesn't really know the UUID of the boot partition so it's probably not a straightforward fix.

(Originally by Simone Tiraboschi)

Comment 3 RHV bug bot 2019-08-06 11:38:42 UTC
Workaround:
detect the missing UUID parameter with something like:
  findmnt --output=UUID --noheadings --target=/boot

and instead of just clicking on FIPS checkbox, edit the Kernel command line field entering
  fips=1 boot=UUID=<boot_p_uuid>
as in the attached screenshot

(Originally by Simone Tiraboschi)

Comment 4 RHV bug bot 2019-08-06 11:38:44 UTC
Created attachment 1600017 [details]
Workaround

(Originally by Simone Tiraboschi)

Comment 5 RHV bug bot 2019-08-06 11:38:46 UTC
Fixing it on engine side is quite complex because the engine doesn't directly know the UUID of the boot partition for each host.
What we can do is intercept the kernel parameters string on host-deploy here:
https://github.com/oVirt/ovirt-host-deploy/blob/master/src/plugins/ovirt-host-deploy/kernel/kernel.py#L93
detect if 'fips=1' is there with no 'boot=' and in that case detect the missing value and inject the missing parameter.
Once the host will reboot, host-monitoring should detect the new kernel cmd line and so the engine will detect it for the future.

Please notice the python host-deploy is going to be deprecated/replaced in favour of a pure ansible implementation in 4.4 so this fix has to be re-applied there.

(Originally by Simone Tiraboschi)

Comment 6 RHV bug bot 2019-08-06 11:38:48 UTC
reducing severity to high and postponing to 4.3.6 since a simple workaround exists ( comment #3 )

(Originally by Sandro Bonazzola)

Comment 8 Pavol Brilla 2019-09-12 05:31:54 UTC
Testing on RHVH as it was discovered there:

# imgbase w
You are on rhvh-4.3.6.2-0.20190904.0+1


# cat /proc/cmdline 
BOOT_IMAGE=/rhvh-4.3.6.2-0.20190904.0+1/vmlinuz-3.10.0-1062.1.1.el7.x86_64 root=...... img.bootid=rhvh-4.3.6.2-0.20190904.0+1 boot=UUID=e66ac913-2353-4e17-bf97-1197133128de fips=1

Comment 10 errata-xmlrpc 2019-10-10 15:39:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3026