Bug 1875745 - [BM][IPI] rootDeviceHints not honoured for worker nodes
Summary: [BM][IPI] rootDeviceHints not honoured for worker nodes
Keywords:
Status: CLOSED DUPLICATE of bug 1871653
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: ---
Assignee: Steven Hardy
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-04 09:10 UTC by Yurii Prokulevych
Modified: 2020-09-04 20:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-04 20:46:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yurii Prokulevych 2020-09-04 09:10:55 UTC
Description of problem:
-----------------------
During OCP deployment rootDeviceHints specified in install-config.yaml ain't honoured.

Excerpt from install-config.yaml:

     - name: openshift-worker-0-1
        role: worker
        bmc:
          address: redfish://...
          disableCertificateVerification: True
          username: ***
          password: ***
        bootMACAddress: 52:54:00:82:cf:e7
        rootDeviceHints:
          minSizeGigabytes: 40

The node has 6 disks:
---------------------
    Storage:
      Hctl:           0:0:0:0
      Model:          QEMU HARDDISK
      Name:           /dev/sda
      Rotational:     true
      Serial Number:  drive-scsi0-0-0-0
      Size Bytes:     26843545600
      Vendor:         QEMU
      Hctl:           0:0:0:5
      Model:          QEMU HARDDISK
      Name:           /dev/sdb
      Rotational:     true
      Serial Number:  drive-scsi0-0-0-5
      Size Bytes:     10737418240
      Vendor:         QEMU
      Hctl:           0:0:0:4
      Model:          QEMU HARDDISK
      Name:           /dev/sdc
      Rotational:     true
      Serial Number:  drive-scsi0-0-0-4
      Size Bytes:     10737418240
      Vendor:         QEMU
      Hctl:           0:0:0:3
      Model:          QEMU HARDDISK
      Name:           /dev/sdd
      Rotational:     true
      Serial Number:  drive-scsi0-0-0-3
      Size Bytes:     10737418240
      Vendor:         QEMU
      Hctl:           0:0:0:2
      Model:          QEMU HARDDISK
      Name:           /dev/sde
      Rotational:     true
      Serial Number:  drive-scsi0-0-0-2
      Size Bytes:     10737418240
      Vendor:         QEMU
      Hctl:           0:0:0:1
      Model:          QEMU HARDDISK
      Name:           /dev/sdf
      Rotational:     true
      Serial Number:  drive-scsi0-0-0-1
      Size Bytes:     48318382080
      Vendor:         QEMU

Based on hint a disk with minimum of 40GB has to be chosen.
But when checking node:

[core@worker-0-0 ~]$ lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   25G  0 disk
├─sda1                         8:1    0  384M  0 part /boot
├─sda2                         8:2    0  127M  0 part /boot/efi
├─sda3                         8:3    0    1M  0 part
├─sda4                         8:4    0 24.4G  0 part
│ └─coreos-luks-root-nocrypt 253:0    0 24.4G  0 dm   /sysroot
└─sda5                         8:5    0   65M  0 part
sdb                            8:16   0   10G  0 disk
sdc                            8:32   0   10G  0 disk
sdd                            8:48   0   10G  0 disk
sde                            8:64   0   10G  0 disk
sdf                            8:80   0   45G  0 disk

Events for worker's BMH objects:
--------------------------------
Events:
  Type    Reason                  Age   From                         Message
  ----    ------                  ----  ----                         -------
  Normal  Registered              51m   metal3-baremetal-controller  Registered new host
  Normal  BMCAccessValidated      51m   metal3-baremetal-controller  Verified access to BMC
  Normal  InspectionStarted       51m   metal3-baremetal-controller  Hardware inspection started
  Normal  InspectionComplete      46m   metal3-baremetal-controller  Hardware inspection completed
  Normal  ProfileSet              46m   metal3-baremetal-controller  Hardware profile set: unknown
  Normal  ProvisioningStarted     46m   metal3-baremetal-controller  Image provisioning started for http://..../rhcos-46.82.202008260918-0-compressed.x86_64.qcow2
  Normal  ProvisioningError       46m   metal3-baremetal-controller  Image provisioning failed: node 513f836e-e3e6-4b8f-a824-ba7c9077aa84 command status errored: {'type': 'ImageWriteError', 'code': 500, 'message': 'Error writing image to device', 'details': 'Writing image to device /dev/sdb failed with exit code 1. stdout: write_image.sh: Erasing existing GPT and MBR data structures from /dev/sdb\nCreating new GPT entries.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nwrite_image.sh: Imaging /tmp/rhcos-46.82.202008260918-0-compressed.x86_64.qcow2 to /dev/sdb\n. stderr: 33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.00156113 s, 10.8 MB/s\n33+0 records in\n33+0 records out\n16896 bytes (17 kB, 16 KiB) copied, 0.00101555 s,
16.6 MB/s\nqemu-img: /dev/sdb: error while converting host_device: Device is too small\n'}
  Normal  DeprovisioningStarted   46m   metal3-baremetal-controller  Image deprovisioning started
  Normal  DeprovisioningComplete  45m   metal3-baremetal-controller  Image deprovisioning completed
  Normal  ProvisioningStarted     45m   metal3-baremetal-controller  Image provisioning started for http://.../rhcos-46.82.202008260918-0-compressed.x86_64.qcow2
  Normal  ProvisioningComplete    41m   metal3-baremetal-controller  Image provisioning completed for http://.../rhcos-46.82.202008260918-0-compressed.x86_64.qcow2

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
4.6.0-0.nightly-2020-08-31-220837


How reproducible:
-----------------
100%

Steps to Reproduce:
-------------------
1. Deploy OCP 4.6 with rootDeviceHint values set for workers
2.
3.

Actual results:
---------------
Deployment finished but deployed to wrong disk


Expected results:
-----------------
Deployment finished and deployed to correct disk


Additional info:
----------------
Virtual deployment: 3masters + 2workers + provisionhost.
Masters with 2 disks(1st - 25Gb, 2nd - 45Gb)
Workers with 6 disks(1st - 25Gb, 2-5 - 10Gb, 6th - 45Gb)

Comment 2 Doug Hellmann 2020-09-04 20:46:54 UTC
According to http://file.emea.redhat.com/~yprokule/SOSReports/OCP/RHBZ-1875745/bmh-openshift-worker-0-0.yml it looks like the rootDeviceHints are not being copied to the worker host definition at all. That's the same problem being fixed for https://bugzilla.redhat.com/show_bug.cgi?id=1871653 so I am going to mark this ticket as a duplicate of the other.

*** This bug has been marked as a duplicate of bug 1871653 ***


Note You need to log in before you can comment on or make changes to this bug.