Bug 2043296

Summary: Ignition fails when reusing existing statically-keyed LUKS volume
Product: OpenShift Container Platform Reporter: Benjamin Gilbert <bgilbert>
Component: RHCOSAssignee: Benjamin Gilbert <bgilbert>
Status: CLOSED ERRATA QA Contact: Micah Abbott <miabbott>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.9CC: dornelas, hhei, jligon, miabbott, mrussell, nstielau
Target Milestone: ---Keywords: Regression
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When reusing an existing statically-keyed LUKS volume during provisioning, the encryption key is not correctly written to disk. Consequence: Reusing a statically-keyed LUKS volume fails with a "missing persisted keyfile" error. Fix: Ignition correctly writes keys for reused LUKS volumes. Result: Existing statically-keyed LUKS volumes can be reused during provisioning.
Story Points: ---
Clone Of:
: 2043299 (view as bug list) Environment:
Last Closed: 2022-03-10 16:41:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2043297    
Bug Blocks: 2043299    

Description Benjamin Gilbert 2022-01-21 00:27:19 UTC
It's possible to reprovision an existing node and reuse existing LUKS volumes that are statically keyed.  For example, if a node is installed twice with this config:

variant: openshift
version: 4.9.0
metadata:
  name: static-key
  labels:
    machineconfiguration.openshift.io/role: worker
storage:
  disks:
  - device: /dev/sdb
    wipe_table: true
    partitions:
    - number: 1
  luks:
  - name: disk1
    device: /dev/sdb1
    key_file:
      inline: mykey
  filesystems:
  - device: /dev/disk/by-id/dm-name-disk1
    format: xfs
    path: /var/mnt/encrypted_test
    with_mount_unit: true

the encrypted data on /dev/sdb will be preserved.  However, in 4.9+ Ignition fails with:

    CRITICAL : Ignition failed: creating crypttab entries: missing persisted keyfile for [...]

Comment 1 RHCOS Bug Bot 2022-01-21 18:45:21 UTC
This bug has been reported fixed in a new RHCOS build and is ready for QE verification.  To mark the bug verified, set the Verified field to Tested.  This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.

Comment 2 Micah Abbott 2022-01-24 18:00:22 UTC
I used libvirt to reproduce this using 4.9 and verify the fix with RHCOS 410.84.202201241233-0


I created an Ignition config from the Butane example above, with the addition of a user + ssh key

`butane -o bz2043296.json -r -p -s bz2043296.priv.bu`



Confirming on RHCOS 4.9
=========================
1. Create snapshot of a RHCOS 4.9 image

qemu-img create -f qcow2 -F qcow2 -o backing_file=rhcos-49.84.202201212103-0-qemu.x86_64.qcow2 rhcos-49.84.202201212103-0-qemu.x86_64.jan24.qcow2

2. Create an empty qcow2 disk to use as a secondary disk

qemu-img create -f qcow2 2043296.qcow2 10G

3. Set some variables

VM_NAME=rhcos49
VCPUS=2
RAM_MB=4096
IMAGE=/path/to/rhcos-49.84.202201212103-0-qemu.x86_64.jan24.qcow2
LUKS_DISK=/path/to/2043296.qcow2
IGNITION_CONFIG=/path/to/bz2043296.json

4. Run the virt-install command

virt-install --connect="qemu:///system" --name="${VM_NAME}" --vcpus="${VCPUS}" --memory="${RAM_MB}" --import --disk path="${IMAGE}",format=qcow2,bus=virtio --disk path="${LUKS_DISK}",format=qcow2,bus=virtio --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=${IGNITION_CONFIG}" --os-variant rhel8.4 --graphics=none`

5. Confirm initial install was successful
6. Destroy/undefine the VM

virsh destroy "${VM_NAME}" && virsh undefine "${VM_NAME}"

7. Delete + recreate the root disk

rm "${IMAGE}"
qemu-img create -f qcow2 -F qcow2 -obacking_file=rhcos-49.84.202201212103-0-qemu.x86_64.qcow2 rhcos-49.84.202201212103-0-qemu.x86_64.jan24.qcow2

8. Re-run virt-install

virt-install --connect="qemu:///system" --name="${VM_NAME}" --vcpus="${VCPUS}" --memory="${RAM_MB}" --import --disk path="${IMAGE}",format=qcow2,bus=virtio --disk path="${LUKS_DISK}",format=qcow2,bus=virtio --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=${IGNITION_CONFIG}" --os-variant rhel8.4 --graphics=none`

9. Observe the failure message on the console


Verifying fix with RHCOS 4.10
==============================
1. Create snapshot of a RHCOS 4.10 image

qemu-img create -f qcow2 -F qcow2 -o backing_file=rhcos-410.84.202201241233-0-qemu.x86_64.qcow2 rhcos-410.84.202201241233-0-qemu.x86_64.jan24.qcow2

2. Create an empty qcow2 disk to use as a secondary disk

qemu-img create -f qcow2 2043296-rhcos410.qcow2 10G

3. Set some variables

VM_NAME=rhcos410
VCPUS=2
RAM_MB=4096
IMAGE=/path/to/rhcos-410.84.202201241233-0-qemu.x86_64.jan24.qcow2
LUKS_DISK=/path/to/2043296-rhcos410.qcow2
IGNITION_CONFIG=/path/to/bz2043296.json

4. Run the virt-install command

virt-install --connect="qemu:///system" --name="${VM_NAME}" --vcpus="${VCPUS}" --memory="${RAM_MB}" --import --disk path="${IMAGE}",format=qcow2,bus=virtio --disk path="${LUKS_DISK}",format=qcow2,bus=virtio --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=${IGNITION_CONFIG}" --os-variant rhel8.4 --graphics=none`

5. Confirm initial install was successful
6. Destroy/undefine the VM

virsh destroy "${VM_NAME}" && virsh undefine "${VM_NAME}"

7. Delete + recreate the root disk

rm "${IMAGE}"
qemu-img create -f qcow2 -F qcow2 -o backing_file=rhcos-410.84.202201241233-0-qemu.x86_64.qcow2 rhcos-410.84.202201241233-0-qemu.x86_64.jan24.qcow2

8. Re-run virt-install

virt-install --connect="qemu:///system" --name="${VM_NAME}" --vcpus="${VCPUS}" --memory="${RAM_MB}" --import --disk path="${IMAGE}",format=qcow2,bus=virtio --disk path="${LUKS_DISK}",format=qcow2,bus=virtio --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=${IGNITION_CONFIG}" --os-variant rhel8.4 --graphics=none`

9. Observe a succesfull install + reuse of the secondary disk

Comment 3 RHCOS Bug Bot 2022-01-28 16:09:27 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2043297 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 6 HuijingHei 2022-02-04 10:10:44 UTC
Verify passed with rhcos-410.84.202202040840-0-qemu.x86_64.qcow2 according to steps in Comment 2

Auto script status is tracked by https://github.com/coreos/ignition/issues/1306

Comment 8 errata-xmlrpc 2022-03-10 16:41:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056