Bug 1940966 - [4.7.z] prjquota is dropped from rootflags if rootfs is reprovisioned
Summary: [4.7.z] prjquota is dropped from rootflags if rootfs is reprovisioned
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.7
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 4.7.z
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1935174 1940704
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-19 16:30 UTC by Jonathan Lebon
Modified: 2021-04-12 23:23 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The `prjquota` kernel argument was dropped if rootfs reprovisioning (e.g. LUKS) was enabled. Consequence: This breaks OCP features around disk space quota management. Fix: The `prjquota` kernel argument is now retained even if the root filesystem is reprovisioned. Result: OCP features dependent on that rootfs mount option now work.
Clone Of: 1940704
Environment:
Last Closed: 2021-04-12 23:22:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:1075 0 None None None 2021-04-12 23:23:10 UTC

Description Jonathan Lebon 2021-03-19 16:30:31 UTC
+++ This bug was initially created as a clone of Bug #1940704 +++

Description of problem:

When the rootfs is reprovisioned, such as when LUKS disk encryption is configured, the default karg "rootflags=prjquota" is dropped.  This mount option is required to enable container storage quotas (see https://github.com/coreos/coreos-assembler/pull/303/commits/6103effbd006bb6109467830d6a3e42dd847668d and BZ 1658386)



Version-Release number of selected component (if applicable):

4.7.1 (47.83.202103041352-0)


How reproducible:  100%


Steps to Reproduce:
1.  Deploy RHCOS with Tang encryption (https://docs.openshift.com/container-platform/4.7/installing/install_config/installing-customizing.html#installation-special-config-encrypt-disk-tang_installing-customizing)

Actual results:

rootfs is mounted with 'noquota' option

# findmnt /var
TARGET SOURCE                                     FSTYPE OPTIONS
/var   /dev/mapper/root[/ostree/deploy/rhcos/var] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota


Expected results:

root is mounted with 'prjquota' option

# findmnt /var
TARGET SOURCE                                     FSTYPE OPTIONS
/var   /dev/mapper/root[/ostree/deploy/rhcos/var] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota



Additional info:

Benjamin pointed me at https://github.com/coreos/fedora-coreos-config/blob/rhcos-4.7/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/coreos-rootflags.sh


Simply reproducer for testing on a single node:


{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "http://myserver/rhcos/test.json"
        }
      ]
    },
    "version": "3.2.0"
  },
  "storage": {
    "filesystems": [
      {
        "device": "/dev/mapper/root",
        "format": "xfs",
        "label": "root",
        "wipeFilesystem": true
      }
    ],
    "luks": [
      {
        "clevis": {
          "tang": [
            {
              "thumbprint": "my_thumbprint",
              "url": "http://mytangserver"
            }
          ]
        },
        "device": "/dev/disk/by-partlabel/root",
        "name": "root",
        "options": [
          "--cipher",
          "aes-cbc-essiv:sha256"
        ],
        "wipeVolume": true
      }
    ]
  }
}


test.json is just a simple ignition config that provides a password hash and ssh key so I can log in to the test system

--- Additional comment from Benjamin Gilbert on 2021-03-18 19:29:52 EDT ---

Assigning to jlebon for initial evaluation.

--- Additional comment from Jonathan Lebon on 2021-03-19 11:45:06 EDT ---

At the "Ignition distro integration" level, the assumption is that as soon as you reprovision the root filesystem, you're choosing your own adventure. So e.g. you should be able to add whatever mount options you'd like.

There's at least two problems here though:
1. So far `prjquota` has mostly been an implementation detail we haven't exposed to users. This would require documenting it everywhere root reprovisioning is documented.
2. Changing mount options right now for the rootfs is painful. We don't support the `mountOptions` flag (https://github.com/coreos/fedora-coreos-config/issues/805) nor Ignition kargs (but soon: https://github.com/coreos/ignition/issues/1168). So right now, this would require using `rpm-ostree kargs` in a systemd service as documented in https://docs.fedoraproject.org/en-US/fedora-coreos/kernel-args/.

It's tempting to try to get https://github.com/coreos/fedora-coreos-config/issues/805 into 4.7, because at a technical level it's pretty trivial to do. That would leave (1), i.e. updating all the documented MachineConfigs and RCC snippets to include a mount_options/mountOptions. Though specifically for prjquota, it doesn't seem like there's any harm in just *always* turning it on at the OS level if the rootfs is XFS, regardless of whether it was reprovisioned or not. AFAICT there isn't really any overhead or performance issues associated with this (I mean... this *has* been the default for a long time). So then it remains an implementation detail and avoids documentation churn. Or another way to frame this is that it simplifies the rootflags messaging to just: "by default, we use prjquota if the rootfs is XFS".

And when we implement https://github.com/coreos/fedora-coreos-config/issues/805 (which is generally useful), anyone who for whatever reason *doesn't* want prjquota can just do `mountOptions: []` (short-term, they can fallback to using `rpm-ostree kargs` to modify the `rootflags` karg).

I'll take a look at this.

Comment 1 Jonathan Lebon 2021-03-19 16:31:16 UTC
Waiting on https://github.com/coreos/fedora-coreos-config/pull/903 before backporting.

Comment 2 Micah Abbott 2021-03-25 19:07:07 UTC
Backported here - https://github.com/coreos/fedora-coreos-config/pull/905
Included in openshift/os here - https://github.com/openshift/os/pull/520
Fed into the build pipeline here - https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1240

Landed in RHCOS 47.83.202103251640-0

Comment 4 Michael Nguyen 2021-03-26 15:59:40 UTC
Verified on 47.83.202103251640-0.  Overrode the boot image with the ami from the boot image bump https://github.com/openshift/installer/pull/4791 on 4.7.0-0.nightly-2021-03-26-105314.  

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-03-26-105314   True        False         10m     Cluster version is 4.7.0-0.nightly-2021-03-26-105314
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-130-216.us-west-2.compute.internal   Ready    master   36m   v1.20.0+bafe72f
ip-10-0-140-126.us-west-2.compute.internal   Ready    worker   23m   v1.20.0+bafe72f
ip-10-0-160-193.us-west-2.compute.internal   Ready    worker   24m   v1.20.0+bafe72f
ip-10-0-169-235.us-west-2.compute.internal   Ready    master   36m   v1.20.0+bafe72f
ip-10-0-194-63.us-west-2.compute.internal    Ready    master   34m   v1.20.0+bafe72f
ip-10-0-212-179.us-west-2.compute.internal   Ready    worker   23m   v1.20.0+bafe72f
$ oc debug node/ip-10-0-130-216.us-west-2.compute.internal
Starting pod/ip-10-0-130-216us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ad297b22e7b96e04e45aefcc57f571361c87bdc3110e692bb239f2dfbe64050
              CustomOrigin: Managed by machine-config-operator
                   Version: 47.83.202103251640-0 (2021-03-25T16:44:03Z)

  ostree://3fdd1488024f054e39b1be508781d535d1ac7ed423bb3b4b656c2f345934220d
                   Version: 47.83.202103251640-0 (2021-03-25T16:44:03Z)
sh-4.4# cryptsetup luksDump /dev/disk/by-partlabel/root
LUKS header information
Version:       	2
Epoch:         	6
Metadata area: 	16384 [bytes]
Keyslots area: 	16744448 [bytes]
UUID:          	40fb8592-a819-412e-8dc6-25c58c915edf
Label:         	(no label)
Subsystem:     	(no subsystem)
Flags:       	(no flags)

Data segments:
  0: crypt
	offset: 16777216 [bytes]
	length: (whole device)
	cipher: aes-cbc-essiv:sha256
	sector: 512 [bytes]

Keyslots:
  1: luks2
	Key:        256 bits
	Priority:   normal
	Cipher:     aes-cbc-essiv:sha256
	Cipher key: 256 bits
	PBKDF:      argon2i
	Time cost:  5
	Memory:     1048576
	Threads:    4
	Salt:       50 79 32 2f 29 ec 5a 33 8a 05 17 47 80 89 bf 2d 
	            63 54 ce e7 dd 99 70 23 bf b6 28 74 22 65 68 6c 
	AF stripes: 4000
	AF hash:    sha256
	Area offset:163840 [bytes]
	Area length:131072 [bytes]
	Digest ID:  0
Tokens:
  0: clevis
	Keyslot:  1
Digests:
  0: pbkdf2
	Hash:       sha256
	Iterations: 214520
	Salt:       23 0f c2 81 42 2c ca 5b 82 0a 3e 9b e7 af 61 5d 
	            ec af 0d c4 12 65 4d e4 94 5c 8d 92 07 2d 54 29 
	Digest:     e3 54 29 61 44 be 29 39 68 da 62 01 da e5 0c 8f 
	            c7 17 32 59 02 56 f2 ab 32 d6 fb f0 a9 0e 13 31 
sh-4.4# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme0n1     259:0    0   120G  0 disk  
|-nvme0n1p1 259:1    0     1M  0 part  
|-nvme0n1p2 259:2    0   127M  0 part  
|-nvme0n1p3 259:3    0   384M  0 part  /boot
`-nvme0n1p4 259:4    0 119.5G  0 part  
  `-root    253:0    0 119.5G  0 crypt /sysroot
sh-4.4# findmnt /var | more
TARGET SOURCE                                     FSTYPE OPTIONS
/var   /dev/mapper/root[/ostree/deploy/rhcos/var] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
sh-4.4# clevis luks list -d /dev/disk/by-partlabel/root
1: sss '{"t":1,"pins":{"tang":[{"url":"http://34.217.25.205"}]}}'
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...
$ oc debug node/ip-10-0-140-126.us-west-2.compute.internal
Starting pod/ip-10-0-140-126us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme0n1     259:0    0   120G  0 disk  
|-nvme0n1p1 259:1    0     1M  0 part  
|-nvme0n1p2 259:2    0   127M  0 part  
|-nvme0n1p3 259:3    0   384M  0 part  /boot
`-nvme0n1p4 259:4    0 119.5G  0 part  
  `-root    253:0    0 119.5G  0 crypt /sysroot
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0ad297b22e7b96e04e45aefcc57f571361c87bdc3110e692bb239f2dfbe64050
              CustomOrigin: Managed by machine-config-operator
                   Version: 47.83.202103251640-0 (2021-03-25T16:44:03Z)

  ostree://3fdd1488024f054e39b1be508781d535d1ac7ed423bb3b4b656c2f345934220d
                   Version: 47.83.202103251640-0 (2021-03-25T16:44:03Z)
sh-4.4# findmnt /var | more
TARGET SOURCE                                     FSTYPE OPTIONS
/var   /dev/mapper/root[/ostree/deploy/rhcos/var] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota
sh-4.4# cryptsetup luksDump /dev/disk/by-partlabel/root
LUKS header information
Version:       	2
Epoch:         	6
Metadata area: 	16384 [bytes]
Keyslots area: 	16744448 [bytes]
UUID:          	32f7868b-af6c-45d2-8d80-b041fed469d2
Label:         	(no label)
Subsystem:     	(no subsystem)
Flags:       	(no flags)

Data segments:
  0: crypt
	offset: 16777216 [bytes]
	length: (whole device)
	cipher: aes-cbc-essiv:sha256
	sector: 512 [bytes]

Keyslots:
  1: luks2
	Key:        256 bits
	Priority:   normal
	Cipher:     aes-cbc-essiv:sha256
	Cipher key: 256 bits
	PBKDF:      argon2i
	Time cost:  4
	Memory:     852086
	Threads:    2
	Salt:       01 2a 15 b9 81 ea 5b 1a e2 41 26 05 2b 81 74 64 
	            19 51 61 81 4e 46 55 28 b2 70 5d 45 51 72 6e 54 
	AF stripes: 4000
	AF hash:    sha256
	Area offset:163840 [bytes]
	Area length:131072 [bytes]
	Digest ID:  0
Tokens:
  0: clevis
	Keyslot:  1
Digests:
  0: pbkdf2
	Hash:       sha256
	Iterations: 217366
	Salt:       1a 51 2d 2d 06 42 96 2a de ef 7a 79 2f d9 57 38 
	            7d 50 8a 33 9f 65 f6 ba f4 83 01 57 73 a3 b4 d9 
	Digest:     49 0c 7c 64 de 41 2e 38 6c 2a b9 24 22 5e 5f 03 
	            9b 31 5d fd d8 4c 58 60 4c 34 04 5f e0 84 14 34 
sh-4.4# clevis luks list /dev/disk/by-partlabel/root
Did not specify a device!

Usage: clevis luks list -d DEV [-s SLT]

Lists pins bound to a LUKSv1 or LUKSv2 device:

  -d DEV  The LUKS device to list bound pins

  -s SLOT The slot number to list

sh-4.4# clevis luks list -d /dev/disk/by-partlabel/root
1: sss '{"t":1,"pins":{"tang":[{"url":"http://34.217.25.205"}]}}'
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...
$ oc -n openshift-machine-api get machinesets/mnguyen47bootimage-cmmk5-worker-us-west-2a -o yaml | grep ami-
            id: ami-0617611237b58ac93

Comment 9 errata-xmlrpc 2021-04-12 23:22:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1075


Note You need to log in before you can comment on or make changes to this bug.