Bug 2038240 - Error when configuring a file using permissions bigger than decimal 511 (octal 0777)
Summary: Error when configuring a file using permissions bigger than decimal 511 (octa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.10.0
Assignee: Zack Zlotnik
QA Contact: Sergio
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-07 15:34 UTC by Sergio
Modified: 2022-03-10 16:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:37:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2910 0 None open Bug 2038240: Error if files have special bits set 2022-01-12 17:02:50 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:37:54 UTC

Description Sergio 2022-01-07 15:34:23 UTC
Description of problem:
When we create a MachineConfig resource to deploy a file with permissions numerically bigger than decimal 511 (octal 0777) the MachineConfigPool becomes degraded, the config daemon shows an error regarding a config drift and the pool cannot be recovered by deleting the MC and editing the desirdeConfig value.


Version-Release number of MCO (Machine Config Operator) (if applicable):
$ oc get co machine-config
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.10.0-0.nightly-2022-01-07-004348   True        False         False      5h37m   


Platform (AWS, VSphere, Metal, etc.):
AWS


Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure): Y

How reproducible:
Always

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job:

2. Profile:

Steps to Reproduce:
1. Create a MachineConfiguration resource deploying a file using a mode bigger than decimal 511 (octal 0777)

cat << EOF | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: mco-test-file-permissions
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:,MCO%20test%20file%20permissions%0A
        path: /etc/mco-test-file-permissions
        mode: 512
EOF


Actual results:
The worker pool is degraded, and we can see this message in the config daemon logs

I0107 11:50:22.717823    1612 daemon.go:1198] Validating against pending config rendered-worker-d262b13e390b5082d2cf843819138dba
E0107 11:50:22.723320    1612 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-d262b13e390b5082d2cf843819138dba: mode mismatch for file: "/etc/mco-test-file-permissions"; expected: ----------/512/01000; received: ----------/0/0


Expected results:
If permissions numerically bigger than 511(0777) are allowed, no error should happen and the permisions should be set properly.

If permissions numerically bigger than 511(0777) are not allowed, a validation should be done such that we don't get into an error that cannot be recovered. This validation should report the right cause of the problem (the MachineConfig resource defining forbidden permissions in a file).

Additional info:

Notice that if permissions numerically bigger than 511(0777) are not allowed, then users cannot configure things like the sticky bit, decimal 1023(octal 1777), or the setuid and setgid bits.

Comment 1 Zack Zlotnik 2022-01-10 22:52:02 UTC
Overall, this behavior isn't new but rather has existed and has been (mostly) dormant. What's changed is that we're executing this code more often and writing better test cases. Under the hood, here's what's happening:

1. When the file is created by the MCD [1], the os.Chmod [2] function ignores the sticky, setuid, and setgid bits because only the 9 most significant bits (read / write / execute for user, group, and other) are considered standard UNIX permissions [3]. See [6] for a Go Playground link which illustrates this.
2. Because of this, the permissions are set, but the mode bits are not.
3. When the config drift detection code runs, it identifies a mismatch between what's on disk vs. what the MachineConfig specifies. In Sergio's provided example which sets 01000, we can see that stat'ing the file shows that its permission and mode bits are set to 0 because of the truncation.

In addition to #1, one can see that Golang has its own internal representation of file mode bits [3]. What this means is that a valid (to Golang's internal file mode bits, anyway) octal representation of 01777 (that os.Chmod would set correctly) would be 04000777 (1049087, decimal). While passing this into os.Chmod works, it fails Ignition validation [5], which expects a file mode value less than 07777. To my understanding, Golang keeps its own internal representation of this for portability across OSes [7], so this particular behavior is considered a feature.

This appears to be consistent with the current behavior in Ignition [4], meaning that if one were to use Ignition in an out-of-MCO context, the file mode would not be set correctly.

=========

Refs:
1. https://github.com/openshift/machine-config-operator/blob/release-4.10/pkg/daemon/update.go#L1567-L1600
2. https://cs.opensource.google/go/go/+/refs/tags/go1.17.6:src/os/file.go;l=521-539
3. https://cs.opensource.google/go/go/+/refs/tags/go1.17.6:src/io/fs/fs.go;l=166-167
4. https://github.com/coreos/ignition/blob/v2.7.0/internal/exec/util/file.go#L163-L180
5. https://github.com/coreos/ignition/blob/v2.7.0/config/v3_2/types/mode.go#L21-L26
6. https://go.dev/play/p/iLVNsA3Kf_y
7. https://github.com/golang/go/issues/25539

Comment 2 Zack Zlotnik 2022-01-11 14:24:54 UTC
After explaining this to a non-engineer, this is a more succinct and easier to follow summary of my comment above.

- File permissions (read / write / execute for user, group, and other) is conferred by the first three digits, e.g., 0755.
- Special file modes (sticky bit, setuid, getgid) are conferred by a fourth digit to the left of the first three, e.g., 01755.
- What's happening is that the internal Golangs internal representation of file modes places the special file mode bits in a different location (e.g., 04000775).
- When os.Chmod() attempts to apply the file mode bits, it discards anything to the left of the file permission bits (e.g., the 1 in 01755, which becomes 0755) since it ultimately calls the .Perm() method on the FileMode object to do this.
- While the os.Chmod() function (or rather, the ones it delegates to) can set the special file mode bits, it will only do so if they're set in the different location referenced above. So some additional logic is needed to determine if any special file mode bits are set and to adjust the internal Golang representation accordingly.

Comment 3 Zack Zlotnik 2022-01-11 16:12:12 UTC
I was able to reproduce this issue within Ignition as well, so I've opened a bug against Ignition as well: https://github.com/coreos/ignition/issues/1301.

Comment 4 Sergio 2022-01-26 13:39:23 UTC
Verified using 
$ oc get co machine-config 
NAME             VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.10.0-0.ci-2022-01-26-000911   True        False         False      3h57m   


- When the MC with the wrong permissions was created, the worker pool became Degraded
- When the offending MC was deleted, the worker pool stopped being Degraded
- When a MC using valid permissions (0777) was created, the nodes were configured properly and without errors

Comment 8 errata-xmlrpc 2022-03-10 16:37:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.