Description of problem: After replacing key in masters in an OpenShift cluster, user can still ssh into the masters with old key. Version-Release number of selected component (if applicable): 4.6 How reproducible: Always Steps to reproduce: Create a cluster Generate a new key pair $oc get machineconfig 99-master-ssh -o yaml > update-ssh-master.yaml edit the yaml and replace the key under sshAuthorizedKeys with new key $oc apply -f update-ssh-master.yaml ssh into the master with NEW key - success Actual results: ssh into the master with OLD key - still successful Expected results: ssh into the master with OLD key - should fail Additional info: On masters, .ssh/authorized_keys.d/ignition file contains the old key. It was pointed out that https://github.com/openshift/machine-config-operator/blob/e0e622950e56/pkg/daemon/update.go#L1318 shows that the new key is written to authorized_keys instead of authorized_keys.d/ignition.
Ignition writes SSH keys to .ssh/authorized_keys.d/ignition, not .ssh/authorized_keys. On RHCOS, sshd is configured with that path as an additional AuthorizedKeysFile. But the MCO updates .ssh/authorized_keys instead. The result is that SSH keys initially provisioned via Ignition cannot be removed with the MCO. It might make sense for MCO to manage .ssh/authorized_keys.d/ignition instead of authorized_keys, but an automatic migration would be needed.
I completely forgot about the behavior change in Ignition 2.x (spec 3). The problem is limited to OpenShift 4.6, and previous versions were fine. In Ignition 0.x (spec 2), Ignition wrote keys to .ssh/authorized_keys.d/coreos-ignition and then synced them to ~/.ssh/authorized_keys. The operating system didn't look at .ssh/authorized_keys.d/coreos-ignition, so it was harmless for the MCO to update authorized_keys directly. Ignition 2.x no longer updates authorized_keys. Instead, it relies on sshd to read .ssh/authorized_keys.d/ignition directly. We'll need to deduplicate our use of these two files. Assuming we don't want to change the historical practice of the MCO managing authorized_keys, I think it'd be fine for the MCO to delete .ssh/authorized_keys.d/ignition after merging its contents into authorized_keys. (Or, equivalently, after first reconciling the MachineConfig?) RHCOS currently special-cases .ssh/authorized_keys.d/ignition and doesn't read other files in authorized_keys.d, including coreos-ignition. FCOS has tooling which automatically reads _every_ file in authorized_keys.d. There are no current plans to include that tooling in RHCOS, but as a precaution, it might be safer for the MCO to delete .ssh/authorized_keys.d/coreos-ignition as well.
Filip, I think that's probably a separate issue. RHCOS reads SSH public keys out of both ~/.ssh/authorized_keys and ~/.ssh/authorized_keys.d/ignition.
Why is this a private bug and why are people creating private comments? I don't see anything confidental here. I think this is something in Bugzilla defaulting to private bugs. Does anyone object to me un-marking it with Group: redhat? I do think this is an important bug, and actually a security issue in some cases so this should be a 4.6 blocker. One option is to change Ignition on RHCOS to write to the old location for now; it seems simpler than changing the MCO to effectively undo what Ignition is doing, but that path also seems fine. (We have messy upgrade issues here too because we need to handle the case where a cluster was installed with < 4.6, then upgraded and where the keys are still in the main file and not the ignition file, if we choose to keep it)
(In reply to Colin Walters from comment #6) > Why is this a private bug and why are people creating private comments? I > don't see anything confidental here. > I think this is something in Bugzilla defaulting to private bugs. Does > anyone object to me un-marking it with Group: redhat? > > I do think this is an important bug, and actually a security issue in some > cases so this should be a 4.6 blocker. > > One option is to change Ignition on RHCOS to write to the old location for > now; it seems simpler than changing the MCO to effectively undo what > Ignition is doing, but that path also seems fine. I also think rolling back ignition till we implement a better migration strategy (which I concur is feasible) would be desiderable for the MCO > > (We have messy upgrade issues here too because we need to handle the case > where a cluster was installed with < 4.6, then upgraded and where the keys > are still in the main file and not the ignition file, if we choose to keep > it)
I'm unilaterally lifting the private flag.
Internal discussion resulted in this: Option 1: Revert just RHCOS: === Revert back to writing `authorized_keys` for RHCOS for now. + Fixes the immediate problem - Doesn't help OKD/FCOS Option 2: MCO merge: === Patch the MCO to work with both cases and to handle upgrades: MCO deletes `.ssh/authorized_keys.d/ignition` after merging its contents into `authorized_keys`. + Better fix - More work and lots going on in MCO Also MCO writes /etc/ssh/sshd_config.d/01-disable-authorized_keys.d.conf to disable Afterburn SSH keys for OKD. RHCOS will ignore it. Option 3: "Belt and Suspenders" (both option 1 and option 2) === RHCOS wouldn’t have to change, but in OKD+FCOS we could test the changes out. Basically do both MCO work _and_ have Ignition build in RHCOS with the flag to disable authorized_keys.d. + Lets us try out the change in OKD first - More work (but not much) Option 4: Revert writing the `/ignition file` in FCOS *and* RHCOS === Luca points out this is documented in the spec: https://github.com/coreos/ignition/blob/58f67d61975c799cc9322aa07b044099b52d5837/doc/configuration-v3_0.md (I hope no one really argues this)
Colin, re comment 6, I recommended to the original reporter that they file a private bug until the security consequences could be evaluated. I hadn't realized at that time that the problem only affected 4.6.
We've decided on option 1 for now, with the intention to revisit after 4.6, and without precluding the issue from being addressed in OKD as time permits. Until that happens, we should document for OKD that install-time SSH keys cannot be removed via a MachineConfig.
This needs a bootimage update. Otherwise Ignition will write to ~/.ssh/authorized_keys.d/ignition, and then we'll immediately update into a new RHCOS release that won't read that file.
Verified with 4.6.0-fc.4 ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-fc.4 True False 163m Cluster version is 4.6.0-fc.4 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-134-86.ec2.internal Ready master 3h3m v1.19.0-rc.2+514f31a ip-10-0-148-100.ec2.internal Ready worker 173m v1.19.0-rc.2+514f31a ip-10-0-150-160.ec2.internal Ready master 3h5m v1.19.0-rc.2+514f31a ip-10-0-180-51.ec2.internal Ready worker 173m v1.19.0-rc.2+514f31a ip-10-0-218-54.ec2.internal Ready worker 173m v1.19.0-rc.2+514f31a ip-10-0-223-205.ec2.internal Ready master 3h4m v1.19.0-rc.2+514f31a $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 00-worker 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 01-master-container-runtime 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 01-master-kubelet 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 01-worker-container-runtime 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 01-worker-kubelet 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 99-master-generated-registries 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 99-master-ssh 3.1.0 3h9m 99-worker-generated-registries 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m 99-worker-ssh 3.1.0 3h9m rendered-master-b0985b6c15f0f96c125e32b7a5b1a081 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m rendered-worker-27ca4280a07e7e99a33a3347cfff1f6b 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h3m rendered-worker-8daaa44d8998b182fd082a2f3dac2feb 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 123m zz-test 123m $ oc debug node/ip-10-0-134-86.ec2.internal -- chroot /host cat /var/home/core/.ssh/authorized_keys Starting pod/ip-10-0-134-86ec2internal-debug ... To use host binaries, run `chroot /host` ssh-rsa AAAAB3Nza.... Removing debug pod ... $ cat update-ssh-master.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-new-master-sshkey spec: config: ignition: version: 3.1.0 passwd: users: - name: core sshAuthorizedKeys: - | ssh-ed25519 AAAAC3.... $ oc apply -f update-ssh-master.yaml machineconfig.machineconfiguration.openshift.io/99-new-master-sshkey created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 00-worker 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 01-master-container-runtime 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 01-master-kubelet 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 01-worker-container-runtime 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 01-worker-kubelet 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 99-master-generated-registries 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 99-master-ssh 3.1.0 3h13m 99-new-master-sshkey 3.1.0 14s 99-worker-generated-registries 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m 99-worker-ssh 3.1.0 3h13m rendered-master-14f4c9b8751656e373cce056401dbf98 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 9s rendered-master-b0985b6c15f0f96c125e32b7a5b1a081 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m rendered-worker-27ca4280a07e7e99a33a3347cfff1f6b 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 3h7m rendered-worker-8daaa44d8998b182fd082a2f3dac2feb 130947243313dcfa8a4f0ef487f458f923df1128 3.1.0 128m zz-test 128m $ oc debug node/ip-10-0-134-86.ec2.internal -- chroot /host cat /var/home/core/.ssh/authorized_keys Starting pod/ip-10-0-134-86ec2internal-debug ... To use host binaries, run `chroot /host` ssh-rsa AAAAB3NzaC1y.... ssh-ed25519 AAAAC3Nza.... Removing debug pod ... ```
https://github.com/openshift/machine-config-operator/pull/2087 proposes to modify the MCO to disable reading authorized_keys.d fragments on FCOS. RHCOS and OKD behavior would then remain as in option 1 (reading and writing only authorized_keys) for the long term. The tradeoffs with this approach are described in https://github.com/openshift/machine-config-operator/pull/2087#pullrequestreview-489234956.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196