Bug 1868062

Summary: New key written to authorized_keys instead of ignition allowing user to continue to ssh into machines with old key
Product: OpenShift Container Platform Reporter: To Hung Sze <tsze>
Component: RHCOSAssignee: Benjamin Gilbert <bgilbert>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.6CC: amurdaca, bbreard, bgilbert, fbrychta, imcleod, jligon, kgarriso, miabbott, nstielau, walters, wking, yanyang, yprokule
Target Milestone: ---Keywords: DeliveryBlocker
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:27:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1871795    

Description To Hung Sze 2020-08-11 14:39:35 UTC
Description of problem:
After replacing key in masters in an OpenShift cluster, user can still ssh into the masters with old key.

Version-Release number of selected component (if applicable):
4.6

How reproducible: Always

Steps to reproduce:
Create a cluster
Generate a new key pair

$oc get machineconfig 99-master-ssh -o yaml > update-ssh-master.yaml

edit the yaml and replace the key under sshAuthorizedKeys with new key

$oc apply -f update-ssh-master.yaml

ssh into the master with NEW key - success

Actual results:
ssh into the master with OLD key - still successful


Expected results:
ssh into the master with OLD key - should fail


Additional info:
On masters, .ssh/authorized_keys.d/ignition file contains the old key.

It was pointed out that 
https://github.com/openshift/machine-config-operator/blob/e0e622950e56/pkg/daemon/update.go#L1318 shows that the new key is written to authorized_keys instead of authorized_keys.d/ignition.

Comment 1 Benjamin Gilbert 2020-08-11 14:46:05 UTC
Ignition writes SSH keys to .ssh/authorized_keys.d/ignition, not .ssh/authorized_keys.  On RHCOS, sshd is configured with that path as an additional AuthorizedKeysFile.  But the MCO updates .ssh/authorized_keys instead.  The result is that SSH keys initially provisioned via Ignition cannot be removed with the MCO.

It might make sense for MCO to manage .ssh/authorized_keys.d/ignition instead of authorized_keys, but an automatic migration would be needed.

Comment 2 Benjamin Gilbert 2020-08-11 16:49:07 UTC
I completely forgot about the behavior change in Ignition 2.x (spec 3).  The problem is limited to OpenShift 4.6, and previous versions were fine.

In Ignition 0.x (spec 2), Ignition wrote keys to .ssh/authorized_keys.d/coreos-ignition and then synced them to ~/.ssh/authorized_keys.  The operating system didn't look at .ssh/authorized_keys.d/coreos-ignition, so it was harmless for the MCO to update authorized_keys directly.

Ignition 2.x no longer updates authorized_keys.  Instead, it relies on sshd to read .ssh/authorized_keys.d/ignition directly.

We'll need to deduplicate our use of these two files.  Assuming we don't want to change the historical practice of the MCO managing authorized_keys, I think it'd be fine for the MCO to delete .ssh/authorized_keys.d/ignition after merging its contents into authorized_keys.  (Or, equivalently, after first reconciling the MachineConfig?)

RHCOS currently special-cases .ssh/authorized_keys.d/ignition and doesn't read other files in authorized_keys.d, including coreos-ignition.  FCOS has tooling which automatically reads _every_ file in authorized_keys.d.  There are no current plans to include that tooling in RHCOS, but as a precaution, it might be safer for the MCO to delete .ssh/authorized_keys.d/coreos-ignition as well.

Comment 4 Benjamin Gilbert 2020-08-13 14:47:48 UTC
Filip, I think that's probably a separate issue.  RHCOS reads SSH public keys out of both ~/.ssh/authorized_keys and ~/.ssh/authorized_keys.d/ignition.

Comment 6 Colin Walters 2020-08-17 14:12:26 UTC
Why is this a private bug and why are people creating private comments?  I don't see anything confidental here.
I think this is something in Bugzilla defaulting to private bugs.  Does anyone object to me un-marking it with Group: redhat?

I do think this is an important bug, and actually a security issue in some cases so this should be a 4.6 blocker.

One option is to change Ignition on RHCOS to write to the old location for now; it seems simpler than changing the MCO to effectively undo what Ignition is doing, but that path also seems fine.

(We have messy upgrade issues here too because we need to handle the case where a cluster was installed with < 4.6, then upgraded and where the keys are still in the main file and not the ignition file, if we choose to keep it)

Comment 7 Antonio Murdaca 2020-08-17 14:18:47 UTC
(In reply to Colin Walters from comment #6)
> Why is this a private bug and why are people creating private comments?  I
> don't see anything confidental here.
> I think this is something in Bugzilla defaulting to private bugs.  Does
> anyone object to me un-marking it with Group: redhat?
> 
> I do think this is an important bug, and actually a security issue in some
> cases so this should be a 4.6 blocker.
> 
> One option is to change Ignition on RHCOS to write to the old location for
> now; it seems simpler than changing the MCO to effectively undo what
> Ignition is doing, but that path also seems fine.

I also think rolling back ignition till we implement a better migration strategy (which I concur is feasible) would be desiderable for the MCO

> 
> (We have messy upgrade issues here too because we need to handle the case
> where a cluster was installed with < 4.6, then upgraded and where the keys
> are still in the main file and not the ignition file, if we choose to keep
> it)

Comment 8 Colin Walters 2020-08-18 16:44:39 UTC
I'm unilaterally lifting the private flag.

Comment 9 Colin Walters 2020-08-18 17:41:31 UTC
Internal discussion resulted in this:

Option 1: Revert just RHCOS:
===

Revert back to writing `authorized_keys` for RHCOS for now.

+ Fixes the immediate problem
- Doesn't help OKD/FCOS

Option 2: MCO merge:
===

Patch the MCO to work with both cases and to handle upgrades:

MCO deletes `.ssh/authorized_keys.d/ignition` after merging its contents into `authorized_keys`.

+ Better fix
- More work and lots going on in MCO

Also MCO writes /etc/ssh/sshd_config.d/01-disable-authorized_keys.d.conf to disable Afterburn SSH keys for OKD.  RHCOS will ignore it.


Option 3: "Belt and Suspenders" (both option 1 and option 2)
===

RHCOS wouldn’t have to change, but in OKD+FCOS we could test the changes out.
Basically do both MCO work _and_ have Ignition build in RHCOS with the flag to disable authorized_keys.d.

+ Lets us try out the change in OKD first
- More work (but not much)

Option 4: Revert writing the `/ignition file` in FCOS *and* RHCOS
===

Luca points out this is documented in the spec: https://github.com/coreos/ignition/blob/58f67d61975c799cc9322aa07b044099b52d5837/doc/configuration-v3_0.md

(I hope no one really argues this)

Comment 10 Benjamin Gilbert 2020-08-20 02:07:01 UTC
Colin, re comment 6, I recommended to the original reporter that they file a private bug until the security consequences could be evaluated.  I hadn't realized at that time that the problem only affected 4.6.

Comment 11 Benjamin Gilbert 2020-08-20 19:11:48 UTC
We've decided on option 1 for now, with the intention to revisit after 4.6, and without precluding the issue from being addressed in OKD as time permits.  Until that happens, we should document for OKD that install-time SSH keys cannot be removed via a MachineConfig.

Comment 14 Benjamin Gilbert 2020-08-25 11:45:16 UTC
This needs a bootimage update.  Otherwise Ignition will write to ~/.ssh/authorized_keys.d/ignition, and then we'll immediately update into a new RHCOS release that won't read that file.

Comment 16 Micah Abbott 2020-09-08 17:35:49 UTC
Verified with 4.6.0-fc.4 

```
$ oc get clusterversion                                                                       
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS                                                                               
version   4.6.0-fc.4   True        False         163m    Cluster version is 4.6.0-fc.4                                                        

$ oc get nodes                                                                                
NAME                           STATUS   ROLES    AGE    VERSION                                                                               
ip-10-0-134-86.ec2.internal    Ready    master   3h3m   v1.19.0-rc.2+514f31a                                                                  
ip-10-0-148-100.ec2.internal   Ready    worker   173m   v1.19.0-rc.2+514f31a                                                                  
ip-10-0-150-160.ec2.internal   Ready    master   3h5m   v1.19.0-rc.2+514f31a                                                                  
ip-10-0-180-51.ec2.internal    Ready    worker   173m   v1.19.0-rc.2+514f31a                                                                                                                                                                                                                
ip-10-0-218-54.ec2.internal    Ready    worker   173m   v1.19.0-rc.2+514f31a                                                                                                                                                                                                                
ip-10-0-223-205.ec2.internal   Ready    master   3h4m   v1.19.0-rc.2+514f31a                                                                                                                                                                                                                

$ oc get mc                                                                                   
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE                           
00-master                                          130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                          
00-worker                                          130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                          
01-master-container-runtime                        130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        
01-master-kubelet                                  130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        
01-worker-container-runtime                        130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        
01-worker-kubelet                                  130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        
99-master-generated-registries                     130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        
99-master-ssh                                                                                 3.1.0             3h9m
99-worker-generated-registries                     130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m
99-worker-ssh                                                                                 3.1.0             3h9m                                                                                                                                                                        rendered-master-b0985b6c15f0f96c125e32b7a5b1a081   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        rendered-worker-27ca4280a07e7e99a33a3347cfff1f6b   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h3m                                                                                                                                                                        
rendered-worker-8daaa44d8998b182fd082a2f3dac2feb   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             123m
zz-test                                                                                                         123m


$ oc debug node/ip-10-0-134-86.ec2.internal -- chroot /host cat /var/home/core/.ssh/authorized_keys
Starting pod/ip-10-0-134-86ec2internal-debug ...
To use host binaries, run `chroot /host`                                                                                                      
ssh-rsa AAAAB3Nza....
                                                                                                                                                                                                                                                                                            
Removing debug pod ...                          


$ cat update-ssh-master.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-new-master-sshkey
spec:
  config:
    ignition:
      version: 3.1.0
    passwd:
      users:
      - name: core
        sshAuthorizedKeys:
        - |
          ssh-ed25519 AAAAC3....

$ oc apply -f update-ssh-master.yaml 
machineconfig.machineconfiguration.openshift.io/99-new-master-sshkey created

$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
00-worker                                          130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
01-master-container-runtime                        130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
01-master-kubelet                                  130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
01-worker-container-runtime                        130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
01-worker-kubelet                                  130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
99-master-generated-registries                     130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
99-master-ssh                                                                                 3.1.0             3h13m
99-new-master-sshkey                                                                          3.1.0             14s
99-worker-generated-registries                     130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
99-worker-ssh                                                                                 3.1.0             3h13m
rendered-master-14f4c9b8751656e373cce056401dbf98   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             9s
rendered-master-b0985b6c15f0f96c125e32b7a5b1a081   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
rendered-worker-27ca4280a07e7e99a33a3347cfff1f6b   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             3h7m
rendered-worker-8daaa44d8998b182fd082a2f3dac2feb   130947243313dcfa8a4f0ef487f458f923df1128   3.1.0             128m
zz-test                                                                                                         128m

$ oc debug node/ip-10-0-134-86.ec2.internal -- chroot /host cat /var/home/core/.ssh/authorized_keys
Starting pod/ip-10-0-134-86ec2internal-debug ...
To use host binaries, run `chroot /host`
ssh-rsa AAAAB3NzaC1y....

ssh-ed25519 AAAAC3Nza....


Removing debug pod ...
```

Comment 17 Benjamin Gilbert 2020-09-24 22:15:45 UTC
https://github.com/openshift/machine-config-operator/pull/2087 proposes to modify the MCO to disable reading authorized_keys.d fragments on FCOS.  RHCOS and OKD behavior would then remain as in option 1 (reading and writing only authorized_keys) for the long term.  The tradeoffs with this approach are described in https://github.com/openshift/machine-config-operator/pull/2087#pullrequestreview-489234956.

Comment 19 errata-xmlrpc 2020-10-27 16:27:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196