Description of problem: Old fernet tokens are retained on the controller after mistral reaches to max_active_keys=5 (default). Steps to Reproduce: 1. Deploy a undercloud and deploy the overcloud leaving the default setting ManageKeystoneFernetKeys: true 2. Rotate fernet tokens until max keys are reached. At this time the original 1,2 tokens are untracked by mistral. (undercloud) [stack@undercloud-0 ~]$ openstack workflow execution create tripleo.fernet_keys.v1.rotate_fernet_keys '{"container": "overcloud"}' (undercloud) [stack@undercloud-0 ~]$ mistral task-get-result d389ced4-4b55-4288-9f76-7a510fe30ac4 { "/etc/keystone/fernet-keys/0": { "content": "cVbVzTVtQLDzGCcfYR-58BhsqtbIy75g1GRZm969Uh8=" }, "/etc/keystone/fernet-keys/3": { "content": "zwkkJG5lr0e46RnXOqOz2xf7MFgrXARmhTsgWBlIZmI=" }, "/etc/keystone/fernet-keys/4": { "content": "1rriYXTcXcw_XnPEY8QlHjAib1JQxUoilYhHIBqv8qg=" }, "/etc/keystone/fernet-keys/5": { "content": "pKozY6zC1uYUormSP3RwqIknPD7R22crBjfmj0M_jvY=" }, "/etc/keystone/fernet-keys/6": { "content": "yH21T5dKCd08g1ZF_EcnElmMXPD3rfKkTzJoV1rj4Dc=" } } Actual results: The old untracked fernet tokens are retained on the controller and not cleaned up. [root@controller-0 fernet-keys]# ls -la /var/lib/config-data/puppet-generated/keystone/etc/keystone/fernet-keys/ total 28 -rw-------. 1 42425 42425 44 Oct 15 16:09 0 -rw-------. 1 42425 42425 44 Oct 13 00:30 1 -rw-------. 1 42425 42425 44 Oct 13 02:18 2 -rw-------. 1 42425 42425 44 Oct 15 16:09 3 -rw-------. 1 42425 42425 44 Oct 15 16:09 4 -rw-------. 1 42425 42425 44 Oct 15 16:09 5 -rw-------. 1 42425 42425 44 Oct 15 16:09 6 Expected results: The overcloud controller will only have the current fernet tokens managed by mistral from the undercloud.
I believe Fernet bugs are best handled by the Security DFG
(In reply to Jeremy Agee from comment #0) > > [root@controller-0 fernet-keys]# ls -la > /var/lib/config-data/puppet-generated/keystone/etc/keystone/fernet-keys/ > total 28 > -rw-------. 1 42425 42425 44 Oct 15 16:09 0 > -rw-------. 1 42425 42425 44 Oct 13 00:30 1 > -rw-------. 1 42425 42425 44 Oct 13 02:18 2 > -rw-------. 1 42425 42425 44 Oct 15 16:09 3 > -rw-------. 1 42425 42425 44 Oct 15 16:09 4 > -rw-------. 1 42425 42425 44 Oct 15 16:09 5 > -rw-------. 1 42425 42425 44 Oct 15 16:09 6 There is a clue here with the timestamps. On my OSP13 test deployment (where fernet rotation works fine), the timestamp for all of the keys is exactly the same. The reason for this is that every time a rotation workflow is executed, all of the old keys are deleted on the controller before all of the current keys are copied down. You can see this in the ansible playbook used for fernet rotation here: https://github.com/openstack/tripleo-common/blob/f7f2d33170a8d152bcde87211ff438fe4a16cad8/playbooks/rotate-keys.yaml#L17-L28 In your example above, keys 1 and 2 have a different date/time on them. As you can see in the playbook, there should be a 'rm-rf' on the entire directory where fernet keys are kept during workflow execution. It would be useful to see the output from the playbook. You can get this by running a rotation task, waiting to let it complete, then running the following: openstack workflow execution output show <ID>
Additionally, it might be useful to increase the verbosity of the output that we get when executing the ansible playbook. This can be done by updating the workbook definition used for fernet rotation. This is done on the undercloud like so: --------------------------------------------------------------------------- $ openstack workbook definition show tripleo.fernet_keys.v1 > /tmp/fernet-workbook $ vi /tmp/fernet-workbook (change "verbosity" value for "deploy_keys" task to "5") $ openstack workbook update /tmp/fernet-workbook --------------------------------------------------------------------------- After the workbook is updated, run another fernet rotation workflow execution and wait for it to complete. When it has finished, you can get the verbose output from the ansibvle playbook execution by running the following command: openstack workflow execution show <ID> When troubleshooting is complete, you will want to reset the verbosity to the default of 0.
I have not got to the bottom of this just yet, but I have a few more clues about is going on. There are 2 areas used for fernet keys on the controller nodes: Initial keys - /var/lib/config-data/keystone/etc/keystone/fernet-keys Key repo - /var/lib/config-data/puppet-generated/etc/keystone/fernet-keys If you look in the "initial keys" area, it will only have keys 0 and 1. These are the initial keys that are generated when you first deploy the overcloud. They will not change as a part of the fernet rotation workflow. The key rotation workflow works in the "key repo" area. If you trigger rotation enough to go past the configured number of max keys, you will find that this area works as you expect (you will have key 0, and max-1 other keys). For example, my test environment (with the default max of 5 keys) after a few rotations has these keys: 0, 7, 8, 9, 10. If you look inside of the keystone container on a controller, it seems to do some sort of merging of these two areas from what I can see. On my test deployment, my container has these keys: 0, 1, 6, 7, 8, 9, 10 All of the keys numbers that are present in the "key repo" have matching date/timestamps with their counterparts that are in "/etc/keystone/fernet-keys" in the keystone container. Key 1 has a date/timestamp that matches the key from the "initial keys" area. This leads me to believe that we first take the "initial keys", then overlay the "key repo" keys on top (which overwrites key 0 with the newer copy since it exists in both areas). The real mystery for me is why key 6 still exists within the container. This could be a fluke from my environment, or it could be evidence of a bug. More investigation is needed.
I think there was a red herring in my previous update caused by bug 1639495. I think we can ignore the face that keys 2-5 were cleaned up on my system, as the above mentioned bug prevented those keys from ever being copied into the "key repo". The key rotation workflow is doing its job correctly. From what I can tell, the keystone container itself is responsible for copying the keys from the key repo into its active configuration area at container startup time. This is related to how the keystone container is built, which seems to be handled by kolla. If you look in the keystone container config on the overcloud controller node (in /var/lib/tripleo-config/docker-container-startup-config-step_3.json), you can see that the area where the "key repo" lives is not actually bind mounted into the active configuration area of the keystone container: "/var/lib/config-data/puppet-generated/keystone/:/var/lib/kolla/config_files/src:ro" When the keystone container starts, the entrypoint will end up running various tasks before the service is actually started. The handling of configuration appears to be controlled by /var/lib/kolla/config_files/config.json in the container itself. If you look at this file in the keystone container, you will see this snippet: "config_files": [ { "dest": "/", "merge": true, "preserve_properties": true, "source": "/var/lib/kolla/config_files/src/*" } ] This shows that config files will be copied from /var/lib/kolla/config_files/src on the container into the root area (the directory structure in the "src" area contains everything needed to create an absolute path of the destination). I think that the problem here is that "merge" is true. This setting likely makes sense for the rest of keystone's config files, but it may be what is causing old fernet keys to be left on the container.
I have found a work-around for this issue, which we can use as the basis for a fix. My theory in comment#5 was correct. To solve it, we can edit the kolla config for keystone to first copy the fernet-keys directory with merge set to false, then we can merge any other config as usual. This can be done manually by editing /var/lib/kolla/config_files/keystone.json on your overcloud controller nodes to add another item to the "config_files" list. The resulting file that I used on my Rocky installation looks like this: { "command": "/usr/sbin/httpd -DFOREGROUND", "config_files": [ { "dest": "/etc/keystone/fernet-keys", "merge": false, "preserve_properties": true, "source": "/var/lib/kolla/config_files/src/etc/keystone/fernet-keys" }, { "dest": "/", "merge": true, "preserve_properties": true, "source": "/var/lib/kolla/config_files/src/*" } ] } The result will be that the fernet keys in the container match our key repo area, which is what we want. The next step is looking to see who is responsible for the creation and deployment of /var/lib/kolla/config_files/keystone.json on the overcloud nodes so we can modify the template.
A patch for this has been submitted for upstream for tripleo-heat-templates: https://review.openstack.org/618604
An upstream backport patch for this has been submitted for stable/rocky: https://review.openstack.org/626753/
Upstream changes have merged.
Downstream build created. Moving bug to MODIFIED.
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0446