Description of problem: Some of the openstack containers are needlessly restart during overcloud redeploy/update with no configuration change. I am attaching diff between two files, file named "before" contains a list of all running containers with start time before overcloud redeploy and the same in file named "after" which is list once overcloud was redeployed/updated. $ diff before after 7,10c7,10 < ceilometer_agent_notification: 2018-08-06 10:42:45 +0000 UTC < cinder_api: 2018-08-06 10:42:43 +0000 UTC < cinder_api_cron: 2018-08-06 10:42:49 +0000 UTC < cinder_scheduler: 2018-08-06 10:42:47 +0000 UTC --- > ceilometer_agent_notification: 2018-08-06 15:24:57 +0000 UTC > cinder_api: 2018-08-06 15:24:56 +0000 UTC > cinder_api_cron: 2018-08-06 15:25:00 +0000 UTC > cinder_scheduler: 2018-08-06 15:24:59 +0000 UTC 18,21c18,21 < heat_api: 2018-08-06 10:42:43 +0000 UTC < heat_api_cfn: 2018-08-06 10:42:50 +0000 UTC < heat_api_cron: 2018-08-06 10:42:50 +0000 UTC < heat_engine: 2018-08-06 10:42:47 +0000 UTC --- > heat_api: 2018-08-06 15:24:56 +0000 UTC > heat_api_cfn: 2018-08-06 15:25:02 +0000 UTC > heat_api_cron: 2018-08-06 15:25:01 +0000 UTC > heat_engine: 2018-08-06 15:24:58 +0000 UTC 24,26c24,26 < keystone: 2018-08-06 10:40:02 +0000 UTC < keystone_cron: 2018-08-06 10:40:03 +0000 UTC < logrotate_crond: 2018-08-06 10:42:51 +0000 UTC --- > keystone: 2018-08-06 15:21:44 +0000 UTC > keystone_cron: 2018-08-06 15:21:45 +0000 UTC > logrotate_crond: 2018-08-06 15:25:03 +0000 UTC 33,37c33,37 < nova_api: 2018-08-06 10:42:53 +0000 UTC < nova_api_cron: 2018-08-06 10:42:44 +0000 UTC < nova_conductor: 2018-08-06 10:42:48 +0000 UTC < nova_consoleauth: 2018-08-06 10:42:44 +0000 UTC < nova_metadata: 2018-08-06 10:42:53 +0000 UTC --- > nova_api: 2018-08-06 15:25:04 +0000 UTC > nova_api_cron: 2018-08-06 15:24:56 +0000 UTC > nova_conductor: 2018-08-06 15:25:00 +0000 UTC > nova_consoleauth: 2018-08-06 15:24:57 +0000 UTC > nova_metadata: 2018-08-06 15:25:05 +0000 UTC 39,41c39,41 < nova_scheduler: 2018-08-06 10:42:43 +0000 UTC < nova_vnc_proxy: 2018-08-06 10:42:46 +0000 UTC < panko_api: 2018-08-06 10:42:54 +0000 UTC --- > nova_scheduler: 2018-08-06 15:24:55 +0000 UTC > nova_vnc_proxy: 2018-08-06 15:24:57 +0000 UTC > panko_api: 2018-08-06 15:25:06 +0000 UTC 65,68c65,68 < ceilometer_agent_notification: 2018-08-06 10:42:50 +0000 UTC < cinder_api: 2018-08-06 10:42:49 +0000 UTC < cinder_api_cron: 2018-08-06 10:42:55 +0000 UTC < cinder_scheduler: 2018-08-06 10:42:53 +0000 UTC --- > ceilometer_agent_notification: 2018-08-06 15:24:55 +0000 UTC > cinder_api: 2018-08-06 15:24:53 +0000 UTC > cinder_api_cron: 2018-08-06 15:24:59 +0000 UTC > cinder_scheduler: 2018-08-06 15:24:57 +0000 UTC 76,79c76,79 < heat_api: 2018-08-06 10:42:49 +0000 UTC < heat_api_cfn: 2018-08-06 10:42:56 +0000 UTC < heat_api_cron: 2018-08-06 10:42:55 +0000 UTC < heat_engine: 2018-08-06 10:42:52 +0000 UTC --- > heat_api: 2018-08-06 15:24:54 +0000 UTC > heat_api_cfn: 2018-08-06 15:25:01 +0000 UTC > heat_api_cron: 2018-08-06 15:25:00 +0000 UTC > heat_engine: 2018-08-06 15:24:57 +0000 UTC 82,84c82,84 < keystone: 2018-08-06 10:40:09 +0000 UTC < keystone_cron: 2018-08-06 10:40:10 +0000 UTC < logrotate_crond: 2018-08-06 10:42:57 +0000 UTC --- > keystone: 2018-08-06 15:21:41 +0000 UTC > keystone_cron: 2018-08-06 15:21:42 +0000 UTC > logrotate_crond: 2018-08-06 15:25:01 +0000 UTC 91,95c91,95 < nova_api: 2018-08-06 10:42:59 +0000 UTC < nova_api_cron: 2018-08-06 10:42:49 +0000 UTC < nova_conductor: 2018-08-06 10:42:54 +0000 UTC < nova_consoleauth: 2018-08-06 10:42:50 +0000 UTC < nova_metadata: 2018-08-06 10:42:59 +0000 UTC --- > nova_api: 2018-08-06 15:25:02 +0000 UTC > nova_api_cron: 2018-08-06 15:24:54 +0000 UTC > nova_conductor: 2018-08-06 15:24:58 +0000 UTC > nova_consoleauth: 2018-08-06 15:24:55 +0000 UTC > nova_metadata: 2018-08-06 15:25:03 +0000 UTC 97,99c97,99 < nova_scheduler: 2018-08-06 10:42:48 +0000 UTC < nova_vnc_proxy: 2018-08-06 10:42:51 +0000 UTC < panko_api: 2018-08-06 10:43:00 +0000 UTC --- > nova_scheduler: 2018-08-06 15:24:53 +0000 UTC > nova_vnc_proxy: 2018-08-06 15:24:56 +0000 UTC > panko_api: 2018-08-06 15:25:04 +0000 UTC 123,126c123,126 < ceilometer_agent_notification: 2018-08-06 10:43:10 +0000 UTC < cinder_api: 2018-08-06 10:43:09 +0000 UTC < cinder_api_cron: 2018-08-06 10:43:14 +0000 UTC < cinder_scheduler: 2018-08-06 10:43:12 +0000 UTC --- > ceilometer_agent_notification: 2018-08-06 15:24:56 +0000 UTC > cinder_api: 2018-08-06 15:24:54 +0000 UTC > cinder_api_cron: 2018-08-06 15:25:01 +0000 UTC > cinder_scheduler: 2018-08-06 15:24:58 +0000 UTC 134,137c134,137 < heat_api: 2018-08-06 10:43:09 +0000 UTC < heat_api_cfn: 2018-08-06 10:43:16 +0000 UTC < heat_api_cron: 2018-08-06 10:43:15 +0000 UTC < heat_engine: 2018-08-06 10:43:11 +0000 UTC --- > heat_api: 2018-08-06 15:24:54 +0000 UTC > heat_api_cfn: 2018-08-06 15:25:03 +0000 UTC > heat_api_cron: 2018-08-06 15:25:02 +0000 UTC > heat_engine: 2018-08-06 15:24:57 +0000 UTC 140,142c140,142 < keystone: 2018-08-06 10:40:11 +0000 UTC < keystone_cron: 2018-08-06 10:40:15 +0000 UTC < logrotate_crond: 2018-08-06 10:43:17 +0000 UTC --- > keystone: 2018-08-06 15:21:58 +0000 UTC > keystone_cron: 2018-08-06 15:22:02 +0000 UTC > logrotate_crond: 2018-08-06 15:25:04 +0000 UTC 149,153c149,153 < nova_api: 2018-08-06 10:43:18 +0000 UTC < nova_api_cron: 2018-08-06 10:43:09 +0000 UTC < nova_conductor: 2018-08-06 10:43:13 +0000 UTC < nova_consoleauth: 2018-08-06 10:43:10 +0000 UTC < nova_metadata: 2018-08-06 10:43:19 +0000 UTC --- > nova_api: 2018-08-06 15:25:05 +0000 UTC > nova_api_cron: 2018-08-06 15:24:55 +0000 UTC > nova_conductor: 2018-08-06 15:25:00 +0000 UTC > nova_consoleauth: 2018-08-06 15:24:55 +0000 UTC > nova_metadata: 2018-08-06 15:25:06 +0000 UTC 155,158c155,158 < nova_scheduler: 2018-08-06 10:43:08 +0000 UTC < nova_vnc_proxy: 2018-08-06 10:43:11 +0000 UTC < openstack-cinder-volume-docker-0: 2018-08-06 10:44:50 +0000 UTC < panko_api: 2018-08-06 10:43:19 +0000 UTC --- > nova_scheduler: 2018-08-06 15:24:54 +0000 UTC > nova_vnc_proxy: 2018-08-06 15:24:57 +0000 UTC > openstack-cinder-volume-docker-0: 2018-08-06 15:26:33 +0000 UTC > panko_api: 2018-08-06 15:25:06 +0000 UTC 179c179 < logrotate_crond: 2018-08-06 10:42:55 +0000 UTC --- > logrotate_crond: 2018-08-06 15:24:28 +0000 UTC 181,184c181,184 < nova_compute: 2018-08-06 10:42:55 +0000 UTC < nova_libvirt: 2018-08-06 10:39:57 +0000 UTC < nova_migration_target: 2018-08-06 10:42:55 +0000 UTC < nova_virtlogd: 2018-08-06 10:39:57 +0000 UTC --- > nova_compute: 2018-08-06 15:24:28 +0000 UTC > nova_libvirt: 2018-08-06 15:21:31 +0000 UTC > nova_migration_target: 2018-08-06 15:24:28 +0000 UTC > nova_virtlogd: 2018-08-06 15:21:30 +0000 UTC 188c188 < logrotate_crond: 2018-08-06 10:42:49 +0000 UTC --- > logrotate_crond: 2018-08-06 15:24:20 +0000 UTC 190,193c190,193 < nova_compute: 2018-08-06 10:42:48 +0000 UTC < nova_libvirt: 2018-08-06 10:39:52 +0000 UTC < nova_migration_target: 2018-08-06 10:42:48 +0000 UTC < nova_virtlogd: 2018-08-06 10:39:52 +0000 UTC --- > nova_compute: 2018-08-06 15:24:20 +0000 UTC > nova_libvirt: 2018-08-06 15:21:22 +0000 UTC > nova_migration_target: 2018-08-06 15:24:20 +0000 UTC > nova_virtlogd: 2018-08-06 15:21:22 +0000 UTC Version-Release number of selected component (if applicable): OSP13 - rhosp-director-images-13.0-20180803.1.el7ost.noarch How reproducible: always Steps to Reproduce: 1. Deploy overcloud 2. Redeploy the same overcloud with no configuration change Actual results: Some Openstack containers are restarted needlessly - there is no configuration change. Expected results: No containers restarted because no configuration change was done Additional info:
Containers restart when they become unhealthy, docker will restart them. Could you provide some logs so we can see if containers are unhealthy before their restarts. So we can figure out if it's something to do with the applications themselves.
I have observed this behavior as well (after Marian mentioned it to me). So the reproducer used is the following (run after a successful deployment): #!/bin/bash tripleo-ansible-inventory --static-yaml-inventory inv.yaml ansible -f1 -i inv.yaml -m shell --become -a "docker ps --format=\"{{ '{{' }}.Names{{ '}}' }}: {{ '{{' }}.CreatedAt{{ '}}' }}\" | sort" overcloud > before ./overcloud_deploy.sh ansible -f1 -i inv.yaml -m shell --become -a "docker ps --format=\"{{ '{{' }}.Names{{ '}}' }}: {{ '{{' }}.CreatedAt{{ '}}' }}\" | sort" overcloud > after So I took one of the containers that is affected (keystone) and run docker inspect before and after: docker inspect keystone > keystone-before.txt ...redeploy with no changes... docker inspect keystone > keystone-after.txt diffing the two text files brings up at least one point that needs more analysis: - "TRIPLEO_CONFIG_HASH=6802e3cfc6fd2a7257269bd40acfd34c", + "TRIPLEO_CONFIG_HASH=e1378deafc578eb34b5ce93b167a360f", (there are a bunch of bind mounts changes, but those seem just to be due to the non guaranteed ordering).
Ok so one of the reasons that *some* containers restart is that the cron file config changes at every run. The reason for this is the following snippet: [root@controller-0 usr]# cat /var/lib/config-data/puppet-generated/keystone/var/spool/cron/keystone # HEADER: This file was autogenerated at 2018-08-07 11:44:57 +0000 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: keystone-manage token_flush PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh 1 * * * * keystone-manage token_flush >>/var/log/keystone/keystone-tokenflush.log 2>&1 The #{Time.Now} call in the HEADER lines triggers and md5 change in the cron file, which is part of the keystone volumes to calculate the md5 hash and so we restart the container. From a quick look this seems hardcoded into puppet (which we ship inside the container) and we will prolly need to find a way to exclude '^#.*HEADER.*' lines from md5 calculation (or find another way). Looking at puppet code also the 'parsedfile' provider has this behaviour, so if we use that we will be affected there as well.
So we really have three separate issues here. Below is the list of all the containers that may restart needlessly (at least what I have observed in my tests): A) cron category: ceilometer_agent_notification cinder_api cinder_api_cron cinder_scheduler heat_api heat_api_cfn heat_api_cron heat_engine keystone keystone_cron logrotate_crond nova_api nova_api_cron nova_conductor nova_consoleauth nova_metadata nova_scheduler nova_vnc_proxy openstack-cinder-volume-docker-0 panko_api These end up being restarted because in the config volume for the container there is a cron file and cron files are generated with a timestamp inside: $ cat /var/lib/config-data/puppet-generated/keystone/var/spool/cron/keystone # HEADER: This file was autogenerated at 2018-08-07 11:44:57 +0000 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: keystone-manage token_flush PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh 1 * * * * keystone-manage token_flush >>/var/log/keystone/keystone-tokenflush.log 2>&1 The timestamp is unfortunately hard coded into puppet in both the cron provider and the parsedfile provider: https://github.com/puppetlabs/puppet/blob/master/lib/puppet/provider/cron/crontab.rb#L127 https://github.com/puppetlabs/puppet/blob/master/lib/puppet/provider/parsedfile.rb#L104 One possible fix would be to change the tar command in docker-puppet.py to be something like this: tar -c -f - /var/lib/config-data/${NAME} --mtime='1970-01-01' | tar xO | sed '/#.*HEADER.*/d' Note that the 'tar xO' is needed in order to force text output and not use the native tar format (if you don't the sed won't work). B) swift category: swift_account_auditor swift_account_reaper swift_account_replicator swift_account_server swift_container_auditor swift_container_replicator swift_container_server swift_container_updater swift_object_auditor swift_object_expirer swift_object_replicator swift_object_server swift_object_updater swift_proxy swift_rsync So the swift containers restart because when recalculating the md5 over the /var/lib/config-data/puppet-generated/swift folder we also include: B.1) /etc/swift/backups/... which is a folder which over time collects backup of the ringfiles B.2) /etc/swift/*.gz it seems that the *.gz files seem to change over time I see two potential fixes here: B.3) We simply exclude rings and their backups from the tar commands: EXCL="--exclude='*/etc/swift/backups/*' --exclude='*/etc/swift/*.gz'" tar -c -f - /var/lib/config-data/${NAME} --mtime='1970-01-01' $EXCL | tar xO | sed '/^#.*HEADER.*/d' B.4) We stop storing the rings under /etc/swift and we store them in some other place ? I lack the necessary swift knowledge to add anything too useful here. C) libvirt category: nova_compute nova_libvirt nova_migration_target nova_virtlogd This one seems to be due to the fact that the /etc/libvirt/passwd.db file contains a timestamp or somehow is just different after a redeploy: [root@compute-1 nova_libvirt]# git diff cb2441bb1caf7572ccfd870561dcc29d7819ba04..0c7441f30926b111603ce4d4b60c6000fe49d290 . diff --git a/puppet-generated/nova_libvirt/etc/libvirt/passwd.db b/puppet-generated/nova_libvirt/etc/libvirt/passwd.db index 66e2024..6ffa492 100644 Binary files a/puppet-generated/nova_libvirt/etc/libvirt/passwd.db and b/puppet-generated/nova_libvirt/etc/libvirt/passwd.db differ (I have not exactly understood why yet because in my quick test without tls a simple 'saslpasswd2 -d -a libvirt -u overcloud migration' did not change the md5 for me, but maybe I am missing something) Excluding this file from the tar command is probably a bad idea because if you do indeed change the password of the migration user then you probably do want the libvirtd container to restart (actually you likely want only that one to restart, as opposed to all nova* containers on the compute which is what happens now) Just to make sure that the above theories are all correct I used the following tar commands: EXCL="--exclude='*/etc/swift/backups/*' --exclude='*/etc/swift/*.gz' --exclude '*/etc/libvirt/passwd.db'" tar -c -f - /var/lib/config-data/${NAME} --mtime='1970-01-01' $EXCL | tar xO | sed '/^#.*HEADER.*/d' | md5sum | awk '{print $1}' > /var/lib/config-data/${NAME}.md5sum tar -c -f - /var/lib/config-data/puppet-generated/${NAME} --mtime='1970-01-01' $EXCL | tar xO | sed '/^#.*HEADER.*/d' | md5sum | awk '{print $1}' > /var/lib/config-data/puppet-generated/${NAME}.md5sum And I observed no spurious restarts.
(In reply to Michele Baldessari from comment #4) > So the swift containers restart because when recalculating the md5 over the > /var/lib/config-data/puppet-generated/swift folder we also include: > B.1) /etc/swift/backups/... which is a folder which over time collects > backup of the ringfiles > B.2) /etc/swift/*.gz it seems that the *.gz files seem to change over time > > I see two potential fixes here: > B.3) We simply exclude rings and their backups from the tar commands: > EXCL="--exclude='*/etc/swift/backups/*' --exclude='*/etc/swift/*.gz'" > tar -c -f - /var/lib/config-data/${NAME} --mtime='1970-01-01' $EXCL | tar xO > | sed '/^#.*HEADER.*/d' > > B.4) We stop storing the rings under /etc/swift and we store them in some > other place ? Spoke to Giulio and the rings (either the backup ones or the ones in /etc/swift/*.ring.gz) do not need to trigger any container restart when they change, so option B.3 should be viable.
(In reply to Michele Baldessari from comment #4) > > C) libvirt category: > nova_compute > nova_libvirt > nova_migration_target > nova_virtlogd > > This one seems to be due to the fact that the /etc/libvirt/passwd.db file > contains a timestamp or somehow > is just different after a redeploy: > [root@compute-1 nova_libvirt]# git diff > cb2441bb1caf7572ccfd870561dcc29d7819ba04.. > 0c7441f30926b111603ce4d4b60c6000fe49d290 . > diff --git a/puppet-generated/nova_libvirt/etc/libvirt/passwd.db > b/puppet-generated/nova_libvirt/etc/libvirt/passwd.db > index 66e2024..6ffa492 100644 > Binary files a/puppet-generated/nova_libvirt/etc/libvirt/passwd.db and > b/puppet-generated/nova_libvirt/etc/libvirt/passwd.db differ > > (I have not exactly understood why yet because in my quick test without tls > a simple 'saslpasswd2 -d -a libvirt -u overcloud migration' did not change > the md5 for me, but maybe I am missing something) > checking md5sum and stat from before and after an overcloud deploy * before overcloud deploy [root@compute-0 libvirt]# cat 1 -rw-r-----. 1 root root 12288 Aug 8 16:22 passwd.db 71ede9ac578c3e22efe1d0d2d68cb1ff passwd.db File: ‘passwd.db’ Size: 12288 Blocks: 24 IO Block: 4096 regular file Device: fc02h/64514d Inode: 14733371 Links: 1 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:virt_etc_t:s0 Access: 2018-08-09 06:17:59.725717427 +0000 Modify: 2018-08-08 16:22:49.161938000 +0000 Change: 2018-08-08 16:29:45.598883277 +0000 Birth: - * after overcloud deploy [root@compute-0 libvirt]# cat 2 -rw-r-----. 1 root root 12288 Aug 9 06:41 passwd.db 80b03ead0f5b13ea05ab6462f2ef7762 passwd.db File: ‘passwd.db’ Size: 12288 Blocks: 24 IO Block: 4096 regular file Device: fc02h/64514d Inode: 14758991 Links: 1 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:virt_etc_t:s0 Access: 2018-08-09 06:52:22.338206274 +0000 Modify: 2018-08-09 06:41:50.306298000 +0000 Change: 2018-08-09 06:48:46.525213436 +0000 Birth: - 93 if !empty($tls_password) { 94 $libvirt_sasl_command = "echo \"\${TLS_PASSWORD}\" | saslpasswd2 -p -a libvirt -u overcloud migration" 95 $libvirt_auth_ensure = present 96 $libvirt_auth_conf = " 97 [credentials-overcloud] 98 authname=migration@overcloud 99 password=${tls_password} 100 101 [auth-libvirt-default] 102 credentials=overcloud 103 " 104 } 105 else { 106 $libvirt_sasl_command = 'saslpasswd2 -d -a libvirt -u overcloud migration' 107 $libvirt_auth_ensure = absent 108 $libvirt_auth_conf = '' 109 } 110 111 exec{ 'set libvirt sasl credentials': 112 environment => ["TLS_PASSWORD=${tls_password}"], 113 command => $libvirt_sasl_command, 114 path => ['/sbin', '/usr/sbin', '/bin', '/usr/bin'], 115 require => File['/etc/sasl2/libvirt.conf'], 116 tag => ['libvirt_tls_password'] 117 } When running the same $libvirt_sasl_command manually with the same pwd, the md5sum/inode does not change. [root@controller-0 libvirt]# md5sum passwd.db && stat passwd.db 9424d45a095a94d5c1d5a6d1510a4d53 passwd.db File: ‘passwd.db’ Size: 12288 Blocks: 16 IO Block: 4096 regular file Device: fc02h/64514d Inode: 14876433 Links: 1 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:virt_etc_t:s0 Access: 2018-08-09 07:09:10.992618579 +0000 Modify: 2018-08-09 07:09:10.991618574 +0000 Change: 2018-08-09 07:09:10.991618574 +0000 Birth: - [root@controller-0 libvirt]# echo "1234567" | saslpasswd2 -p -a libvirt -u overcloud migration [root@controller-0 libvirt]# md5sum passwd.db && stat passwd.db 9424d45a095a94d5c1d5a6d1510a4d53 passwd.db File: ‘passwd.db’ Size: 12288 Blocks: 16 IO Block: 4096 regular file Device: fc02h/64514d Inode: 14876433 Links: 1 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:virt_etc_t:s0 Access: 2018-08-09 09:25:10.490826254 +0000 Modify: 2018-08-09 09:25:10.489826249 +0000 Change: 2018-08-09 09:25:10.489826249 +0000 Birth: -
(In reply to Martin Schuppert from comment #6) > (In reply to Michele Baldessari from comment #4) > > > > C) libvirt category: > > nova_compute > > nova_libvirt > > nova_migration_target > > nova_virtlogd > > > > This one seems to be due to the fact that the /etc/libvirt/passwd.db file > > contains a timestamp or somehow > > is just different after a redeploy: > > [root@compute-1 nova_libvirt]# git diff > > cb2441bb1caf7572ccfd870561dcc29d7819ba04.. > > 0c7441f30926b111603ce4d4b60c6000fe49d290 . > > diff --git a/puppet-generated/nova_libvirt/etc/libvirt/passwd.db > > b/puppet-generated/nova_libvirt/etc/libvirt/passwd.db > > index 66e2024..6ffa492 100644 > > Binary files a/puppet-generated/nova_libvirt/etc/libvirt/passwd.db and > > b/puppet-generated/nova_libvirt/etc/libvirt/passwd.db differ > > > > (I have not exactly understood why yet because in my quick test without tls > > a simple 'saslpasswd2 -d -a libvirt -u overcloud migration' did not change > > the md5 for me, but maybe I am missing something) > > > > checking md5sum and stat from before and after an overcloud deploy > > * before overcloud deploy > > [root@compute-0 libvirt]# cat 1 > -rw-r-----. 1 root root 12288 Aug 8 16:22 passwd.db > 71ede9ac578c3e22efe1d0d2d68cb1ff passwd.db > File: ‘passwd.db’ > Size: 12288 Blocks: 24 IO Block: 4096 regular file > Device: fc02h/64514d Inode: 14733371 Links: 1 > Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) > Context: system_u:object_r:virt_etc_t:s0 > Access: 2018-08-09 06:17:59.725717427 +0000 > Modify: 2018-08-08 16:22:49.161938000 +0000 > Change: 2018-08-08 16:29:45.598883277 +0000 > Birth: - > > > * after overcloud deploy > > [root@compute-0 libvirt]# cat 2 > -rw-r-----. 1 root root 12288 Aug 9 06:41 passwd.db > 80b03ead0f5b13ea05ab6462f2ef7762 passwd.db > File: ‘passwd.db’ > Size: 12288 Blocks: 24 IO Block: 4096 regular file > Device: fc02h/64514d Inode: 14758991 Links: 1 > Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) > Context: system_u:object_r:virt_etc_t:s0 > Access: 2018-08-09 06:52:22.338206274 +0000 > Modify: 2018-08-09 06:41:50.306298000 +0000 > Change: 2018-08-09 06:48:46.525213436 +0000 > Birth: - > > 93 if !empty($tls_password) { > 94 $libvirt_sasl_command = "echo \"\${TLS_PASSWORD}\" | > saslpasswd2 -p -a libvirt -u overcloud migration" > 95 $libvirt_auth_ensure = present > 96 $libvirt_auth_conf = " > 97 [credentials-overcloud] > 98 authname=migration@overcloud > 99 password=${tls_password} > 100 > 101 [auth-libvirt-default] > 102 credentials=overcloud > 103 " > 104 } > 105 else { > 106 $libvirt_sasl_command = 'saslpasswd2 -d -a libvirt -u > overcloud migration' > 107 $libvirt_auth_ensure = absent > 108 $libvirt_auth_conf = '' > 109 } > 110 > 111 exec{ 'set libvirt sasl credentials': > 112 environment => ["TLS_PASSWORD=${tls_password}"], > 113 command => $libvirt_sasl_command, > 114 path => ['/sbin', '/usr/sbin', '/bin', '/usr/bin'], > 115 require => File['/etc/sasl2/libvirt.conf'], > 116 tag => ['libvirt_tls_password'] > 117 } > > When running the same $libvirt_sasl_command manually with the same pwd, the > md5sum/inode does not change. > > [root@controller-0 libvirt]# md5sum passwd.db && stat passwd.db > 9424d45a095a94d5c1d5a6d1510a4d53 passwd.db > File: ‘passwd.db’ > Size: 12288 Blocks: 16 IO Block: 4096 regular file > Device: fc02h/64514d Inode: 14876433 Links: 1 > Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) > Context: unconfined_u:object_r:virt_etc_t:s0 > Access: 2018-08-09 07:09:10.992618579 +0000 > Modify: 2018-08-09 07:09:10.991618574 +0000 > Change: 2018-08-09 07:09:10.991618574 +0000 > Birth: - > > [root@controller-0 libvirt]# echo "1234567" | saslpasswd2 -p -a libvirt -u > overcloud migration > [root@controller-0 libvirt]# md5sum passwd.db && stat passwd.db > 9424d45a095a94d5c1d5a6d1510a4d53 passwd.db > File: ‘passwd.db’ > Size: 12288 Blocks: 16 IO Block: 4096 regular file > Device: fc02h/64514d Inode: 14876433 Links: 1 > Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) > Context: unconfined_u:object_r:virt_etc_t:s0 > Access: 2018-08-09 09:25:10.490826254 +0000 > Modify: 2018-08-09 09:25:10.489826249 +0000 > Change: 2018-08-09 09:25:10.489826249 +0000 > Birth: - The issue is that in the config container the passwd.db gets created via the saslpasswd2 commands triggered, Aug 9 04:47:20 compute-0 puppet-user[10]: (/Stage[main]/Tripleo::Profile::Base::Nova::Libvirt/File[/etc/sasl2/libvirt.conf]/content) content changed '{md5}09c4fa846e8e27bfa3ab3325900d63ea' to '{md5}2f138c0278e1b666ec77a6d8ba3054a1' Aug 9 04:47:20 compute-0 journal: Notice: /Stage[main]/Tripleo::Profile::Base::Nova::Libvirt/File[/etc/sasl2/libvirt.conf]/content: content changed '{md5}09c4fa846e8e27bfa3ab3325900d63ea' to '{md5}2f138c0278e1b666ec77a6d8ba3054a1' Aug 9 04:47:20 compute-0 saslpasswd2: error deleting entry from sasldb: BDB0073 DB_NOTFOUND: No matching key/data pair found Aug 9 04:47:20 compute-0 saslpasswd2: error deleting entry from sasldb: BDB0073 DB_NOTFOUND: No matching key/data pair found Aug 9 04:47:20 compute-0 saslpasswd2: error deleting entry from sasldb: BDB0073 DB_NOTFOUND: No matching key/data pair found Aug 9 04:47:20 compute-0 puppet-user[10]: (/Stage[main]/Tripleo::Profile::Base::Nova::Libvirt/Exec[set libvirt sasl credentials]/returns) executed successfully Aug 9 04:47:20 compute-0 journal: Notice: /Stage[main]/Tripleo::Profile::Base::Nova::Libvirt/Exec[set libvirt sasl credentials]/returns: executed successfully Aug 9 04:47:20 compute-0 puppet-user[10]: (/Stage[main]/Tripleo::Profile::Base::Nova::Libvirt/File[/etc/libvirt/auth.conf]/ensure) defined content as '{md5}4095b54c2cd67ed91cb56856c63babb9' Aug 9 04:47:20 compute-0 journal: Notice: /Stage[main]/Tripleo::Profile::Base::Nova::Libvirt/File[/etc/libvirt/auth.conf]/ensure: defined content as '{md5}4095b54c2cd67ed91cb56856c63babb9' Aug 9 04:47:20 compute-0 puppet-user[10]: (/Stage[main]/Tripleo::Profile::Base::Nova::Migration::Target/File[/etc/nova/migration/authorized_keys]/content) content changed '{md5}dff145cb4e519333c0096aae8de2e77c' to '{md5}e927faa6270b0bb9684bad3f346c4295' Aug 9 04:47:20 compute-0 journal: Notice: /Stage[main]/Tripleo::Profile::Base::Nova::Migration::Target/File[/etc/nova/migration/authorized_keys]/content: content changed '{md5}dff145cb4e519333c0096aae8de2e77c' to '{md5}e927faa6270b0bb9684bad3f346c4295' Also if we do not have tls enabled and there is no passwd.db 'saslpasswd2 -d' creates one.
Martin tested if we actually need to restart libvirt containers in case passwd.db changes and the answer is no: (overcloud) [stack@undercloud-0 ~]$ openstack server list --long +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties | +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | 93f012fe-bf61-49b0-af2e-25b56cfd6083 | test | ACTIVE | None | Running | private=192.168.0.3 | cirros | 823525a2-dd89-4d5a-9e70-183132bf2d6c | m1.small | aed75444-94e3-445d-a30f-6881f8dd72dd | nova | compute-0.redhat.local | | +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ 1) change pwd on both computes ()[root@compute-1 libvirt]# cat auth.conf [credentials-overcloud] authname=migration@overcloud password=hAhyktxV9hr8gE2JbCdC2gDjD [auth-libvirt-default] credentials=overcloud ()[root@compute-0 libvirt]# sasldblistusers2 -f passwd.db migration@overcloud: userPassword ()[root@compute-1 libvirt]# echo "12345678" | saslpasswd2 -p -a libvirt -u overcloud migration ()[root@compute-0 libvirt]# cat auth.conf [credentials-overcloud] authname=migration@overcloud password=12345678 [auth-libvirt-default] credentials=overcloud (overcloud) [stack@undercloud-0 ~]$ nova host-evacuate-live compute-0.redhat.local (overcloud) [stack@undercloud-0 ~]$ openstack server list --long +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties | +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | 93f012fe-bf61-49b0-af2e-25b56cfd6083 | test | ACTIVE | None | Running | private=192.168.0.3 | cirros | 823525a2-dd89-4d5a-9e70-183132bf2d6c | m1.small | aed75444-94e3-445d-a30f-6881f8dd72dd | nova | compute-1.redhat.local | | +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ 2) make it fail with different pwd in auth.conf and passwd.db on src and dst compute ()[root@compute-0 libvirt]# echo "abcdefgh" | saslpasswd2 -p -a libvirt -u overcloud migration (overcloud) [stack@undercloud-0 ~]$ openstack server list --long +--------------------------------------+------+-----------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties | +--------------------------------------+------+-----------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | 93f012fe-bf61-49b0-af2e-25b56cfd6083 | test | MIGRATING | migrating | Running | private=192.168.0.3 | cirros | 823525a2-dd89-4d5a-9e70-183132bf2d6c | m1.small | aed75444-94e3-445d-a30f-6881f8dd72dd | nova | compute-1.redhat.local | | +--------------------------------------+------+-----------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ (overcloud) [stack@undercloud-0 ~]$ openstack server list --long +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties | +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ | 93f012fe-bf61-49b0-af2e-25b56cfd6083 | test | ACTIVE | None | Running | private=192.168.0.3 | cirros | 823525a2-dd89-4d5a-9e70-183132bf2d6c | m1.small | aed75444-94e3-445d-a30f-6881f8dd72dd | nova | compute-1.redhat.local | | +--------------------------------------+------+--------+------------+-------------+---------------------+------------+--------------------------------------+-------------+--------------------------------------+-------------------+------------------------+------------+ 2018-08-10 06:26:33.664 1 ERROR nova.virt.libvirt.driver [req-30ee5623-83b2-40e5-81c3-0dd76bf15a38 62d292c6128940eb8d78887f612845bc 291cc1a1c7724531b2cb894047e05fac - default default] [instance: 93f012fe-bf61-49b0-af2e-25b56cfd6083] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tls://compute-0.internalapi.redhat.local/system: authentication failed: authentication failed: libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://compute-0.internalapi.redhat.local/system: authentication failed: authentication failed 2018-08-10 06:26:33.697 1 ERROR nova.virt.libvirt.driver [req-30ee5623-83b2-40e5-81c3-0dd76bf15a38 62d292c6128940eb8d78887f612845bc 291cc1a1c7724531b2cb894047e05fac - default default] [instance: 93f012fe-bf61-49b0-af2e-25b56cfd6083] Migration operation has aborted
I'm a bit concerned by this approach. Won't we just hit the same problem again in future unless, somehow, we can cover this in CI? Ultimately I think the root cause of this issue is that the existing state is not available when running puppet apply, resulting non-existing files being recreated. I think https://review.openstack.org/562876 is the simple, and most correct solution.
(In reply to Ollie Walsh from comment #9) > I'm a bit concerned by this approach. Won't we just hit the same problem > again in future unless, somehow, we can cover this in CI? I share you concern in general. I dislike the idea of somehow maintaining an exclude list (especially in docker-puppet.py). And yes we should somehow cover this in CI. I could not come up with a better contained fix so far. > Ultimately I think the root cause of this issue is that the existing state > is not available when running puppet apply, resulting non-existing files > being recreated. I think https://review.openstack.org/562876 is the simple, > and most correct solution. That is one part of the story, but not all of it? Unless I am misunderstanding things, even if you mapped /etc/libvirt/passwd.db inside the docker-puppet container before running it, you'd still get and md5 mismatch because this Berkeley DB file contains a timestamp which gets changed as soon as you reinforce state on that file. It also would not fix the swift stuff as those get pulled in via the undercloud and get updated and change the md5 anyway. This would fix the cron/parsefile thing though, which currently is rather ugly to deal with tar. I kind of like your approach in https://review.openstack.org/562876 for that. I also understand Jiri's concern in that review. I don't know..https://review.openstack.org/#/c/590008/ now works in my tests but makes anyone's eyes bleed tbh, so I am more than happy to hear better ideas? If we could restore your https://review.openstack.org/562876 approach we'd at least remove the sed hack in my review. We'd still have some excludes here and there to take off though
(In reply to Michele Baldessari from comment #10) > (In reply to Ollie Walsh from comment #9) > > Ultimately I think the root cause of this issue is that the existing state > > is not available when running puppet apply, resulting non-existing files > > being recreated. I think https://review.openstack.org/562876 is the simple, > > and most correct solution. > > That is one part of the story, but not all of it? Unless I am > misunderstanding things, even if you mapped /etc/libvirt/passwd.db inside > the docker-puppet container before running it, you'd still get and md5 > mismatch because this Berkeley DB file contains a timestamp which gets > changed as soon as you reinforce state on that file. > I doesn't seem to be a timestamp in the passwd.db: [root@compute-0 libvirt]# echo "12345678" | saslpasswd2 -p -a libvirt -u overcloud migration [root@compute-0 libvirt]# md5sum passwd.db 1f584b05074002da163a31997c1498b9 passwd.db [root@compute-0 libvirt]# echo "12345678" | saslpasswd2 -p -a libvirt -u overcloud migration [root@compute-0 libvirt]# md5sum passwd.db 1f584b05074002da163a31997c1498b9 passwd.db [root@compute-0 libvirt]# db_dump -p ./passwd.db VERSION=3 format=print type=hash db_pagesize=4096 HEADER=END migration\00overcloud\00userPassword 12345678 DATA=END As long as we have the same db file it seems we are ok.
(In reply to Martin Schuppert from comment #11) > (In reply to Michele Baldessari from comment #10) > > (In reply to Ollie Walsh from comment #9) > > > Ultimately I think the root cause of this issue is that the existing state > > > is not available when running puppet apply, resulting non-existing files > > > being recreated. I think https://review.openstack.org/562876 is the simple, > > > and most correct solution. > > > > That is one part of the story, but not all of it? Unless I am > > misunderstanding things, even if you mapped /etc/libvirt/passwd.db inside > > the docker-puppet container before running it, you'd still get and md5 > > mismatch because this Berkeley DB file contains a timestamp which gets > > changed as soon as you reinforce state on that file. > > > > I doesn't seem to be a timestamp in the passwd.db: > > [root@compute-0 libvirt]# echo "12345678" | saslpasswd2 -p -a libvirt -u > overcloud migration > > [root@compute-0 libvirt]# md5sum passwd.db > 1f584b05074002da163a31997c1498b9 passwd.db > > [root@compute-0 libvirt]# echo "12345678" | saslpasswd2 -p -a libvirt -u > overcloud migration > > [root@compute-0 libvirt]# md5sum passwd.db > 1f584b05074002da163a31997c1498b9 passwd.db > > [root@compute-0 libvirt]# db_dump -p ./passwd.db > > VERSION=3 > format=print > type=hash > db_pagesize=4096 > HEADER=END > migration\00overcloud\00userPassword > 12345678 > DATA=END > > As long as we have the same db file it seems we are ok. Ack, then that would be covered, indeed.
I wonder if we could make an exclude list per-service? Like we pass puppet_tags [1] when we want Puppet to run more than default tags, we could also pass exclude_files to exclude a list of files from puppet-generated directory. The issue with timestamped header in cron files is a bit less convenient though. I think it's a bug in Puppet (it should be idempotent IMO), and it looks like it's not possible to switch the header off easily. That one we might need to deal with in docker-puppet.py directly as Michele proposed. Alternatively we could try to subclass the crontab provider and override the header [2]. For context, IIUC this particular issue also would not be solved by the data carry-over patch [3], so we'd need to hack around it either way. I still think we should try to avoid doing [3] it if at all possible. Once we start carrying config state this way, we will no longer be able to guarantee that e.g. fresh deployment and upgraded deployment have the same config files. We'd open ourselves up to a new class of bugs and lose one of the benefits of containerization. According to my current understanding of the issue, per-service exclude lists and special handling for `# HEADER` bits would get my vote. (And perhaps we can convince Puppeteers to at remove the timestamp from headers in the long run, allowing us to drop the hack?) [1] https://github.com/openstack/tripleo-heat-templates/blob/bf7a452de07d2d71ad97b25e8fc46ead42f0f634/docker/services/nova-libvirt.yaml#L217 [2] https://github.com/puppetlabs/puppet/blob/1fe4d9584e03f4366fb4be7ecafb4fc8fb833dfd/lib/puppet/provider/cron/crontab.rb#L126 [3] https://review.openstack.org/562876
Looks like the puppet parsedfile provider has the same issue. I had expected it to leave the header unchanged if there are no updates but it looks like it writes it out every time (just glancing at the source - https://github.com/puppetlabs/puppet/blob/master/lib/puppet/provider/parsedfile.rb#L380).
(In reply to Jiri Stransky from comment #13) > I wonder if we could make an exclude list per-service? Like we pass > puppet_tags [1] when we want Puppet to run more than default tags, we could > also pass exclude_files to exclude a list of files from puppet-generated > directory. Yeah that is probably one of the best options at this stage. Do we all agree here that this is the way forward? Or do we want the quick exclude in first since it is likely more self-contained and we also work in parallel for a more generic exclude list? Maybe the former for OSP13 and the latter for OSP14? > The issue with timestamped header in cron files is a bit less convenient > though. I think it's a bug in Puppet (it should be idempotent IMO), and it > looks like it's not possible to switch the header off easily. That one we > might need to deal with in docker-puppet.py directly as Michele proposed. > Alternatively we could try to subclass the crontab provider and override the > header [2]. For context, IIUC this particular issue also would not be solved > by the data carry-over patch [3], so we'd need to hack around it either way. Probably not about [3] (see below). I kinda like option [2] (I had not thought about it). So these subclasses of crontab/parsefiles would live in puppet-tripleo, correct? I guess we should first try our luck upstream. > I still think we should try to avoid doing [3] it if at all possible. Once > we start carrying config state this way, we will no longer be able to > guarantee that e.g. fresh deployment and upgraded deployment have the same > config files. We'd open ourselves up to a new class of bugs and lose one of > the benefits of containerization. Yeah I think your concern here is very valid and convincing > According to my current understanding of the issue, per-service exclude > lists and special handling for `# HEADER` bits would get my vote. (And > perhaps we can convince Puppeteers to at remove the timestamp from headers > in the long run, allowing us to drop the hack?) So here is the thing I tried and it kind of surprised me, but the HEADER with timestamp does not retrigger the cron resource updating all the time: [root@rhel8a ~]# cat > cron-test.pp cron { 'helloworld': command => "/puppet/pls.sh", user => root, hour => '*', minute => '*/5', require => File['/puppet/pls.sh'] } [root@rhel8a ~]# vim cron-test.pp [root@rhel8a ~]# puppet apply^C [root@rhel8a ~]# cat cron-test.pp cron { 'helloworld': command => "echo foo", user => root, hour => '*', minute => '*/5', } [root@rhel8a ~]# puppet apply --detailed-exitcodes cron-test.pp Notice: Compiled catalog for rhel8a.internal in environment production in 0.08 seconds Notice: /Stage[main]/Main/Cron[helloworld]/ensure: created Notice: Applied catalog in 0.06 seconds [root@rhel8a ~]# cat /var/spool/cron/root # HEADER: This file was autogenerated at 2018-08-16 11:49:45 -0400 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: helloworld */5 * * * * echo foo [root@rhel8a ~]# puppet apply --detailed-exitcodes cron-test.pp Notice: Compiled catalog for rhel8a.internal in environment production in 0.08 seconds Notice: Applied catalog in 0.05 seconds So the way I see it puppet is idempotent here (although I think we can all agree that the timestamp is a bit lame, the case for getting a patch upstream might be a bit lessened, since idempotency seems to work just fine. Not sure how it happens as I did not investigate too much) I.e. [3] would indeed solve this issue here, I think. > > [1] > https://github.com/openstack/tripleo-heat-templates/blob/ > bf7a452de07d2d71ad97b25e8fc46ead42f0f634/docker/services/nova-libvirt. > yaml#L217 > [2] > https://github.com/puppetlabs/puppet/blob/ > 1fe4d9584e03f4366fb4be7ecafb4fc8fb833dfd/lib/puppet/provider/cron/crontab. > rb#L126 > [3] https://review.openstack.org/562876
How about the reverse? i.e we have a include list of the files to preserve between puppet runs. We could also catch any issues I think: run puppet twice, once with no existing state preserved (except the include list) and again with all state preserved. If there are any differences then fail the deployment (in CI at least).
(In reply to Michele Baldessari from comment #15) > (In reply to Jiri Stransky from comment #13) > > I wonder if we could make an exclude list per-service? Like we pass > > puppet_tags [1] when we want Puppet to run more than default tags, we could > > also pass exclude_files to exclude a list of files from puppet-generated > > directory. > > Yeah that is probably one of the best options at this stage. Do we all agree > here that this is the way forward? Or do we want the quick exclude in first > since it is likely more self-contained and we also work in parallel for a > more generic exclude list? Maybe the former for OSP13 and the latter for > OSP14? I like the proposed approach. > > The issue with timestamped header in cron files is a bit less convenient > > though. I think it's a bug in Puppet (it should be idempotent IMO), and it > > looks like it's not possible to switch the header off easily. That one we > > might need to deal with in docker-puppet.py directly as Michele proposed. > > Alternatively we could try to subclass the crontab provider and override the > > header [2]. For context, IIUC this particular issue also would not be solved > > by the data carry-over patch [3], so we'd need to hack around it either way. > > Probably not about [3] (see below). > I kinda like option [2] (I had not thought about it). So these subclasses of > crontab/parsefiles would live in puppet-tripleo, correct? I guess we should > first try our luck upstream. Indeed, monkey patching the modules that inject a header with timestamps would be one option, but honestly I do not find it much cleaner than the sed you've implemented in docker-puppet.py. We can legitimately (?) assume the injected headers format is consistent across puppet modules and catch them all with the sed, while monkey patching would force us to work on a case by case basis.
(sorry for the delay, back from PTO now) +1 for sed for the headers removal. The implementation is already proposed, it catches all rather than having to patch each affected Puppet type. Michele's implementation only affects the checksumming, we don't edit the files which then end up mounted into containers, so it's pretty safe i think. Regarding list to keep vs. list to exclude, it's a tough call i think. List to keep will probably be nicer in the end but it will require more work to get it done, if we go that way we should probably involve all DFGs so that we can cover all services. I think this could be doable (and quite nice? :) ) if: * When the parameter ("keep_files"?) isn't specified, we keep all files. That way we allow gradual migration from current approach to the new one. * We probably should allow regexes as the keep_files entries, rather than simple strings, to make sure the keep_files parameter has sufficient power to cover all use cases. I like it but it requires more effort :)
Thanks for all the feedback. I'll do one more round of testing of the WIP review and then propose that so that we can bring it to OSP13. I'll prolly need some help to do the cleaner approach with whitelist/blacklist for OSP14/15.
Verified on puddle 2018-10-02.1 No containers restart on stack update. Used the same script to verify [stack@undercloud-0 ~]$ diff after before [stack@undercloud-0 ~]$
*** Bug 1642178 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3587