Description of problem: What we know so far: Client upgraded OSP10 to OSP13 on December 6. After that things were working fine. Until on 1 compute node, 6 containers couldn't be restarted. The error we had was: Jan 20 13:26:38 compute13 journal: INFO:__main__:Deleting /etc/neutron/neutron.conf Jan 20 13:26:38 compute13 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/neutron/neutron.conf to /etc/neutron/neutron.conf Jan 20 13:26:38 compute13 journal: ERROR:__main__:Unexpected error: Jan 20 13:26:38 compute13 journal: Traceback (most recent call last): Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 411, in main Jan 20 13:26:38 compute13 journal: execute_config_strategy(config) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy Jan 20 13:26:38 compute13 journal: copy_config(config) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 306, in copy_config Jan 20 13:26:38 compute13 journal: config_file.copy() Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 150, in copy Jan 20 13:26:38 compute13 journal: self._merge_directories(source, dest) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories Jan 20 13:26:38 compute13 journal: os.path.join(dest, to_copy)) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories Jan 20 13:26:38 compute13 journal: os.path.join(dest, to_copy)) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories Jan 20 13:26:38 compute13 journal: os.path.join(dest, to_copy)) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 92, in _merge_directories Jan 20 13:26:38 compute13 journal: self._set_properties(source, dest) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 117, in _set_properties Jan 20 13:26:38 compute13 journal: self._set_properties_from_file(source, dest) Jan 20 13:26:38 compute13 journal: File "/usr/local/bin/kolla_set_configs", line 122, in _set_properties_from_file Jan 20 13:26:38 compute13 journal: shutil.copystat(source, dest) Jan 20 13:26:38 compute13 journal: File "/usr/lib64/python2.7/shutil.py", line 98, in copystat Jan 20 13:26:38 compute13 journal: os.utime(dst, (st.st_atime, st.st_mtime)) Jan 20 13:26:38 compute13 journal: OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted' The containers with the issue: 19f836ead8ba sat:5000/osp-osp13_containers-neutron-sriov-agent:13.0-89 "kolla_start" 12 days ago Restarting (2) 26 minutes ago neutron_sriov_agent b156ed0fbe14 sat:5000/osp-osp13_containers-nova-compute-hotfix:13.0-98-1703225 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_compute 44e83ede071d sat:5000/osp-osp13_containers-nova-compute-hotfix:13.0-98-1703225 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_migration_target f851155eaafd sat:5000/osp-osp13_containers-cron:13.0-90 "kolla_start" 12 days ago Restarting (2) 26 minutes ago logrotate_crond 1e4426c5ead3 sat:5000/osp-osp13_containers-nova-libvirt:13.0-101 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_libvirt e43197378dc1 sat:5000/osp-osp13_containers-nova-libvirt:13.0-101 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_virtlogd The solution was found with the help of engineering. For an unknown reason, /etc/pki directory was copied to subdfolders inside /var/lib/config-data/puppet-generated/<container>/. So when the containers were started, the binding that is supposed to be done with /etc/pki/ca-trust/extracted was done with the one inside /var/lib/config-data/puppet-generated/<container>/etc/pki/ca-trust/extracted. Removing that folder from each subfolder solved the issue and allowed us to start the containers. One thing to know is that it wasn't copied into /var/lib/config-data/puppet-generated/iscsid/. This container was running just fine. The theory so far is that the folder /etc/pki was modified for some reason on that single compute node and because of no restriction in the code, that folder was copied inside puppet-generated folder to be fed to containers. I have logs from that compute node for analysis. If anything else is needed please let me know. I'm also asking the client to ask internally what was done around January 8th (openstack operations) to explain why this directory was changed. Version-Release number of selected component (if applicable): puppet-tripleo-8.4.1-14.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. cp /etc/pki /var/lib/config-data/puppet-generated/<subfolder>/etc/ 2. docker restart container 3. docker container stuck in Restarting state Actual results: docker container stuck in Restarting state Expected results: docker containers start normally Additional info:
The hotfix does not remove the files. Once a customer hits the problem, they need to clean it up per the KCS and deploy with the hotfix to ensure it does not occur again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2718
*** Bug 1847507 has been marked as a duplicate of this bug. ***