1794119 – [OSP13] Docker containers stuck in restarting. Error is OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'

Bug 1794119 - [OSP13] Docker containers stuck in restarting. Error is OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'

Summary: [OSP13] Docker containers stuck in restarting. Error is OSError: [Errno 30] R...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Alex Schultz
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1847507 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-22 17:16 UTC by ggrimaux
Modified:	2023-12-15 17:13 UTC (History)
CC List:	15 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-8.4.1-52.el7ost
Doc Type:	Bug Fix
Doc Text:	Before this update, files that were mounted as read-only on the host were incorrectly collected during container configuration generation. This caused the containers to remain in a `restarting` state and generated a `read-only file system` error. With this update, read-only files are no longer collected during container configuration generation, and the containers restart correctly. If the error still occurs, you must manually clean up the old files before re-deploying. For more information, see: https://access.redhat.com/solutions/5048941
Clone Of:
Environment:
Last Closed:	2020-06-24 11:33:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1860607	None	None	None	2020-01-22 21:18:51 UTC
OpenStack gerrit	703873	None	MERGED	Update ro excludes	2021-02-02 07:35:22 UTC
Red Hat Issue Tracker	OSP-11848	None	None	None	2021-12-15 11:17:37 UTC
Red Hat Knowledge Base (Solution)	4950631	None	None	None	2020-04-01 16:36:40 UTC
Red Hat Knowledge Base (Solution)	5048941	None	None	None	2020-05-05 13:03:49 UTC
Red Hat Product Errata	RHBA-2020:2718	None	None	None	2020-06-24 11:33:54 UTC

Description ggrimaux 2020-01-22 17:16:44 UTC

Description of problem:
What we know so far:
Client upgraded OSP10 to OSP13 on December 6.
After that things were working fine.
Until on 1 compute node, 6 containers couldn't be restarted.

The error we had was:
Jan 20 13:26:38 compute13 journal: INFO:__main__:Deleting /etc/neutron/neutron.conf
Jan 20 13:26:38 compute13 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/neutron/neutron.conf to /etc/neutron/neutron.conf
Jan 20 13:26:38 compute13 journal: ERROR:__main__:Unexpected error:
Jan 20 13:26:38 compute13 journal: Traceback (most recent call last):
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 411, in main
Jan 20 13:26:38 compute13 journal:    execute_config_strategy(config)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy
Jan 20 13:26:38 compute13 journal:    copy_config(config)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 306, in copy_config
Jan 20 13:26:38 compute13 journal:    config_file.copy()
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 150, in copy
Jan 20 13:26:38 compute13 journal:    self._merge_directories(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
Jan 20 13:26:38 compute13 journal:    os.path.join(dest, to_copy))
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
Jan 20 13:26:38 compute13 journal:    os.path.join(dest, to_copy))
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
Jan 20 13:26:38 compute13 journal:    os.path.join(dest, to_copy))
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 92, in _merge_directories
Jan 20 13:26:38 compute13 journal:    self._set_properties(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 117, in _set_properties
Jan 20 13:26:38 compute13 journal:    self._set_properties_from_file(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 122, in _set_properties_from_file
Jan 20 13:26:38 compute13 journal:    shutil.copystat(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/lib64/python2.7/shutil.py", line 98, in copystat
Jan 20 13:26:38 compute13 journal:    os.utime(dst, (st.st_atime, st.st_mtime))
Jan 20 13:26:38 compute13 journal: OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'

The containers with the issue:
19f836ead8ba sat:5000/osp-osp13_containers-neutron-sriov-agent:13.0-89 "kolla_start" 12 days ago Restarting (2) 26 minutes ago neutron_sriov_agent
b156ed0fbe14 sat:5000/osp-osp13_containers-nova-compute-hotfix:13.0-98-1703225 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_compute
44e83ede071d sat:5000/osp-osp13_containers-nova-compute-hotfix:13.0-98-1703225 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_migration_target
f851155eaafd sat:5000/osp-osp13_containers-cron:13.0-90 "kolla_start" 12 days ago Restarting (2) 26 minutes ago logrotate_crond
1e4426c5ead3 sat:5000/osp-osp13_containers-nova-libvirt:13.0-101 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_libvirt
e43197378dc1 sat:5000/osp-osp13_containers-nova-libvirt:13.0-101 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_virtlogd

The solution was found with the help of engineering.

For an unknown reason, /etc/pki directory was copied to subdfolders inside /var/lib/config-data/puppet-generated/<container>/.
So when the containers were started, the binding that is supposed to be done with /etc/pki/ca-trust/extracted was done with the one inside /var/lib/config-data/puppet-generated/<container>/etc/pki/ca-trust/extracted.

Removing that folder from each subfolder solved the issue and allowed us to start the containers.

One thing to know is that it wasn't copied into /var/lib/config-data/puppet-generated/iscsid/. This container was running just fine.

The theory so far is that the folder /etc/pki was modified for some reason on that single compute node and because of no restriction in the code, that folder was copied inside puppet-generated folder to be fed to containers.

I have logs from that compute node for analysis.
If anything else is needed please let me know.

I'm also asking the client to ask internally what was done around January 8th (openstack operations) to explain why this directory was changed.

Version-Release number of selected component (if applicable):
puppet-tripleo-8.4.1-14.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. cp /etc/pki /var/lib/config-data/puppet-generated/<subfolder>/etc/
2. docker restart container
3. docker container stuck in Restarting state

Actual results:
docker container stuck in Restarting state

Expected results:
docker containers start normally

Additional info:

Comment 16 Alex Schultz 2020-05-19 13:11:08 UTC

The hotfix does not remove the files. Once a customer hits the problem, they need to clean it up per the KCS and deploy with the hotfix to ensure it does not occur again.

Comment 25 errata-xmlrpc 2020-06-24 11:33:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2718

Comment 26 Alex Schultz 2020-07-01 13:00:46 UTC

*** Bug 1847507 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.