Bug 1847507

Summary: [OSP13] Docker containers stuck in restarting. OSError: [Errno 16] Device or resource busy: '/etc/hosts'
Product: Red Hat OpenStack Reporter: Irina Petrova <ipetrova>
Component: openstack-containersAssignee: Dan Prince <dprince>
Status: CLOSED DUPLICATE QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: apetrich, aschultz, augol, jjoyce, m.andre, nchandek
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-01 13:00:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Irina Petrova 2020-06-16 14:09:41 UTC
This bug was initially created as a copy of Bug #1794119

I am copying this bug because: 
The issue might be similar if not the same. Cloning/Opening a new Bug for verification. 


Description of problem:
What we know so far:
Client upgraded OSP10 to OSP13 on December 6.
After that things were working fine.
Until on 1 compute node, 6 containers couldn't be restarted.

The error we had was:
Jan 20 13:26:38 compute13 journal: INFO:__main__:Deleting /etc/neutron/neutron.conf
Jan 20 13:26:38 compute13 journal: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/neutron/neutron.conf to /etc/neutron/neutron.conf
Jan 20 13:26:38 compute13 journal: ERROR:__main__:Unexpected error:
Jan 20 13:26:38 compute13 journal: Traceback (most recent call last):
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 411, in main
Jan 20 13:26:38 compute13 journal:    execute_config_strategy(config)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 377, in execute_config_strategy
Jan 20 13:26:38 compute13 journal:    copy_config(config)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 306, in copy_config
Jan 20 13:26:38 compute13 journal:    config_file.copy()
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 150, in copy
Jan 20 13:26:38 compute13 journal:    self._merge_directories(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
Jan 20 13:26:38 compute13 journal:    os.path.join(dest, to_copy))
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
Jan 20 13:26:38 compute13 journal:    os.path.join(dest, to_copy))
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 97, in _merge_directories
Jan 20 13:26:38 compute13 journal:    os.path.join(dest, to_copy))
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 92, in _merge_directories
Jan 20 13:26:38 compute13 journal:    self._set_properties(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 117, in _set_properties
Jan 20 13:26:38 compute13 journal:    self._set_properties_from_file(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/local/bin/kolla_set_configs", line 122, in _set_properties_from_file
Jan 20 13:26:38 compute13 journal:    shutil.copystat(source, dest)
Jan 20 13:26:38 compute13 journal:  File "/usr/lib64/python2.7/shutil.py", line 98, in copystat
Jan 20 13:26:38 compute13 journal:    os.utime(dst, (st.st_atime, st.st_mtime))
Jan 20 13:26:38 compute13 journal: OSError: [Errno 30] Read-only file system: '/etc/pki/ca-trust/extracted'

The containers with the issue:
19f836ead8ba sat:5000/osp-osp13_containers-neutron-sriov-agent:13.0-89 "kolla_start" 12 days ago Restarting (2) 26 minutes ago neutron_sriov_agent
b156ed0fbe14 sat:5000/osp-osp13_containers-nova-compute-hotfix:13.0-98-1703225 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_compute
44e83ede071d sat:5000/osp-osp13_containers-nova-compute-hotfix:13.0-98-1703225 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_migration_target
f851155eaafd sat:5000/osp-osp13_containers-cron:13.0-90 "kolla_start" 12 days ago Restarting (2) 26 minutes ago logrotate_crond
1e4426c5ead3 sat:5000/osp-osp13_containers-nova-libvirt:13.0-101 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_libvirt
e43197378dc1 sat:5000/osp-osp13_containers-nova-libvirt:13.0-101 "kolla_start" 12 days ago Restarting (2) 26 minutes ago nova_virtlogd

The solution was found with the help of engineering.

For an unknown reason, /etc/pki directory was copied to subdfolders inside /var/lib/config-data/puppet-generated/<container>/.
So when the containers were started, the binding that is supposed to be done with /etc/pki/ca-trust/extracted was done with the one inside /var/lib/config-data/puppet-generated/<container>/etc/pki/ca-trust/extracted.

Removing that folder from each subfolder solved the issue and allowed us to start the containers.

One thing to know is that it wasn't copied into /var/lib/config-data/puppet-generated/iscsid/. This container was running just fine.

The theory so far is that the folder /etc/pki was modified for some reason on that single compute node and because of no restriction in the code, that folder was copied inside puppet-generated folder to be fed to containers.

I have logs from that compute node for analysis.
If anything else is needed please let me know.

I'm also asking the client to ask internally what was done around January 8th (openstack operations) to explain why this directory was changed.

Version-Release number of selected component (if applicable):
puppet-tripleo-8.4.1-14.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. cp /etc/pki /var/lib/config-data/puppet-generated/<subfolder>/etc/
2. docker restart container
3. docker container stuck in Restarting state

Actual results:
docker container stuck in Restarting state

Expected results:
docker containers start normally

Additional info:

Comment 3 Alex Schultz 2020-07-01 13:00:46 UTC
Please ensure that /var/lib/config-data/puppet-generated/*/etc/hosts does not exist. This bug is 1794119 which was just released last week.

*** This bug has been marked as a duplicate of bug 1794119 ***