Bug 1791165 - Undercloud installation failure in step "Start containers for step 1" due to "cannot chdir: Permission denied"
Summary: Undercloud installation failure in step "Start containers for step 1" due to ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On: 1793598
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-15 04:21 UTC by Shatadru Bandyopadhyay
Modified: 2023-09-07 21:30 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1793598 (view as bug list)
Environment:
Last Closed: 2020-04-16 19:57:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-28315 0 None None None 2023-09-07 21:30:46 UTC

Description Shatadru Bandyopadhyay 2020-01-15 04:21:12 UTC
Description of problem:

Undercloud installation failure in step "Start containers for step 1" due to "cannot chdir: Permission denied"

Version-Release number of selected component (if applicable):


How reproducible:
Always


Actual results:
Failing with "cannot chdir: Permission denied"

Expected results:
Undercloud install should work

Additional info:

# openstack undercloud install fails with below error

last log is:
TASK [Debug output for task: Start containers for step 1] *************************************************************************************************
2020-01-08 16:50:47.678 57001 WARNING tripleoclient.v1.tripleo_deploy.Deploy [  ] Wednesday 08 January 2020  16:50:47 +0100 (0:00:02.275)       0:21:21.291 *****
2020-01-08 16:50:47.726 57001 WARNING tripleoclient.v1.tripleo_deploy.Deploy [  ] fatal: [hostname]: FAILED! => {
2020-01-08 16:50:47.727 57001 WARNING tripleoclient.v1.tripleo_deploy.Deploy [  ]     "failed_when_result": true,
2020-01-08 16:50:47.727 57001 WARNING tripleoclient.v1.tripleo_deploy.Deploy [  ]     "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
2020-01-08 16:50:47.727 57001 WARNING tripleoclient.v1.tripleo_deploy.Deploy [  ]         "cannot chdir: Permission denied",

In the messages file, it corresponds to the execution of the following line
Jan  8 16:50:46 hostname python3[71223]: ansible-paunch Invoked with config=/var/lib/tripleo-config/hashed-container-startup-config-step_1.json config_id=['tripleo_step1'] action=apply container_cli=podman container_log_stdout_path=/var/log/containers/stdouts healthcheck_disabled=False managed_by=tripleo-Undercloud debug=False log_file=/var/log/paunch.log

The paunch.log shows errros:

2020-01-08 16:50:46.834 71223 WARNING paunch [  ] Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=memcached', '--filter', 'label=config_id=tripleo_step1', '--format', '{{.Names}}']" - retrying without config_id
2020-01-08 16:50:46.872 71223 WARNING paunch [  ] Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=memcached', '--format', '{{.Names}}']"
2020-01-08 16:50:46.910 71223 ERROR paunch [  ] Error running ['podman', 'create', '--name', 'memcached', '--label', 'config_id=tripleo_step1', '--label', 'container_name=memcached', '--label', 'managed_by=tripleo-Undercloud', '--label', 'config_data={"command": ["/bin/bash", "-c", "source /etc/sysconfig/memcached; /usr/bin/memcached -p ${PORT} -u ${USER} -m ${CACHESIZE} -c ${MAXCONN} $OPTIONS"], "healthcheck": {"test": "/openstack/healthcheck"}, "image": "hostname.ctlplane.localdomain:8787/rhosp-beta/openstack-memcached:16.0-62", "net": "host", "privileged": false, "restart": "always", "start_order": 0, "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/memcached/etc/sysconfig/memcached:/etc/sysconfig/memcached:ro"]}', '--conmon-pidfile=/var/run/memcached.pid', '--detach=true', '--log-driver', 'k8s-file', '--log-opt', 'path=/var/log/containers/stdouts/memcached.log', '--net=host', '--privileged=false', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/config-data/memcached/etc/sysconfig/memcached:/etc/sysconfig/memcached:ro', '--cpuset-cpus=0,1,2,3', 'hostname.ctlplane.localdomain:8787/rhosp-beta/openstack-memcached:16.0-62', '/bin/bash', '-c', 'source /etc/sysconfig/memcached; /usr/bin/memcached -p ${PORT} -u ${USER} -m ${CACHESIZE} -c ${MAXCONN} $OPTIONS']. [1]

2020-01-08 16:50:46.910 71223 ERROR paunch [  ] stdout:
2020-01-08 16:50:46.911 71223 ERROR paunch [  ] stderr: cannot chdir: Permission denied


Things tried so far

- Tried setting selinux to Permissive
- We have tried to set 'become: true' in the playbook which was failing
- We have cleaned up older containers and images.

Comment 2 Alex Schultz 2020-01-15 16:32:10 UTC
Can you have the customer attempt to run 'sudo podman system migrate' and rerun the install.  We're not currently able to reproduce this issue. Can the customer provide any additional details about the initial configuration of the undercloud. Was there any specific system hardening or other configurations that were applied prior to attempting to install?  

Can the customer also check the file permissions in /var/lib/containers/ and /var/lib/containers/storage/.  Specifically they can run "sudo find /var/lib/containers/storage/{overlay-layers,overlay-images,overlay-containers,mounts,libpod,tmp} -ls"

Comment 3 Alex Schultz 2020-01-15 16:39:23 UTC
Also, see https://bugzilla.redhat.com/show_bug.cgi?id=1768355 which sounds similar to this error.

Comment 5 Alex Schultz 2020-01-20 21:35:13 UTC
I haven't been able to reproduce the issue with a clean install. However I was given access to a box showing the issue and we attempted an upgrade to podman 1.6 which will be what is used by OSP16 GA. This appears to have cleared up the issue but I'm continuing to try and reproduce. Unfortunately that means the RC might not work for some folks at this time. I'm still trying to track down specifically is causing the issue as I've seen it work on a retry or if you manually pull the containers.

Comment 6 Srinivas Atmakuri 2020-01-21 05:55:56 UTC
Hi,

Another customer is also facing a similar issue on RHOSP-15.

Comment 8 Alex Schultz 2020-01-21 16:05:11 UTC
Since this is affecting both OSP15 and OSP16 rc, it's likely an issue with podman 1.4 that we currently ship.  I believe 1.6 will be coming out in next few weeks which may address this issue. I'll raise a bz against podman to see if we can get further details.  There doesn't appear to be anything that OSP can do about this at the moment.

Comment 9 Alex Schultz 2020-04-16 19:57:28 UTC
Closing this out as we have shipped out 1.6 and I haven't seen any issues like this again. If this is still an issue, please reopen this bug and provide updated logs.


Note You need to log in before you can comment on or make changes to this bug.