Description of problem: Trying to test OSP Director 13 controller update from rhosp-release 13.0.7 to 13.0.11 and ran into below issue "Running container: nova_api_ensure_cell0_database_url", "$ docker ps -a --filter label=container_name=nova_api_ensure_cell0_database_url --filter label=config_id=tripleo_step3 --format {{.Names}}", "Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=nova_api_ensure_cell0_database_url', '--filter', 'label=config_id=tripleo_step3', '--format', '{{.Names}}']\" - retrying without config_id", "$ docker ps -a --filter label=container_name=nova_api_ensure_cell0_database_url --format {{.Names}}", "nova_api_ensure_cell0_database_url", "$ docker run --name nova_api_ensure_cell0_database_url --label config_id=tripleo_step3 --label container_name=nova_api_ensure_cell0_database_url --label managed_by=paunch --label config_data={\"start_order\": 3, \"image\": \"registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-114\", \"environment\": [\"TRIPLEO_CONFIG_HASH=249fea517d1efae4911e08838277b246\"], \"command\": \"/usr/bin/bootstrap_host_exec nova_api /nova_api_ensure_cell0_database_url.sh\", \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/log/containers/httpd/nova-api:/var/log/httpd\", \"/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro\", \"/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/lib/config-data/puppet-generated/nova/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/docker-config-scripts/nova_api_ensure_cell0_database_url.sh:/nova_api_ensure_cell0_database_url.sh:ro\"], \"net\": \"host\", \"detach\": false} --env=TRIPLEO_CONFIG_HASH=249fea517d1efae4911e08838277b246 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/nova:/var/log/nova --volume=/var/log/containers/httpd/nova-api:/var/log/httpd --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova --volume=/var/lib/config-data/puppet-generated/nova/:/var/lib/kolla/config_files/src:ro --volume=/var/lib/docker-config-scripts/nova_api_ensure_cell0_database_url.sh:/nova_api_ensure_cell0_database_url.sh:ro --cpuset-cpus=0,1,2,3 registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-114 /usr/bin/bootstrap_host_exec nova_api /nova_api_ensure_cell0_database_url.sh", "/usr/bin/docker-current: Error response from daemon: Conflict. The container name \"/nova_api_ensure_cell0_database_url\" is already in use by container 580df6343e9af347fdf157e1f00fe37e0155c02ed368263ce8fc08466fcf7824. You have to remove (or rename) that container to be able to reuse that name..", "See '/usr/bin/docker-current run --help'.", "Error running ['docker', 'run', '--name', u'nova_api_ensure_cell0_database_url', '--label', 'config_id=tripleo_step3', '--label', 'container_name=nova_api_ensure_cell0_database_url', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 3, \"image\": \"registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-114\", \"environment\": [\"TRIPLEO_CONFIG_HASH=249fea517d1efae4911e08838277b246\"], \"command\": \"/usr/bin/bootstrap_host_exec nova_api /nova_api_ensure_cell0_database_url.sh\", \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/log/containers/httpd/nova-api:/var/log/httpd\", \"/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro\", \"/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/lib/config-data/puppet-generated/nova/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/docker-config-scripts/nova_api_ensure_cell0_database_url.sh:/nova_api_ensure_cell0_database_url.sh:ro\"], \"net\": \"host\", \"detach\": false}', '--env=TRIPLEO_CONFIG_HASH=249fea517d1efae4911e08838277b246', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/log/containers/nova:/var/log/nova', '--volume=/var/log/containers/httpd/nova-api:/var/log/httpd', '--volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro', '--volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro', '--volume=/var/log/containers/nova:/var/log/nova', '--volume=/var/lib/config-data/puppet-generated/nova/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/docker-config-scripts/nova_api_ensure_cell0_database_url.sh:/nova_api_ensure_cell0_database_url.sh:ro', '--cpuset-cpus=0,1,2,3', 'registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-114', '/usr/bin/bootstrap_host_exec', 'nova_api', '/nova_api_ensure_cell0_database_url.sh']. [125]", "stderr: /usr/bin/docker-current: Error response from daemon: Conflict. The container name \"/nova_api_ensure_cell0_database_url\" is already in use by container 580df6343e9af347fdf157e1f00fe37e0155c02ed368263ce8fc08466fcf7824. You have to remove (or rename) that container to be able to reuse that name.." Version-Release number of selected component (if applicable): 13.0.7 overcloud deployment versions: openstack-tripleo-heat-templates-8.3.1-54.el7ost rhosp-director-images-13.0-20190627.1.el7ost rhosp-director-images-ipa-13.0-20190627.1.el7ost Update versions: openstack-tripleo-heat-templates-8.4.1-42.el7ost How reproducible: When trying to update overcloud from rhosp-release 13.0.7 to 13.0.11. Steps to Reproduce: 1. Install fresh undercloud 13.0.11 and swap openstack-tripleo-heat-templates with openstack-tripleo-heat-templates-8.3.1-54.el7ost 2. Install rhosp-director-images-13.0-20190627.1.el7ost and rhosp-director-images-ipa-13.0-20190627.1.el7ost and update glance with 13.0.7 overcloud images. 3. Deploy 13.0.7 overcloud using [1] overcloud_images.yaml mentioned in additional info and running "openstack overcloud deploy --templates -e /home/stack/templates/overcloud_images.yaml --ntp-server <ntp-ip>" 4. Once the deployment is up, update openstack-tripleo-heat-templates to openstack-tripleo-heat-templates-8.4.1-42.el7ost 5. Run below commands a. source stackrc b. sudo openstack overcloud container image prepare --namespace=registry.access.redhat.com/rhosp13 --prefix=openstack- --tag-from-label {version}-{release} --output-env-file=/home/stack/templates/overcloud_images.yaml c. openstack overcloud update prepare --templates -e /home/stack/templates/overcloud_images.yaml --ntp-server <ntp-ip> d. Run "openstack overcloud update run --nodes Controller" Actual results: Conflict. The container name \"/nova_api_ensure_cell0_database_url\" is already in use by container 580df6343e9af347fdf157e1f00fe37e0155c02ed368263ce8fc08466fcf7824. You have to remove (or rename) that container to be able to reuse that name.." Expected results: Successfully update overcloud controller nodes from 13.0.7 to 13.0.11. Additional info: It looks like in openstack-tripleo-heat-templates-8.3.1-54.el7ost, "nova_api_ensure_cell0_database_url" container starts at "step: 5", whereas in openstack-tripleo-heat-templates-8.4.1-42.el7ost starting of "nova_api_ensure_cell0_database_url" container moved to "step: 3". Due to which during the controller update, "nova_api_ensure_cell0_database_url" from 13.0.7 was still running when step is at 3 and when latest openstack-tripleo-heat-temaplates are trying to start "nova_api_ensure_cell0_database_url" at step 3, this is causing the issue: Conflict. The container name \"/nova_api_ensure_cell0_database_url\" is already in use by container 580df6343e9af347fdf157e1f00fe37e0155c02ed368263ce8fc08466fcf7824. You have to remove (or rename) that container to be able to reuse that name.." [1] overcloud_images.yaml parameter_defaults: DockerAodhApiImage: registry.access.redhat.com/rhosp13/openstack-aodh-api:13.0-76 DockerAodhConfigImage: registry.access.redhat.com/rhosp13/openstack-aodh-api:13.0-76 DockerAodhEvaluatorImage: registry.access.redhat.com/rhosp13/openstack-aodh-evaluator:13.0-76 DockerAodhListenerImage: registry.access.redhat.com/rhosp13/openstack-aodh-listener:13.0-75 DockerAodhNotifierImage: registry.access.redhat.com/rhosp13/openstack-aodh-notifier:13.0-76 DockerCeilometerCentralImage: registry.access.redhat.com/rhosp13/openstack-ceilometer-central:13.0-73 DockerCeilometerComputeImage: registry.access.redhat.com/rhosp13/openstack-ceilometer-compute:13.0-75 DockerCeilometerConfigImage: registry.access.redhat.com/rhosp13/openstack-ceilometer-central:13.0-73 DockerCeilometerNotificationImage: registry.access.redhat.com/rhosp13/openstack-ceilometer-notification:13.0-75 DockerCinderApiImage: registry.access.redhat.com/rhosp13/openstack-cinder-api:13.0-79 DockerCinderConfigImage: registry.access.redhat.com/rhosp13/openstack-cinder-api:13.0-79 DockerCinderSchedulerImage: registry.access.redhat.com/rhosp13/openstack-cinder-scheduler:13.0-81 DockerCinderVolumeImage: registry.access.redhat.com/rhosp13/openstack-cinder-volume:13.0-79 DockerClustercheckConfigImage: registry.access.redhat.com/rhosp13/openstack-mariadb:13.0-77 DockerClustercheckImage: registry.access.redhat.com/rhosp13/openstack-mariadb:13.0-77 DockerCrondConfigImage: registry.access.redhat.com/rhosp13/openstack-cron:13.0-82 DockerCrondImage: registry.access.redhat.com/rhosp13/openstack-cron:13.0-82 DockerGlanceApiConfigImage: registry.access.redhat.com/rhosp13/openstack-glance-api:13.0-78 DockerGlanceApiImage: registry.access.redhat.com/rhosp13/openstack-glance-api:13.0-78 DockerGnocchiApiImage: registry.access.redhat.com/rhosp13/openstack-gnocchi-api:13.0-76 DockerGnocchiConfigImage: registry.access.redhat.com/rhosp13/openstack-gnocchi-api:13.0-76 DockerGnocchiMetricdImage: registry.access.redhat.com/rhosp13/openstack-gnocchi-metricd:13.0-77 DockerGnocchiStatsdImage: registry.access.redhat.com/rhosp13/openstack-gnocchi-statsd:13.0-76 DockerHAProxyConfigImage: registry.access.redhat.com/rhosp13/openstack-haproxy:13.0-79 DockerHAProxyImage: registry.access.redhat.com/rhosp13/openstack-haproxy:13.0-79 DockerIscsidConfigImage: registry.access.redhat.com/rhosp13/openstack-iscsid:13.0-74 DockerIscsidImage: registry.access.redhat.com/rhosp13/openstack-iscsid:13.0-74 DockerKeystoneConfigImage: registry.access.redhat.com/rhosp13/openstack-keystone:13.0-74 DockerKeystoneImage: registry.access.redhat.com/rhosp13/openstack-keystone:13.0-74 DockerMemcachedConfigImage: registry.access.redhat.com/rhosp13/openstack-memcached:13.0-76 DockerMemcachedImage: registry.access.redhat.com/rhosp13/openstack-memcached:13.0-76 DockerMysqlClientConfigImage: registry.access.redhat.com/rhosp13/openstack-mariadb:13.0-77 DockerMysqlConfigImage: registry.access.redhat.com/rhosp13/openstack-mariadb:13.0-77 DockerMysqlImage: registry.access.redhat.com/rhosp13/openstack-mariadb:13.0-77 DockerNeutronDHCPImage: registry.access.redhat.com/rhosp13/openstack-neutron-dhcp-agent:13.0-85 DockerNeutronL3AgentImage: registry.access.redhat.com/rhosp13/openstack-neutron-l3-agent:13.0-83 DockerNeutronMetadataImage: registry.access.redhat.com/rhosp13/openstack-neutron-metadata-agent:13.0-86 DockerNovaApiImage: registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-84 DockerNovaComputeImage: registry.access.redhat.com/rhosp13/openstack-nova-compute:13.0-92 DockerNovaConductorImage: registry.access.redhat.com/rhosp13/openstack-nova-conductor:13.0-82 DockerNovaConfigImage: registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-84 DockerNovaConsoleauthImage: registry.access.redhat.com/rhosp13/openstack-nova-consoleauth:13.0-82 DockerNovaLibvirtConfigImage: registry.access.redhat.com/rhosp13/openstack-nova-compute:13.0-92 DockerNovaLibvirtImage: registry.access.redhat.com/rhosp13/openstack-nova-libvirt:13.0-95 DockerNovaMetadataImage: registry.access.redhat.com/rhosp13/openstack-nova-api:13.0-84 DockerNovaPlacementConfigImage: registry.access.redhat.com/rhosp13/openstack-nova-placement-api:13.0-83 DockerNovaPlacementImage: registry.access.redhat.com/rhosp13/openstack-nova-placement-api:13.0-83 DockerNovaSchedulerImage: registry.access.redhat.com/rhosp13/openstack-nova-scheduler:13.0-84 DockerNovaVncProxyImage: registry.access.redhat.com/rhosp13/openstack-nova-novncproxy:13.0-85 DockerOpenvswitchImage: registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:13.0-84 DockerPankoApiImage: registry.access.redhat.com/rhosp13/openstack-panko-api:13.0-76 DockerPankoConfigImage: registry.access.redhat.com/rhosp13/openstack-panko-api:13.0-76 DockerRabbitmqConfigImage: registry.access.redhat.com/rhosp13/openstack-rabbitmq:13.0-78 DockerRabbitmqImage: registry.access.redhat.com/rhosp13/openstack-rabbitmq:13.0-78 DockerRedisConfigImage: registry.access.redhat.com/rhosp13/openstack-redis:13.0-79 DockerRedisImage: registry.access.redhat.com/rhosp13/openstack-redis:13.0-79 DockerSwiftAccountImage: registry.access.redhat.com/rhosp13/openstack-swift-account:13.0-74 DockerSwiftConfigImage: registry.access.redhat.com/rhosp13/openstack-swift-proxy-server:13.0-76 DockerSwiftContainerImage: registry.access.redhat.com/rhosp13/openstack-swift-container:13.0-77 DockerSwiftObjectImage: registry.access.redhat.com/rhosp13/openstack-swift-object:13.0-74 DockerSwiftProxyImage: registry.access.redhat.com/rhosp13/openstack-swift-proxy-server:13.0-76 DockerNeutronSriovImage: registry.access.redhat.com/rhosp13/openstack-neutron-sriov-agent:13.0-83
Created attachment 1670327 [details] ansible-error.json
Hey guys, Any update on this ? Thanks.
Hi, so we've seen that error before and it was usually due to the container not being properly removed in the first place. > It looks like in openstack-tripleo-heat-templates-8.3.1-54.el7ost, > "nova_api_ensure_cell0_database_url" container starts at "step: 5", > whereas in openstack-tripleo-heat-templates-8.4.1-42.el7ost starting > of "nova_api_ensure_cell0_database_url" container moved to "step: > 3". This is change id I7b5f6e0a2c8ba77fd575cf1a1003a1553f96efff and it's only in z7, z6 doesn't have it and z8 has already the next change (switch to step3) > Due to which during the controller update, > "nova_api_ensure_cell0_database_url" from 13.0.7 was still running > when step is at 3 and when latest openstack-tripleo-heat-temaplates > are trying to start "nova_api_ensure_cell0_database_url" at step 3, > this is causing the issue: Conflict. The container name > \"/nova_api_ensure_cell0_database_url\" is already in use by > container > 580df6343e9af347fdf157e1f00fe37e0155c02ed368263ce8fc08466fcf7824. You > have to remove (or rename) that container to be able to reuse that > name.." Those containers should be ephemeral. My guess here is that it was not properly removed by docker and that causes that issue. As I said I've seen that before, and the problem is that issue during delete doesn't cause an error, so it stay unnoticed until we play the update. It's only a guess because: - we need more logs: - complete output of the udpate run - sos-report of the overcloud node where the error happened - sos-report of the undercloud. - the way it was tested is not supported as far as I can tell. In all cases, I've triggered a job that does the update from z7 to z11[1] in order to have a reference point. If it's really caused by I7b5f6e0a2c8ba77fd575cf1a1003a1553f96efff we will know right away as this will reproduce systematically. If it's a removal issue then only your log will tell. I'll have the result on Monday, in the meantime if you can provide the above log that would be helpful. Thanks,
Created attachment 1672132 [details] update run output
Hi, Thanks for the response, 1813642-ansible.log (attached to this bug) has the compete output of update run. Coming to sos-report of overcloud and undercloud after the failure, we uploaded them to ftp "dropbox.redhat.com" with names sosreport-overcloud-controller-0-1813642-2020-03-20-orbwngp.tar.xz (overcloud controller) and sosreport-undercloud13-1813642-2020-03-20-zdbbguh.tar.xz (undercloud) in path /incoming. Thanks, Sai
Hey Sofer Athlan-Guyot , Any update on this ?
Hi, so I was able to reproduce this in a lab, I will be able to analyse further tomorrow, and hopefully come up with an definitive answer. Regards,
Hi, So here is the chain of even: - update tripleo heat template to openstack-tripleo-heat-templates-11.3.2-0.20200324120625.c3a8eb4.el8ost - which switches nova_api_ensure_cell0_database_url to step5 to step3 In the logs: - in paunch logs we have a serie of deletion: 2020-03-24 23:07:07.996 55867 DEBUG paunch [ ] $ docker rm nova_api_ensure_default_cell 2020-03-24 23:07:08.051 55867 DEBUG paunch [ ] nova_api_ensure_default_cell 2020-03-24 23:07:08.051 55867 DEBUG paunch [ ] 2020-03-24 23:07:08.052 55867 DEBUG paunch [ ] $ docker inspect --type container --format {{index .Config.Labels "config_data"}} nova_api_map_cell0 2020-03-24 23:07:08.116 55867 DEBUG paunch [ ] {"start_order": 1, "command": "/usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 map_cell0'", "user": "root", "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/log/containers/nova:/var/log/nova", "/var/log/containers/httpd/nova-api:/var/log/httpd", "/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro", "/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro"], "image": "192.168.24.1:8787/rhosp13/openstack-nova-api:2019-07-29-grades", "detach": false, "net": "host"} 2020-03-24 23:07:08.117 55867 DEBUG paunch [ ] 2020-03-24 23:07:08.125 55867 DEBUG paunch [ ] Deleting container (changed config_data): nova_api_map_cell0 2020-03-24 23:07:08.125 55867 DEBUG paunch [ ] $ docker stop nova_api_map_cell0 2020-03-24 23:07:08.180 55867 DEBUG paunch [ ] nova_api_map_cell0 2020-03-24 23:07:08.181 55867 DEBUG paunch [ ] 2020-03-24 23:07:08.181 55867 DEBUG paunch [ ] $ docker rm nova_api_map_cell0 2020-03-24 23:07:08.242 55867 DEBUG paunch [ ] nova_api_map_cell0 but nothing about nova_api_ensure_cell0_database_url being deleted and no error whatsoever. Then when it tries to run it and fails because it hasn't been deleted in the first place. 2020-03-24 23:08:13.269 55867 DEBUG paunch [ ] $ docker run --name nova_api_ensure_cell0_database_url --label config_id=tripleo_step3 --label container_name=nova_api_ensure_cell0_database_url --label managed_by=paunch --label config_data={"start_order": 3, "image": "192.168.24.1:8787/rh-osbs/rhosp13-openstack-nova-api:20200323.1", "environment": ["TRIPLEO_CONFIG_HASH=2bce43a73f636ff057f68b65bdd839cf"], "command": "/usr/bin/bootstrap_host_exec nova_api /nova_api_ensure_cell0_database_url.sh", "user": "root", "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/log/containers/nova:/var/log/nova", "/var/log/containers/httpd/nova-api:/var/log/httpd", "/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro", "/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro", "/var/log/containers/nova:/var/log/nova", "/var/lib/config-data/puppet-generated/nova/:/var/lib/kolla/config_files/src:ro", "/var/lib/docker-config-scripts/nova_api_ensure_cell0_database_url.sh:/nova_api_ensure_cell0_database_url.sh:ro"], "net": "host", "detach": false} --env=TRIPLEO_CONFIG_HASH=2bce43a73f636ff057f68b65bdd839cf --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/nova:/var/log/nova --volume=/var/log/containers/httpd/nova-api:/var/log/httpd --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova --volume=/var/lib/config-data/puppet-generated/nova/:/var/lib/kolla/config_files/src:ro --volume=/var/lib/docker-config-scripts/nova_api_ensure_cell0_database_url.sh:/nova_api_ensure_cell0_database_url.sh:ro --cpuset-cpus=0,1,2,3,4,5,6,7 192.168.24.1:8787/rh-osbs/rhosp13-openstack-nova-api:20200323.1 /usr/bin/bootstrap_host_exec nova_api /nova_api_ensure_cell0_database_url.sh 2020-03-24 23:08:13.315 55867 DEBUG paunch [ ] /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/nova_api_ensure_cell0_database_url" is already in use by container d959dfe22826f52363ce4f1d9dcea454b4331ebaf9bf1f63b6b43f0a7f3d5427. You have to remove (or rename) that container to be able to reuse that name.. So the intersting bits in this is "config_id". It is set righfully to tripleo_step3 but ... this information is encoded in the container label. So on the live environment we have: [root@controller-2 ~]# docker inspect nova_api_ensure_cell0_database_url|jq '.[]|.Config.Labels.config_id' "tripleo_step5" So I wonder if this is why paunch doesn't try to delete it in the first place. The workaround you've applied is correct (renaming the container). You could as well delete it without risk because in the end that what we should do (detect that it has moved to step3 and ensure the delete action happens there as well) I'm discussing the right next course of action right now. Thanks,
Sofer, I found the problem in the env you gave to me. 2 things: - the version of Paunch was outdated, python-paunch-2.5.0-4.el7ost.noarch while the last one is python-paunch-2.5.3-3.el7ost.noarch and it includes the patches that fix https://bugzilla.redhat.com/show_bug.cgi?id=1790792 (same bug as you hit presently I think). - A backport in Paunch was missing (not sure you need it in your case, but it's good to have it): https://code.engineering.redhat.com/gerrit/189510 So now I wonder why the paunch rpm wasn't updated on the overclouds?
Hi Emilien, (In reply to Emilien Macchi from comment #10) > Sofer, I found the problem in the env you gave to me. > > 2 things: > > - the version of Paunch was outdated, python-paunch-2.5.0-4.el7ost.noarch > while the last one is python-paunch-2.5.3-3.el7ost.noarch and it includes > the patches that fix https://bugzilla.redhat.com/show_bug.cgi?id=1790792 > (same bug as you hit presently I think). Oh, you've worked on controller-0, while the update started with controller-2 On ctl-2: rpm -qa | grep paunch python-paunch-2.5.3-3.el7ost.noarch and it was updated during update: /var/log/messages:Mar 24 22:43:41 controller-2 yum[935175]: Updated: python-paunch-2.5.3-3.el7ost.noarch and before paunch was triggered (see timeline in #9). So paunch was 2.5.5-3 at the time of the error. > - A backport in Paunch was missing (not sure you need it in your case, but > it's good to have it): https://code.engineering.redhat.com/gerrit/189510 Let's try to work on this this again today, not sure this patch would help. Thanks,
Hi Sofer, Any update on this?
Sofer (or Emilien), Can we get some updates to this BZ at your nearest convenience?
Hi folks, The last time I looked at this BZ, I realized that a backport into Paunch was missing. I went ahead and did it, and today I built python-paunch-2.5.3-4.el7ost which should include all the needed backports in OSP13. I would like us to retry this scenario and pull the latest paunch, to see if we can reproduce the issue. Like I said in previous comments, a bunch of issues related to that BZ were fixed in a paunch version that is superior to what was on controller2 when the update failed.
Thanks Emillien for update. 1. I see latest python-paunch available today is python-paunch-2.5.3-3.el7ost on rhel-7-server-openstack-13-rpms repository. When do you think we will have python-paunch-2.5.3-4.el7ost RPM published in the repo? 2. Is there any ETA on whoever is trying to reproduce this issue to see if it fixes all the issues or not? As this is a blocker of us, a reasonable ETA so we can plan accordingly from our side would be helpful. Thanks.
(In reply to Sunny Verma from comment #16) > Thanks Emillien for update. > > 1. I see latest python-paunch available today is > python-paunch-2.5.3-3.el7ost on rhel-7-server-openstack-13-rpms repository. > > When do you think we will have python-paunch-2.5.3-4.el7ost RPM published in > the repo? AFIK OSP13z12 GA is scheduled for June 3rd. > 2. Is there any ETA on whoever is trying to reproduce this issue to see if > it fixes all the issues or not? As this is a blocker of us, a reasonable ETA > so we can plan accordingly from our side would be helpful. > > Thanks. I don't know on my side. Sofer, please let me know if DF needs to help on that one.
Hi, so we tested the paunch at the version mentioned, but the problem still happened. Based on the analysis is c#9, we are going to: 1. make a first queen only patch whose sole purpose is to make sure that nova_api_ensure_cell0_database_url is deleted before reaching the paunch stage; 2. make a more long term solution where those container are indeed ephemeral and get destroyed after being run. 1. is implemented in the new review attached to that bz. Thanks,
*** Bug 1837872 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2718