Description of problem: Running a minor update fails with "service endpoint with name neutron_ovs_agent already exists" error. The update is not removing the old containers before creating the new ones. How reproducible: Right now the customer is stuck at this step during an update however we have suggested to rename the container to get passed this issue as we have already encountered these in a lab environment. I have attached the logs to a private comment.
Still happening in z11: ~~~ (undercloud) [stack@director ~]$ openstack stack failures list overcloud --long | grep "service endpoint" "stderr: /usr/bin/docker-current: Error response from daemon: service endpoint with name logrotate_crond already exists.", "stderr: /usr/bin/docker-current: Error response from daemon: service endpoint with name neutron_dhcp already exists.", "stderr: /usr/bin/docker-current: Error response from daemon: service endpoint with name neutron_metadata_agent already exists.", "stderr: /usr/bin/docker-current: Error response from daemon: service endpoint with name neutron_l3_agent already exists.", ~~~ Happens with Pacemaker containers as well which is a bit more difficult to figure out since the update doesn't fail until you start the compute nodes and they can't reach the VIP.
Hi, so I think the error here that the container *fail* to be deleted by paunch. The problem is hard to detect as that failure doesn't cause the process to stop. Those steps are the one we have in common we deployment and we've seen that in other bz. Moving this to dfg:df for further analysis, but we will need: - sos-report of a node where this happens; - complete output of the update process (the delete error should be visible there as well)
We believe this may be solved by applying this patch: https://review.opendev.org/#/c/704656/
Hey Luke, I extracted the src RPM and can see the container-update.py script contained in the RPM. But I don't see it being copied onto the filesystem anywhere. So I'm not too sure how this is all coming together for the container-update.py to be executed and help with the issue. Are you able to provide some insights there? I know we're updating tripleo-common on Director, but ultimately this script will need to be available to the overcloud nodes or Ansible during the update. So do we do something to put this script on the Overcloud nodes during the deployment?
Hey guys, So for this one. It seems that Paunch would be the package that needs updating here. It seems the line that needs backporting is probably this one: https://github.com/openstack/paunch/blob/master/paunch/runner.py#L295 Since prior to this, it didn't have the -f https://github.com/openstack/paunch/blob/stable/stein/paunch/runner.py#L290 I'm updating Paunch on my lab environment where I am currently hitting this issue while enabling DVR, so I will update runner.py to include the -f: ~~~ 80 def remove_container(self, container): 81 self.execute([self.docker_cmd, 'stop', container], self.log) 82 cmd = [self.docker_cmd, 'rm', '-f', container] 83 cmd_stdout, cmd_stderr, returncode = self.execute(cmd, self.log) 84 if returncode != 0: 85 self.log.error('Error removing container: %s' % container) 86 self.log.error(cmd_stderr) ~~~
This didn't work, but looking at what paunch is doing when I run: paunch --debug apply --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_3.json --config-id tripleo_step3 --managed-by tripleo-Controller This reproduces the issue: Did not find container with "['docker', 'ps', '-a', '--filter', 'label=container_name=nova_libvirt', '--format', '{{.Names}}']" $ docker run --name nova_libvirt --label config_id=tripleo_step3 --label container_name=nova_libvirt --label managed_by=paunch --label config_data={"start_order": 1, "ulimit": ["nofile=131072", "nproc=126960"], "image": "192.168.24.1:8787/rhosp13/openstack-nova-libvirt:13.0-134", "pid": "host", "environment": ["KOLLA_CONFIG_STRATEGY=COPY_ALWAYS", "TRIPLEO_CONFIG_HASH=83064576a84887d9c3c48eb59627bce6"], "cpuset_cpus": "all", "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro", "/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro", "/lib/modules:/lib/modules:ro", "/dev:/dev", "/run:/run", "/sys/fs/cgroup:/sys/fs/cgroup", "/var/lib/nova:/var/lib/nova:shared", "/etc/libvirt:/etc/libvirt", "/var/run/libvirt:/var/run/libvirt", "/var/lib/libvirt:/var/lib/libvirt", "/var/log/containers/libvirt:/var/log/libvirt", "/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro", "/var/lib/vhost_sockets:/var/lib/vhost_sockets", "/sys/fs/selinux:/sys/fs/selinux"], "net": "host", "privileged": true, "restart": "always"} --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --env=TRIPLEO_CONFIG_HASH=83064576a84887d9c3c48eb59627bce6 --net=host --pid=host --ulimit=nofile=131072 --ulimit=nproc=126960 --privileged=true --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro --volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro --volume=/lib/modules:/lib/modules:ro --volume=/dev:/dev --volume=/run:/run --volume=/sys/fs/cgroup:/sys/fs/cgroup --volume=/var/lib/nova:/var/lib/nova:shared --volume=/etc/libvirt:/etc/libvirt --volume=/var/run/libvirt:/var/run/libvirt --volume=/var/lib/libvirt:/var/lib/libvirt --volume=/var/log/containers/libvirt:/var/log/libvirt --volume=/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro --volume=/var/lib/vhost_sockets:/var/lib/vhost_sockets --volume=/sys/fs/selinux:/sys/fs/selinux 192.168.24.1:8787/rhosp13/openstack-nova-libvirt:13.0-134 /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/nova_libvirt" is already in use by container 5bf51fcb28755c18a64077c68c02774fc2ca486da03c3c3a3ae54ad504e42844. You have to remove (or rename) that container to be able to reuse that name.. See '/usr/bin/docker-current run --help'. Error running ['docker', 'run', '--name', 'nova_libvirt', '--label', 'config_id=tripleo_step3', '--label', 'container_name=nova_libvirt', '--label', 'managed_by=paunch', '--label', 'config_data={"start_order": 1, "ulimit": ["nofile=131072", "nproc=126960"], "image": "192.168.24.1:8787/rhosp13/openstack-nova-libvirt:13.0-134", "pid": "host", "environment": ["KOLLA_CONFIG_STRATEGY=COPY_ALWAYS", "TRIPLEO_CONFIG_HASH=83064576a84887d9c3c48eb59627bce6"], "cpuset_cpus": "all", "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro", "/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro", "/lib/modules:/lib/modules:ro", "/dev:/dev", "/run:/run", "/sys/fs/cgroup:/sys/fs/cgroup", "/var/lib/nova:/var/lib/nova:shared", "/etc/libvirt:/etc/libvirt", "/var/run/libvirt:/var/run/libvirt", "/var/lib/libvirt:/var/lib/libvirt", "/var/log/containers/libvirt:/var/log/libvirt", "/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro", "/var/lib/vhost_sockets:/var/lib/vhost_sockets", "/sys/fs/selinux:/sys/fs/selinux"], "net": "host", "privileged": true, "restart": "always"}', '--detach=true', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=83064576a84887d9c3c48eb59627bce6', '--net=host', '--pid=host', '--ulimit=nofile=131072', '--ulimit=nproc=126960', '--privileged=true', '--restart=always', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro', '--volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro', '--volume=/lib/modules:/lib/modules:ro', '--volume=/dev:/dev', '--volume=/run:/run', '--volume=/sys/fs/cgroup:/sys/fs/cgroup', '--volume=/var/lib/nova:/var/lib/nova:shared', '--volume=/etc/libvirt:/etc/libvirt', '--volume=/var/run/libvirt:/var/run/libvirt', '--volume=/var/lib/libvirt:/var/lib/libvirt', '--volume=/var/log/containers/libvirt:/var/log/libvirt', '--volume=/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro', '--volume=/var/lib/vhost_sockets:/var/lib/vhost_sockets', '--volume=/sys/fs/selinux:/sys/fs/selinux', '192.168.24.1:8787/rhosp13/openstack-nova-libvirt:13.0-134']. [125] stdout: stderr: /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/nova_libvirt" is already in use by container 5bf51fcb28755c18a64077c68c02774fc2ca486da03c3c3a3ae54ad504e42844. You have to remove (or rename) that container to be able to reuse that name.. See '/usr/bin/docker-current run --help'. It searches for the container using the container_name, which returns nothing: [root@overcloud-compute-6 ~]# docker ps -a --filter label=container_name=nova_libvirt --format {{.Names}} [root@overcloud-compute-6 ~]# So something wrong with the label container_name?
container_name seems to be working for neutron_ovs_agent for me this time: [root@overcloud-compute-6 ~]# docker ps -a --filter label=container_name=neutron_ovs_agent --format {{.Names}} neutron_ovs_agent [root@overcloud-compute-6 ~]# docker inspect $(docker ps -a --filter label=container_name=neutron_ovs_agent --format {{.Names}}) | grep container_name "container_name": "neutron_ovs_agent", Whereas this is missing from the nova_libvirt container: [root@overcloud-compute-6 ~]# docker inspect nova_libvirt | grep container_name [root@overcloud-compute-6 ~]# If I stop and rename the container and then re-run the paunch command: paunch --debug apply --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_3.json --config-id tripleo_step3 --managed-by tripleo-Controller It now has the label [root@overcloud-compute-6 ~]# docker inspect nova_libvirt | grep container_name "container_name": "nova_libvirt", So the issue is the missing label, but I dont yet know why.
A summary of this issue: Paunch debug has been utilised in various scenario's to troubleshoot issues with containers. Additionally, in some cases, it has been used to apply hotfixed RPM's in containerized services. Paunch debug does NOT apply container labels in it's current form: https://github.com/openstack/paunch/blob/stable/queens/paunch/__init__.py#L139-L147 Subsequently, any containers that have been relaunched with Paunch debug will be missing the label container_name: [root@overcloud-compute-6 ~]# docker inspect neutron_ovs_agent | grep container_name [root@overcloud-compute-6 ~]# We can reproduce the issue by doing basically the same thing the overcloud deploy is doing. Using the following command: paunch apply --file neutron_ovs_agent.json --config-id tripleo_step4 --managed-by tripleo-Compute [root@overcloud-compute-6 ~]# paunch apply --file neutron_ovs_agent.json --config-id tripleo_step4 --managed-by tripleo-Compute Did not find container with "['docker', 'ps', '-a', '--filter', 'label=container_name=neutron_ovs_agent', '--filter', 'label=config_id=tripleo_step4', '--format', '{{.Names}}']" - retrying without config_id Did not find container with "['docker', 'ps', '-a', '--filter', 'label=container_name=neutron_ovs_agent', '--format', '{{.Names}}']" Error running ['docker', 'run', '--name', 'neutron_ovs_agent', '--label', 'config_id=tripleo_step4', '--label', 'container_name=neutron_ovs_agent', '--label', 'managed_by=paunch', '--label', 'config_data={"start_order": 10, "ulimit": ["nofile=16384"], "healthcheck": {"test": "/openstack/healthcheck 5672"}, "image": "registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:latest-hotfix-bz1819055", "pid": "host", "environment": ["KOLLA_CONFIG_STRATEGY=COPY_ALWAYS", "TRIPLEO_CONFIG_HASH=d358ea7a3382ea0902b0f073dcb3a5ba"], "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/log/containers/neutron:/var/log/neutron", "/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro", "/var/lib/docker-config-scripts/neutron_ovs_agent_launcher.sh:/neutron_ovs_agent_launcher.sh:ro", "/lib/modules:/lib/modules:ro", "/run/openvswitch:/run/openvswitch"], "net": "host", "privileged": true, "restart": "always"}', '--detach=true', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=d358ea7a3382ea0902b0f073dcb3a5ba', '--net=host', '--pid=host', '--ulimit=nofile=16384', '--health-cmd=/openstack/healthcheck 5672', '--privileged=true', '--restart=always', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/log/containers/neutron:/var/log/neutron', '--volume=/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/docker-config-scripts/neutron_ovs_agent_launcher.sh:/neutron_ovs_agent_launcher.sh:ro', '--volume=/lib/modules:/lib/modules:ro', '--volume=/run/openvswitch:/run/openvswitch', '--cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31', 'registry.access.redhat.com/rhosp13/openstack-neutron-openvswitch-agent:latest-hotfix-bz1819055']. [125] stdout: stderr: /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/neutron_ovs_agent" is already in use by container 2d0cb2627111762795972de33394eb7c46719ec0b66dcf6e8d4405f67443d60e. You have to remove (or rename) that container to be able to reuse that name.. See '/usr/bin/docker-current run --help'. Here, we can see that the issue has been reproduced. If we take the json file for neutron_ovs_agent: paunch debug --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_4.json --container neutron_ovs_agent --action dump-json > neutron_ovs_agent.json Stop and delete the existing container: docker stop neutron_ovs_agent neutron_l3_agent && docker rm neutron_ovs_agent neutron_l3_agent And re-launch it using paunch apply with our json file: [root@overcloud-compute-6 ~]# paunch apply --file neutron_ovs_agent-hotfix-bz1819055/neutron_ovs_agent.json --config-id tripleo_step4 --managed-by tripleo-Compute Did not find container with "['docker', 'ps', '-a', '--filter', 'label=container_name=neutron_ovs_agent', '--filter', 'label=config_id=tripleo_step4', '--format', '{{.Names}}']" - retrying without config_id Did not find container with "['docker', 'ps', '-a', '--filter', 'label=container_name=neutron_ovs_agent', '--format', '{{.Names}}']" We can see that the label has now been applied [root@overcloud-compute-6 ~]# docker inspect neutron_ovs_agent | grep container_name "container_name": "neutron_ovs_agent", So this whole Bugzilla is likely related to the use of Paunch debug and the missing line for adding labels: https://github.com/openstack/paunch/blob/stable/queens/paunch/builder/compose1.py#L81 We can fix it by adding that line to the this file, in this section: https://github.com/openstack/paunch/blob/stable/queens/paunch/__init__.py#L139-L147 For anyone experiencing this issue, you can stop and delete the container, which will allow the deployment to succeed and it will also fix the label issue moving forward. You will only have the issue if a container has been rebuilt with paunch debug. I'll update the solution article as well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2718