Description of problem: Overcloud nodes always try to pull docker images from 192.168.24.1 registry(even if undercloud IP address is 192.168.0.1): (undercloud) [stack@undercloud75 ~]$ grep -v ^# undercloud.conf | grep -v ^$ [DEFAULT] local_ip = 192.168.0.1/24 undercloud_public_host = 192.168.0.2 undercloud_admin_host = 192.168.0.3 undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem masquerade_network = 192.168.0.0/24 undercloud_public_vip = 192.168.0.2 undercloud_admin_vip = 192.168.0.3 container_images_file = /home/stack/containers-prepare-parameter.yaml undercloud_ntp_servers = clock.redhat.com docker_insecure_registries = docker-registry.engineering.redhat.com,brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 [auth] [ctlplane-subnet] cidr = 192.168.0.0/24 dhcp_start = 192.168.0.5 dhcp_end = 192.168.0.24 inspection_iprange = 192.168.0.100,192.168.0.120 gateway = 192.168.0.1 masquerade = true (undercloud) [stack@undercloud75 ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2018-09-11T16:52:20.938778 # # openstack tripleo container image prepare default --output-env-file /home/stack/containers-prepare-parameter.yaml --local-push-destination # parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_image: rhceph ceph_namespace: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 ceph_tag: 3-12 name_prefix: openstack- name_suffix: '' namespace: docker-registry.engineering.redhat.com/rhosp14 neutron_driver: null openshift_base_image: ose openshift_cockpit_image: registry-console openshift_cockpit_namespace: registry.access.redhat.com/openshift3 openshift_cockpit_tag: v3.9 openshift_etcd_image: etcd openshift_etcd_namespace: registry.access.redhat.com/rhel7 openshift_etcd_tag: 2018-09-06.1 openshift_gluster_block_image: rhgs-gluster-block-prov-rhel7 openshift_gluster_image: rhgs-server-rhel7 openshift_gluster_namespace: registry.access.redhat.com/rhgs3 openshift_gluster_tag: latest openshift_heketi_image: rhgs-volmanager-rhel7 openshift_heketi_namespace: registry.access.redhat.com/rhgs3 openshift_heketi_tag: latest openshift_namespace: registry.access.redhat.com/openshift3 openshift_tag: v3.9 tag: 2018-09-06.1 (undercloud) [stack@undercloud75 ~]$ cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/ceph.yaml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/docker-images.yaml \ -e /home/stack/dns/dns.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \ -e /home/stack/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/inject-trust-anchor-hiera.yaml \ -e /home/stack/containers-prepare-parameter.yaml \ Version-Release number of selected component (if applicable): openstack-tripleo-common-containers-9.3.1-0.20180831204016.bb0582a.el7ost.noarch python2-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch openstack-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with an undercloud that has local_ip different than 192.168.24.1 Actual results: Overcloud deployment fails because overcloud nodes try to pull images from 192.168.24.1:8787: snippet from /var/log/messages: Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.769883177-04:00" level=warning msg="Error getting v2 registry: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.770359547-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.770398653-04:00" level=debug msg="Trying to pull 192.168.24.1:8787/rhosp12/openstack-keystone-docker from http://192.168.24.1:8787 v2" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781106940-04:00" level=warning msg="Error getting v2 registry: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781179509-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781215121-04:00" level=debug msg="Trying to pull 192.168.24.1:8787/rhosp12/openstack-redis-docker from http://192.168.24.1:8787 v2" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781393741-04:00" level=warning msg="Error getting v2 registry: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781442808-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781474833-04:00" level=debug msg="Trying to pull 192.168.24.1:8787/rhosp12/openstack-iscsid-docker from http://192.168.24.1:8787 v2" Expected results: Overcloud nodes pull images from 192.168.0.1 which is undercloud's IP address. Additional info:
The workaround is to edit your containers-prepare-parameter.yaml to set push_destination: 192.168.0.1:8787 Is this workaround enough to remove the blocker status? The prepare code doesn't have access to the undercloud.conf, so it currently assumes that the registry is bound to the same IP as br-ctlplane[1] Could you please attach the output of "ip addr" to help with finding a better approach?
(In reply to Steve Baker from comment #2) > The workaround is to edit your containers-prepare-parameter.yaml to set > push_destination: 192.168.0.1:8787 > > Is this workaround enough to remove the blocker status? > > The prepare code doesn't have access to the undercloud.conf, so it currently > assumes that the registry is bound to the same IP as br-ctlplane[1] > > Could you please attach the output of "ip addr" to help with finding a > better approach? Last time I tried push_destination: 192.168.0.1:8787 as a workaround it didn't work. As a workaround for now we run this before the overcloud deployment: openstack tripleo container image prepare -e /home/stack/containers-prepare-parameter.yaml --roles-file /usr/share/openstack-tripleo-heat-templates/roles_data.yaml --output-env-file /home/stack/virt/docker-images.yaml This is the output of ip a: [stack@undercloud75 ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:d0:59:54 brd ff:ff:ff:ff:ff:ff inet 10.19.184.224/24 brd 10.19.184.255 scope global eth0 valid_lft forever preferred_lft forever inet6 2620:52:0:13b8:5054:ff:fed0:5954/64 scope global mngtmpaddr dynamic valid_lft 2591822sec preferred_lft 604622sec inet6 fe80::5054:ff:fed0:5954/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP group default qlen 1000 link/ether 52:54:00:cf:ee:73 brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fecf:ee73/64 scope link valid_lft forever preferred_lft forever 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether b6:d2:63:f9:0d:f5 brd ff:ff:ff:ff:ff:ff 5: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 52:54:00:cf:ee:73 brd ff:ff:ff:ff:ff:ff inet 192.168.0.1/24 brd 192.168.0.255 scope global br-ctlplane valid_lft forever preferred_lft forever inet 192.168.0.3/32 scope global br-ctlplane valid_lft forever preferred_lft forever inet 192.168.0.2/32 scope global br-ctlplane valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fecf:ee73/64 scope link valid_lft forever preferred_lft forever 6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:1b:d3:c7:2c brd ff:ff:ff:ff:ff:ff inet 172.31.0.1/24 scope global docker0 valid_lft forever preferred_lft forever inet6 fe80::42:1bff:fed3:c72c/64 scope link valid_lft forever preferred_lft forever 45: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether f6:b3:96:61:bc:40 brd ff:ff:ff:ff:ff:ff
there must be something else going on here, since there is no mention of 192.168.24.1 in that "ip a" output.
I tried to reproduce this issue today and I wasn't able. I am going to close the bug and re-open if needed.
Lately facing alike issue where the overcloud deployment tries to pull the images from the public ip of the undercloud that is there and its ping able but the ceph-ansible logs complaining for not able to pull the images as; stderr: |- Trying to pull 10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4...Failed error pulling image "10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4": unable to pull 10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4: unable to pull image: Error determining manifest MIME type for docker://10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4: pinging docker registry returned: Get https://10.101.52.7:8787/v2/: http: server gave HTTP response to HTTPS client stderr_lines: <omitted> stdout: '' while the images can be searched from the other br-ctlplane ip that is 10.101.52.5, whereas the ip -4 addr returns; br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 10.101.52.5/24 brd 10.101.52.255 scope global br-ctlplane valid_lft forever preferred_lft forever inet 10.101.52.6/32 scope global br-ctlplane valid_lft forever preferred_lft forever inet 10.101.52.7/32 scope global br-ctlplane valid_lft forever preferred_lft forever any idea where else to be checked for this or its a expected behavior? PS: I'm going to attempt the workaround to see the differnce.. in case, let me know for following the wrong bug.. :-)
Forgot to mention; the issue is reported from RHOSP15 i.e. Stein. whereas; openstack-tripleo-common-containers-10.8.2-0.20190913130445.4754dea.el8ost.noarch python3-tripleo-common-10.8.2-0.20190913130445.4754dea.el8ost.noarch openstack-tripleo-common-10.8.2-0.20190913130445.4754dea.el8ost.noarch parameter_defaults: ContainerImagePrepare: - tag_from_label: "{version}-{release}" push_destination: true excludes: - nova-api set: ceph_image: rhceph-4-rhel8 ceph_namespace: registry.redhat.io/rhceph-beta ceph_tag: 4-4 namespace: registry.redhat.io/rhosp15-rhel8 name_prefix: openstack- name_suffix: '' neutron_driver: ovn tag: latest barbican_tag: latest barbican_api_image: barbican-api barbican_keystone_image: barbican-keystone barbican_worker_image: barbican-worker - push_destination: true includes: - nova-api set: namespace: registry.redhat.io/rhosp15-rhel8 tag: 15.0-69 ContainerImageRegistryCredentials: registry.redhat.io:
The IP detection logic did not change in Stein, it still looks for the "first" address in br-ctlplane[1]. I wonder if the order of addresses returned differs to the `ip addr` command. (maybe the order in python is undetermined, but the ip command sorts by address?) In train (OSP-16) this logic changes to hostname entry lookup[2], so this shouldn't be an issue in the future. [1] https://opendev.org/openstack/tripleo-common/src/branch/stable/stein/tripleo_common/image/image_uploader.py#L101 [2] https://opendev.org/openstack/tripleo-common/src/branch/stable/train/tripleo_common/image/image_uploader.py#L101