1627946 – Overcloud nodes always try to pull docker images from 192.168.24.1 registry(even if undercloud IP address is 192.168.0.1)

Bug 1627946 - Overcloud nodes always try to pull docker images from 192.168.24.1 registry(even if undercloud IP address is 192.168.0.1)

Summary: Overcloud nodes always try to pull docker images from 192.168.24.1 registry(e...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Steve Baker
QA Contact:	Gurenko Alex
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-11 23:55 UTC by Marius Cornea
Modified:	2019-12-19 20:52 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-27 17:57:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Marius Cornea 2018-09-11 23:55:05 UTC

Description of problem:
Overcloud nodes always try to pull docker images from 192.168.24.1 registry(even if undercloud IP address is  192.168.0.1):

(undercloud) [stack@undercloud75 ~]$ grep -v ^# undercloud.conf  | grep -v ^$ 
[DEFAULT]
local_ip = 192.168.0.1/24
undercloud_public_host = 192.168.0.2
undercloud_admin_host = 192.168.0.3
undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem
undercloud_service_certificate = /etc/pki/instack-certs/undercloud.pem
masquerade_network = 192.168.0.0/24
undercloud_public_vip = 192.168.0.2
undercloud_admin_vip = 192.168.0.3
container_images_file = /home/stack/containers-prepare-parameter.yaml
undercloud_ntp_servers = clock.redhat.com
docker_insecure_registries = docker-registry.engineering.redhat.com,brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
[auth]
[ctlplane-subnet]
cidr = 192.168.0.0/24
dhcp_start = 192.168.0.5
dhcp_end = 192.168.0.24
inspection_iprange = 192.168.0.100,192.168.0.120
gateway = 192.168.0.1
masquerade = true


(undercloud) [stack@undercloud75 ~]$ cat containers-prepare-parameter.yaml 
# Generated with the following on 2018-09-11T16:52:20.938778
#
#   openstack tripleo container image prepare default --output-env-file /home/stack/containers-prepare-parameter.yaml --local-push-destination
#

parameter_defaults:
  ContainerImagePrepare:
  - push_destination: true
    set:
      ceph_image: rhceph
      ceph_namespace: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
      ceph_tag: 3-12
      name_prefix: openstack-
      name_suffix: ''
      namespace: docker-registry.engineering.redhat.com/rhosp14
      neutron_driver: null
      openshift_base_image: ose
      openshift_cockpit_image: registry-console
      openshift_cockpit_namespace: registry.access.redhat.com/openshift3
      openshift_cockpit_tag: v3.9
      openshift_etcd_image: etcd
      openshift_etcd_namespace: registry.access.redhat.com/rhel7
      openshift_etcd_tag: 2018-09-06.1
      openshift_gluster_block_image: rhgs-gluster-block-prov-rhel7
      openshift_gluster_image: rhgs-server-rhel7
      openshift_gluster_namespace: registry.access.redhat.com/rhgs3
      openshift_gluster_tag: latest
      openshift_heketi_image: rhgs-volmanager-rhel7
      openshift_heketi_namespace: registry.access.redhat.com/rhgs3
      openshift_heketi_tag: latest
      openshift_namespace: registry.access.redhat.com/openshift3
      openshift_tag: v3.9
      tag: 2018-09-06.1
    


(undercloud) [stack@undercloud75 ~]$ cat overcloud_deploy.sh 
#!/bin/bash

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/ceph.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /home/stack/dns/dns.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \
-e /home/stack/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/inject-trust-anchor-hiera.yaml \
-e /home/stack/containers-prepare-parameter.yaml \


Version-Release number of selected component (if applicable):
openstack-tripleo-common-containers-9.3.1-0.20180831204016.bb0582a.el7ost.noarch
python2-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch
openstack-tripleo-common-9.3.1-0.20180831204016.bb0582a.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with an undercloud that has local_ip different than 192.168.24.1


Actual results:
Overcloud deployment fails because overcloud nodes try to pull images from 192.168.24.1:8787:

snippet from /var/log/messages:

Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.769883177-04:00" level=warning msg="Error getting v2 registry: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.770359547-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.770398653-04:00" level=debug msg="Trying to pull 192.168.24.1:8787/rhosp12/openstack-keystone-docker from http://192.168.24.1:8787 v2"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781106940-04:00" level=warning msg="Error getting v2 registry: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781179509-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781215121-04:00" level=debug msg="Trying to pull 192.168.24.1:8787/rhosp12/openstack-redis-docker from http://192.168.24.1:8787 v2"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781393741-04:00" level=warning msg="Error getting v2 registry: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781442808-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://192.168.24.1:8787/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
Sep 11 19:52:11 overcloud-controller-0 dockerd-current: time="2018-09-11T19:52:11.781474833-04:00" level=debug msg="Trying to pull 192.168.24.1:8787/rhosp12/openstack-iscsid-docker from http://192.168.24.1:8787 v2"


Expected results:
Overcloud nodes pull images from 192.168.0.1 which is undercloud's IP address.

Additional info:

Comment 2 Steve Baker 2018-09-26 21:56:30 UTC

The workaround is to edit your containers-prepare-parameter.yaml to set push_destination: 192.168.0.1:8787

Is this workaround enough to remove the blocker status?

The prepare code doesn't have access to the undercloud.conf, so it currently assumes that the registry is bound to the same IP as br-ctlplane[1]

Could you please attach the output of "ip addr" to help with finding a better approach?

Comment 3 Marius Cornea 2018-09-26 22:14:53 UTC

(In reply to Steve Baker from comment #2)
> The workaround is to edit your containers-prepare-parameter.yaml to set
> push_destination: 192.168.0.1:8787
> 
> Is this workaround enough to remove the blocker status?
> 
> The prepare code doesn't have access to the undercloud.conf, so it currently
> assumes that the registry is bound to the same IP as br-ctlplane[1]
> 
> Could you please attach the output of "ip addr" to help with finding a
> better approach?

Last time I tried push_destination: 192.168.0.1:8787 as a workaround it didn't work. As a workaround for now we run this before the overcloud deployment:
openstack tripleo container image prepare -e /home/stack/containers-prepare-parameter.yaml --roles-file /usr/share/openstack-tripleo-heat-templates/roles_data.yaml --output-env-file /home/stack/virt/docker-images.yaml


This is the output of ip a:

[stack@undercloud75 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:d0:59:54 brd ff:ff:ff:ff:ff:ff
    inet 10.19.184.224/24 brd 10.19.184.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2620:52:0:13b8:5054:ff:fed0:5954/64 scope global mngtmpaddr dynamic 
       valid_lft 2591822sec preferred_lft 604622sec
    inet6 fe80::5054:ff:fed0:5954/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:cf:ee:73 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fecf:ee73/64 scope link 
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b6:d2:63:f9:0d:f5 brd ff:ff:ff:ff:ff:ff
5: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:cf:ee:73 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/24 brd 192.168.0.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.0.3/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 192.168.0.2/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fecf:ee73/64 scope link 
       valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:1b:d3:c7:2c brd ff:ff:ff:ff:ff:ff
    inet 172.31.0.1/24 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:1bff:fed3:c72c/64 scope link 
       valid_lft forever preferred_lft forever
45: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether f6:b3:96:61:bc:40 brd ff:ff:ff:ff:ff:ff

Comment 4 Steve Baker 2018-09-27 03:35:43 UTC

there must be something else going on here, since there is no mention of 192.168.24.1 in that "ip a" output.

Comment 5 Marius Cornea 2018-09-27 17:57:57 UTC

I tried to reproduce this issue today and I wasn't able. I am going to close the bug and re-open if needed.

Comment 6 Salman Khan 2019-12-19 11:57:47 UTC

Lately facing alike issue where the overcloud deployment tries to pull the images from the public ip of the undercloud that is there and its ping able but the ceph-ansible logs complaining for not able to pull the images as;

stderr: |-
    Trying to pull 10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4...Failed
    error pulling image "10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4": unable to pull 10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4: unable to pull image: Error determining manifest MIME type for docker://10.101.52.7:8787/rhceph-beta/rhceph-4-rhel8:4-4: pinging docker registry returned: Get https://10.101.52.7:8787/v2/: http: server gave HTTP response to HTTPS client
  stderr_lines: <omitted>
  stdout: ''


while the images can be searched from the other br-ctlplane ip that is 10.101.52.5, whereas the ip -4 addr returns;

br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.101.52.5/24 brd 10.101.52.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 10.101.52.6/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet 10.101.52.7/32 scope global br-ctlplane
       valid_lft forever preferred_lft forever


any idea where else to be checked for this or its a expected behavior?

PS: I'm going to attempt the workaround to see the differnce.. in case, let me know for following the wrong bug.. :-)

Comment 7 Salman Khan 2019-12-19 12:01:06 UTC

Forgot to mention; the issue is reported from RHOSP15 i.e. Stein.

whereas;

openstack-tripleo-common-containers-10.8.2-0.20190913130445.4754dea.el8ost.noarch
python3-tripleo-common-10.8.2-0.20190913130445.4754dea.el8ost.noarch
openstack-tripleo-common-10.8.2-0.20190913130445.4754dea.el8ost.noarch



parameter_defaults:
  ContainerImagePrepare:
  - tag_from_label: "{version}-{release}"
    push_destination: true
    excludes:
    - nova-api
    set:
      ceph_image: rhceph-4-rhel8
      ceph_namespace: registry.redhat.io/rhceph-beta
      ceph_tag: 4-4
      namespace: registry.redhat.io/rhosp15-rhel8
      name_prefix: openstack-
      name_suffix: ''
      neutron_driver: ovn
      tag: latest
      barbican_tag: latest
      barbican_api_image: barbican-api
      barbican_keystone_image: barbican-keystone
      barbican_worker_image: barbican-worker
  - push_destination: true
    includes:
    - nova-api
    set:
      namespace: registry.redhat.io/rhosp15-rhel8
      tag: 15.0-69
  ContainerImageRegistryCredentials:
    registry.redhat.io:

Comment 8 Steve Baker 2019-12-19 20:52:55 UTC

The IP detection logic did not change in Stein, it still looks for the "first" address in br-ctlplane[1]. I wonder if the order of addresses returned differs to the `ip addr` command. (maybe the order in python is undetermined, but the ip command sorts by address?)

In train (OSP-16) this logic changes to hostname entry lookup[2], so this shouldn't be an issue in the future.

[1] https://opendev.org/openstack/tripleo-common/src/branch/stable/stein/tripleo_common/image/image_uploader.py#L101
[2] https://opendev.org/openstack/tripleo-common/src/branch/stable/train/tripleo_common/image/image_uploader.py#L101

Note You need to log in before you can comment on or make changes to this bug.