Bug 1753676 - Overcloud deploy does not handle login to container repository
Summary: Overcloud deploy does not handle login to container repository
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: Alex Schultz
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-19 14:58 UTC by James E. LaBarre
Modified: 2020-03-05 12:00 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.2-0.20191202200455.41d9f8a.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-05 12:00:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
generated customization.yaml from deploy (712 bytes, text/plain)
2019-09-19 14:58 UTC, James E. LaBarre
no flags Details
node-info.yaml used in deploy (702 bytes, text/plain)
2019-09-19 14:59 UTC, James E. LaBarre
no flags Details
overcloud_images.yaml used for deploy (479 bytes, text/plain)
2019-09-19 15:00 UTC, James E. LaBarre
no flags Details
roles_data.yaml used for deploy (17.60 KB, text/x-matlab)
2019-09-19 15:01 UTC, James E. LaBarre
no flags Details
tripleo-overcloud-passwords.yaml used for deploy (408 bytes, text/plain)
2019-09-19 15:01 UTC, James E. LaBarre
no flags Details
Overcloud deploy test run, 2019-Nov-05 (1.15 MB, text/plain)
2019-11-05 21:05 UTC, James E. LaBarre
no flags Details
Overcloud deploy log, Nov 11 (563.74 KB, text/plain)
2019-11-11 22:32 UTC, James E. LaBarre
no flags Details
containers-prepare-parameter file for overcloud deploy (522 bytes, text/plain)
2019-11-11 22:32 UTC, James E. LaBarre
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1849747 0 None None None 2019-10-24 23:21:43 UTC
OpenStack gerrit 691131 0 'None' MERGED Add second fact to ensure type safty 2020-12-21 14:56:42 UTC
Red Hat Product Errata RHBA-2020:0643 0 None None None 2020-03-05 12:00:34 UTC

Description James E. LaBarre 2019-09-19 14:58:55 UTC
Created attachment 1616774 [details]
generated customization.yaml from deploy

Description of problem:
When running openstack overcloud deploy, the playbook is trying to pull container images from registry.redhat.io.  This server requires a login in order to access images.  However, even after running a 'podman login registry.redhat.io' from the director node before running the 'openstack overcloud deploy', the nodes it is trying to set up do not get logged into the container registry, and therefore cannot retrieve containers ("invalid username/password" error)

Version-Release number of selected component (if applicable):
RC-0.9 and above

How reproducible:
always

Steps to Reproduce:
1. successfully deploy an OSP15 undercloud 
2. work through the steps to configure images, baremetal nodes (including introspection) for an overcloud (in this particular configuration, had minimal one Controller, one ppc64le Compute) up to and just before the "openstack overcloud deploy"
3. (optionally) run "podman login -u <RH access username> registry.redhat.io"
4. run "openstack overcloud deploy ..." with it's appropriate parameters (our details shown below)

Actual results:
As soon as the overcloud node attempts to run a "podman pull ..." of the containers, it fails with:
"unable to pull registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume:15.0: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume:15.0: unable to retrieve auth token: invalid username/password",


Expected results:
Podman will be able to retrieve and install all containers needed for an overcloud deploy

Additional info:

Our configuration on this test installation:

*  x86_64 host for 2 VMs, ansible playbooks for deploy are run from here (until we re-try the deploy from the director node).  Running VirtBMC to provide out-of-band management emulation to the Controller node VM

*  Director node running as VM on the x86_64 host node above.

*  Controller node running as a VM on x86_64 host listed above.  Uses VirtBMC on hosting server to emulate it's BMC

*  Power8 system (physical, not VM) as Compute (ppc64le) node.

Deploy command run:

openstack overcloud deploy --templates -e /home/stack/templates/node-info.yml -e /home/stack/templates/overcloud_images.yaml -e /home/stack/templates/tripleo-overcloud-passwords.yaml -e /home/stack/templates/customization.yaml --disable-validations -r /home/stack/templates/roles_data.yaml --ntp-server clock.corp.redhat.com

Comment 1 James E. LaBarre 2019-09-19 14:59:53 UTC
Created attachment 1616776 [details]
node-info.yaml used in deploy

Comment 2 James E. LaBarre 2019-09-19 15:00:41 UTC
Created attachment 1616777 [details]
overcloud_images.yaml used for deploy

Comment 3 James E. LaBarre 2019-09-19 15:01:16 UTC
Created attachment 1616778 [details]
roles_data.yaml used for deploy

Comment 4 James E. LaBarre 2019-09-19 15:01:40 UTC
Created attachment 1616779 [details]
tripleo-overcloud-passwords.yaml used for deploy

Comment 5 James E. LaBarre 2019-09-19 18:16:48 UTC
Some clarification on steps I had tested:

From the host for my VMs, where I was running our own playbooks to deploy the undercloud and overcloud from, I was looking for command-line settings to forward the authentication to the playbooks.  Nothing found for that.

After I found I could run the deploy up *to* the "openstack overcloud deploy" step, I tried running the "podman login ..." command from there, to see if the expectation was that a customer would authenticate on a deployed undercloud, then the overcloud deploy would carry over the authentication.  This was not successful either.

Authenticating on the individual nodes won't work because the overcloud deploy will wipe out any installation on the individual nodes.

The important issue is what would be the expected Customer procedure, and is it properly documented on the public site.

Comment 6 Alex Schultz 2019-09-19 19:55:55 UTC
You can specify credentials for a registry using the ContainerImageRegistryCredentials parameter. 

Example:

parameter_defaults:
  ContainerImageRegistryCredentials:
    registry.redhat.io:
      myuser: mypassword


See also https://bugzilla.redhat.com/show_bug.cgi?id=1716627#c3

This should be referenced in the documentation, https://bugzilla.redhat.com/show_bug.cgi?id=1723969

Comment 8 James E. LaBarre 2019-09-23 19:31:09 UTC
(In reply to Alex Schultz from comment #7)
> Actually I found it in the docs.
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15-
> beta/html/director_installation_and_usage/preparing-for-director-
> installation#container-image-preparation-parameters
> 
> Did you have this specified and it didn't work?

Tested this a couple ways, from a host system running an ansible playbook to do all the overcloud deploy scripts, and running the "openstack overcloud deloy" from the Director node itself.  Even though the overcloud should have pulled in the login information when it ran, the overcloud deploy is not finding it, or is not passing it on to the nodes.  Neither the controller nor the compute have had any containers installed.

Comment 9 James E. LaBarre 2019-09-23 19:56:01 UTC
I just tested pulling the container manually on the controller and the compute nodes (x86_64 and ppc64le respectively).  Logged into the node as heat-admin, then ran "sudo podman login -u <username> registry.redhat.io".  Gave it my password at the prompt, at which point it said "Login Succeeded!".  Then pulled the image the deploy wanted to download, with the same command and path the deploy failed at; "sudo podman pull registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume:15.0".  This pulled down successfully, and a "sudo podman image list" will show 

REPOSITORY                                                 TAG    IMAGE ID       CREATED       SIZE
registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume   15.0   90498b64119f   10 days ago   1.16 GB

(for the x86_64 controller)


REPOSITORY                                                 TAG    IMAGE ID       CREATED       SIZE
registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume   15.0   3e7c1eec0e27   10 days ago   1.29 GB

(for the Power8 compute)


I would presume this means my login does have proper access for the container.  

Does the /home/stack/containers-prepare-parameter.yaml file need to get called by the overcloud deploy?  (the format of it doesn't look right for the overcloud parameters)

Comment 10 Alex Schultz 2019-09-23 20:00:37 UTC
Yes you need to pass -e /home/stack/containers-prepare-parameter.yaml as part of the deployment.

Comment 11 Kha Do 2019-09-26 19:01:51 UTC
Setting ContainerImageRegistryCredentials and setting "push_destination: true" in the containers-prepare-parameter.yaml file lets the director download the images and store them on the director. If push_destination is not set, the login credentials are not used on the nodes unless "ContainerImageRegistryLogin: true" is also set. But when that parameter is set, you get the following error instead:

fatal: [overcloud-controller-0]: FAILED! => {"msg": "The conditional check 'container_registry_logins_json | length) > 0' failed. The error was: template error while templating string: unexpected ')'. String: {% if container_registry_logins_json | length) > 0 %} True {% else %} False {% endif %}\n\nThe error appears to be in '/var/lib/mistral/overcloud/Controller/host_prep_tasks.yaml': line 737, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n          kdo: password\n  - name: Convert logins json to dict\n    ^ here\n"}

Comment 15 Kevin Carter 2019-09-26 22:36:28 UTC
This issue is caused by a typo, an upstream review to resolve this issue has been committed, and can be tracked here: [ https://review.opendev.org/685185 ].

Comment 16 Kha Do 2019-09-27 14:11:44 UTC
With the missing parenthesis added, you end up with the following error:

TASK [Convert logins json to dict] *********************************************
Friday 27 September 2019  10:05:36 -0400 (0:00:00.052)       0:02:36.839 ****** 
fatal: [overcloud-controller-0]: FAILED! => {"msg": "Unexpected templating type error occurred on ({{ container_registry_logins_json | from_json }}): the JSON object must be str, bytes or bytearray, not 'dict'"}

Comment 17 Kevin Carter 2019-09-27 19:48:34 UTC
Can you test again with the following review [ https://review.opendev.org/#/c/685469 ] we believe this should ensure the data is handled correctly no matter how it is provided to the heat template.

Comment 18 Kha Do 2019-09-30 12:26:36 UTC
Yes, that allowed the nodes to login to the container registry.

Comment 19 James E. LaBarre 2019-11-05 18:39:46 UTC
Nope, tried that patch on my system, it still fails to retrieve the containers.  I have gone so far as to do a "podman login" to make sure the Director will recognize the login.  I can do a "podman pull" of a container manually on the Director, so I know the account is valid, and that the container is also a valid one.

Comment 20 Alex Schultz 2019-11-05 20:35:33 UTC
Please provide logs of the most recent issue as we believe we have resolved the ansible error.

Comment 21 James E. LaBarre 2019-11-05 21:05:05 UTC
Created attachment 1633084 [details]
Overcloud deploy test run, 2019-Nov-05

Command used to run deploy:

openstack overcloud deploy \
 --templates \
 -e "/home/stack/templates/node-info.yml" \
 -e "/home/stack/templates/overcloud_images.yaml" \
 -e "/home/stack/templates/tripleo-overcloud-passwords.yaml" \
 -e "/home/stack/templates/customization.yaml" \
 -e "/home/stack/containers-prepare-parameter.yaml" \
 --disable-validations \
 -r /home/stack/templates/roles_data.yaml \
 --ntp-server clock.corp.redhat.com

Comment 22 James E. LaBarre 2019-11-11 22:29:53 UTC
I thought maybe the problem was in the podman-baremetal-ansible.yaml file, where under "ContainerImageRegistryLogin" it set "default: false" and should have been set for "true", but editing that in the file that the prior failed setup put there (trying a hard-coded edit before hunting down the configuration location) as well as putting the login information into containers-prepare-parameter.yaml and calling it as part of the overcloud deploy command (... -e /home/stack/containers-prepare-parameter.yaml ).

I had run a "podman login" to registry.redhat.io before trying a deploy, and verified a "podman pull registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume:15.0" would be able to pull down the container, which it did.  Within that same terminal, right after trying a manual pull, I ran the overcloud deploy again, and it still fails with the "unable to retrieve auth token: invalid username/password".

I will attach the logs and a redacted containers-prepare-parameter.yaml file after this comment

Comment 23 James E. LaBarre 2019-11-11 22:32:00 UTC
Created attachment 1635100 [details]
Overcloud deploy log, Nov 11

Command line:   openstack overcloud deploy --templates -e "/home/stack/containers-prepare-parameter.yaml" -e "/home/stack/templates/node-info.yml" -e"/home/stack/templates/overcloud_images.yaml" -e"/home/stack/templates/tripleo-overcloud-passwords.yaml" -e"/home/stack/templates/customization.yaml" --disable-validations -r /home/stack/templates/roles_data.yaml --ntp-server clock.corp.redhat.com

Comment 24 James E. LaBarre 2019-11-11 22:32:52 UTC
Created attachment 1635102 [details]
containers-prepare-parameter file for overcloud deploy

Comment 25 Alex Schultz 2019-11-11 23:05:52 UTC
You wouldn't modify the podman-baremetal-ansible.yaml. ContainerImageRegistryLogin is a parameter that should be set to true in your file that that contains the credentials

Example (similar from the docs):

parameter_defaults:
  ContainerImageRegistryLogin: true
  ContainerImageRegistryCredentials:
    registry.redhat.io:
      myuser: 'p@55w0rd!'
    registry.internalsite.com:
      myuser2: '0th3rp@55w0rd!'
    '192.0.2.1:8787':
      myuser3: '@n0th3rp@55w0rd!'


I still haven't had a chance to reproduce this, but it sounds like we might need improve the docs.

Comment 27 Jad Haj Yahya 2020-01-23 16:17:00 UTC
used below openstack-tripleo-heat-templates-10.6.3-0.20191218080442.6978a62.el8ost.noarch :

parameter_defaults:
  ContainerImagePrepare:
  - push_destination: true
    set:
      ceph_image: rhceph-4.0-rhel8
      ceph_namespace: docker-registry.upshift.redhat.com/ceph
      ceph_tag: latest
      name_prefix: rhosp15-openstack-
      name_suffix: ''
      namespace: rhos-qe-mirror-tlv.usersys.redhat.com:5002/rh-osbs
      neutron_driver: ovn
      tag: 20200115.1

parameter_defaults:
  NeutronMechanismDrivers: ovn
  ContainerImagePrepare:
  - set:
      name_prefix: openstack-
      namespace: registry.redhat.io/rhosp15-rhel8
      tag: latest
  ContainerImageRegistryCredentials:
    registry.redhat.io:
            user : 'pass'


and hit same error above:

 unable to pull registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume:latest: unable to pull image: Error initializing source docker://registry.redhat.io/rhosp15-rhel8/openstack-cinder-volume:latest: unable to retrieve auth token: invalid username/password"]

Am I missing something

Comment 28 Alex Schultz 2020-01-23 16:35:15 UTC
You need to also include ContainerImageRegistryLogin: true if push_destination is not include. See Bug 1792486

Comment 29 Jad Haj Yahya 2020-01-23 21:31:46 UTC
Deployed OC using:
parameter_defaults:
  NeutronMechanismDrivers: ovn
  ContainerImagePrepare:
  - set:
      name_prefix: openstack-
      namespace: registry.redhat.io/rhosp15-rhel8
      tag: latest
  ContainerImageRegistryLogin: true
  ContainerImageRegistryCredentials:
    registry.redhat.io:
            user: 'pass'

Comment 30 Alex McLeod 2020-02-19 12:43:59 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 31 Alex Schultz 2020-02-19 16:15:56 UTC
Documentation has been updated as part of Bug 1792486

Comment 32 James E. LaBarre 2020-02-21 20:36:22 UTC
I am presuming this had already been set, as the "requires_doc_text" flag is already "-".

I looked over the OSP16 documentation at https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html-single/director_installation_and_usage/preparing-for-director-installation#container-image-preparation-parameters and it looks OK.  Hopefully that's not merely because I understand it better now, but I think it's adequately explaining the configuration of the login settings.

Comment 33 Alex Schultz 2020-02-21 21:28:32 UTC
Yes that's the updated documentation per Bug 1792486. There will likely be additional updates per the use case presented as part of Bug 1805117. It probably needs to be clarified that you only need to set ContainerImageRegistryLogin: true if you will be fetching containers on the overcloud systems from a remote registry that requires authentication. Using push_destination: true does not require this to be set to true.

Comment 34 James E. LaBarre 2020-02-24 13:51:57 UTC
I would still like to test that "oush_destination" setting with Power though.  I had tried it before but had bad parameters elsewhere.  For 16.1 I want to verify if that variation needs special settings (limited to the one small configuration to validate on for now).

Comment 36 errata-xmlrpc 2020-03-05 12:00:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0643


Note You need to log in before you can comment on or make changes to this bug.