Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1811798

Summary: [osp16] Random error to registry.redhat.io -> requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://registry.redhat.io/v2/rhosp-rhel8/openstack-
Product: Red Hat OpenStack Reporter: Chris Janiszewski <cjanisze>
Component: openstack-tripleo-commonAssignee: Adriano Petrich <apetrich>
Status: CLOSED DUPLICATE QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.0 (Train)CC: aschultz, dsorrent, mburns, msecaur, rrubins, slinaber
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-08 19:16:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Janiszewski 2020-03-09 19:44:05 UTC
Description of problem:
I tend to get these random Authorization errors with registry.redhat.io

Example:
2020-03-09 14:56:10,433 78845 ERROR tripleo_common.image.image_uploader [  ] [undercloud-osp16.ctlplane.home.lab:8787/rhosp-rhel8/openstack-heat-api:16.0-80] Failed uploading the target image
2020-03-09 14:56:10,454 78697 ERROR root [  ] Image prepare failed: 401 Client Error: Unauthorized for url: https://registry.redhat.io/v2/rhosp-rhel8/openstack-tempest/blobs/sha256:681dc5037699793998bac6e1011552d4c6735fd40482abd5f99d72321a183efe
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.6/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/lib64/python3.6/concurrent/futures/process.py", line 153, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/usr/lib64/python3.6/concurrent/futures/process.py", line 153, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 2311, in upload_task
    return uploader.upload_image(task)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 1379, in upload_image
    multi_arch=t.multi_arch
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
    return self.call(f, *args, **kw)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 331, in iter
    raise retry_exc.reraise()
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 167, in reraise
    raise self.last_attempt.result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 1727, in _copy_registry_to_registry                                                                                 
    request=r
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 562, in check_status                                                                                                
    request.raise_for_status()
  File "/usr/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://registry.redhat.io/v2/rhosp-rhel8/openstack-tempest/blobs/sha256:681dc5037699793998bac6e1011552d4c6735fd40482abd5f99d72321a183efe

This one happened during an undercloud upgrade, but I have seen them all over the place in both undercloud and overcloud when installing and/or upgrading it.

I then typically re-run the same action with no change and get a positive results. 
I suspect the registry.redhat.io is getting overwhelmed with requests or something along those lines. Is there any way we could fix these issues on the registry or alternatively make more retries whenever we try to pull things from the registry?


Version-Release number of selected component (if applicable):
OSP16

How reproducible:
Random

Steps to Reproduce:
1. Upgrade or install undercloud or overcloud
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Alex Schultz 2020-03-09 22:35:17 UTC
You'll have to raise an issue with the owners of registry.redhat.io. We already attempt multiple retries for requests. This seems to point to issues with the authentication mechanism and not necessarily anything in the code.

Comment 2 Chris Janiszewski 2020-03-10 14:31:54 UTC
I understand the problem is not necessary with OSP bits. I don't recall us ever having these issues prior to moving to registry.redhat.io .. hence the other registry has not required authentication.
This has been a huge issue for the field team  deploying OSP16. Is there any way we could increase the timeout/retries ? Alternatively if there is a way to detach the container upload process from the rest of the installation that would be helpful as well. At least we could fail early on and have easier way to identify the cause.
The process for creating local registries with podman is not really described anywhere and the steps that used to work with docker do not apply anymore. Please advise.

Comment 3 Alex Schultz 2020-03-10 14:40:43 UTC
You can run the prepare process prior to running the overcloud deployment. You can run `openstack tripleo container image prepare` by hand to populate the registry on the undercloud.

1) openstack tripleo container image prepare default --local-push-destination --output-env-file containers.yaml
2) edit containers.yaml to add credentials
3) sudo openstack tripleo container image prepare -e containers.yaml
4) perform deploy and include -e containers.yaml

Additionally if you're having issues with the registry.redhat.io, a satellite server (which we do document/recommend) would be another option for a local source. Moving the bz over to the release delivery folks who may be able to provide additional information on how to raise issues with registry.redhat.io. We already do provide multiple retries as part of the process but it's not going to solve this problem.

Comment 4 Darin Sorrentino 2020-03-10 15:19:31 UTC
Alex,
 If we wanted to avoid calling out to registry.redhat.io during the Overcloud deployment, wouldn't the process be this instead:

1) openstack tripleo container image prepare default --local-push-destination --output-env-file containers.yaml
2) edit containers.yaml to add credentials
3) sudo openstack tripleo container image prepare -e containers.yaml | tee -a ~stack/templates/local_container_images.yaml
4) perform deploy and include -e ~stack/templates/local_container_images.yaml in place of the container.yaml or container-images-prepare.yaml (from the docs)

If we include the containers.yaml as it is in the overcloud deployment, with the push-destination set to true, won't it still attempt to update the local container images?

Comment 5 Chris Janiszewski 2020-03-10 15:27:45 UTC
Here is another example of this issue:
(undercloud) [stack@undercloud-osp16 rebuild_image]$ sudo buildah bud -t undercloud-osp16.ctlplane.home.lab:8787/rhceph-4-rhel8-custom
STEP 1: FROM registry.redhat.io/rhceph/rhceph-4-rhel8
error creating build container: Error initializing source docker://registry.redhat.io/rhceph/rhceph-4-rhel8:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication
(undercloud) [stack@undercloud-osp16 rebuild_image]$ sudo podman login registry.redhat.io
Authenticating with existing credentials...
Existing credentials are valid. Already logged in to registry.redhat.io

Comment 6 Alex Schultz 2020-03-10 15:30:35 UTC
(In reply to Darin Sorrentino from comment #4)
> Alex,
>  If we wanted to avoid calling out to registry.redhat.io during the
> Overcloud deployment, wouldn't the process be this instead:
> 
> 1) openstack tripleo container image prepare default
> --local-push-destination --output-env-file containers.yaml
> 2) edit containers.yaml to add credentials
> 3) sudo openstack tripleo container image prepare -e containers.yaml | tee
> -a ~stack/templates/local_container_images.yaml
> 4) perform deploy and include -e
> ~stack/templates/local_container_images.yaml in place of the container.yaml
> or container-images-prepare.yaml (from the docs)
> 
> If we include the containers.yaml as it is in the overcloud deployment, with
> the push-destination set to true, won't it still attempt to update the local
> container images?

It'll check the versions but it won't update because they already exist. It really depends. For many customers this check is not a problem, the only way to truly disconnect an environment is to use a satellite server infrastructure.

Comment 7 Alex Schultz 2020-03-10 15:32:01 UTC
(In reply to Chris Janiszewski from comment #5)
> Here is another example of this issue:
> (undercloud) [stack@undercloud-osp16 rebuild_image]$ sudo buildah bud -t
> undercloud-osp16.ctlplane.home.lab:8787/rhceph-4-rhel8-custom
> STEP 1: FROM registry.redhat.io/rhceph/rhceph-4-rhel8
> error creating build container: Error initializing source
> docker://registry.redhat.io/rhceph/rhceph-4-rhel8:latest: unable to retrieve
> auth token: invalid username/password: unauthorized: Please login to the Red
> Hat Registry using your Customer Portal credentials. Further instructions
> can be found here: https://access.redhat.com/RegistryAuthentication
> (undercloud) [stack@undercloud-osp16 rebuild_image]$ sudo podman login
> registry.redhat.io
> Authenticating with existing credentials...
> Existing credentials are valid. Already logged in to registry.redhat.io

It's likely you need to use `buildah login`. This would be a bug against builadh if it still doesn't work.

Comment 8 Matthew Secaur 2020-04-29 15:50:26 UTC
Hi, Alex,

The problem with the solution you've provided in Comment #3 is that this error can still happen during you Step #1 (i.e. openstack tripleo container image prepare default --local-push-destination --output-env-file containers.yaml). I see this happen a lot in a lab environment in Pune inside the Red Hat network. If a single container (or maybe just a blob) takes more than 5 minutes to download, then subsequent requests will fail. I also managed to re-create this issue by driving the load on the server up very high before running the 'openstack tripleo container image prepare' command (which is just another way of making the transfer take longer than 5 minutes).

In my Ansible playbooks that deploy OSP16 in our lab, I ended up just running the 'openstack tripleo container image prepare' twice since it almost always fails the first time. On the second run, most of the containers are downloaded already, so it typically works fine then.

This continues to be a problem when we are using registry.redhat.io.

Comment 9 Alex Schultz 2020-04-29 15:54:04 UTC
Yes the 5 minute blob issue has been fixed via https://review.opendev.org/#/c/713923/ and should be out in a future minor release. I will have to track down the a bz for this issue.

Comment 10 Alex Schultz 2020-06-08 19:16:10 UTC

*** This bug has been marked as a duplicate of bug 1813520 ***