Bug 1813520 - [RHOSP-16] Undercloud deployment is failing with HTTPError: 401 Client Error: Unauthorized for url
Summary: [RHOSP-16] Undercloud deployment is failing with HTTPError: 401 Client Error:...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.0 (Train)
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: ---
Assignee: Alex Schultz
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 1811798 1821490 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-14 07:11 UTC by Nilesh
Modified: 2023-09-07 22:26 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-common-11.3.3-0.20200321092338.da2cc62.el8ost
Doc Type: Bug Fix
Doc Text:
This update fixes authentication timeouts caused by slow transfer of container images. Previously, undercloud and overcloud pulls against container sources that require authentication could fail, and generate a 401 error, if the image transfer exceeded five minutes. Now, if the container fetching process exceeds 5 minutes, the code attempts to re-authenticate, preventing the timeout.
Clone Of:
Environment:
Last Closed: 2020-05-14 12:16:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1867981 0 None None None 2020-03-18 20:08:07 UTC
OpenStack gerrit 713724 0 None MERGED Improve authentication retries for slow transfers 2021-02-15 16:09:27 UTC
OpenStack gerrit 713923 0 None MERGED Improve authentication retries for slow transfers 2021-02-15 16:09:27 UTC
Red Hat Issue Tracker OSP-28378 0 None None None 2023-09-07 22:26:09 UTC
Red Hat Product Errata RHBA-2020:2114 0 None None None 2020-05-14 12:16:36 UTC

Comment 2 Alex Schultz 2020-03-16 23:10:29 UTC
We've seen 401's from the registry itself (we do retry) but that is outside of our control unfortunately. If you rerun is it failing on the same layer each time?

Comment 5 Alex Schultz 2020-03-18 14:37:48 UTC
I'm currently try to reproduce the issue but I don't appear to be able to. Is there a proxy being used in this environment?

Comment 6 Alex Schultz 2020-03-18 14:39:53 UTC
Oh I just noticed the undercloud.conf. The following options are in the wrong section:

custom_env_files = /home/stack/templates/custom-undercloud-params.yaml
container_images_file = /home/stack/containers-prepare-parameter.yaml

They are under [ctlplane-subnet] and not [DEFAULT] so they aren't being picked up so no auth is being used.  Please try moving them to the correct ini section.

Comment 15 Alex Schultz 2020-03-18 16:16:10 UTC
I've replicated the issue. If the network throughput causes the layer fetching to exceed the life time of the authentication token it'll fail with something like:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cliff/app.py", line 401, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/container_image.py", line 965, in take_action
    cleanup=parsed_args.cleanup, lock=lock)
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/kolla_builder.py", line 235, in container_images_prepare_multi
    uploader.upload()
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 272, in upload
    uploader.run_tasks()
  File "/usr/lib/python3.6/site-packages/tripleo_common/image/image_uploader.py", line 2252, in run_tasks
    for result in p.map(upload_task, self.upload_tasks):
  File "/usr/lib64/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
OSError: [rhosp-rhel8/openstack-swift-account] Write Failure: 401 Client Error: Unauthorized for url: https://registry.redhat.io/v2/rhosp-rhel8/openstack-swift-account/blobs/sha256:01d76065b50cd19077e1b2d2aafb0dd332ecfd6f8d02088dd87242de97e72a43


I was able to replicate this by using iproute-tc to limit the ingress to ~4mbit down and ~1mbit up (using script from https://wiki.gentoo.org/wiki/Traffic_shaping).


I'll need to figure out the correct place to work around this issue.

Comment 16 Alex Schultz 2020-03-18 20:52:39 UTC
Right now the workaround would be to not use "push_destination: true" when installing the undercloud. That should allow you to at least get an undercloud installed. From there you could fetch the containers manually and push them to the registry. Unfortunately it seems that the transfer from registry.redhat.io is taking longer than 5 minutes for the various layers resulting in the token expiring and causing the process to fail.

Comment 21 Emilien Macchi 2020-04-21 20:06:13 UTC
*** Bug 1821490 has been marked as a duplicate of this bug. ***

Comment 25 errata-xmlrpc 2020-05-14 12:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2114

Comment 26 Alex Schultz 2020-06-08 19:16:10 UTC
*** Bug 1811798 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.