Description of problem:
The bug detailed in https://bugs.launchpad.net/tripleo/+bug/1634195 is failing in multiple CI jobs, and is impacting rhos-delivery's import of newton into OSP 10.
The u/s issue is slated for ocata-1, but it's impacting newton imports.
How reproducible:
This has reproduced multiple times in both minimal and HA virt deployments in CI
Steps to Reproduce:
1. We notice in particular when there is memory pressure on the undercloud this reproduces at a higher rate.
See private comments for details / links
Actual results:
failed deployment of overcloud
Expected results:
overcloud deploys without this error
Additional info:
#37 @ https://review.rdoproject.org/etherpad/p/rdo-internal-issues
This bug is in progress. I'm trying two different approaches:
1) adding defaults to the swiftclient connection to manage retries
https://review.openstack.org/#/c/389124/
2) modifying the workflow to perform the retries
https://review.openstack.org/#/c/389124
As the issue is found intermittently in CI, I need to do a period of continuous runs in CI to see if the issue goes away.
According to myoung, the problem doesn't occur anymore since they have migrated the CI jobs to different hardware. We can no longer tell if an applied fix works or not.