Bug 1386308 - Creating the default plan can fail when communicating with swift. It should be retried
Summary: Creating the default plan can fail when communicating with swift. It should b...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Ryan Brady
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-18 15:42 UTC by Matt Young
Modified: 2016-11-02 17:53 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-02 17:53:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1634195 0 None None None 2016-10-18 15:42:01 UTC

Description Matt Young 2016-10-18 15:42:01 UTC
Description of problem:

The bug detailed in https://bugs.launchpad.net/tripleo/+bug/1634195 is failing in multiple CI jobs, and is impacting rhos-delivery's import of newton into OSP 10.

The u/s issue is slated for ocata-1, but it's impacting newton imports.

How reproducible:

This has reproduced multiple times in both minimal and HA virt deployments in CI


Steps to Reproduce:
1. We notice in particular when there is memory pressure on the undercloud this reproduces at a higher rate.

See private comments for details / links

Actual results:

failed deployment of overcloud

Expected results:

overcloud deploys without this error

Additional info:

#37 @ https://review.rdoproject.org/etherpad/p/rdo-internal-issues

Comment 1 Ryan Brady 2016-10-20 12:36:54 UTC
This bug is in progress.  I'm trying two different approaches:

1) adding defaults to the swiftclient connection to manage retries
https://review.openstack.org/#/c/389124/

2) modifying the workflow to perform the retries
https://review.openstack.org/#/c/389124


As the issue is found intermittently in CI, I need to do a period of continuous runs in CI to see if the issue goes away.

Comment 2 Ryan Brady 2016-11-02 17:53:14 UTC
According to myoung, the problem doesn't occur anymore since they have migrated  the CI jobs to different hardware.  We can no longer tell if an applied fix works or not.


Note You need to log in before you can comment on or make changes to this bug.