Bug 1386308

Summary: Creating the default plan can fail when communicating with swift. It should be retried
Product: Red Hat OpenStack Reporter: Matt Young <matyoung>
Component: openstack-tripleoAssignee: Ryan Brady <rbrady>
Status: CLOSED CANTFIX QA Contact: Arik Chernetsky <achernet>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: dmatthew, jcoufal, jschluet, jslagle, mburns, rbrady, rhel-osp-director-maint, whayutin
Target Milestone: ---Keywords: Automation, AutomationBlocker, Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-02 17:53:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Matt Young 2016-10-18 15:42:01 UTC
Description of problem:

The bug detailed in https://bugs.launchpad.net/tripleo/+bug/1634195 is failing in multiple CI jobs, and is impacting rhos-delivery's import of newton into OSP 10.

The u/s issue is slated for ocata-1, but it's impacting newton imports.

How reproducible:

This has reproduced multiple times in both minimal and HA virt deployments in CI


Steps to Reproduce:
1. We notice in particular when there is memory pressure on the undercloud this reproduces at a higher rate.

See private comments for details / links

Actual results:

failed deployment of overcloud

Expected results:

overcloud deploys without this error

Additional info:

#37 @ https://review.rdoproject.org/etherpad/p/rdo-internal-issues

Comment 1 Ryan Brady 2016-10-20 12:36:54 UTC
This bug is in progress.  I'm trying two different approaches:

1) adding defaults to the swiftclient connection to manage retries
https://review.openstack.org/#/c/389124/

2) modifying the workflow to perform the retries
https://review.openstack.org/#/c/389124


As the issue is found intermittently in CI, I need to do a period of continuous runs in CI to see if the issue goes away.

Comment 2 Ryan Brady 2016-11-02 17:53:14 UTC
According to myoung, the problem doesn't occur anymore since they have migrated  the CI jobs to different hardware.  We can no longer tell if an applied fix works or not.