Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1386308

Summary:	Creating the default plan can fail when communicating with swift. It should be retried
Product:	Red Hat OpenStack	Reporter:	Matt Young <matyoung>
Component:	openstack-tripleo	Assignee:	Ryan Brady <rbrady>
Status:	CLOSED CANTFIX	QA Contact:	Arik Chernetsky <achernet>
Severity:	high	Docs Contact:
Priority:	high
Version:	10.0 (Newton)	CC:	dmatthew, jcoufal, jschluet, jslagle, mburns, rbrady, rhel-osp-director-maint, whayutin
Target Milestone:	---	Keywords:	Automation, AutomationBlocker, Triaged
Target Release:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-02 17:53:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Matt Young 2016-10-18 15:42:01 UTC

Description of problem:

The bug detailed in https://bugs.launchpad.net/tripleo/+bug/1634195 is failing in multiple CI jobs, and is impacting rhos-delivery's import of newton into OSP 10.

The u/s issue is slated for ocata-1, but it's impacting newton imports.

How reproducible:

This has reproduced multiple times in both minimal and HA virt deployments in CI


Steps to Reproduce:
1. We notice in particular when there is memory pressure on the undercloud this reproduces at a higher rate.

See private comments for details / links

Actual results:

failed deployment of overcloud

Expected results:

overcloud deploys without this error

Additional info:

#37 @ https://review.rdoproject.org/etherpad/p/rdo-internal-issues

Comment 1 Ryan Brady 2016-10-20 12:36:54 UTC

This bug is in progress.  I'm trying two different approaches:

1) adding defaults to the swiftclient connection to manage retries
https://review.openstack.org/#/c/389124/

2) modifying the workflow to perform the retries
https://review.openstack.org/#/c/389124


As the issue is found intermittently in CI, I need to do a period of continuous runs in CI to see if the issue goes away.

Comment 2 Ryan Brady 2016-11-02 17:53:14 UTC

According to myoung, the problem doesn't occur anymore since they have migrated  the CI jobs to different hardware.  We can no longer tell if an applied fix works or not.