1882785 – Multi-Arch CI Jobs destroy libvirt network but occasionally leave it defined

Bug 1882785 - Multi-Arch CI Jobs destroy libvirt network but occasionally leave it defined

Summary: Multi-Arch CI Jobs destroy libvirt network but occasionally leave it defined

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Multi-Arch
Sub Component:
Version:	4.6
Hardware:	s390x
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jeremy Poulin
QA Contact:	Rafael Fonseca
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1910158
TreeView+	depends on / blocked

Reported:	2020-09-25 16:53 UTC by Jeremy Poulin
Modified:	2021-02-24 15:21 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1910158 (view as bug list)
Environment:
Last Closed:	2021-02-24 15:21:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift release pull 14479	0	None	closed	Bug 1882785: Clean-up leased environments and detect broken state before running remote libvirt jobs.	2021-01-19 23:15:42 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:21:56 UTC

Description Jeremy Poulin 2020-09-25 16:53:57 UTC

Description of problem:
This doesn't happen very often, but occasionally teardown will fail to fully undefine the libvirt network used for a CI job. This is bad because further network devices that are leased that subnet will fail to create a cluster, leaving that lease completely broken until someone manually intervenes.

This seems to be happening far more often with 4.6.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Dan Li 2020-09-28 13:16:17 UTC

Hi Jeremy, which Target Release are you targeting for this bug (4.6 or 4.7)? Currently it is considered "Untriaged" and it would be great if we can provide a target release.

Comment 2 Dan Li 2020-09-28 17:20:17 UTC

Hi Jeremy, one more logistics question - will this bug be resolved before the end of this Sprint (October 3rd)? If not, can we add "UpcomingSprint"?

Comment 3 Jeremy Poulin 2020-09-28 17:30:53 UTC

Hi Dan - very unlikely to be resolved this week.

We're still in the monitoring phase to determine what causes the underlying problem, so I have added the UpcomingSprint label.

Comment 4 Dan Li 2020-10-19 20:23:46 UTC

Adding "UpcomingSprint" tag as Jeremy is OOTO and this bug is unlikely to be resolved before the end of this Sprint (Oct 24th)

Comment 5 Dan Li 2020-11-12 00:58:03 UTC

Hi Jeremy, will this bug be resolved before the end of this sprint (Nov 14th)? If not, can we add "UpcomingSprint"?

Comment 6 Jeremy Poulin 2020-11-12 15:21:09 UTC

There isn't a clear path forward on how to fix this yet, but this will likely be targeted for post step-registry migration.

Comment 7 Dan Li 2020-12-02 18:45:24 UTC

Hi Jeremy, will this bug be resolved before the end of this sprint (Dec 5th)? If not, can we add "UpcomingSprint"?

Comment 8 Jeremy Poulin 2020-12-02 21:39:02 UTC

This is affected by the work that Deep is doing with the step registry migration. I don't think it will make it into next sprint, but I can see it becoming higher priority once we knock out some of the major stability improvements. Marking this "UpcomingSprint"

Comment 9 Dan Li 2020-12-15 18:30:00 UTC

Hi Jeremy,

I am doing this exercise one week early because most people are out next week. 

1. Do you think this bug will be resolved before the end of this sprint (December 26th)? If not, I'd like to add "UpcomingSprint"
2. Do you think this bug's Target Release is still 4.7.0? If it does not target 4.7, can we set it to blank value "---"?

Comment 10 Jeremy Poulin 2020-12-17 23:36:31 UTC

Adding upcomingSprint and setting the delivery to blank. I did have a conversation with the test platform team about this bug, but right now it reminds lower priority than all our other work.
https://coreos.slack.com/archives/CBN38N3MW/p1608140054245400

Comment 14 errata-xmlrpc 2021-02-24 15:21:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.