Bug 1882785 - Multi-Arch CI Jobs destroy libvirt network but occasionally leave it defined
Summary: Multi-Arch CI Jobs destroy libvirt network but occasionally leave it defined
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.6
Hardware: s390x
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Jeremy Poulin
QA Contact: Rafael Fonseca
URL:
Whiteboard:
Depends On:
Blocks: 1910158
TreeView+ depends on / blocked
 
Reported: 2020-09-25 16:53 UTC by Jeremy Poulin
Modified: 2021-02-24 15:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1910158 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:21:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift release pull 14479 0 None closed Bug 1882785: Clean-up leased environments and detect broken state before running remote libvirt jobs. 2021-01-19 23:15:42 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:21:56 UTC

Description Jeremy Poulin 2020-09-25 16:53:57 UTC
Description of problem:
This doesn't happen very often, but occasionally teardown will fail to fully undefine the libvirt network used for a CI job. This is bad because further network devices that are leased that subnet will fail to create a cluster, leaving that lease completely broken until someone manually intervenes.

This seems to be happening far more often with 4.6.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Dan Li 2020-09-28 13:16:17 UTC
Hi Jeremy, which Target Release are you targeting for this bug (4.6 or 4.7)? Currently it is considered "Untriaged" and it would be great if we can provide a target release.

Comment 2 Dan Li 2020-09-28 17:20:17 UTC
Hi Jeremy, one more logistics question - will this bug be resolved before the end of this Sprint (October 3rd)? If not, can we add "UpcomingSprint"?

Comment 3 Jeremy Poulin 2020-09-28 17:30:53 UTC
Hi Dan - very unlikely to be resolved this week.

We're still in the monitoring phase to determine what causes the underlying problem, so I have added the UpcomingSprint label.

Comment 4 Dan Li 2020-10-19 20:23:46 UTC
Adding "UpcomingSprint" tag as Jeremy is OOTO and this bug is unlikely to be resolved before the end of this Sprint (Oct 24th)

Comment 5 Dan Li 2020-11-12 00:58:03 UTC
Hi Jeremy, will this bug be resolved before the end of this sprint (Nov 14th)? If not, can we add "UpcomingSprint"?

Comment 6 Jeremy Poulin 2020-11-12 15:21:09 UTC
There isn't a clear path forward on how to fix this yet, but this will likely be targeted for post step-registry migration.

Comment 7 Dan Li 2020-12-02 18:45:24 UTC
Hi Jeremy, will this bug be resolved before the end of this sprint (Dec 5th)? If not, can we add "UpcomingSprint"?

Comment 8 Jeremy Poulin 2020-12-02 21:39:02 UTC
This is affected by the work that Deep is doing with the step registry migration. I don't think it will make it into next sprint, but I can see it becoming higher priority once we knock out some of the major stability improvements. Marking this "UpcomingSprint"

Comment 9 Dan Li 2020-12-15 18:30:00 UTC
Hi Jeremy,

I am doing this exercise one week early because most people are out next week. 

1. Do you think this bug will be resolved before the end of this sprint (December 26th)? If not, I'd like to add "UpcomingSprint"
2. Do you think this bug's Target Release is still 4.7.0? If it does not target 4.7, can we set it to blank value "---"?

Comment 10 Jeremy Poulin 2020-12-17 23:36:31 UTC
Adding upcomingSprint and setting the delivery to blank. I did have a conversation with the test platform team about this bug, but right now it reminds lower priority than all our other work.
https://coreos.slack.com/archives/CBN38N3MW/p1608140054245400

Comment 14 errata-xmlrpc 2021-02-24 15:21:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.