2014220 – After PVCs are deleted PVs are recreated multiple times causing different issues

Bug 2014220 - After PVCs are deleted PVs are recreated multiple times causing different issues

Summary: After PVCs are deleted PVs are recreated multiple times causing different issues

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	aos-storage-staff@redhat.com
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-14 15:59 UTC by Mario Vázquez
Modified:	2024-12-20 21:24 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-20 13:23:48 UTC
Target Upstream Version:
Embargoed:
Flags:	jsafrane: needinfo-

Attachments	(Terms of Use)

Description Mario Vázquez 2021-10-14 15:59:37 UTC

Description of problem:

Using LSO we have multiple available PVs, when we bound a PVC to a PV for the first time everything works fine. When we delete the workload (deployments + pvcs) we see that the PVs are recreated multiple times by the diskmaker.

This is causing two issues:

1. The PVs will be created and deleted multiple times, at some point the PV will be available before being deleted and if there are PVCs requesting storage they will bound to that PV that will be deleted by diskmaker causing the PV to move into "Terminating" status.
2. If you delete the workload after issue nº1 happened, the PVs will likely get stuck in "Released" state.

We are seeing PVs being recreated >5 times.

Version-Release number of selected component (if applicable):

4.8.13
How reproducible:

Always

Steps to Reproduce:

Using reproducer here https://gist.github.com/mvazquezc/0dd0dabb673a6d822cd8fa8ccdefe9e0

Issue 1:

1. Run deploy.sh and wait until all pods are running
2. Run clean.sh and wait until all pods are gone
3. Check PVs and you will see that diskmaker is continuously recreating them

Issue 2:

1. Run deploy.sh and wait until all pods are running
2. Run clean.sh and wait until all pods are gone
3. Check PVs and you will see that diskmaker is continuously recreating them
4. Run deploy.sh and wait until all pods are running
5. Check PVs, you will see some in "Terminating" state
6. Run clean.sh and wait until all pods are gone
7. You will see PVs stuck in "Released" state

Actual results:

PVs are continuously re-created
PVs are stuck in Released state

Expected results:

PVs are re-created once (Delete reclaim policy)
PVs are not stuck in Released state

Additional info:

I have an environment where I can reproduce the issue 100% of the time, ping me if you want to access it / have a session.

Comment 1 Jan Safranek 2021-10-15 13:55:25 UTC

The symptoms look a bit different, but should be fixed by #2008088, which should land in 4.8 soon-ish.

*** This bug has been marked as a duplicate of bug 2008088 ***

Comment 2 Mario Vázquez 2021-10-15 15:20:57 UTC

Hey @Jan, it's not fixed by #2008088.

This was actually reproduced using the code that fixes #2008088.

Could you re-open?

Comment 3 Jan Safranek 2021-10-15 16:07:29 UTC

I was able to reproduce it with today's 4.8 LSO, I saw PVs being created and immediately deleted. With LSO from current 4.8.z errata I was not able to reproduce the issue.

I am very interested in diskmaker logs then.

Comment 4 Jan Safranek 2021-10-19 14:22:25 UTC

Friendly reminder: we're still waiting for diskmaker logs (from a build with #2008088 fixed).

Comment 5 Mario Vázquez 2021-10-19 18:35:26 UTC

Hey @jsafrane apologies for the delay in the response.

I just attached a tar.gz file with three log files:

- diskmakerlogs-before-workload.log -> I deleted the diskmaker when workload was not deployed and PVs where available, this are the logs at that point
- diskmakerlogs-after-workload-before-cleanup.log -> Diskmaker logs after all workload pods were running and PVs were bound to their respective pvcs.
- diskmakerlogs-after-workload-after-cleanup.log -> Diskmaker logs after workload was removed from the cluster and after PVs stopped being re-created (It took around 4m30s for diskmaker to stop re-creating PVs).

One of the PVs that got re-created several times was local-pv-a154f0f1.


I used the CatalogSource "quay.io/gnufied/gnufied-index:oct4-1120" which iiuc was the dev build that QE used to verify the fix on 4.8 (diskmaker image: quay.io/gnufied/local-diskmaker:oct4-1120)


I'm going to test now with LSO from 4.8.14 and will update the BZ.

Comment 7 Mario Vázquez 2021-10-19 18:52:41 UTC

Tested again with the latest LSO from 4.8.14:

local-storage-operator.4.8.0-202110011559   Local Storage   4.8.0-202110011559              Succeeded

I was able to reproduce the issue.

Attached is a new tar.gz with the same logs.

Comment 9 Mario Vázquez 2021-10-20 13:23:48 UTC

Tested with the latest LSO from 4.8.15:

local-storage-operator.4.8.0-202110121407   Local Storage   4.8.0-202110121407   local-storage-operator.4.8.0-202110011559   Succeeded

The issue cannot be reproduced. Closing now.

Note You need to log in before you can comment on or make changes to this bug.