Bug 2014220 - After PVCs are deleted PVs are recreated multiple times causing different issues
Summary: After PVCs are deleted PVs are recreated multiple times causing different issues
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: aos-storage-staff@redhat.com
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-14 15:59 UTC by Mario Vázquez
Modified: 2024-12-20 21:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-20 13:23:48 UTC
Target Upstream Version:
Embargoed:
jsafrane: needinfo-


Attachments (Terms of Use)

Description Mario Vázquez 2021-10-14 15:59:37 UTC
Description of problem:

Using LSO we have multiple available PVs, when we bound a PVC to a PV for the first time everything works fine. When we delete the workload (deployments + pvcs) we see that the PVs are recreated multiple times by the diskmaker.

This is causing two issues:

1. The PVs will be created and deleted multiple times, at some point the PV will be available before being deleted and if there are PVCs requesting storage they will bound to that PV that will be deleted by diskmaker causing the PV to move into "Terminating" status.
2. If you delete the workload after issue nº1 happened, the PVs will likely get stuck in "Released" state.

We are seeing PVs being recreated >5 times.


Version-Release number of selected component (if applicable):

4.8.13
How reproducible:

Always

Steps to Reproduce:

Using reproducer here https://gist.github.com/mvazquezc/0dd0dabb673a6d822cd8fa8ccdefe9e0

Issue 1:

1. Run deploy.sh and wait until all pods are running
2. Run clean.sh and wait until all pods are gone
3. Check PVs and you will see that diskmaker is continuously recreating them

Issue 2:

1. Run deploy.sh and wait until all pods are running
2. Run clean.sh and wait until all pods are gone
3. Check PVs and you will see that diskmaker is continuously recreating them
4. Run deploy.sh and wait until all pods are running
5. Check PVs, you will see some in "Terminating" state
6. Run clean.sh and wait until all pods are gone
7. You will see PVs stuck in "Released" state

Actual results:

PVs are continuously re-created
PVs are stuck in Released state

Expected results:

PVs are re-created once (Delete reclaim policy)
PVs are not stuck in Released state

Additional info:

I have an environment where I can reproduce the issue 100% of the time, ping me if you want to access it / have a session.

Comment 1 Jan Safranek 2021-10-15 13:55:25 UTC
The symptoms look a bit different, but should be fixed by #2008088, which should land in 4.8 soon-ish.

*** This bug has been marked as a duplicate of bug 2008088 ***

Comment 2 Mario Vázquez 2021-10-15 15:20:57 UTC
Hey @Jan, it's not fixed by #2008088.

This was actually reproduced using the code that fixes #2008088.

Could you re-open?

Comment 3 Jan Safranek 2021-10-15 16:07:29 UTC
I was able to reproduce it with today's 4.8 LSO, I saw PVs being created and immediately deleted. With LSO from current 4.8.z errata I was not able to reproduce the issue.

I am very interested in diskmaker logs then.

Comment 4 Jan Safranek 2021-10-19 14:22:25 UTC
Friendly reminder: we're still waiting for diskmaker logs (from a build with #2008088 fixed).

Comment 5 Mario Vázquez 2021-10-19 18:35:26 UTC
Hey @jsafrane apologies for the delay in the response.

I just attached a tar.gz file with three log files:

- diskmakerlogs-before-workload.log -> I deleted the diskmaker when workload was not deployed and PVs where available, this are the logs at that point
- diskmakerlogs-after-workload-before-cleanup.log -> Diskmaker logs after all workload pods were running and PVs were bound to their respective pvcs.
- diskmakerlogs-after-workload-after-cleanup.log -> Diskmaker logs after workload was removed from the cluster and after PVs stopped being re-created (It took around 4m30s for diskmaker to stop re-creating PVs).

One of the PVs that got re-created several times was local-pv-a154f0f1.


I used the CatalogSource "quay.io/gnufied/gnufied-index:oct4-1120" which iiuc was the dev build that QE used to verify the fix on 4.8 (diskmaker image: quay.io/gnufied/local-diskmaker:oct4-1120)


I'm going to test now with LSO from 4.8.14 and will update the BZ.

Comment 7 Mario Vázquez 2021-10-19 18:52:41 UTC
Tested again with the latest LSO from 4.8.14:

local-storage-operator.4.8.0-202110011559   Local Storage   4.8.0-202110011559              Succeeded

I was able to reproduce the issue.

Attached is a new tar.gz with the same logs.

Comment 9 Mario Vázquez 2021-10-20 13:23:48 UTC
Tested with the latest LSO from 4.8.15:

local-storage-operator.4.8.0-202110121407   Local Storage   4.8.0-202110121407   local-storage-operator.4.8.0-202110011559   Succeeded

The issue cannot be reproduced. Closing now.


Note You need to log in before you can comment on or make changes to this bug.