Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2014220

Summary: After PVCs are deleted PVs are recreated multiple times causing different issues
Product: OpenShift Container Platform Reporter: Mario Vázquez <mavazque>
Component: StorageAssignee: aos-storage-staff <aos-storage-staff>
Storage sub component: Local Storage Operator QA Contact: Wei Duan <wduan>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, jsafrane
Version: 4.8Keywords: Reopened
Target Milestone: ---Flags: jsafrane: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-20 13:23:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mario Vázquez 2021-10-14 15:59:37 UTC
Description of problem:

Using LSO we have multiple available PVs, when we bound a PVC to a PV for the first time everything works fine. When we delete the workload (deployments + pvcs) we see that the PVs are recreated multiple times by the diskmaker.

This is causing two issues:

1. The PVs will be created and deleted multiple times, at some point the PV will be available before being deleted and if there are PVCs requesting storage they will bound to that PV that will be deleted by diskmaker causing the PV to move into "Terminating" status.
2. If you delete the workload after issue nº1 happened, the PVs will likely get stuck in "Released" state.

We are seeing PVs being recreated >5 times.


Version-Release number of selected component (if applicable):

4.8.13
How reproducible:

Always

Steps to Reproduce:

Using reproducer here https://gist.github.com/mvazquezc/0dd0dabb673a6d822cd8fa8ccdefe9e0

Issue 1:

1. Run deploy.sh and wait until all pods are running
2. Run clean.sh and wait until all pods are gone
3. Check PVs and you will see that diskmaker is continuously recreating them

Issue 2:

1. Run deploy.sh and wait until all pods are running
2. Run clean.sh and wait until all pods are gone
3. Check PVs and you will see that diskmaker is continuously recreating them
4. Run deploy.sh and wait until all pods are running
5. Check PVs, you will see some in "Terminating" state
6. Run clean.sh and wait until all pods are gone
7. You will see PVs stuck in "Released" state

Actual results:

PVs are continuously re-created
PVs are stuck in Released state

Expected results:

PVs are re-created once (Delete reclaim policy)
PVs are not stuck in Released state

Additional info:

I have an environment where I can reproduce the issue 100% of the time, ping me if you want to access it / have a session.

Comment 1 Jan Safranek 2021-10-15 13:55:25 UTC
The symptoms look a bit different, but should be fixed by #2008088, which should land in 4.8 soon-ish.

*** This bug has been marked as a duplicate of bug 2008088 ***

Comment 2 Mario Vázquez 2021-10-15 15:20:57 UTC
Hey @Jan, it's not fixed by #2008088.

This was actually reproduced using the code that fixes #2008088.

Could you re-open?

Comment 3 Jan Safranek 2021-10-15 16:07:29 UTC
I was able to reproduce it with today's 4.8 LSO, I saw PVs being created and immediately deleted. With LSO from current 4.8.z errata I was not able to reproduce the issue.

I am very interested in diskmaker logs then.

Comment 4 Jan Safranek 2021-10-19 14:22:25 UTC
Friendly reminder: we're still waiting for diskmaker logs (from a build with #2008088 fixed).

Comment 5 Mario Vázquez 2021-10-19 18:35:26 UTC
Hey @jsafrane apologies for the delay in the response.

I just attached a tar.gz file with three log files:

- diskmakerlogs-before-workload.log -> I deleted the diskmaker when workload was not deployed and PVs where available, this are the logs at that point
- diskmakerlogs-after-workload-before-cleanup.log -> Diskmaker logs after all workload pods were running and PVs were bound to their respective pvcs.
- diskmakerlogs-after-workload-after-cleanup.log -> Diskmaker logs after workload was removed from the cluster and after PVs stopped being re-created (It took around 4m30s for diskmaker to stop re-creating PVs).

One of the PVs that got re-created several times was local-pv-a154f0f1.


I used the CatalogSource "quay.io/gnufied/gnufied-index:oct4-1120" which iiuc was the dev build that QE used to verify the fix on 4.8 (diskmaker image: quay.io/gnufied/local-diskmaker:oct4-1120)


I'm going to test now with LSO from 4.8.14 and will update the BZ.

Comment 7 Mario Vázquez 2021-10-19 18:52:41 UTC
Tested again with the latest LSO from 4.8.14:

local-storage-operator.4.8.0-202110011559   Local Storage   4.8.0-202110011559              Succeeded

I was able to reproduce the issue.

Attached is a new tar.gz with the same logs.

Comment 9 Mario Vázquez 2021-10-20 13:23:48 UTC
Tested with the latest LSO from 4.8.15:

local-storage-operator.4.8.0-202110121407   Local Storage   4.8.0-202110121407   local-storage-operator.4.8.0-202110011559   Succeeded

The issue cannot be reproduced. Closing now.