Bug 2109662 - [RHCS][GSS][OCS/ODF] Cephfs file lock is not being released after pod restart [NEEDINFO]
Summary: [RHCS][GSS][OCS/ODF] Cephfs file lock is not being released after pod restart
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.10
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-21 18:26 UTC by Anton Mark
Modified: 2023-08-09 16:37 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-06 13:23:46 UTC
Embargoed:
mrajanna: needinfo? (amark)


Attachments (Terms of Use)

Description Anton Mark 2022-07-21 18:26:58 UTC
Description of problem:
Possible stale file lock issue with CephFS PV used by an IBM MQ Queue Manager running in OpenShift Container Platform.

Version-Release number of selected component (if applicable):
OCS/ODF 4.8, 4.9 and 4.10
RHCS 5.0 (16.2.0-152.el8cp)
RHCS 4.2 (14.2.11-208.el8cp)

How reproducible:
Can be reproduced by killing mds, mon and osd pods. Also, occurs during OCS/ODF operator upgrade, but not predictably.

Steps to Reproduce:
- Provision/Install OCP 4.8, 4.9 or 4.10 cluster.
- Install corresponding version of OCS/ODF (details of how we configured OCS included below)
- install ibm-operator-catalog, and IBM MQ operator: https://www.ibm.com/docs/en/ibm-mq/9.3?topic=iumorho-installing-mq-operator-using-red-hat-openshift-web-console
- Install at least 2 MI QMs (just to increase chance of recreate, 1 should suffice)
- Upgrade OCS/ODF operator.



Actual results:
When the error occurs, the active container has restarted, but the active lock is not released, so neither running container holds the lock, but neither can get the active lock. The standby continues to be the standby,  as it sees the active lock is taken, and the container that was the active comes up, fails to get any of the 3 locks, and so restarts and tries again, for ever.


Expected results:
Active lock should be released and so that the standby container is able to assume master role when needed.


Additional info:
Reproducer can be provided.

Comment 19 Madhu Rajanna 2022-08-23 07:44:16 UTC
> From the supportshell, I cannot see the PV mentioned above. Can you point me in the right direction or upload the latest ODF must-gather

local-pv-5ac92a54.yaml                         pvc-42b13dbb-eaa4-437d-8090-b810f3ec61fd.yaml  pvc-ab97441e-51b7-4027-b6a7-bfa427fdc67b.yaml
local-pv-b1da38b.yaml                          pvc-591c34f6-ba4d-4d1b-bc24-e83a6f0262e9.yaml  pvc-b10cb295-eb30-4b58-81b1-6811937bf313.yaml
local-pv-cf57db52.yaml                         pvc-611d68c6-d612-4c0a-a045-80e8d4200267.yaml  pvc-baa3af25-226f-45ec-b01a-e6b05acf08d9.yaml
local-pv-e88b8859.yaml                         pvc-7054dbc1-4704-40b3-8b51-d5605796ef5d.yaml  pvc-c586c046-4736-424f-9aeb-54fbc4b29fab.yaml
local-pv-f7989845.yaml                         pvc-8616bd57-3de9-4f8a-aad1-238632c5010c.yaml  pvc-e730a504-70d7-4e0b-b0fe-4059ac71ba38.yaml
pvc-1413481c-4c76-4d5f-8479-748aab8a9304.yaml  pvc-87e7d939-dfbb-4f36-bcb9-804563b0df15.yaml  pvc-eeaeb83b-a8b8-4e92-ad6d-8abda7d73119.yaml
pvc-2eecbaf0-952e-4a8c-8da7-69a40f23c746.yaml  pvc-a68972e9-5d33-497b-9a4b-885355615e58.yaml  pvc-f8cf7550-1387-483d-a316-c53e76579556.yaml
pvc-3d7b7f22-c45f-4c72-85a5-635d27062fd5.yaml  pvc-ab0bdea1-4c3d-4d07-91a2-cd3e9d3b1216.yaml  registry-storage.yaml

cat 0070-410-must-gather-2.tar.gz/must-gather/registry-redhat-io-odf4-ocs-must-gather-rhel8-sha256-65c4b98de58f052d1e2650fc356a3e828364b048289a38186564c95b1a5f7a85/cluster-scoped-resources/core/persistentvolumes/
local-pv-4ffb47ae.yaml                         pvc-572312e4-67a3-4317-8355-10cf0e4b6a87.yaml  pvc-b1650d3f-69f0-4f90-86f0-59e8839490e3.yaml
local-pv-7e58ce2d.yaml                         pvc-602cc60a-57c6-4b38-887d-67546cb8c518.yaml  pvc-ce340ea4-edb1-4067-9b56-eb28dddf2d70.yaml
local-pv-acf29df.yaml                          pvc-64fe8bf7-4ef6-4802-8f08-83d7b53554fd.yaml  pvc-da12c4bc-2653-46e4-ad77-901082034ffa.yaml
local-pv-d0fc14c8.yaml                         pvc-7d542bc5-d8e5-4a87-b5e8-a181d7b70d83.yaml  registry-storage.yaml
local-pv-decf0004.yaml                         pvc-9c801387-126e-41a9-8f87-7df921f5847e.yaml  
pvc-1922b6e2-f884-420c-a15e-0fdb44f11e8d.yaml  pvc-b0019ba7-889e-4e07-acbc-090875686531.yaml


Note You need to log in before you can comment on or make changes to this bug.