Bug 1447312 - [3.3] During cluster issues, node left mounts on the nodes referring to Persistent Storage, these mounts removed all the data from the PVs
[3.3] During cluster issues, node left mounts on the nodes referring to Persi...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage (Show other bugs)
3.3.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.3.1
Assigned To: hchen
Wenqi He
:
: 1447765 (view as bug list)
Depends On:
Blocks: 1447763 1447764 1447765 1447767 1447768
  Show dependency treegraph
 
Reported: 2017-05-02 08:10 EDT by Vladislav Walek
Modified: 2017-06-15 14:38 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: During cluster issues, node left mounts on the nodes referring to Persistent Storage, these mounts removed all the data from the PV. Consequence: Data on the PVs are deleted. Fix: Cleaning up Pods on the nodes doesn't delete data on PVs. Result: Data on the PVs are no longer deleted during Pods cleanup.
Story Points: ---
Clone Of:
: 1447763 (view as bug list)
Environment:
Last Closed: 2017-06-15 14:38:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vladislav Walek 2017-05-02 08:10:33 EDT
Description of problem:

We lost a node yesterday so the remaining nodes became overloaded, started paging etc, shutting down/starting pods. 

Once it had calmed down we noticed that all our PV were empty

Issue seemed to be left over mounts...e.g.

[root@node5 ~]# df -h|grep jenkins
node3:/nfs/pv0014               20G  7.9G   13G  40% /var/lib/origin/openshift.local.volumes/pods/57cc2362-2a72-11e7-8826-00505680f0  s.io~nfs/jenkins-ci-pv0001
node3:/nfs/jenkins              20G  7.9G   13G  40% /var/lib/origin/openshift.local.volumes/pods/5aa261d7-2a72-11e7-8826-00505680f0  s.io~nfs/jenkins-pv0001

Even touching a file in the PV followed an immediate removal. Once we unmounted the old mounts, we were able to re-populate the PV with the data from backup

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.3.0

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Comment 7 Eric Paris 2017-05-11 11:52:44 EDT
*** Bug 1447765 has been marked as a duplicate of this bug. ***
Comment 8 Eric Paris 2017-05-11 11:53:09 EDT
https://github.com/openshift/ose/pull/728
Comment 10 Wenqi He 2017-05-24 04:18:54 EDT
Verified on v3.3.1.28

Steps:
1. Create NFS PV and create an app using the jenkins-persistent template.
2. Once the jenkins pod is running, bring down the node by stopping it's node service.
3. The pod will be redeployed to another schedulable node.
4. After the pod is deployed to another node, bring back the stopped node service.
5. On the original node, verify the jenkins pod volume is successfully unmounted, the pod directory is removed while all data in the persistent volume are still there.
Comment 12 errata-xmlrpc 2017-06-15 14:38:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1425

Note You need to log in before you can comment on or make changes to this bug.