Bug 1447312

Summary: [3.3] During cluster issues, node left mounts on the nodes referring to Persistent Storage, these mounts removed all the data from the PVs
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: StorageAssignee: hchen
Status: CLOSED ERRATA QA Contact: Wenqi He <wehe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aos-bugs, bchilds, eparis
Target Milestone: ---   
Target Release: 3.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: During cluster issues, node left mounts on the nodes referring to Persistent Storage, these mounts removed all the data from the PV. Consequence: Data on the PVs are deleted. Fix: Cleaning up Pods on the nodes doesn't delete data on PVs. Result: Data on the PVs are no longer deleted during Pods cleanup.
Story Points: ---
Clone Of:
: 1447763 (view as bug list) Environment:
Last Closed: 2017-06-15 18:38:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1447763, 1447764, 1447765, 1447767, 1447768    

Description Vladislav Walek 2017-05-02 12:10:33 UTC
Description of problem:

We lost a node yesterday so the remaining nodes became overloaded, started paging etc, shutting down/starting pods. 

Once it had calmed down we noticed that all our PV were empty

Issue seemed to be left over mounts...e.g.

[root@node5 ~]# df -h|grep jenkins
node3:/nfs/pv0014               20G  7.9G   13G  40% /var/lib/origin/openshift.local.volumes/pods/57cc2362-2a72-11e7-8826-00505680f0  s.io~nfs/jenkins-ci-pv0001
node3:/nfs/jenkins              20G  7.9G   13G  40% /var/lib/origin/openshift.local.volumes/pods/5aa261d7-2a72-11e7-8826-00505680f0  s.io~nfs/jenkins-pv0001

Even touching a file in the PV followed an immediate removal. Once we unmounted the old mounts, we were able to re-populate the PV with the data from backup

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.3.0

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 7 Eric Paris 2017-05-11 15:52:44 UTC
*** Bug 1447765 has been marked as a duplicate of this bug. ***

Comment 8 Eric Paris 2017-05-11 15:53:09 UTC
https://github.com/openshift/ose/pull/728

Comment 10 Wenqi He 2017-05-24 08:18:54 UTC
Verified on v3.3.1.28

Steps:
1. Create NFS PV and create an app using the jenkins-persistent template.
2. Once the jenkins pod is running, bring down the node by stopping it's node service.
3. The pod will be redeployed to another schedulable node.
4. After the pod is deployed to another node, bring back the stopped node service.
5. On the original node, verify the jenkins pod volume is successfully unmounted, the pod directory is removed while all data in the persistent volume are still there.

Comment 12 errata-xmlrpc 2017-06-15 18:38:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1425