Bug 1463393 - Upgrade process may result in data deletion if running older versions of openshift.
Summary: Upgrade process may result in data deletion if running older versions of open...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.6.z
Assignee: Scott Dodson
QA Contact: Anping Li
URL:
Whiteboard:
: 1443623 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-20 18:49 UTC by Bradley Childs
Modified: 2023-09-14 03:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-15 12:35:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bradley Childs 2017-06-20 18:49:49 UTC
Description of problem: There existed a previous bug where if a Volume failed to unmount (timeout or other) kubelet would recursive delete the mount point regardless.  This lead to data being deleted on volumes.

This has been fixed so that now recursive delete wont cross file systems.. However, if a user running the old kubelet tries to upgrade a version of openshift missing the fix via the documented upgrade path, when the node is drained this data-deletion code path may be triggered and data loss may still occur.  For example here: https://bugzilla.redhat.com/show_bug.cgi?id=1443623

Talking with sjennings on IRC, he said that for z-stream releases, while risky, we could replace the node binary and restart the service instead of draining the node. This approach would not be advisable for major releases or even as a permanent upgrade process for z-stream and should only be used as required.

To add context to sjennings concern about using it permanently, he said that kubelet may have trouble rebuilding its state from the previous version if paths or file naming changed. 


The data deletion bug is fixed in these versions:
v3.2.1.31 
v3.3.1.28
v3.4.1.27 
v3.5.5.18 
v3.6

Any z-streams older than the ones listed should do the binary replace at least to the version listed, then the node drain upgrade process could be followed.

Additionally the upgrade docs need to be updated to instruct the user that they must upgrade to the most recent z-stream patch before upgrading to another major release.. (3.2.1.29 --> 3.2.1.31 then 3.3, 3.4, 3.5 etc..)



REF:
https://docs.openshift.com/container-platform/3.3/install_config/upgrading/automated_upgrades.html
https://bugzilla.redhat.com/show_bug.cgi?id=1447764 https://bugzilla.redhat.com/show_bug.cgi?id=1447312 https://bugzilla.redhat.com/show_bug.cgi?id=1447767 https://bugzilla.redhat.com/show_bug.cgi?id=1447767

Comment 1 Scott Dodson 2017-06-20 19:21:43 UTC
The proposed solution is that as part of the node upgrade we update the package or container and restart if the version falls into the problematic ranges then perform normal upgrade procedures.

3.2.0 < $current_version < 3.2.1.31
3.3.0 < $current_version < 3.3.1.28
3.4.0 < $current_version < 3.4.1.27
3.5.0 < $current_version < 3.5.5.18
$current_version < 3.5.5.18 AND $target_version >= 3.6

This means, if the version is in the problematic ranges then the node process will be restarted twice during an upgrade. Once for this special task, and then once as part of the normal drain and upgrade process.

Comment 2 Eric Paris 2017-06-20 20:36:21 UTC
Isn't it the new node, which starts, finds things in /var/lib/origin-volumes (or whatever) which actually was doing the bad rm? If so, isn't this just fixed by updating to the latest? It doesn't matter where you come from, it only matters where you go to.

Also, if that is the case, we could more easily work around things by stopping the node, running umount -l on everything we still have mounted, then starting the new node. Thus the new node couldn't see the thing and delete them. (It also could never properly unmount them or re-use them, but that's a different problem and one I'd rather we had instead of known data deletion...)

Comment 3 Hemant Kumar 2017-06-20 22:31:26 UTC
After some pre-defined internal stopping the node will automatically trigger migration of pods to another node. The migration may trigger code that cleans up orphaned pod directory and thereby triggering the bug..

Comment 4 Hemant Kumar 2017-06-20 23:31:50 UTC
minor correction - stopping the atomic-openshift-node process while will trigger migration of pods to another node, that will not trigger orphan pod directory deletion code (because node process is not running!). sorry, I had a brain f**t for a moment. 


Anyways though - stopping node will not stop containers running on it. so, we would be performing unmount on volumes that are being used.

Comment 5 Eric Paris 2017-06-21 12:46:51 UTC
The playbooks run `drain` before they stop the node. So there will be no running pods.

Comment 6 Hemant Kumar 2017-06-21 13:15:08 UTC
Yes but drain cause pod eviction and that triggers CleanOrphanPodDir routine (for all evicted pods). Evicted pods are treated differently than deleted pods and somehow we are calling CleanOrphanPodDir for all evicted pods. I have only verified this through experimentation, didn't dig through code yet.

So if somehow NFS volume didn't unmount properly during drain then CleanOrphanPodDir routine *will* delete data for versions without fix.  We have 2 bugs here:

a. Volumes not getting unmounted on pod eviction. 
b. Data getting deleted on CleanOrphanPodDir call. 

So, it matters which version we are upgrading from as well.

Comment 7 Scott Dodson 2017-06-21 18:53:41 UTC
The decision is that we'll do the forced upgrade and then follow with normal upgrade steps afterwards. I think this means we'll need to re-factor the playbook to move the drain operation into openshift_node_upgrade role but I think we should do that anyway. I'll get someone working on this tomorrow.

Comment 8 Bradley Childs 2017-06-21 20:53:40 UTC
*** Bug 1443623 has been marked as a duplicate of this bug. ***

Comment 9 Scott Dodson 2017-06-23 13:34:33 UTC
I just realized the proposal would not account for blue green upgrades where the service is never upgraded but instead new hosts are provisioned running the new version and then the old hosts are removed from the cluster in this manner (per the docs, we don't actually automate this).

$ oadm manage-node --selector=color=blue --evacuate
$ oc delete node --selector=color=blue

I assume this would trigger the problem?

Comment 10 Hemant Kumar 2017-06-26 14:33:02 UTC
You mean old hosts are removed from cluster without ever upgrading? In that case - yeah those old hosts must have their version updated (at least with a z-stream version) before draining the old host.

Comment 11 Scott Dodson 2017-08-15 12:35:56 UTC
This was handled via a release notes entry that advises users to perform a manual update prior to upgrading because it was determined that amending the playbooks was prohibitively complex.

Comment 12 Red Hat Bugzilla 2023-09-14 03:59:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.