Bug 1122271 - oo-admin-move (and other rsync invocations) should preserve sparse files
Summary: oo-admin-move (and other rsync invocations) should preserve sparse files
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1122166
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-22 20:14 UTC by Brenton Leanhardt
Modified: 2014-09-11 20:06 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, sparse files were not preserved when gears moved between nodes, which caused unnecessary disk usage. This bug fix adds the proper flags to related rsync operations which help prevent the issue going forward. After applying this fix, a cartridge migration is required and the MCollective service must be restarted. See https://access.redhat.com/solutions/1179673 for steps to fix existing gears previously using sparse files that have exceeded their quota as a result of this issue.
Clone Of: 1122166
Environment:
Last Closed: 2014-09-11 20:06:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1183 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.6 bug fix and enhancement update 2014-09-12 00:06:21 UTC

Description Brenton Leanhardt 2014-07-22 20:14:15 UTC
+++ This bug was initially created as a clone of Bug #1122166 +++

Description of problem:

When a gear containing sparse files is moved with oo-admin-move, sparseness is not preserved, which causes unnecessary disk usage.  This can also cause a gear to exceed its quota, since quotas are disabled during the move.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-msg-broker-mcollective-1.26.2-1.el6oso.noarch

How reproducible:

Always

Steps to Reproduce:
1.  create an app
2.  ssh into the app
3.  dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1
4.  check the quota and "du -k ~/app-root/data/test"
5.  oo-admin-move the gear
6.  check the quota and "du -k ~/app-root/data/test" again

Actual results:

The quota and the disk usage of the test file in particular will both increase by 1000 MB after the move.

Expected results:

The quota should be approximately the same before and after the move

Additional info:

There are several other places in the code where rsync is called.  these should all be evaluated to determine whether adding "-S" (and maybe even "-H") is appropriate.

--- Additional comment from Andy Grimm on 2014-07-22 13:05:19 EDT ---

Note that this issue was exposed by the same situation that caused https://bugzilla.redhat.com/show_bug.cgi?id=1122084 to be filed, but there are two spearate issues here.

--- Additional comment from Andy Grimm on 2014-07-22 15:56:04 EDT ---

PR: https://github.com/openshift/origin-server/pull/5640

Needs testing.

Comment 1 Brenton Leanhardt 2014-08-26 00:44:45 UTC
Builds:

openshift-origin-cartridge-jbosseap-2.16.3.5-1
openshift-origin-cartridge-jbossews-1.22.3.6-1
openshift-origin-cartridge-jenkins-client-1.19.3.4-1
openshift-origin-cartridge-python-1.23.4.7-1
openshift-origin-node-util-1.22.17.1-1
rubygem-openshift-origin-node-1.23.9.18-1

Upstream commits:

commit 2ab30b8e421adbd20a02efe3c2006479ef8598b6
Author: Andy Grimm <agrimm>
Date:   Tue Jul 22 15:51:41 2014 -0400

    Bug 1122166 - Preserve sparse files during rsync operations

Comment 4 Anping Li 2014-08-26 07:09:43 UTC
Verified and pass on puddle-2-1-2014-08-25. For zero sparse file, the file is handled to save space. for random bytes sparse file, both the quota and disk usage keep the same before and after the move 

1. For all zero sparse files:
Before Move:
[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> ls -lah
-rw-------. 1 53fbfdc8fa838ef802000017 53fbfdc8fa838ef802000017 1001M Aug 26
02:12 test

[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> du -k
~/app-root/data/test
1024    /var/lib/openshift/53fbfdc8fa838ef802000017//app-root/data/test
[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> quota -ws
Disk quotas for user 53fbfdc8fa838ef802000017 (uid 4285):
     Filesystem  blocks   quota   limit   grace   files   quota   limit
grace
/dev/mapper/vg_rhel65-lv_root   24488       0   1024M            1901       0
80000

after Move:
[sjbossews-hanli1dom.oserhn20-20140825.com.cn 53fbfdc8fa838ef802000017]\>
quota -s
Disk quotas for user 53fbfdc8fa838ef802000017 (uid 4285):
     Filesystem  blocks   quota   limit   grace   files   quota   limit
grace
/dev/mapper/vg_rhel65-lv_root
                  23472       0   1024M            1902       0   80000
[sjbossews-hanli1dom.oserhn20-20140825.com.cn 53fbfdc8fa838ef802000017]\> du
-k ~/app-root/data/test
0       /var/lib/openshift/53fbfdc8fa838ef802000017//app-root/data/test

[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> ls -lah
-rw-------. 1 53fbfdc8fa838ef802000017 53fbfdc8fa838ef802000017 1001M Aug 26
02:12 test


2. For random bytes sparse file. 

Before Move
[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> quota -ws
Disk quotas for user 53fbfdc8fa838ef802000017 (uid 4285):
     Filesystem  blocks   quota   limit   grace   files   quota   limit
grace
/dev/mapper/vg_rhel65-lv_root   24500       0   1024M            1902       0
80000

[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> ls -lah test1
-rw-------. 1 53fbfdc8fa838ef802000017 53fbfdc8fa838ef802000017 1001M Aug 26
02:39 test1

[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> du test1
1024    test1


After Move:

ssews-hanli1dom.oserhn20-20140825.com.cn data]\> du test1
1024    test1

[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> ls -lah test1
-rw-------. 1 53fbfdc8fa838ef802000017 53fbfdc8fa838ef802000017 1001M Aug 26
02:39 test

[sjbossews-hanli1dom.oserhn20-20140825.com.cn data]\> quota -ws
Disk quotas for user 53fbfdc8fa838ef802000017 (uid 4285):
     Filesystem  blocks   quota   limit   grace   files   quota   limit
grace
/dev/mapper/vg_rhel65-lv_root   24500       0   1024M            1902       0
8000

Comment 5 Brenton Leanhardt 2014-08-27 20:40:28 UTC
I've added a script for admins to fix gears that hit this bug:

https://github.com/openshift/openshift-extras/pull/444/files

It's based on the script that Online used.  The main changes are the addition of the options and it's not specific to fuse cartridges.  My plan is to create a brief KBase article that points to the file in the openshift-extras repository.

If you could test out the script as part of this bug it would be a big help.

Comment 6 Anping Li 2014-08-28 10:32:32 UTC
This scripts works fine. 
it can reported exceeding quota gears and fix the sparse files. After fixed, the app can be started,accessed, the quota is approximately the same before move.

For quota exceed report:
[root@node1 ~]# ./resparsify.sh 
Searching for gears exceeding quota...
53fefd0408c291528f000090 is exceeding quota. See -h for options to attempt resparsification.
Done.


For fix sparse file:
[root@node1 ~]# ./resparsify.sh -u 53fefd0408c291528f000090 
Fixing 53fefd0408c291528f000090
Disk quotas for user 53fefd0408c291528f000090 (uid 6903): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
/dev/mapper/vg_rhel65-lv_root
                1050528*      0 1048576             216       0   80000        
Disk quotas for user 53fefd0408c291528f000090 (uid 6903): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
/dev/mapper/vg_rhel65-lv_root
                    924       0 1048576             216       0   80000        
Done.

Comment 7 Brenton Leanhardt 2014-08-28 12:58:47 UTC
Thanks for the help Anping!

Comment 9 errata-xmlrpc 2014-09-11 20:06:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1183.html


Note You need to log in before you can comment on or make changes to this bug.