Bug 1122166 - oo-admin-move (and other rsync invocations) should preserve sparse files
Summary: oo-admin-move (and other rsync invocations) should preserve sparse files
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 1.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1122271
TreeView+ depends on / blocked
 
Reported: 2014-07-22 17:04 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1122271 (view as bug list)
Environment:
Last Closed: 2014-10-10 00:49:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andy Grimm 2014-07-22 17:04:20 UTC
Description of problem:

When a gear containing sparse files is moved with oo-admin-move, sparseness is not preserved, which causes unnecessary disk usage.  This can also cause a gear to exceed its quota, since quotas are disabled during the move.

Version-Release number of selected component (if applicable):

rubygem-openshift-origin-msg-broker-mcollective-1.26.2-1.el6oso.noarch

How reproducible:

Always

Steps to Reproduce:
1.  create an app
2.  ssh into the app
3.  dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1
4.  check the quota and "du -k ~/app-root/data/test"
5.  oo-admin-move the gear
6.  check the quota and "du -k ~/app-root/data/test" again

Actual results:

The quota and the disk usage of the test file in particular will both increase by 1000 MB after the move.

Expected results:

The quota should be approximately the same before and after the move

Additional info:

There are several other places in the code where rsync is called.  these should all be evaluated to determine whether adding "-S" (and maybe even "-H") is appropriate.

Comment 1 Andy Grimm 2014-07-22 17:05:19 UTC
Note that this issue was exposed by the same situation that caused https://bugzilla.redhat.com/show_bug.cgi?id=1122084 to be filed, but there are two spearate issues here.

Comment 2 Andy Grimm 2014-07-22 19:56:04 UTC
PR: https://github.com/openshift/origin-server/pull/5640

Needs testing.

Comment 3 Derek Carr 2014-07-25 18:03:36 UTC
Testing Andy's PR

Comment 4 Derek Carr 2014-07-25 20:48:00 UTC
Tested proposed PR:

1. create 2 districts with same profile
2. add a node in each district
3. create an app
4. ssh into app

[myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00147788 s, 710 MB/s
[myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> du -k ~/app-root/data/test 
1024	/var/lib/openshift/53d2f6f5ce191294cf000001//app-root/data/test
[myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> 

5. run oo-admin-move

bash-4.1# oo-admin-move --gear_uuid 53d2f6f5ce191294cf000001 --destination_district_uuid 53d2f5ffd45f6bcfc3000001 -i ip-10-47-172-53
URL: http://myapp-bugtest.dev.rhcloud.com
Login: decarr
App UUID: 53d2f6f5ce191294cf000001
Gear UUID: 53d2f6f5ce191294cf000001
DEBUG: Destination district uuid '53d2f5ffd45f6bcfc3000001' is being ignored in favor of destination container ip-10-47-172-53
DEBUG: Source district uuid: 295470732931138548924416
DEBUG: Destination district uuid: a8ccd2da145b11e4aff43a41578b65d0
DEBUG: Getting existing app 'myapp' status before moving
DEBUG: Gear component 'jbossas-7' was running
DEBUG: Stopping existing app cartridge 'jbossas-7' before moving
DEBUG: Force stopping existing app before moving
DEBUG: Reserved uid '1489' on district: 'a8ccd2da145b11e4aff43a41578b65d0'
DEBUG: Gear platform is 'linux'
DEBUG: Creating new account for gear 'myapp' on ip-10-47-172-53
DEBUG: Moving content for app 'myapp', gear 'myapp' to ip-10-47-172-53
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Warning: Permanently added '10.144.192.76' (RSA) to the list of known hosts.
Warning: Permanently added '10.47.172.53' (RSA) to the list of known hosts.
Agent pid 221
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 221 killed;
DEBUG: Moving system components for app 'myapp', gear 'myapp' to ip-10-47-172-53
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Agent pid 243
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 243 killed;
DEBUG: Starting cartridge 'jbossas-7' in 'myapp' after move on ip-10-47-172-53
DEBUG: Fixing DNS and mongo for gear 'myapp' after move
DEBUG: Changing server identity of 'myapp' from 'ip-10-144-192-76' to 'ip-10-47-172-53'
DEBUG: Deconfiguring old app 'myapp' on ip-10-144-192-76 after move
Successfully moved gear with uuid '53d2f6f5ce191294cf000001' of app 'myapp' from 'ip-10-144-192-76' to 'ip-10-47-172-53'

6. validate disk usage on new app

[myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00136305 s, 769 MB/s
[myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> du -k ~/app-root/data/test
1024	/var/lib/openshift/53d2f6f5ce191294cf000001//app-root/data/test
[myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> 

things looked good

Comment 5 Liang Xia 2014-07-28 04:46:49 UTC
Checked on devenv_5011, 
the quota are the same before and after the move, but 
the disk usage of the test file in particular will increase by 1000 MB after the move.

Steps used list below:

1. create a district with small profile
2. add 2 nodes to the district
3. create an app
4. ssh into app

[diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00118324 s, 886 MB/s
[diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> du -k ~/app-root/data/test 
1024	 /var/lib/openshift/53d607bc50f94c7a9f000001//app-root/data/test
[diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> quota 
Disk quotas for user 53d607bc50f94c7a9f000001 (uid 2044): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvda1    1720       0 1048576             168       0   80000 

5. oo-admin-move the gear

[root@ip-10-167-165-175 ~]# oo-broker oo-admin-move --gear_uuid 53d607bc50f94c7a9f000001 -i ip-10-167-165-175
URL: http://diy-lxia.dev.rhcloud.com
Login: lxia
App UUID: 53d607bc50f94c7a9f000001
Gear UUID: 53d607bc50f94c7a9f000001
DEBUG: Source district uuid: 969464524087737874120704
DEBUG: Destination district uuid: 969464524087737874120704
DEBUG: Getting existing app 'diy' status before moving
DEBUG: Gear component 'diy-0.1' was running
DEBUG: Stopping existing app cartridge 'diy-0.1' before moving
DEBUG: Force stopping existing app before moving
DEBUG: Gear platform is 'linux'
DEBUG: Creating new account for gear 'diy' on ip-10-167-165-175
DEBUG: Moving content for app 'diy', gear 'diy' to ip-10-167-165-175
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Warning: Permanently added '10.186.168.48' (RSA) to the list of known hosts.
Warning: Permanently added '10.167.165.175' (RSA) to the list of known hosts.
Agent pid 112
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 112 killed;
DEBUG: Moving system components for app 'diy', gear 'diy' to ip-10-167-165-175
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Agent pid 134
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 134 killed;
DEBUG: Starting cartridge 'diy-0.1' in 'diy' after move on ip-10-167-165-175
DEBUG: Fixing DNS and mongo for gear 'diy' after move
DEBUG: Changing server identity of 'diy' from 'ip-10-186-168-48' to 'ip-10-167-165-175'
DEBUG: Deconfiguring old app 'diy' on ip-10-186-168-48 after move
Successfully moved gear with uuid '53d607bc50f94c7a9f000001' of app 'diy' from 'ip-10-186-168-48' to 'ip-10-167-165-175'

6. validate disk usage on new app

[diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> du -k ~/app-root/data/test
1025028	/var/lib/openshift/53d607bc50f94c7a9f000001//app-root/data/test
[diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> quota
Disk quotas for user 53d607bc50f94c7a9f000001 (uid 2044): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvda1 1026752       0 1048576             170       0   80000

You can see, 
the quota(the limit) are the same before and after the move(both are 1048576), but the disk usage of the test file in particular will increase by 1000 MB after the move(from 1024K to 1025028K).

Comment 6 Derek Carr 2014-07-28 13:52:59 UTC
The merge didn't complete for some reason.  Kicking off Jenkins process again to merge Andy's pull.

Comment 7 openshift-github-bot 2014-07-29 17:50:31 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/2ab30b8e421adbd20a02efe3c2006479ef8598b6
Bug 1122166 - Preserve sparse files during rsync operations

Comment 8 Liang Xia 2014-07-30 08:13:06 UTC
Checked on devenv_5027, 
the quota are the same before and after the move, but 
the disk usage of the test file in particular will decrease to 0 after the move.
Is it expected, or another issue cause by above fix?

Steps used list below:

1. create a district with small profile
2. add 2 nodes to the district
3. create an app
4. ssh into app

[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\>  dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00119314 s, 879 MB/s
[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> du -k ~/app-root/data/test 
1024	 /var/lib/openshift/53d8a0c1cfbc997441000001//app-root/data/test
[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> ls -lh app-root/data/test 
-rw-------. 1 53d8a0c1cfbc997441000001 53d8a0c1cfbc997441000001 1001M Jul 30 03:39 app-root/data/test
[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> quota 
Disk quotas for user 53d8a0c1cfbc997441000001 (uid 1867): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvda1    1720       0 1048576             168       0   80000 

5. oo-admin-move the gear

[root@ip-10-136-48-153 ~]# oo-broker oo-admin-move --gear_uuid 53d8a0c1cfbc997441000001 -i ip-10-167-177-145
URL: http://diy-lxia.dev.rhcloud.com
Login: lxia
App UUID: 53d8a0c1cfbc997441000001
Gear UUID: 53d8a0c1cfbc997441000001
DEBUG: Source district uuid: defbe8c417bb11e4892b1ee109ab2a63
DEBUG: Destination district uuid: defbe8c417bb11e4892b1ee109ab2a63
DEBUG: Getting existing app 'diy' status before moving
DEBUG: Gear component 'diy-0.1' was running
DEBUG: Stopping existing app cartridge 'diy-0.1' before moving
DEBUG: Force stopping existing app before moving
DEBUG: Gear platform is 'linux'
DEBUG: Creating new account for gear 'diy' on ip-10-167-177-145
DEBUG: Moving content for app 'diy', gear 'diy' to ip-10-167-177-145
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Warning: Permanently added '10.136.48.153' (RSA) to the list of known hosts.
Agent pid 112
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 112 killed;
DEBUG: Moving system components for app 'diy', gear 'diy' to ip-10-167-177-145
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Agent pid 134
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 134 killed;
DEBUG: Starting cartridge 'diy-0.1' in 'diy' after move on ip-10-167-177-145
DEBUG: Fixing DNS and mongo for gear 'diy' after move
DEBUG: Changing server identity of 'diy' from 'ip-10-136-48-153' to 'ip-10-167-177-145'
DEBUG: Deconfiguring old app 'diy' on ip-10-136-48-153 after move
Successfully moved gear with uuid '53d8a0c1cfbc997441000001' of app 'diy' from 'ip-10-136-48-153' to 'ip-10-167-177-145'


6. validate disk usage on new app

[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> du -k ~/app-root/data/test
0	/var/lib/openshift/53d8a0c1cfbc997441000001//app-root/data/test
[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> ls -lh app-root/data/test 
-rw-------. 1 53d8a0c1cfbc997441000001 53d8a0c1cfbc997441000001 1001M Jul 30 03:39 app-root/data/test
[diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> quota
Disk quotas for user 53d8a0c1cfbc997441000001 (uid 1867): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvda1     700       0 1048576             169       0   80000

Comment 9 Andy Grimm 2014-07-30 12:44:12 UTC
Since you wrote zeros to the end of the sparse file, rsync will merge those zeroes into the sparse region of the file during the sync, so it is not surprising that the usage would go down.  it would be a better test to write random bytes to the end of the sparse file; then the usage before and after would be the same.

Coincidentally, we just had a customer issue raised that brought a potentially negative consequence of this patch to my attention.  For databases such as mongo, it is standard procedure for them to intentionally preallocate some files to ensure that the data is contiguous on the filesystem.  When "rsync -S" is used, rsync does not pay attention to whether the source file was sparsely allocated on disk.  It always tries to make target files sparse if possible.  (In other words, it is like "cp --sparse=always" rather than "cp --sparse=auto").  This makes me think that maybe we should be using tar instead, since its "-S" option only makes target files sparse if the source file was sparse.  This would be a larger change, though.

Comment 10 Derek Carr 2014-07-30 14:53:27 UTC
Per Andy's comment, I think the fix as supplied is better than the current product behavior.  Having a gear hit quota has a larger operational impact than mongo having spare journals.  I think we should investigate a tar solution as a future feature enhancement.

Comment 11 Liang Xia 2014-07-31 03:25:22 UTC
Checked on devenv-stage_937, verified both the quota and disk usage keep the same before and after the move when adding random bytes to the end of the sparse file. So move bug to verified.

Steps used list below:

1. create a district with small profile
2. add 2 nodes to the district
3. create an app
4. ssh into app

[diy-lxia.dev.rhcloud.com data]\> dd if=/dev/urandom of=./test1 seek=1000 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.243243 s, 4.3 MB/s
[diy-lxia.dev.rhcloud.com data]\> dd if=/dev/urandom of=./test2 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.244365 s, 4.3 MB/s
[diy-lxia.dev.rhcloud.com data]\> du -k test*
1024	test1
1024	test2
[diy-lxia.dev.rhcloud.com data]\> quota
Disk quotas for user 53d9e863235c2e0b52000006 (uid 1585): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvda1    2748       0 1048576             170       0   80000

5. oo-admin-move the gear

[root@ip-10-179-187-164 ~]# oo-broker oo-admin-move --gear_uuid 53d9e863235c2e0b52000006 -i ip-10-179-187-164
URL: http://diy-lxia.dev.rhcloud.com
Login: lxia
App UUID: 53d9e863235c2e0b52000006
Gear UUID: 53d9e863235c2e0b52000006
DEBUG: Source district uuid: 728756879810492677226496
DEBUG: Destination district uuid: 728756879810492677226496
DEBUG: Getting existing app 'diy' status before moving
DEBUG: Gear component 'diy-0.1' was running
DEBUG: Stopping existing app cartridge 'diy-0.1' before moving
DEBUG: Force stopping existing app before moving
DEBUG: Gear platform is 'linux'
DEBUG: Creating new account for gear 'diy' on ip-10-179-187-164
DEBUG: Moving content for app 'diy', gear 'diy' to ip-10-179-187-164
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Warning: Permanently added '10.142.175.99' (RSA) to the list of known hosts.
Warning: Permanently added '10.179.187.164' (RSA) to the list of known hosts.
Agent pid 114
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 114 killed;
DEBUG: Moving system components for app 'diy', gear 'diy' to ip-10-179-187-164
Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa)
Agent pid 136
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 136 killed;
DEBUG: Starting cartridge 'diy-0.1' in 'diy' after move on ip-10-179-187-164
DEBUG: Fixing DNS and mongo for gear 'diy' after move
DEBUG: Changing server identity of 'diy' from 'ip-10-142-175-99' to 'ip-10-179-187-164'
DEBUG: Deconfiguring old app 'diy' on ip-10-142-175-99 after move
Successfully moved gear with uuid '53d9e863235c2e0b52000006' of app 'diy' from 'ip-10-142-175-99' to 'ip-10-179-187-164'


6. validate disk usage on new app

[diy-lxia.dev.rhcloud.com data]\> du -k test*
1024	test1
1024	test2
[diy-lxia.dev.rhcloud.com data]\> ls -lh 
total 2.0M
-rw-------. 1 53d9e863235c2e0b52000006 53d9e863235c2e0b52000006 1001M Jul 31 03:12 test1
-rw-------. 1 53d9e863235c2e0b52000006 53d9e863235c2e0b52000006  1.0M Jul 31 03:12 test2
[diy-lxia.dev.rhcloud.com data]\> quota
Disk quotas for user 53d9e863235c2e0b52000006 (uid 1585): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     /dev/xvda1    2748       0 1048576             170       0   80000


Note You need to log in before you can comment on or make changes to this bug.