Description of problem: When a gear containing sparse files is moved with oo-admin-move, sparseness is not preserved, which causes unnecessary disk usage. This can also cause a gear to exceed its quota, since quotas are disabled during the move. Version-Release number of selected component (if applicable): rubygem-openshift-origin-msg-broker-mcollective-1.26.2-1.el6oso.noarch How reproducible: Always Steps to Reproduce: 1. create an app 2. ssh into the app 3. dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1 4. check the quota and "du -k ~/app-root/data/test" 5. oo-admin-move the gear 6. check the quota and "du -k ~/app-root/data/test" again Actual results: The quota and the disk usage of the test file in particular will both increase by 1000 MB after the move. Expected results: The quota should be approximately the same before and after the move Additional info: There are several other places in the code where rsync is called. these should all be evaluated to determine whether adding "-S" (and maybe even "-H") is appropriate.
Note that this issue was exposed by the same situation that caused https://bugzilla.redhat.com/show_bug.cgi?id=1122084 to be filed, but there are two spearate issues here.
PR: https://github.com/openshift/origin-server/pull/5640 Needs testing.
Testing Andy's PR
Tested proposed PR: 1. create 2 districts with same profile 2. add a node in each district 3. create an app 4. ssh into app [myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00147788 s, 710 MB/s [myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> du -k ~/app-root/data/test 1024 /var/lib/openshift/53d2f6f5ce191294cf000001//app-root/data/test [myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> 5. run oo-admin-move bash-4.1# oo-admin-move --gear_uuid 53d2f6f5ce191294cf000001 --destination_district_uuid 53d2f5ffd45f6bcfc3000001 -i ip-10-47-172-53 URL: http://myapp-bugtest.dev.rhcloud.com Login: decarr App UUID: 53d2f6f5ce191294cf000001 Gear UUID: 53d2f6f5ce191294cf000001 DEBUG: Destination district uuid '53d2f5ffd45f6bcfc3000001' is being ignored in favor of destination container ip-10-47-172-53 DEBUG: Source district uuid: 295470732931138548924416 DEBUG: Destination district uuid: a8ccd2da145b11e4aff43a41578b65d0 DEBUG: Getting existing app 'myapp' status before moving DEBUG: Gear component 'jbossas-7' was running DEBUG: Stopping existing app cartridge 'jbossas-7' before moving DEBUG: Force stopping existing app before moving DEBUG: Reserved uid '1489' on district: 'a8ccd2da145b11e4aff43a41578b65d0' DEBUG: Gear platform is 'linux' DEBUG: Creating new account for gear 'myapp' on ip-10-47-172-53 DEBUG: Moving content for app 'myapp', gear 'myapp' to ip-10-47-172-53 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Warning: Permanently added '10.144.192.76' (RSA) to the list of known hosts. Warning: Permanently added '10.47.172.53' (RSA) to the list of known hosts. Agent pid 221 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 221 killed; DEBUG: Moving system components for app 'myapp', gear 'myapp' to ip-10-47-172-53 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Agent pid 243 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 243 killed; DEBUG: Starting cartridge 'jbossas-7' in 'myapp' after move on ip-10-47-172-53 DEBUG: Fixing DNS and mongo for gear 'myapp' after move DEBUG: Changing server identity of 'myapp' from 'ip-10-144-192-76' to 'ip-10-47-172-53' DEBUG: Deconfiguring old app 'myapp' on ip-10-144-192-76 after move Successfully moved gear with uuid '53d2f6f5ce191294cf000001' of app 'myapp' from 'ip-10-144-192-76' to 'ip-10-47-172-53' 6. validate disk usage on new app [myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00136305 s, 769 MB/s [myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> du -k ~/app-root/data/test 1024 /var/lib/openshift/53d2f6f5ce191294cf000001//app-root/data/test [myapp-bugtest.dev.rhcloud.com 53d2f6f5ce191294cf000001]\> things looked good
Checked on devenv_5011, the quota are the same before and after the move, but the disk usage of the test file in particular will increase by 1000 MB after the move. Steps used list below: 1. create a district with small profile 2. add 2 nodes to the district 3. create an app 4. ssh into app [diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00118324 s, 886 MB/s [diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> du -k ~/app-root/data/test 1024 /var/lib/openshift/53d607bc50f94c7a9f000001//app-root/data/test [diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> quota Disk quotas for user 53d607bc50f94c7a9f000001 (uid 2044): Filesystem blocks quota limit grace files quota limit grace /dev/xvda1 1720 0 1048576 168 0 80000 5. oo-admin-move the gear [root@ip-10-167-165-175 ~]# oo-broker oo-admin-move --gear_uuid 53d607bc50f94c7a9f000001 -i ip-10-167-165-175 URL: http://diy-lxia.dev.rhcloud.com Login: lxia App UUID: 53d607bc50f94c7a9f000001 Gear UUID: 53d607bc50f94c7a9f000001 DEBUG: Source district uuid: 969464524087737874120704 DEBUG: Destination district uuid: 969464524087737874120704 DEBUG: Getting existing app 'diy' status before moving DEBUG: Gear component 'diy-0.1' was running DEBUG: Stopping existing app cartridge 'diy-0.1' before moving DEBUG: Force stopping existing app before moving DEBUG: Gear platform is 'linux' DEBUG: Creating new account for gear 'diy' on ip-10-167-165-175 DEBUG: Moving content for app 'diy', gear 'diy' to ip-10-167-165-175 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Warning: Permanently added '10.186.168.48' (RSA) to the list of known hosts. Warning: Permanently added '10.167.165.175' (RSA) to the list of known hosts. Agent pid 112 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 112 killed; DEBUG: Moving system components for app 'diy', gear 'diy' to ip-10-167-165-175 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Agent pid 134 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 134 killed; DEBUG: Starting cartridge 'diy-0.1' in 'diy' after move on ip-10-167-165-175 DEBUG: Fixing DNS and mongo for gear 'diy' after move DEBUG: Changing server identity of 'diy' from 'ip-10-186-168-48' to 'ip-10-167-165-175' DEBUG: Deconfiguring old app 'diy' on ip-10-186-168-48 after move Successfully moved gear with uuid '53d607bc50f94c7a9f000001' of app 'diy' from 'ip-10-186-168-48' to 'ip-10-167-165-175' 6. validate disk usage on new app [diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> du -k ~/app-root/data/test 1025028 /var/lib/openshift/53d607bc50f94c7a9f000001//app-root/data/test [diy-lxia.dev.rhcloud.com 53d607bc50f94c7a9f000001]\> quota Disk quotas for user 53d607bc50f94c7a9f000001 (uid 2044): Filesystem blocks quota limit grace files quota limit grace /dev/xvda1 1026752 0 1048576 170 0 80000 You can see, the quota(the limit) are the same before and after the move(both are 1048576), but the disk usage of the test file in particular will increase by 1000 MB after the move(from 1024K to 1025028K).
The merge didn't complete for some reason. Kicking off Jenkins process again to merge Andy's pull.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/2ab30b8e421adbd20a02efe3c2006479ef8598b6 Bug 1122166 - Preserve sparse files during rsync operations
Checked on devenv_5027, the quota are the same before and after the move, but the disk usage of the test file in particular will decrease to 0 after the move. Is it expected, or another issue cause by above fix? Steps used list below: 1. create a district with small profile 2. add 2 nodes to the district 3. create an app 4. ssh into app [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> dd if=/dev/zero of=~/app-root/data/test seek=1000 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00119314 s, 879 MB/s [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> du -k ~/app-root/data/test 1024 /var/lib/openshift/53d8a0c1cfbc997441000001//app-root/data/test [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> ls -lh app-root/data/test -rw-------. 1 53d8a0c1cfbc997441000001 53d8a0c1cfbc997441000001 1001M Jul 30 03:39 app-root/data/test [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> quota Disk quotas for user 53d8a0c1cfbc997441000001 (uid 1867): Filesystem blocks quota limit grace files quota limit grace /dev/xvda1 1720 0 1048576 168 0 80000 5. oo-admin-move the gear [root@ip-10-136-48-153 ~]# oo-broker oo-admin-move --gear_uuid 53d8a0c1cfbc997441000001 -i ip-10-167-177-145 URL: http://diy-lxia.dev.rhcloud.com Login: lxia App UUID: 53d8a0c1cfbc997441000001 Gear UUID: 53d8a0c1cfbc997441000001 DEBUG: Source district uuid: defbe8c417bb11e4892b1ee109ab2a63 DEBUG: Destination district uuid: defbe8c417bb11e4892b1ee109ab2a63 DEBUG: Getting existing app 'diy' status before moving DEBUG: Gear component 'diy-0.1' was running DEBUG: Stopping existing app cartridge 'diy-0.1' before moving DEBUG: Force stopping existing app before moving DEBUG: Gear platform is 'linux' DEBUG: Creating new account for gear 'diy' on ip-10-167-177-145 DEBUG: Moving content for app 'diy', gear 'diy' to ip-10-167-177-145 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Warning: Permanently added '10.136.48.153' (RSA) to the list of known hosts. Agent pid 112 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 112 killed; DEBUG: Moving system components for app 'diy', gear 'diy' to ip-10-167-177-145 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Agent pid 134 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 134 killed; DEBUG: Starting cartridge 'diy-0.1' in 'diy' after move on ip-10-167-177-145 DEBUG: Fixing DNS and mongo for gear 'diy' after move DEBUG: Changing server identity of 'diy' from 'ip-10-136-48-153' to 'ip-10-167-177-145' DEBUG: Deconfiguring old app 'diy' on ip-10-136-48-153 after move Successfully moved gear with uuid '53d8a0c1cfbc997441000001' of app 'diy' from 'ip-10-136-48-153' to 'ip-10-167-177-145' 6. validate disk usage on new app [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> du -k ~/app-root/data/test 0 /var/lib/openshift/53d8a0c1cfbc997441000001//app-root/data/test [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> ls -lh app-root/data/test -rw-------. 1 53d8a0c1cfbc997441000001 53d8a0c1cfbc997441000001 1001M Jul 30 03:39 app-root/data/test [diy-lxia.dev.rhcloud.com 53d8a0c1cfbc997441000001]\> quota Disk quotas for user 53d8a0c1cfbc997441000001 (uid 1867): Filesystem blocks quota limit grace files quota limit grace /dev/xvda1 700 0 1048576 169 0 80000
Since you wrote zeros to the end of the sparse file, rsync will merge those zeroes into the sparse region of the file during the sync, so it is not surprising that the usage would go down. it would be a better test to write random bytes to the end of the sparse file; then the usage before and after would be the same. Coincidentally, we just had a customer issue raised that brought a potentially negative consequence of this patch to my attention. For databases such as mongo, it is standard procedure for them to intentionally preallocate some files to ensure that the data is contiguous on the filesystem. When "rsync -S" is used, rsync does not pay attention to whether the source file was sparsely allocated on disk. It always tries to make target files sparse if possible. (In other words, it is like "cp --sparse=always" rather than "cp --sparse=auto"). This makes me think that maybe we should be using tar instead, since its "-S" option only makes target files sparse if the source file was sparse. This would be a larger change, though.
Per Andy's comment, I think the fix as supplied is better than the current product behavior. Having a gear hit quota has a larger operational impact than mongo having spare journals. I think we should investigate a tar solution as a future feature enhancement.
Checked on devenv-stage_937, verified both the quota and disk usage keep the same before and after the move when adding random bytes to the end of the sparse file. So move bug to verified. Steps used list below: 1. create a district with small profile 2. add 2 nodes to the district 3. create an app 4. ssh into app [diy-lxia.dev.rhcloud.com data]\> dd if=/dev/urandom of=./test1 seek=1000 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.243243 s, 4.3 MB/s [diy-lxia.dev.rhcloud.com data]\> dd if=/dev/urandom of=./test2 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.244365 s, 4.3 MB/s [diy-lxia.dev.rhcloud.com data]\> du -k test* 1024 test1 1024 test2 [diy-lxia.dev.rhcloud.com data]\> quota Disk quotas for user 53d9e863235c2e0b52000006 (uid 1585): Filesystem blocks quota limit grace files quota limit grace /dev/xvda1 2748 0 1048576 170 0 80000 5. oo-admin-move the gear [root@ip-10-179-187-164 ~]# oo-broker oo-admin-move --gear_uuid 53d9e863235c2e0b52000006 -i ip-10-179-187-164 URL: http://diy-lxia.dev.rhcloud.com Login: lxia App UUID: 53d9e863235c2e0b52000006 Gear UUID: 53d9e863235c2e0b52000006 DEBUG: Source district uuid: 728756879810492677226496 DEBUG: Destination district uuid: 728756879810492677226496 DEBUG: Getting existing app 'diy' status before moving DEBUG: Gear component 'diy-0.1' was running DEBUG: Stopping existing app cartridge 'diy-0.1' before moving DEBUG: Force stopping existing app before moving DEBUG: Gear platform is 'linux' DEBUG: Creating new account for gear 'diy' on ip-10-179-187-164 DEBUG: Moving content for app 'diy', gear 'diy' to ip-10-179-187-164 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Warning: Permanently added '10.142.175.99' (RSA) to the list of known hosts. Warning: Permanently added '10.179.187.164' (RSA) to the list of known hosts. Agent pid 114 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 114 killed; DEBUG: Moving system components for app 'diy', gear 'diy' to ip-10-179-187-164 Identity added: /etc/openshift/rsync_id_rsa (/etc/openshift/rsync_id_rsa) Agent pid 136 unset SSH_AUTH_SOCK; unset SSH_AGENT_PID; echo Agent pid 136 killed; DEBUG: Starting cartridge 'diy-0.1' in 'diy' after move on ip-10-179-187-164 DEBUG: Fixing DNS and mongo for gear 'diy' after move DEBUG: Changing server identity of 'diy' from 'ip-10-142-175-99' to 'ip-10-179-187-164' DEBUG: Deconfiguring old app 'diy' on ip-10-142-175-99 after move Successfully moved gear with uuid '53d9e863235c2e0b52000006' of app 'diy' from 'ip-10-142-175-99' to 'ip-10-179-187-164' 6. validate disk usage on new app [diy-lxia.dev.rhcloud.com data]\> du -k test* 1024 test1 1024 test2 [diy-lxia.dev.rhcloud.com data]\> ls -lh total 2.0M -rw-------. 1 53d9e863235c2e0b52000006 53d9e863235c2e0b52000006 1001M Jul 31 03:12 test1 -rw-------. 1 53d9e863235c2e0b52000006 53d9e863235c2e0b52000006 1.0M Jul 31 03:12 test2 [diy-lxia.dev.rhcloud.com data]\> quota Disk quotas for user 53d9e863235c2e0b52000006 (uid 1585): Filesystem blocks quota limit grace files quota limit grace /dev/xvda1 2748 0 1048576 170 0 80000