Bug 1360317 - [GSS] glusterfs doesn't respect cluster.min-free-disk on remove-brick operation
Summary: [GSS] glusterfs doesn't respect cluster.min-free-disk on remove-brick operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.3.0
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On: 1441508 1473132 1473133
Blocks: 1369781 1417145
TreeView+ depends on / blocked
 
Reported: 2016-07-26 12:26 UTC by Mateusz Mazur
Modified: 2020-12-11 12:16 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.8.4-25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:28:23 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Mateusz Mazur 2016-07-26 12:26:01 UTC
Description of problem:

During the remove-brick on distribute type of volumen is not respect option cluster.min-free-disk, which can be problematic when unequal capacity bricks.
This leads to the "No space left on device" on small bricks. Logs:

[2016-07-26 10:39:06.589909] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-test-client-1: remote operation failed [No space left on device]
[2016-07-26 10:39:06.589976] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-test-dht: Migrate file failed: /test_file.424: failed to migrate data
[2016-07-26 10:39:06.590375] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-test-client-1: remote operation failed [No space left on device]
[2016-07-26 10:39:06.590408] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-test-dht: Migrate file failed: /test_file.420: failed to migrate data


Version-Release number of selected component (if applicable):

glusterfs-3.7.1-16.el7rhs.x86_64
glusterfs-fuse-3.7.1-16.el7rhs.x86_64
glusterfs-client-xlators-3.7.1-16.el7rhs.x86_64
glusterfs-api-3.7.1-16.el7rhs.x86_64
glusterfs-server-3.7.1-16.el7rhs.x86_64
glusterfs-libs-3.7.1-16.el7rhs.x86_64
glusterfs-cli-3.7.1-16.el7rhs.x86_64


How reproducible:

Always at different size of bricks.


Steps to Reproduce:

Two servers glusterfs with IP:
- 10.209.2.164 - gluster-test1
- 10.209.2.165 - gluster-test2

XFS volumens mounted on each servers:
- /srv/storage1 - 4GB
- /srv/storage2 - 8GB
- /srv/storage3 - 10GB


1. Create distribute volume 'test':

~ # gluster volume create test 10.209.2.164:/srv/storage1/test 10.209.2.165:/srv/storage1/test 10.209.2.164:/srv/storage2/test 10.209.2.165:/srv/storage2/test 10.209.2.164:/srv/storage3/test 10.209.2.165:/srv/storage3/test


2. Set option 'cluster.min-free-disk' to '2GB':

~ # gluster volume set test cluster.min-free-disk 2GB


3. Set option 'cluster.weighted-rebalance' to 'on'

~ # gluster volume set test cluster.weighted-rebalance on


4. Start volume 'test'

~ # gluster volume start test


5. Mount volume 'test' on /mnt/test


6. Generate 25GB testing data with 'sysbench' tool:

/mnt/test # sysbench --test=fileio --file-total-size=25G --file-num=512 prepare


7. After that 'df' looks like this:

root ~ # df -h /srv/storage1/ /srv/storage2/ /srv/storage3/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc        4.1G  2.1G  2.0G  51% /srv/storage1
/dev/vdd        8.1G  4.0G  4.1G  50% /srv/storage2
/dev/vde         10G  6.1G  4.0G  61% /srv/storage3

root ~ # df -h /srv/storage1/ /srv/storage2/ /srv/storage3/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc        4.1G  2.1G  2.0G  51% /srv/storage1
/dev/vdd        8.1G  5.2G  2.9G  65% /srv/storage2
/dev/vde         10G  6.0G  4.1G  60% /srv/storage3


8. Remove 8GB brick:

~ # gluster volume remove-brick test 10.209.2.164:/srv/storage2/test


9. After that, remove-brick reports 'completed' of job (with failures), but some files on 10.209.2.164:/srv/storage2/test still exist, and two another volumens are full:

root ~ # gluster volume remove-brick test 10.209.2.164:/srv/storage2/test status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              113         5.5GB           301            69             0            completed              66.00

root ~ # df -h /srv/storage1/ /srv/storage2/ /srv/storage3/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc        4.1G  1.1G  3.0G  27% /srv/storage1
/dev/vdd        8.1G  2.2G  5.9G  28% /srv/storage2
/dev/vde         10G  3.7G  6.4G  37% /srv/storage3

root ~ # df -h /srv/storage1/ /srv/storage2/ /srv/storage3/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc        4.1G  4.0G   18M 100% /srv/storage1
/dev/vdd        8.1G  8.0G   13M 100% /srv/storage2
/dev/vde         10G  6.4G  3.7G  64% /srv/storage3


10. Logs from '/var/log/glusterfs/test-rebalance.log':

[2016-07-26 10:39:06.671197] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.474: attempting to move from test-client-6 to test-client-1
[2016-07-26 10:39:07.019233] I [MSGID: 109022] [dht-rebalance.c:1316:dht_migrate_file] 0-test-dht: completed migration of /test_file.427 from subvolume test-client-6 to test-client-0
[2016-07-26 10:39:07.023070] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.475: attempting to move from test-client-6 to test-client-3
[2016-07-26 10:39:07.033200] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-3) which does not have required free space for (/test_file.475)
[2016-07-26 10:39:07.036136] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.476: attempting to move from test-client-6 to test-client-3
[2016-07-26 10:39:07.046853] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-3) which does not have required free space for (/test_file.476)
[2016-07-26 10:39:07.049549] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.478: attempting to move from test-client-6 to test-client-3
[2016-07-26 10:39:07.059378] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-3) which does not have required free space for (/test_file.478)
[2016-07-26 10:39:07.061465] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.484: attempting to move from test-client-6 to test-client-0
[2016-07-26 10:39:07.768400] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-test-client-1: remote operation failed [No space left on device]
[2016-07-26 10:39:07.768632] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-test-dht: Migrate file failed: /test_file.496: failed to migrate data
[2016-07-26 10:39:07.770780] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-test-client-1: remote operation failed [No space left on device]
[2016-07-26 10:39:07.770842] E [MSGID: 109023] [dht-rebalance.c:1124:dht_migrate_file] 0-test-dht: Migrate file failed: /test_file.474: failed to migrate data
[2016-07-26 10:39:07.783232] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.485: attempting to move from test-client-6 to test-client-3
[2016-07-26 10:39:07.788292] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.490: attempting to move from test-client-6 to test-client-1
[2016-07-26 10:39:07.795349] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-3) which does not have required free space for (/test_file.485)
[2016-07-26 10:39:07.799550] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.491: attempting to move from test-client-6 to test-client-1
[2016-07-26 10:39:07.802247] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-1) which does not have required free space for (/test_file.490)
[2016-07-26 10:39:07.806331] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.492: attempting to move from test-client-6 to test-client-1
[2016-07-26 10:39:07.813135] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-1) which does not have required free space for (/test_file.491)
[2016-07-26 10:39:07.817428] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.497: attempting to move from test-client-6 to test-client-3
[2016-07-26 10:39:07.820004] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-1) which does not have required free space for (/test_file.492)
[2016-07-26 10:39:07.823201] I [dht-rebalance.c:1002:dht_migrate_file] 0-test-dht: /test_file.511: attempting to move from test-client-6 to test-client-3
[2016-07-26 10:39:07.830852] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-3) which does not have required free space for (/test_file.497)
[2016-07-26 10:39:07.835856] E [MSGID: 109023] [dht-rebalance.c:672:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-client-6) to node (test-client-3) which does not have required free space for (/test_file.511)
[2016-07-26 10:39:08.010471] I [MSGID: 109022] [dht-rebalance.c:1316:dht_migrate_file] 0-test-dht: completed migration of /test_file.484 from subvolume test-client-6 to test-client-0
[2016-07-26 10:39:08.415897] I [MSGID: 109022] [dht-rebalance.c:1316:dht_migrate_file] 0-test-dht: completed migration of /test_file.508 from subvolume test-client-4 to test-client-1
[2016-07-26 10:39:08.416988] I [MSGID: 109028] [dht-rebalance.c:3063:gf_defrag_status_get] 0-test-dht: Rebalance is completed. Time taken is 66.00 secs
[2016-07-26 10:39:08.417036] I [MSGID: 109028] [dht-rebalance.c:3067:gf_defrag_status_get] 0-test-dht: Files migrated: 113, size: 5924454400, lookups: 301, failures: 0, skipped: 69
[2016-07-26 10:39:08.417495] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fba9dc3fdc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fba9f2a8785] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7fba9f2a8609] ) 0-: received signum (15), shutting down


Actual results:

During remove-brick on 'distribute' volume, on configuration unequal capacity bricks leds to 'no space on device' on small bricks.


Expected results:

During remove-brick should respect 'cluster.min-free-disk' which does not allow to overfill bricks.

Comment 4 Raghavendra G 2017-04-11 10:14:40 UTC
__dht_check_free_space (dht-rebalance.c), as can be seen below checks whether dst has enough space to accomodate the file that is being migrated. Nowhere it is checking whether dst has free space configured in 'min-free-disk'. Now, since parallel migrations can happen to same brick (either from same rebalance process - multithreaded rebalance - or from multiple rebalance processes), we can end up saying yes to more than one parallel migrations of different files when there is only enough free space to hold the largest of files, which can result in ENOSPC during migration.

check_avail_space:
        if (((dst_statfs.f_bavail * dst_statfs.f_bsize) /
              GF_DISK_SECTOR_SIZE) < stbuf->ia_blocks) {
                gf_msg (this->name, GF_LOG_ERROR, 0,
                        DHT_MSG_MIGRATE_FILE_FAILED,
                        "data movement attempted from node (%s) to node (%s) "
                        "which does not have required free space for (%s)",
                        from->name, to->name, loc->path);
                ret = -1;
                goto out;
        }

We should either:

* use a buffer space to account for parallel migrations (like a min-free-disk relatively larger than largest file size)
* make check_free_space resilient against parallel migrations (may be by atomically decrementing free space in statfs by the size of file being migrated etc).

regards,
Raghavendra

Comment 5 Atin Mukherjee 2017-04-11 12:20:40 UTC
upstream patch : https://review.gluster.org/#/c/17034/

Comment 22 Prasad Desala 2017-08-02 12:57:53 UTC
Verified this BZ on glusterfs version: 3.8.4-36.el7rhgs.x86_64.
This issue is fixed, now remove-brick is considering cluster.min-free-disk values during file migration.

Moving this BZ to Verified state.

Comment 24 errata-xmlrpc 2017-09-21 04:28:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 25 errata-xmlrpc 2017-09-21 04:54:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.