Bug 1285797
| Summary: | tiering: T files getting created , even after disk quota exceeds | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anil Shah <ashah> | |
| Component: | tier | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> | |
| Status: | CLOSED ERRATA | QA Contact: | Anil Shah <ashah> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.1 | CC: | asrivast, byarlaga, dlambrig, josferna, rcyriac, rhs-bugs, rkavunga, sankarshan, storage-qa-internal, vmallika | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | RHGS 3.1.2 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.7.5-15 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1290677 (view as bug list) | Environment: | ||
| Last Closed: | 2016-03-01 05:58:59 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1260783, 1290677, 1295359 | |||
|
Description
Anil Shah
2015-11-26 13:51:31 UTC
Also, these files are showing up on mount point and with permission as ?. I suspect this could be the problem when disk quota is exceeded, new files creation fails on hot-tier but creating linkto file (this is an internal fop, so quota skips enforcement) is success on cold-tier and hence only T files are created.. ========== THERE COULD BE A POSSIBLE SPLIT BRAIN TOO========== 1)create a dist-rep over dist-rep volume 2)mounted on fuse and created 2 700MB files and 2 1.2GB files 3)attached tier ie 2x2 each brick of about 1GB...meaning hot tier is 2GB 4)Deisabled watermark 5)appended all the files to see a promote 6)saw the promote heat and links created in hot tier to promote (see log line#180 till line#339) it can be seen only one file got promoted (mosa700.avi) got following errors due to lack of disk space in hot tier(which is absolutely fine) ###################################################### [2015-12-01 13:52:00.876106] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//trans.avi) [2015-12-01 13:52:00.898462] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//gola.avi) [2015-12-01 13:52:26.694075] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//mm700.avi) OBSERVATION#1----------------->link files are created for all failed files in hot tier 7) then set the log level to trace 8)now with mosa700.avi demoted . I created(scp'ed)a file of 1.2 gb to this mount and parallel heated all the cold files 9) during which time i observed following logs: (refer log line#485 onwards) ########retrying after setting to trace log lveel [2015-12-01 13:58:29.922001] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 741.00 secs [2015-12-01 13:58:29.922042] I [MSGID: 109028] [dht-rebalance.c:3612:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 0, lookups: 11, failures: 0, skipped: 0 [2015-12-01 13:58:33.370249] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 745.00 secs [2015-12-01 13:58:36.972573] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 748.00 secs [2015-12-01 14:00:00.212550] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//trans.avi) [2015-12-01 14:00:00.240233] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//gola.avi) [2015-12-01 14:00:00.245615] I [MSGID: 109038] [tier.c:530:tier_migrate_using_query_file] 0-fsync-tier-dht: Reached cycle migration limit.migrated bytes 1486189890 files 2 [2015-12-01 13:58:36.994649] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 748.00 secs The message "I [MSGID: 109028] [dht-rebalance.c:3612:gf_defrag_status_get] 0-glusterfs: Files migrated: 11, size: 0, lookups: 11, failures: 0, skipped: 0" repeated 3 times between [2015-12-01 13:58:29.922042] and [2015-12-01 13:58:36.994651] [2015-12-01 14:01:52.535352] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 944.00 secs [2015-12-01 14:01:52.535397] I [MSGID: 109028] [dht-rebalance.c:3612:gf_defrag_status_get] 0-glusterfs: Files migrated: 15, size: 0, lookups: 18, failures: 0, skipped: 0 [2015-12-01 14:01:52.561891] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 944.00 secs [2015-12-01 14:01:52.561893] I [MSGID: 109028] [dht-rebalance.c:3612:gf_defrag_status_get] 0-glusterfs: Files migrated: 15, size: 0, lookups: 18, failures: 0, skipped: 0 [2015-12-01 14:02:52.845777] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 1004.00 secs [2015-12-01 14:02:52.845819] I [MSGID: 109028] [dht-rebalance.c:3612:gf_defrag_status_get] 0-glusterfs: Files migrated: 18, size: 0, lookups: 21, failures: 0, skipped: 0 -========================= [2015-12-01 14:02:52.876059] I [MSGID: 109028] [dht-rebalance.c:3608:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 1004.00 secs [2015-12-01 14:02:52.876061] I [MSGID: 109028] [dht-rebalance.c:3612:gf_defrag_status_get] 0-glusterfs: Files migrated: 18, size: 0, lookups: 21, failures: 0, skipped: 0 ===################### [2015-12-01 14:06:00.312002] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-fsync-replicate-0: Failing GETXATTR on gfid c4038106-02b9-48d1-8c11-6826a6b408bb: split-brain observed. [Input/output error] [2015-12-01 14:06:00.312269] W [MSGID: 109023] [dht-rebalance.c:1281:dht_migrate_file] 0-fsync-tier-dht: Migrate file failed://gtrans.avi: failed to get xattr from fsync-hot-dht (Input/output error) [2015-12-01 14:06:00.314588] W [dict.c:612:dict_ref] (-->/lib64/libglusterfs.so.0(syncop_fsetxattr+0x1a4) [0x7feebcc6fe14] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_fsetxattr+0xcb) [0x7feeaeee93fb] -->/lib64/libglusterfs.so.0(dict_ref+0x79) [0x7feebcc202a9] ) 0-dict: dict is NULL [Invalid argument] [2015-12-01 14:06:00.315395] W [MSGID: 114031] [client-rpc-fops.c:1980:client3_3_fsetxattr_cbk] 0-fsync-client-1: remote operation failed [2015-12-01 14:06:00.316094] W [MSGID: 114031] [client-rpc-fops.c:1980:client3_3_fsetxattr_cbk] 0-fsync-client-6: remote operation failed [No space left on device] [2015-12-01 14:06:00.316206] W [MSGID: 114031] [client-rpc-fops.c:1980:client3_3_fsetxattr_cbk] 0-fsync-client-7: remote operation failed [No space left on device] [2015-12-01 14:06:00.316680] W [MSGID: 109023] [dht-rebalance.c:592:__dht_rebalance_create_dst_file] 0-fsync-tier-dht: //trans.avi: failed to set xattr on fsync-hot-dht (No space left on device) [2015-12-01 14:06:00.322215] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-fsync-replicate-0: Failing SETXATTR on gfid c4038106-02b9-48d1-8c11-6826a6b408bb: split-brain observed. [Input/output error] [2015-12-01 14:06:00.322521] E [MSGID: 109023] [dht-rebalance.c:907:__dht_rebalance_open_src_file] 0-fsync-tier-dht: failed to set xattr on //gtrans.avi in fsync-hot-dht (Input/output error) [2015-12-01 14:06:00.322545] E [MSGID: 109023] [dht-rebalance.c:1306:dht_migrate_file] 0-fsync-tier-dht: Migrate file failed: failed to open //gtrans.avi on fsync-hot-dht [2015-12-01 14:06:00.329383] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//trans.avi) [2015-12-01 14:06:00.354235] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-fsync-tier-dht: data movement attempted from node (fsync-cold-dht) to node (fsync-hot-dht) which does not have required free space for (//gola.avi) [2015-12-01 14:06:00.358709] I [MSGID: 109038] [tier.c:530:tier_migrate_using_query_file] 0-fsync-tier-dht: Reached cycle migration limit.migrated bytes 1486189890 files 2 exit^C [root@yarrow glusterfs]# exit logout Connection to yarrow closed. [root@zod ~]# gluster v heal fsync info Brick yarrow:/dummy/brick104/fsync_hot /gtrans.avi Number of entries: 1 Brick zod:/dummy/brick104/fsync_hot /gtrans.avi Number of entries: 1 Brick yarrow:/dummy/brick105/fsync_hot Number of entries: 0 Brick zod:/dummy/brick105/fsync_hot Number of entries: 0 [root@zod ~]# gluster v info fsync [root@zod ~]# gluster v info fsync Volume Name: fsync Type: Tier Volume ID: 862b28d6-329e-4ad4-8e32-0dd5e62a2670 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: yarrow:/dummy/brick104/fsync_hot Brick2: zod:/dummy/brick104/fsync_hot Brick3: yarrow:/dummy/brick105/fsync_hot Brick4: zod:/dummy/brick105/fsync_hot Cold Tier: Cold Tier Type : Distribute Number of Bricks: 4 Brick5: zod:/rhs/brick1/fsync Brick6: yarrow:/rhs/brick1/fsync Brick7: zod:/rhs/brick2/fsync Brick8: yarrow:/rhs/brick2/fsync Options Reconfigured: diagnostics.brick-log-level: TRACE cluster.tier-mode: test features.ctr-enabled: on performance.readdir-ahead: on Able to reproduce bug on fixed build glusterfs-3.7.5-13.el7rhgs.x86_64. Marking this bug to assigned state upstream patch : http://review.gluster.org/#/c/13102/ Bug verified on build glusterfs-3.7.5-14.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html |