Description of problem: This issue is raised to track one of the two tier daemon crashes reported in BZ#1288003. This bug will be used to track this core -- dhcp37-111.core.5424. Pasting description from BZ#1288003 Huge number of 'demotion failed' error messages are seen in few nodes. No demotion is seen although there are multiple files eligible for demotion crossing watermark levels. Hot tier size --> 100Gb and watermarks are set so that low watermark is a 10Gb and high watermark is at 30Gb. Currently, size of hot tier has crossed 30Gb and no files are being demoted yet. Single distributed dispersed volume with 12 bricks was configured to which a distributed replicated(4 bricks) hot tier was attached. Crash was seen few hours after configuring the volume and doing some IO. please note that the gluster cluster is configured on rhel 6.7. sosreport and core files can be found here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1288003/ [root@dhcp37-121 ~]# gluster vol info Volume Name: tiering-test-vol-01 Type: Tier Volume ID: 8afb30c0-bd3e-4248-b4ba-8a6bfe8d237e Status: Started Number of Bricks: 16 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.37.111:/rhs/brick4/leg1 Brick2: 10.70.37.154:/rhs/brick4/leg1 Brick3: 10.70.37.121:/rhs/brick4/leg1 Brick4: 10.70.37.191:/rhs/brick4/leg1 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick5: 10.70.37.191:/rhs/brick1/leg1 Brick6: 10.70.37.121:/rhs/brick1/leg1 Brick7: 10.70.37.154:/rhs/brick1/leg1 Brick8: 10.70.37.111:/rhs/brick1/leg1 Brick9: 10.70.37.140:/rhs/brick15/leg1 Brick10: 10.70.37.132:/rhs/brick15/leg1 Brick11: 10.70.37.180:/rhs/brick15/leg1 Brick12: 10.70.37.48:/rhs/brick15/leg1 Brick13: 10.70.37.191:/rhs/brick2/leg1 Brick14: 10.70.37.121:/rhs/brick2/leg1 Brick15: 10.70.37.154:/rhs/brick2/leg1 Brick16: 10.70.37.111:/rhs/brick2/leg1 Options Reconfigured: cluster.tier-promote-frequency: 7200 cluster.tier-max-files: 1000 cluster.write-freq-threshold: 3 cluster.watermark-hi: 30 cluster.watermark-low: 10 cluster.tier-demote-frequency: 120 cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on <<<<<<<BT from dhcp37-111.core.5424>>>>>> #0 0x00007f6d271ee625 in raise () from /lib64/libc.so.6 #1 0x00007f6d271efe05 in abort () from /lib64/libc.so.6 #2 0x00007f6d2722c537 in __libc_message () from /lib64/libc.so.6 #3 0x00007f6d27231f4e in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f6d27232353 in malloc_consolidate () from /lib64/libc.so.6 #5 0x00007f6d27235c28 in _int_malloc () from /lib64/libc.so.6 #6 0x00007f6d27236b1c in malloc () from /lib64/libc.so.6 #7 0x00007f6d2888b7b2 in __gf_default_malloc () at mem-pool.h:106 #8 glusterfs_lkowner_buf_get () at globals.c:329 #9 0x00007f6d28870188 in lkowner_utoa (lkowner=0x7f6d2625f970) at common-utils.c:2407 #10 0x00007f6d2888ece2 in gf_proc_dump_call_stack (call_stack=0x7f6d2625f718, key_buf=<value optimized out>) at stack.c:167 #11 0x00007f6d2888f04e in gf_proc_dump_pending_frames (call_pool=0x7f6d299727a0) at stack.c:210 #12 0x00007f6d2888dafb in gf_proc_dump_info (signum=<value optimized out>, ctx=0x7f6d29950010) at statedump.c:825 #13 0x00007f6d28d1d10d in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:2020 #14 0x00007f6d2793aa51 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f6d272a493d in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): How reproducible: No pattern yet found Steps to Reproduce: 1. configure 8 node gluster cluster 2. configure 2 x (4 + 2) distributed dispersed volume 3. Attach a dist-repl hot tier 4. Made parameter changes w.r.t tiering to the vol. values are set as shown in the above output of vol info 4. Run IO Actual results: demotion failures seen along with crash Expected results: No crashes or promotion/demotion failure should be seen Additional info: