Description of problem: ======================== There is no core file though but tier.log shows the crash pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-10-21 06:12:00 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.5 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f1d10368002] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f1d1038448d] /lib64/libc.so.6(+0x35670)[0x7f1d0ea56670] /lib64/libc.so.6(+0x1628b1)[0x7f1d0eb838b1] /usr/lib64/libgfdb.so.0(gf_sql_query_function+0x101)[0x7f1d018d7481] /usr/lib64/libgfdb.so.0(gf_sqlite3_find_unchanged_for_time+0xd0)[0x7f1d018d8f70] /usr/lib64/libgfdb.so.0(find_unchanged_for_time+0x41)[0x7f1d018d2da1] /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x561f5)[0x7f1d01f021f5] /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x56d23)[0x7f1d01f02d23] /usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x599d8)[0x7f1d01f059d8] /lib64/libpthread.so.0(+0x7dc5)[0x7f1d0f1d0dc5] /lib64/libc.so.6(clone+0x6d)[0x7f1d0eb171cd] --------- Steps which are done: Created a 1x(8+4) disperse volume and attached a rep 2 tier Started IO (files creation and linux untar) Brought down tier bricks and ec bricks one at a time and triggered heal. checked tier status for promotions and demotions. Version-Release number of selected component (if applicable): ============================================================= 3.7.5-0.3 How reproducible: ================= Seen once Steps to Reproduce: As in description Actual results: =============== Crash Expected results: ================= No crash should be seen Additional info: ================ Attaching the tier log file.
1) Tested the following but couldnt reproduce this. a) Created a volume with 1000 files in it already b) Attached a hot tier and created another 1000 files. [root@fedora1 test]# gluster vol info Volume Name: test Type: Tier Volume ID: bb7a3b77-063d-4334-9e60-862ce4f90bd0 Status: Started Number of Bricks: 10 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: fedora1:/home/ssd/small_brick3/s3 Brick2: fedora1:/home/ssd/small_brick2/s2 Brick3: fedora1:/home/ssd/small_brick1/s1 Brick4: fedora1:/home/ssd/small_brick0/s0 Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick5: fedora1:/home/disk/d1 Brick6: fedora1:/home/disk/d2 Brick7: fedora1:/home/disk/d3 Brick8: fedora1:/home/disk/d4 Brick9: fedora1:/home/disk/d5 Brick10: fedora1:/home/disk/d6 Options Reconfigured: diagnostics.brick-log-level: TRACE cluster.self-heal-daemon: enable cluster.disperse-self-heal-daemon: enable cluster.tier-mode: test features.record-counters: on features.ctr-enabled: on performance.readdir-ahead: on [root@fedora1 test]# c) during promotion and demotion stopped and restarted EC bricks. Didnt find any crash. 2) The code path where this crash was seen previously has completely changed in this patch https://code.engineering.redhat.com/gerrit/#/c/61006/ Similar kind of crashes where seen previously in https://bugzilla.redhat.com/show_bug.cgi?id=1258144 https://bugzilla.redhat.com/show_bug.cgi?id=1273347 And the above fix is supposed to fix these crashes. Changing the status to ON_QA.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html