Bug 1273728

Summary: Crash while bringing down the bricks and self heal
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Bhaskarakiran <byarlaga>
Component: tierAssignee: Joseph Elwin Fernandes <josferna>
Status: CLOSED ERRATA QA Contact: Neha <nerawat>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.1CC: asrivast, dlambrig, mzywusko, nchilaka, rhs-bugs, sankarshan, sashinde, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.5-7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-01 05:43:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1260783, 1260923    

Description Bhaskarakiran 2015-10-21 06:41:34 UTC
Description of problem:
========================

There is no core file though but tier.log shows the crash

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-10-21 06:12:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.5
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f1d10368002]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f1d1038448d]
/lib64/libc.so.6(+0x35670)[0x7f1d0ea56670]
/lib64/libc.so.6(+0x1628b1)[0x7f1d0eb838b1]
/usr/lib64/libgfdb.so.0(gf_sql_query_function+0x101)[0x7f1d018d7481]
/usr/lib64/libgfdb.so.0(gf_sqlite3_find_unchanged_for_time+0xd0)[0x7f1d018d8f70]
/usr/lib64/libgfdb.so.0(find_unchanged_for_time+0x41)[0x7f1d018d2da1]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x561f5)[0x7f1d01f021f5]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x56d23)[0x7f1d01f02d23]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x599d8)[0x7f1d01f059d8]
/lib64/libpthread.so.0(+0x7dc5)[0x7f1d0f1d0dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f1d0eb171cd]
---------

Steps which are done:

Created a 1x(8+4) disperse volume and attached a rep 2 tier
Started IO (files creation and linux untar)
Brought down tier bricks and ec bricks one at a time and triggered heal.
checked tier status for promotions and demotions.


Version-Release number of selected component (if applicable):
=============================================================
3.7.5-0.3

How reproducible:
=================
Seen once 

Steps to Reproduce:
As in description

Actual results:
===============
Crash

Expected results:
=================
No crash should be seen

Additional info:
================
Attaching the tier log file.

Comment 3 Joseph Elwin Fernandes 2015-11-24 11:19:20 UTC
1) Tested the following but couldnt reproduce this.
   a) Created a volume with 1000 files in it already
   b) Attached a hot tier and created another 1000 files.
[root@fedora1 test]# gluster vol info
 
Volume Name: test
Type: Tier
Volume ID: bb7a3b77-063d-4334-9e60-862ce4f90bd0
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: fedora1:/home/ssd/small_brick3/s3
Brick2: fedora1:/home/ssd/small_brick2/s2
Brick3: fedora1:/home/ssd/small_brick1/s1
Brick4: fedora1:/home/ssd/small_brick0/s0
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: fedora1:/home/disk/d1
Brick6: fedora1:/home/disk/d2
Brick7: fedora1:/home/disk/d3
Brick8: fedora1:/home/disk/d4
Brick9: fedora1:/home/disk/d5
Brick10: fedora1:/home/disk/d6
Options Reconfigured:
diagnostics.brick-log-level: TRACE
cluster.self-heal-daemon: enable
cluster.disperse-self-heal-daemon: enable
cluster.tier-mode: test
features.record-counters: on
features.ctr-enabled: on
performance.readdir-ahead: on
[root@fedora1 test]# 

 c) during promotion and demotion stopped and restarted EC bricks.
    Didnt find any crash.

2) The code path where this crash was seen previously has completely changed in this patch https://code.engineering.redhat.com/gerrit/#/c/61006/
Similar kind of crashes where seen previously in
https://bugzilla.redhat.com/show_bug.cgi?id=1258144
https://bugzilla.redhat.com/show_bug.cgi?id=1273347
And the above fix is supposed to fix these crashes.

Changing the status to ON_QA.

Comment 6 errata-xmlrpc 2016-03-01 05:43:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html