Bug 1273728 - Crash while bringing down the bricks and self heal
Crash while bringing down the bricks and self heal
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
Unspecified Unspecified
urgent Severity urgent
: ---
: RHGS 3.1.2
Assigned To: Joseph Elwin Fernandes
: ZStream
Depends On:
Blocks: 1260783 1260923
  Show dependency treegraph
Reported: 2015-10-21 02:41 EDT by Bhaskarakiran
Modified: 2016-11-23 18:12 EST (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.7.5-7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-03-01 00:43:55 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Bhaskarakiran 2015-10-21 02:41:34 EDT
Description of problem:

There is no core file though but tier.log shows the crash

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-10-21 06:12:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.5

Steps which are done:

Created a 1x(8+4) disperse volume and attached a rep 2 tier
Started IO (files creation and linux untar)
Brought down tier bricks and ec bricks one at a time and triggered heal.
checked tier status for promotions and demotions.

Version-Release number of selected component (if applicable):

How reproducible:
Seen once 

Steps to Reproduce:
As in description

Actual results:

Expected results:
No crash should be seen

Additional info:
Attaching the tier log file.
Comment 3 Joseph Elwin Fernandes 2015-11-24 06:19:20 EST
1) Tested the following but couldnt reproduce this.
   a) Created a volume with 1000 files in it already
   b) Attached a hot tier and created another 1000 files.
[root@fedora1 test]# gluster vol info
Volume Name: test
Type: Tier
Volume ID: bb7a3b77-063d-4334-9e60-862ce4f90bd0
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: fedora1:/home/ssd/small_brick3/s3
Brick2: fedora1:/home/ssd/small_brick2/s2
Brick3: fedora1:/home/ssd/small_brick1/s1
Brick4: fedora1:/home/ssd/small_brick0/s0
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: fedora1:/home/disk/d1
Brick6: fedora1:/home/disk/d2
Brick7: fedora1:/home/disk/d3
Brick8: fedora1:/home/disk/d4
Brick9: fedora1:/home/disk/d5
Brick10: fedora1:/home/disk/d6
Options Reconfigured:
diagnostics.brick-log-level: TRACE
cluster.self-heal-daemon: enable
cluster.disperse-self-heal-daemon: enable
cluster.tier-mode: test
features.record-counters: on
features.ctr-enabled: on
performance.readdir-ahead: on
[root@fedora1 test]# 

 c) during promotion and demotion stopped and restarted EC bricks.
    Didnt find any crash.

2) The code path where this crash was seen previously has completely changed in this patch https://code.engineering.redhat.com/gerrit/#/c/61006/
Similar kind of crashes where seen previously in
And the above fix is supposed to fix these crashes.

Changing the status to ON_QA.
Comment 6 errata-xmlrpc 2016-03-01 00:43:55 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.