1336764 – Bricks doesn't come online after reboot [ Brick Full ]

Bug 1336764 - Bricks doesn't come online after reboot [ Brick Full ]

Summary: Bricks doesn't come online after reboot [ Brick Full ]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	posix
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Ashish Pandey
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1361517 (view as bug list)
Depends On:
Blocks:	1351522 1360679 1364354 1364365
TreeView+	depends on / blocked

Reported:	2016-05-17 12:14 UTC by Karan Sandha
Modified:	2017-03-23 05:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1360679 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:31:15 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Karan Sandha 2016-05-17 12:14:01 UTC

************** The bug is a clone 1333341 (Upstream)


Description of problem:
Rebooted the brick2 and started renaming the files in a brick1 which is full. The brick2 didn't came online after the reboot. Errors were seen in the brick logs.
"Creation of unlink directory failed"

sosreport kept at rhsqe-repo.lab.eng.blr.redhat.com://var/www/html/sosreports/<bugid>

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create replica 3 volume and mount the volume on client using fuse.
2. Create files using 
for (( i=1; i <= 50; i++ ))
do
 dd if=/dev/zero of=file$i count=1000 bs=5M status=progress

done
3. After the creation is done. reboot the second brick.
4. start the renaming process of the files to test$i..n
5. When the second brick comes up it fails with below errors.

[2016-05-05 14:37:45.826772] E [MSGID: 113096] [posix.c:6443:posix_create_unlink_dir] 0-arbiter-posix: Creating directory /rhs/brick1/arbiter/.glusterfs/unlink failed [No space left on device]
[2016-05-05 14:37:45.826856] E [MSGID: 113096] [posix.c:6866:init] 0-arbiter-posix: Creation of unlink directory failed
[2016-05-05 14:37:45.826880] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-arbiter-posix: Initialization of volume 'arbiter-posix' failed, review your volfile again
[2016-05-05 14:37:45.826925] E [graph.c:322:glusterfs_graph_init] 0-arbiter-posix: initializing translator failed
[2016-05-05 14:37:45.826943] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2016-05-05 14:37:45.828349] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x331) [0x7f6ba63797d1] -->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x120) [0x7f6ba6374150] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x69) [0x7f6ba6373739] ) 0-: received signum (0), shutting down


 
Actual results:
[root@dhcp43-167 arbiter]# gluster volume status
Status of volume: arbiter
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick1/arbiter                             N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49157     0          Y       2120 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49156     0          Y       2094 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 
 
Task Status of Volume arbiter
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: arbiternfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick2/arbiternfs                          N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49158     0          Y       2128 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49157     0          Y       2109 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 
 
Task Status of Volume arbiternfs
------------------------------------------------------------------------------
There are no active volume tasks
 
*************************************************************
[root@dhcp43-142 arbiter]# gluster volume heal arbiternfs info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# gluster volume heal arbiter info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ - Possibly undergoing heal

Status: Connected
Number of entries: 1

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ 
Status: Connected
Number of entries: 1

[root@dhcp43-142 arbiter]# 

Expected results:
The bricks should be up and running and file names should have been renamed.

Additional info:

Comment 6 Pranith Kumar K 2016-08-02 08:10:54 UTC

*** Bug 1361517 has been marked as a duplicate of this bug. ***

Comment 7 Atin Mukherjee 2016-08-09 04:17:37 UTC

Upstream mainline patch http://review.gluster.org/15030 is merged.

Comment 9 Atin Mukherjee 2016-09-17 14:33:15 UTC

Upstream mainline : http://review.gluster.org/15030
Upstream 3.8 : http://review.gluster.org/15093

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 12 Karan Sandha 2016-10-04 11:47:30 UTC

Verified the bug on 

[root@dhcp47-144 brick0]# gluster --version
glusterfs 3.8.4 built on Sep 29 2016 12:20:30
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

Steps:-
1. Create replica 3 volume and mount the volume on client using fuse.
2. Create files using 
for (( i=1; i <= 50; i++ ))
do
 dd if=/dev/zero of=file$i count=1000 bs=5M status=progress

done
3. After the creation is done. reboot the second brick.
4. start the renaming process of the files to test$i..n
5. When the second brick comes up it fails with below errors.

No ERRORS and Warning messages were observed 
Functionality of the bricks were as expected.

Comment 14 errata-xmlrpc 2017-03-23 05:31:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.