1333341 – Bricks didn't become online after reboot. [Disk Full ]

Bug 1333341 - Bricks didn't become online after reboot. [Disk Full ]

Summary: Bricks didn't become online after reboot. [Disk Full ]

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	posix
Sub Component:
Version:	3.7.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ashish Pandey
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1361517
TreeView+	depends on / blocked

Reported:	2016-05-05 10:25 UTC by Karan Sandha
Modified:	2017-03-08 11:01 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Clones:	1361517 (view as bug list)
Environment:
Last Closed:	2017-03-08 11:01:40 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Karan Sandha 2016-05-05 10:25:13 UTC

Description of problem:
Rebooted the brick2 and started renaming the files in a brick1 which is full. The brick2 didn't came online after the reboot. Errors were seen in the brick logs.
"Creation of unlink directory failed"

sosreport kept at rhsqe-repo.lab.eng.blr.redhat.com://var/www/html/sosreports/<bugid>

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create replica 3 volume and mount the volume on client using fuse.
2. Create files using 
for (( i=1; i <= 50; i++ ))
do
 dd if=/dev/zero of=file$i count=1000 bs=5M status=progress

done
3. After the creation is done. reboot the second brick.
4. start the renaming process of the files to test$i..n
5. When the second brick comes up it fails with below errors.

[2016-05-05 14:37:45.826772] E [MSGID: 113096] [posix.c:6443:posix_create_unlink_dir] 0-arbiter-posix: Creating directory /rhs/brick1/arbiter/.glusterfs/unlink failed [No space left on device]
[2016-05-05 14:37:45.826856] E [MSGID: 113096] [posix.c:6866:init] 0-arbiter-posix: Creation of unlink directory failed
[2016-05-05 14:37:45.826880] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-arbiter-posix: Initialization of volume 'arbiter-posix' failed, review your volfile again
[2016-05-05 14:37:45.826925] E [graph.c:322:glusterfs_graph_init] 0-arbiter-posix: initializing translator failed
[2016-05-05 14:37:45.826943] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2016-05-05 14:37:45.828349] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x331) [0x7f6ba63797d1] -->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x120) [0x7f6ba6374150] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x69) [0x7f6ba6373739] ) 0-: received signum (0), shutting down


 
Actual results:
[root@dhcp43-167 arbiter]# gluster volume status
Status of volume: arbiter
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick1/arbiter                             N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49157     0          Y       2120 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49156     0          Y       2094 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 
 
Task Status of Volume arbiter
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: arbiternfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick2/arbiternfs                          N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49158     0          Y       2128 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49157     0          Y       2109 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 
 
Task Status of Volume arbiternfs
------------------------------------------------------------------------------
There are no active volume tasks
 
*************************************************************
[root@dhcp43-142 arbiter]# gluster volume heal arbiternfs info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# gluster volume heal arbiter info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ - Possibly undergoing heal

Status: Connected
Number of entries: 1

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ 
Status: Connected
Number of entries: 1

[root@dhcp43-142 arbiter]# 

Expected results:
The bricks should be up and running and file names should have been renamed.

Additional info:

Comment 1 Kaushal 2017-03-08 11:01:40 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.