Bug 1333341

Summary: Bricks didn't become online after reboot. [Disk Full ]
Product: [Community] GlusterFS Reporter: Karan Sandha <ksandha>
Component: posixAssignee: Ashish Pandey <aspandey>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.11CC: aspandey, bugs, pkarampu, ravishankar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1361517 (view as bug list) Environment:
Last Closed: 2017-03-08 11:01:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1361517    

Description Karan Sandha 2016-05-05 10:25:13 UTC
Description of problem:
Rebooted the brick2 and started renaming the files in a brick1 which is full. The brick2 didn't came online after the reboot. Errors were seen in the brick logs.
"Creation of unlink directory failed"

sosreport kept at rhsqe-repo.lab.eng.blr.redhat.com://var/www/html/sosreports/<bugid>

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create replica 3 volume and mount the volume on client using fuse.
2. Create files using 
for (( i=1; i <= 50; i++ ))
do
 dd if=/dev/zero of=file$i count=1000 bs=5M status=progress

done
3. After the creation is done. reboot the second brick.
4. start the renaming process of the files to test$i..n
5. When the second brick comes up it fails with below errors.

[2016-05-05 14:37:45.826772] E [MSGID: 113096] [posix.c:6443:posix_create_unlink_dir] 0-arbiter-posix: Creating directory /rhs/brick1/arbiter/.glusterfs/unlink failed [No space left on device]
[2016-05-05 14:37:45.826856] E [MSGID: 113096] [posix.c:6866:init] 0-arbiter-posix: Creation of unlink directory failed
[2016-05-05 14:37:45.826880] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-arbiter-posix: Initialization of volume 'arbiter-posix' failed, review your volfile again
[2016-05-05 14:37:45.826925] E [graph.c:322:glusterfs_graph_init] 0-arbiter-posix: initializing translator failed
[2016-05-05 14:37:45.826943] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2016-05-05 14:37:45.828349] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x331) [0x7f6ba63797d1] -->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x120) [0x7f6ba6374150] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x69) [0x7f6ba6373739] ) 0-: received signum (0), shutting down


 
Actual results:
[root@dhcp43-167 arbiter]# gluster volume status
Status of volume: arbiter
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick1/arbiter                             N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49157     0          Y       2120 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick1/arbiter                            49156     0          Y       2094 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 
 
Task Status of Volume arbiter
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: arbiternfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs
/brick2/arbiternfs                          N/A       N/A        N       N/A  
Brick dhcp43-142.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49158     0          Y       2128 
Brick dhcp43-167.lab.eng.blr.redhat.com:/rh
s/brick2/arbiternfs                         49157     0          Y       2109 
NFS Server on localhost                     2049      0          Y       2679 
Self-heal Daemon on localhost               N/A       N/A        Y       3172 
NFS Server on dhcp42-58.lab.eng.blr.redhat.
com                                         2049      0          Y       2195 
Self-heal Daemon on dhcp42-58.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       2816 
NFS Server on dhcp43-142.lab.eng.blr.redhat
.com                                        2049      0          Y       3072 
Self-heal Daemon on dhcp43-142.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3160 
 
Task Status of Volume arbiternfs
------------------------------------------------------------------------------
There are no active volume tasks
 
*************************************************************
[root@dhcp43-142 arbiter]# gluster volume heal arbiternfs info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick2/arbiternfs
/file4 
/file5 
Status: Connected
Number of entries: 2

[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# 
[root@dhcp43-142 arbiter]# gluster volume heal arbiter info
Brick dhcp42-58.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
Status: Transport endpoint is not connected
Number of entries: -

Brick dhcp43-142.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ - Possibly undergoing heal

Status: Connected
Number of entries: 1

Brick dhcp43-167.lab.eng.blr.redhat.com:/rhs/brick1/arbiter
/ 
Status: Connected
Number of entries: 1

[root@dhcp43-142 arbiter]# 

Expected results:
The bricks should be up and running and file names should have been renamed.

Additional info:

Comment 1 Kaushal 2017-03-08 11:01:40 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.