Bug 1360679 - Bricks doesn't come online after reboot [ Brick Full ]
Summary: Bricks doesn't come online after reboot [ Brick Full ]
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Ashish Pandey
QA Contact:
URL:
Whiteboard:
Depends On: 1336764
Blocks: 1364354 1364365
TreeView+ depends on / blocked
 
Reported: 2016-07-27 10:23 UTC by Ashish Pandey
Modified: 2017-03-08 08:32 UTC (History)
6 users (show)

Fixed In Version: 3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1336764
: 1364354 1364365 (view as bug list)
Environment:
Last Closed: 2017-03-08 08:32:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Ashish Pandey 2016-07-27 10:25:52 UTC
Description of problem:
Rebooted the brick2 and started renaming the files in a brick1 which is full. The brick2 didn't came online after the reboot. Errors were seen in the brick logs.
"Creation of unlink directory failed"

sosreport kept at rhsqe-repo.lab.eng.blr.redhat.com://var/www/html/sosreports/<bugid>

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create replica 3 volume and mount the volume on client using fuse.
2. Create files using 
for (( i=1; i <= 50; i++ ))
do
 dd if=/dev/zero of=file$i count=1000 bs=5M status=progress

done
3. After the creation is done. reboot the second brick.
4. start the renaming process of the files to test$i..n
5. When the second brick comes up it fails with below errors.

[2016-05-05 14:37:45.826772] E [MSGID: 113096] [posix.c:6443:posix_create_unlink_dir] 0-arbiter-posix: Creating directory /rhs/brick1/arbiter/.glusterfs/unlink failed [No space left on device]
[2016-05-05 14:37:45.826856] E [MSGID: 113096] [posix.c:6866:init] 0-arbiter-posix: Creation of unlink directory failed
[2016-05-05 14:37:45.826880] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-arbiter-posix: Initialization of volume 'arbiter-posix' failed, review your volfile again
[2016-05-05 14:37:45.826925] E [graph.c:322:glusterfs_graph_init] 0-arbiter-posix: initializing translator failed
[2016-05-05 14:37:45.826943] E [gr

Comment 2 Vijay Bellur 2016-07-27 11:12:47 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#1) for review on master by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2016-07-28 05:39:03 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#2) for review on master by Ashish Pandey (aspandey)

Comment 4 Vijay Bellur 2016-07-28 10:48:11 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#3) for review on master by Ashish Pandey (aspandey)

Comment 5 Vijay Bellur 2016-08-01 10:43:01 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#4) for review on master by Ashish Pandey (aspandey)

Comment 6 Vijay Bellur 2016-08-02 09:40:38 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#5) for review on master by Ashish Pandey (aspandey)

Comment 7 Vijay Bellur 2016-08-02 11:11:28 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#6) for review on master by Ashish Pandey (aspandey)

Comment 8 Vijay Bellur 2016-08-04 11:34:41 UTC
REVIEW: http://review.gluster.org/15030 (posix: Do not move and recreate .glusterfs/unlink directory) posted (#7) for review on master by Ashish Pandey (aspandey)

Comment 9 Vijay Bellur 2016-08-08 09:05:24 UTC
COMMIT: http://review.gluster.org/15030 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit a432e7bc80dee70a48ccc5f04f5574cdce18a3a5
Author: Ashish Pandey <aspandey>
Date:   Wed Jul 27 15:49:25 2016 +0530

    posix: Do not move and recreate .glusterfs/unlink directory
    
    Problem:
    At the time of start of a volume, it is checked if
    .glusterfs/unlink exist or not. If it does, move it
    to landfill and recreate unlink directory. If a volume
    is mounted and we write data on it till we face ENOSPC,
    restart of that volume fails as it will not be able to
    create unlink dir. mkdir will fail with ENOSPC.
    This will not allow volume to restart.
    
    Solution:
    If .glusterfs/unlink directory exist, don't move it to
    landfill. Delete all the entries inside it.
    
    Change-Id: Icde3fb36012f2f01aeb119a2da042f761203c11f
    BUG: 1360679
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/15030
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>


Note You need to log in before you can comment on or make changes to this bug.