Bug 763656 (GLUSTER-1924)

Summary: If a node is out of disk space "volume is created" but no proper error message
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: cliAssignee: tcp
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: amarts, cww, gluster-bugs, pkarampu, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harshavardhana 2010-10-12 00:57:39 UTC
[root@platform ~]# gluster volume create test2 10.1.10.202:/storage
Creation of volume test2 has been unsuccessful

[root@platform ~]# gluster volume info test2 
Volume Name: test2
Type: Distribute
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.1.10.202:/storage

After clearing up the disk space we can't clear and reuse the same storage brick back into a new volume

[root@platform ~]# gluster volume create test5 10.1.10.202:/storage
Creation of volume test5 has been unsuccessful
Brick: 10.1.10.202:/storage already in use
[root@platform ~]# 

Deleting the same fails
[root@platform ~]# gluster volume delete test2
Deleting volume will erase all information about the volume.Do you want to continue? (y/n) y
Deleting volume test2 has been unsuccessful
Deleting Volume test2 failed

Comment 1 tcp 2011-01-20 00:57:42 UTC
A couple of things that I would want to know:

1. What do you mean by - "A node is out of disk space" ? Do you mean a backend filesystem which is full?

In that case, I did a small test as shown below, and I did not get any error:

---------
root@comrade:/export# dd if=/dev/zero of=diskfile count=1 bs=10M
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.0217879 s, 481 MB/s

root@comrade:/export# ls -lh diskfile 
-rw-r--r-- 1 root root 10M 2011-01-20 09:12 diskfile

root@comrade:/export# losetup /dev/loop0 diskfile 

root@comrade:/export# mkfs -t ext4 /dev/loop0
mke2fs 1.41.11 (14-Mar-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
2560 inodes, 10240 blocks
512 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=10485760
2 block groups
8192 blocks per group, 8192 fragments per group
1280 inodes per group
Superblock backups stored on blocks: 
        8193

Writing inode tables: done                            
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

root@comrade:/export# mount /dev/loop0 /mnt

root@comrade:/export# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            9.7M  1.1M  8.1M  12% /mnt

root@comrade:/export# cd /mnt

root@comrade:/mnt# ls
lost+found/

root@comrade:/mnt# yes > yesfile
yes: standard output: No space left on device
yes: write error

root@comrade:/mnt# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            9.7M  9.7M     0 100% /mnt


root@comrade:/mnt# gluster volume create diskvol comrade:/mnt
Creation of volume diskvol has been successful. Please start the volume to access data.

root@comrade:/mnt# gluster volume info

Volume Name: diskvol
Type: Distribute
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: comrade:/mnt
root@comrade:/mnt# gluster volume start diskvol
Starting volume diskvol has been successful
-------------

2. What do you mean by "clearing space" ? Are you deleting some files in the backend filesystem to get some free space ?

Comment 2 Vijay Bellur 2011-01-20 01:14:45 UTC
(In reply to comment #1)
> A couple of things that I would want to know:
> 
> 1. What do you mean by - "A node is out of disk space" ? Do you mean a backend
> filesystem which is full?
> 

I think the reference is to a full root filesystem or the partition that contains /etc. glusterd stores state information in /etc/glusterd


> 
> 2. What do you mean by "clearing space" ? Are you deleting some files in the
> backend filesystem to get some free space ?

I think that is what is implied.

Comment 3 tcp 2011-01-20 08:18:34 UTC
If it is the root filesystem that is full, the problem takes a different turn.
I'll see if glusterfs can handle it more graciously, but the user
could be hitting a million other problems (unrelated to glusterfs) in such a condition.

Comment 4 Harshavardhana 2011-01-26 22:35:13 UTC
(In reply to comment #3)
> If it is the root filesystem that is full, the problem takes a different turn.
> I'll see if glusterfs can handle it more graciously, but the user
> could be hitting a million other problems (unrelated to glusterfs) in such a
> condition.


There is a bigger disconnect in the code where the "test2" replies saying unsuccessful during creation but "volume info" says its "Created" . Now you try any other operations on it all of it fails. Checking for disk space and returning a documented error is needed. 

Sys admin problems are different, i am not even going that far. I just need gluster cli to handle "disk out of space" cases gracefully, not to add more agony.

Comment 5 tcp 2011-02-01 10:18:42 UTC
I was able to recreate the problem in my setup today. The root cause is that there is no rollback when a commit operation fails. I think it is a more generic problem. We can handle this in different ways -

1. Have a rollback for management ops.
2. Build capability to handle partially completed commands - For ex. commands that have succeeded in some nodes but not in some - Vijay suggested usage of volume sync.

Comment 6 tcp 2011-03-25 09:18:24 UTC
RCA and possible fix:
------------------------

In glusterd_op_create_volume(), the volume currently being created is getting added into the active volumes list before the creation of volfiles is checked for successful completion.

Fix is to move the addition of the volume into the active list after all the checks pass.

This bug might be related to bug 763620.

Comment 7 Pranith Kumar K 2011-03-25 23:58:21 UTC

*** This bug has been marked as a duplicate of bug 1888 ***

Comment 8 Amar Tumballi 2011-04-13 05:21:44 UTC
Error is handled now. Hence no more documentation required