Bug 763656 (GLUSTER-1924)
Summary: | If a node is out of disk space "volume is created" but no proper error message | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Harshavardhana <fharshav> |
Component: | cli | Assignee: | tcp |
Status: | CLOSED DUPLICATE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.1.0 | CC: | amarts, cww, gluster-bugs, pkarampu, vijay |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | DNR | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Harshavardhana
2010-10-12 00:57:39 UTC
A couple of things that I would want to know: 1. What do you mean by - "A node is out of disk space" ? Do you mean a backend filesystem which is full? In that case, I did a small test as shown below, and I did not get any error: --------- root@comrade:/export# dd if=/dev/zero of=diskfile count=1 bs=10M 1+0 records in 1+0 records out 10485760 bytes (10 MB) copied, 0.0217879 s, 481 MB/s root@comrade:/export# ls -lh diskfile -rw-r--r-- 1 root root 10M 2011-01-20 09:12 diskfile root@comrade:/export# losetup /dev/loop0 diskfile root@comrade:/export# mkfs -t ext4 /dev/loop0 mke2fs 1.41.11 (14-Mar-2010) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 2560 inodes, 10240 blocks 512 blocks (5.00%) reserved for the super user First data block=1 Maximum filesystem blocks=10485760 2 block groups 8192 blocks per group, 8192 fragments per group 1280 inodes per group Superblock backups stored on blocks: 8193 Writing inode tables: done Creating journal (1024 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 38 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. root@comrade:/export# mount /dev/loop0 /mnt root@comrade:/export# df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/loop0 9.7M 1.1M 8.1M 12% /mnt root@comrade:/export# cd /mnt root@comrade:/mnt# ls lost+found/ root@comrade:/mnt# yes > yesfile yes: standard output: No space left on device yes: write error root@comrade:/mnt# df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 9.7M 9.7M 0 100% /mnt root@comrade:/mnt# gluster volume create diskvol comrade:/mnt Creation of volume diskvol has been successful. Please start the volume to access data. root@comrade:/mnt# gluster volume info Volume Name: diskvol Type: Distribute Status: Created Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: comrade:/mnt root@comrade:/mnt# gluster volume start diskvol Starting volume diskvol has been successful ------------- 2. What do you mean by "clearing space" ? Are you deleting some files in the backend filesystem to get some free space ? (In reply to comment #1) > A couple of things that I would want to know: > > 1. What do you mean by - "A node is out of disk space" ? Do you mean a backend > filesystem which is full? > I think the reference is to a full root filesystem or the partition that contains /etc. glusterd stores state information in /etc/glusterd > > 2. What do you mean by "clearing space" ? Are you deleting some files in the > backend filesystem to get some free space ? I think that is what is implied. If it is the root filesystem that is full, the problem takes a different turn. I'll see if glusterfs can handle it more graciously, but the user could be hitting a million other problems (unrelated to glusterfs) in such a condition. (In reply to comment #3) > If it is the root filesystem that is full, the problem takes a different turn. > I'll see if glusterfs can handle it more graciously, but the user > could be hitting a million other problems (unrelated to glusterfs) in such a > condition. There is a bigger disconnect in the code where the "test2" replies saying unsuccessful during creation but "volume info" says its "Created" . Now you try any other operations on it all of it fails. Checking for disk space and returning a documented error is needed. Sys admin problems are different, i am not even going that far. I just need gluster cli to handle "disk out of space" cases gracefully, not to add more agony. I was able to recreate the problem in my setup today. The root cause is that there is no rollback when a commit operation fails. I think it is a more generic problem. We can handle this in different ways - 1. Have a rollback for management ops. 2. Build capability to handle partially completed commands - For ex. commands that have succeeded in some nodes but not in some - Vijay suggested usage of volume sync. RCA and possible fix: ------------------------ In glusterd_op_create_volume(), the volume currently being created is getting added into the active volumes list before the creation of volfiles is checked for successful completion. Fix is to move the addition of the volume into the active list after all the checks pass. This bug might be related to bug 763620. *** This bug has been marked as a duplicate of bug 1888 *** Error is handled now. Hence no more documentation required |