Bug 989753 - Inconsistent glusterfs nfs client errors, truncated files, and zero-byte files as quota hard limit is reached
Inconsistent glusterfs nfs client errors, truncated files, and zero-byte file...
Status: CLOSED DUPLICATE of bug 998893
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
Sudhir D
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-29 16:41 EDT by Dustin Black
Modified: 2014-01-03 03:41 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-30 04:35:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dustin Black 2013-07-29 16:41:03 EDT
Description of problem:
Writing to a volume with a quota set will result in inconsistent client errors and file output when approaching the quota hard limit.

Version-Release number of selected component (if applicable):


How reproducible:
A problem of similar symptoms can be reproduced mostly consitently

Steps to Reproduce:
1. Create a 3x2 volume across 6 nodes
2. Set a 1GB hard limit on the volume
3. Mount the volume on a client via NFS
4. Run a looped dd to new files
   # for i in {1..100}; do sudo dd if=/dev/urandom of=/mnt/test1/file${i} bs=1024k count=100; done

Actual results:
Some files appear from the client to be "skipped" in creation with dd. Client receives "Unknown error 527" messages. At times, zero byte files are created.

Expected results:
Upon hitting the quota hard limit, the "quota exceeded" message should be delivered to the client, and no new writes should be allowed beyond that point.



Additional info:

write w/ for loop from nfs client

$ for i in {1..100}; do sudo dd if=/dev/urandom of=/mnt/test1/file${i} bs=1024k count=100; done
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 18.8092 s, 5.6 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.9144 s, 8.8 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.98 s, 8.8 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.7062 s, 9.0 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.0241 s, 9.5 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.6756 s, 8.3 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.3488 s, 8.5 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.1213 s, 8.7 MB/s
dd: closing output file ‘/mnt/test1/file9’: Disk quota exceeded
dd: opening ‘/mnt/test1/file10’: Disk quota exceeded
dd: closing output file ‘/mnt/test1/file11’: Disk quota exceeded
dd: closing output file ‘/mnt/test1/file12’: Unknown error 527
dd: opening ‘/mnt/test1/file13’: Disk quota exceeded
dd: opening ‘/mnt/test1/file14’: Disk quota exceeded
dd: opening ‘/mnt/test1/file15’: Disk quota exceeded
dd: opening ‘/mnt/test1/file16’: Disk quota exceeded
...


file10 is not created (from client's perspective), but file11 is complete and file12 is truncated
$ ls -ltrh /mnt/test1
total 1.1G
-rw-r--r--. 1 root root 100M Jul 29 12:38 file1
-rw-r--r--. 1 root root 100M Jul 29 12:39 file2
-rw-r--r--. 1 root root 100M Jul 29 12:39 file3
-rw-r--r--. 1 root root 100M Jul 29 12:39 file4
-rw-r--r--. 1 root root 100M Jul 29 12:39 file5
-rw-r--r--. 1 root root 100M Jul 29 12:39 file6
-rw-r--r--. 1 root root 100M Jul 29 12:40 file7
-rw-r--r--. 1 root root 100M Jul 29 12:40 file8
-rw-r--r--. 1 root root 100M Jul 29 12:40 file9
-rw-r--r--. 1 root root 100M Jul 29 12:40 file11
-rw-r--r--. 1 root root  26M Jul 29 12:40 file12


In another run, "Unknown error 527 is encountered, and files{1..12} are created, with 10 and 11 truncated and 12 a zero-byte file.

$ for i in {1..100}; do sudo dd if=/dev/urandom of=/mnt/test1/file${i} bs=1024k count=100; done
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 20.1309 s, 5.2 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.2676 s, 8.5 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.2514 s, 9.3 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.7113 s, 8.2 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.3967 s, 8.5 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.0581 s, 8.7 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.2427 s, 9.3 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.5575 s, 9.1 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 13.8252 s, 7.6 MB/s
dd: closing output file ‘/mnt/test1/file10’: Unknown error 527
dd: closing output file ‘/mnt/test1/file11’: Unknown error 527
dd: closing output file ‘/mnt/test1/file12’: Unknown error 527
dd: opening ‘/mnt/test1/file13’: Disk quota exceeded
dd: opening ‘/mnt/test1/file14’: Disk quota exceeded
dd: opening ‘/mnt/test1/file15’: Disk quota exceeded
...


$ ls -ltrh /mnt/test1
total 1.1G
-rw-r--r--. 1 root root 100M Jul 29 12:45 file1
-rw-r--r--. 1 root root 100M Jul 29 12:45 file2
-rw-r--r--. 1 root root 100M Jul 29 12:45 file3
-rw-r--r--. 1 root root 100M Jul 29 12:45 file4
-rw-r--r--. 1 root root 100M Jul 29 12:45 file5
-rw-r--r--. 1 root root 100M Jul 29 12:46 file6
-rw-r--r--. 1 root root 100M Jul 29 12:46 file7
-rw-r--r--. 1 root root 100M Jul 29 12:46 file8
-rw-r--r--. 1 root root 100M Jul 29 12:46 file9
-rw-r--r--. 1 root root  76M Jul 29 12:46 file10
-rw-r--r--. 1 root root  95M Jul 29 12:47 file11
-rw-r--r--. 1 root root    0 Jul 29 12:47 file12


In another run, while heavily monitoring quota with 'gluster volume quota <volname> list' simultaneously every 6 seconds on all 6 nodes (not sure if that's relevant), I've experienced receiving *only* "Unknown error 527" errors at the client (no "quota exceeded" errors) and resulting zero-byte files for each error. In this instance, I can watch a new file grow in size, and then suddenly get truncated to zero bytes.


$ for i in {1..100}; do sudo dd if=/dev/urandom of=/mnt/test1/file${i} bs=1024k count=100; done
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 21.8209 s, 4.8 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.5002 s, 8.4 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.7088 s, 9.0 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 12.5297 s, 8.4 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 13.6834 s, 7.7 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 15.8345 s, 6.6 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.8176 s, 8.9 MB/s
dd: closing output file ‘/mnt/test1/file8’: Disk quota exceeded
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.8152 s, 8.9 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 11.2016 s, 9.4 MB/s
dd: closing output file ‘/mnt/test1/file11’: Unknown error 527
dd: closing output file ‘/mnt/test1/file12’: Unknown error 527
dd: closing output file ‘/mnt/test1/file13’: Unknown error 527
dd: closing output file ‘/mnt/test1/file14’: Unknown error 527
dd: closing output file ‘/mnt/test1/file15’: Unknown error 527
dd: closing output file ‘/mnt/test1/file16’: Unknown error 527
dd: closing output file ‘/mnt/test1/file17’: Unknown error 527
dd: closing output file ‘/mnt/test1/file18’: Unknown error 527
dd: writing ‘/mnt/test1/file19’: Unknown error 527
54+0 records in
53+0 records out
55574528 bytes (56 MB) copied, 4.50534 s, 12.3 MB/s
dd: closing output file ‘/mnt/test1/file20’: Unknown error 527
dd: closing output file ‘/mnt/test1/file21’: Unknown error 527
dd: closing output file ‘/mnt/test1/file22’: Unknown error 527
dd: closing output file ‘/mnt/test1/file23’: Unknown error 527
dd: closing output file ‘/mnt/test1/file24’: Unknown error 527
dd: closing output file ‘/mnt/test1/file25’: Unknown error 527
dd: closing output file ‘/mnt/test1/file26’: Unknown error 527
dd: closing output file ‘/mnt/test1/file27’: Unknown error 527
dd: writing ‘/mnt/test1/file28’: Unknown error 527
17+0 records in
16+0 records out
16777216 bytes (17 MB) copied, 1.35491 s, 12.4 MB/s
dd: closing output file ‘/mnt/test1/file29’: Unknown error 527
dd: closing output file ‘/mnt/test1/file30’: Unknown error 527
dd: closing output file ‘/mnt/test1/file31’: Unknown error 527
dd: closing output file ‘/mnt/test1/file32’: Unknown error 527
dd: closing output file ‘/mnt/test1/file33’: Unknown error 527


$ ls -ltrh /mnt/test1
total 1.1G
-rw-r--r--. 1 root root 100M Jul 29 15:18 file1
-rw-r--r--. 1 root root 100M Jul 29 15:19 file2
-rw-r--r--. 1 root root 100M Jul 29 15:19 file3
-rw-r--r--. 1 root root 100M Jul 29 15:19 file4
-rw-r--r--. 1 root root 100M Jul 29 15:19 file5
-rw-r--r--. 1 root root 100M Jul 29 15:20 file6
-rw-r--r--. 1 root root 100M Jul 29 15:20 file7
-rw-r--r--. 1 root root 100M Jul 29 15:20 file8
-rw-r--r--. 1 root root 100M Jul 29 15:20 file9
-rw-r--r--. 1 root root 100M Jul 29 15:20 file10
-rw-r--r--. 1 root root  99M Jul 29 15:21 file11
-rw-r--r--. 1 root root    0 Jul 29 15:21 file12
-rw-r--r--. 1 root root    0 Jul 29 15:21 file13
-rw-r--r--. 1 root root    0 Jul 29 15:21 file14
-rw-r--r--. 1 root root    0 Jul 29 15:21 file15
-rw-r--r--. 1 root root    0 Jul 29 15:21 file16
-rw-r--r--. 1 root root    0 Jul 29 15:21 file17
...


I have also experienced that after running 'rm -f /mnt/test1/file*' after an iteration of this test and waiting a few moments, the "missing" files may suddlenly show up on the client side. In other words, some intermittent files (like file10 in the first example above) don't appear to be there on an 'ls' of the mount point, then I 'rm' all the files, and shortly after that run another 'ls' which shows the missing files to be present. When this happens, I have seen that the missing files have been written to only one member of a replica pair. Doing an 'ls' against the specific missing files from the client will show that they are present. Running 'gluster volume quota <volname> list' against the volume will reflect the space being consumed by the files that are missing from the client perspective.
Comment 1 Dustin Black 2013-07-29 16:43:12 EDT
A similar problem has been experienced with the native client, so this may not be specific to NFS.

In the case of the native client, the error message 'Bad file descriptor' is given instead of 'Unknown error 527' and intermediate truncated files are seen.

$ ls -ltrh /mnt/test2
total 1.1G
-rw-r--r--. 1 root root 100M Jul 29 16:35 file1
-rw-r--r--. 1 root root 100M Jul 29 16:35 file2
-rw-r--r--. 1 root root 100M Jul 29 16:36 file3
-rw-r--r--. 1 root root 100M Jul 29 16:36 file4
-rw-r--r--. 1 root root 100M Jul 29 16:36 file5
-rw-r--r--. 1 root root 100M Jul 29 16:36 file6
-rw-r--r--. 1 root root 100M Jul 29 16:36 file7
-rw-r--r--. 1 root root 100M Jul 29 16:36 file8
-rw-r--r--. 1 root root 100M Jul 29 16:36 file9
-rw-r--r--. 1 root root 100M Jul 29 16:37 file10
-rw-r--r--. 1 root root  24M Jul 29 16:37 file11
-rw-r--r--. 1 root root 1.7M Jul 29 16:37 file12

Note You need to log in before you can comment on or make changes to this bug.