Bug 894837

Summary:

Transient / Intermittent ENOSPC errors with BTRFS and F18

Product:

[Fedora] Fedora

Reporter:

Reartes Guillermo <rtguille>

Component:

kernel

Assignee:

Zach Brown <zab>

Status:

CLOSED UPSTREAM

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

gansalmon, itamar, jonathan, josef, kernel-maint, madhu.chinakonda, shyu, stephent98, sweil

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-09-23 20:38:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
dmesg of the affected system, which is the same system in which i noticed the transient ENOSPC	none
messages with kernel call traces (3.6.10)	none
messages with kernel call traces (3.7.2)	none

Description Reartes Guillermo 2013-01-13 20:11:59 UTC

Description of problem:

Sometimes F18 installed to a btrfs file-system gives intermittent / spurious / transient ENOSPC errors. A sync or waiting some time generally restores the ability to continue writing. Currently it happens at low space or full file-system condition. But i still believe that btrfs should not emit these failures.

Version-Release number of selected component (if applicable):
Kernel: 3.6.10-4.fc18.x86_64

How reproducible:
Sometimes.

Steps to reproduce:

0. Boot anaconda without any kernel parameter
1. Enter SOFTWARE SELECTION (using CDROM as INSTALLATION SOURCE)
2. Select: Development and Creativity Workstation (AND Also select ALL Ad-dons)
   Both Installation Options and 'anaconda.log' says the needed space is: 8.68gb
3. Enter STORAGE: INSTALLATION DESTINATION and select one disk (mine is 23gb)
4. Do MANUAL PARTITIONING setting the 'scheme' to BTRFS
5. Partition it with /, /boot as BTRFS sub-volumes, swap of 1024mb.
   Note: the btrfs specific partition size is 12.7gb.
6. Once Anaconda accepts the storage configuration, 'begin installation'
7. Anaconda starts installing, set a root password and wait.
8. Anaconda installs without any error.
9. Reboot and log-in to the newly installed system, do a 'yum update' and reboot.

10. Boot, log-in and open a terminal (KONSOLE) and execute:
 
$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6142658+0 records in
6142657+0 records out
3145040384 bytes (3.1 GB) copied, 108.714 s, 28.9 MB/s
$ rm nukefile.out 

>> OK
>> I do have 3.1gb free space.

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6146098+0 records in
6146097+0 records out
3146801664 bytes (3.1 GB) copied, 114.161 s, 27.6 MB/s
$ rm nukefile.out 

>> OK

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6143554+0 records in
6143553+0 records out
3145499136 bytes (3.1 GB) copied, 123.208 s, 25.5 MB/s
$ rm nukefile.out 

>> OK

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0024955 s, 0.0 kB/s
$ rm nukefile.out 

>> BAD!
>> I should have 3.1gb of free space! where is it!!!

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
3066+0 records in
3065+0 records out
1569280 bytes (1.6 MB) copied, 0.0171611 s, 91.4 MB/s

>> BAD!
>> I tried again and i was given a bit more this time but
>> it is still wrong!

$ dd if=/dev/zero of=./nukefile2.out
dd: writing to ‘./nukefile2.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00231884 s, 0.0 kB/s

$ ls -l nuk*
-rw-rw-r--. 1 test test       0 Jan 12 09:53 nukefile2.out
-rw-rw-r--. 1 test test 1569280 Jan 12 09:52 nukefile.out
$ rm nukefile.out 
$ rm nukefile2.out 

$ dd if=/dev/zero of=./nukefile2.out
dd: writing to ‘./nukefile2.out’: No space left on device
3066+0 records in
3065+0 records out
1569280 bytes (1.6 MB) copied, 0.0125578 s, 125 MB/s
$ mv nukefile2.out nukefile3.out 
$ rm nukefile3.out 

>> BAD! but let's WAIT SOME TIME

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6143570+0 records in
6143569+0 records out
3145507328 bytes (3.1 GB) copied, 116.753 s, 26.9 MB/s
$ rm nukefile.out 

>> OK, so i had to wait... Hummm, if there is pending I/O
>> i can understand a drop in performance but not an 
>> spurious/transient ENOSPC.

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
203898+0 records in
203897+0 records out
104395264 bytes (104 MB) copied, 0.709579 s, 147 MB/s
$ rm nukefile.out 

>> BAD! Again!

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
203898+0 records in
203897+0 records out
104395264 bytes (104 MB) copied, 0.717544 s, 145 MB/s
$ rm nukefile.out 

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
203898+0 records in
203897+0 records out
104395264 bytes (104 MB) copied, 0.710479 s, 147 MB/s
$ rm nukefile.out 

>> BAD! Let's do a SYNC, to see if it is really caused
>> by pending I/O

$ sync
$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6143522+0 records in
6143521+0 records out
3145482752 bytes (3.1 GB) copied, 113.037 s, 27.8 MB/s
$ rm nukefile.out 

>> OK, maybe it is.

$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00339174 s, 0.0 kB/s
$ rm nukefile.out 

$ sync
$ dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6143562+0 records in
6143561+0 records out
3145503232 bytes (3.1 GB) copied, 103.462 s, 30.4 MB/s
$ rm nukefile.out 

>> OK, Again, after a SYNC it will work ok.
>> NOW, let's try as ROOT

# sync
# btrfs fi show /dev/sda2 
Label: 'fedora'  uuid: d240a925-dd0c-488e-9e6f-3b6adc9dee78
        Total devices 1 FS bytes used 8.29GB
        devid    1 size 12.70GB used 12.70GB path /dev/sda2

Btrfs Btrfs v0.19

# btrfs fi df / 
Data: total=10.63GB, used=7.71GB
System, DUP: total=32.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=591.98MB

# dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6125154+0 records in
6125153+0 records out
3136078336 bytes (3.1 GB) copied, 118.785 s, 26.4 MB/s
# sync

>> OK for Now.

# rm nukefile.out 
rm: remove regular file ‘nukefile.out’? y
# dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6128066+0 records in
6128065+0 records out
3137569280 bytes (3.1 GB) copied, 119.868 s, 26.2 MB/s

# rm nukefile.out 
rm: remove regular file ‘nukefile.out’? y
# dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
6127570+0 records in
6127569+0 records out
3137315328 bytes (3.1 GB) copied, 105.101 s, 29.9 MB/s

# rm nukefile.out 
rm: remove regular file ‘nukefile.out’? y
# dd if=/dev/zero of=./nukefile.out
dd: writing to ‘./nukefile.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00227143 s, 0.0 kB/s

>> OK, it happened again. Do an strace.

# strace dd if=/dev/zero of=./nukefile.out
execve("/usr/bin/dd", ["dd", "if=/dev/zero", "of=./nukefile.out"], [/* 32 vars */]) = 0
brk(0)                                  = 0x1018000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff00354f000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=152887, ...}) = 0
mmap(NULL, 152887, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff003529000
close(3)                                = 0
open("/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\"\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=44272, ...}) = 0
mmap(NULL, 2128984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff003321000
mprotect(0x7ff003328000, 2093056, PROT_NONE) = 0
mmap(0x7ff003527000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7ff003527000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\33\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2067976, ...}) = 0
mmap(NULL, 3896312, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff002f69000
mprotect(0x7ff003116000, 2097152, PROT_NONE) = 0
mmap(0x7ff003316000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ad000) = 0x7ff003316000
mmap(0x7ff00331c000, 17400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff00331c000
close(3)                                = 0
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360j\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=141296, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff002f68000
mmap(NULL, 2208808, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff002d4c000
mprotect(0x7ff002d62000, 2097152, PROT_NONE) = 0
mmap(0x7ff002f62000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7ff002f62000
mmap(0x7ff002f64000, 13352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff002f64000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff002d4b000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff002d49000
arch_prctl(ARCH_SET_FS, 0x7ff002d49740) = 0
mprotect(0x7ff003316000, 16384, PROT_READ) = 0
mprotect(0x7ff002f62000, 4096, PROT_READ) = 0
mprotect(0x7ff003527000, 4096, PROT_READ) = 0
mprotect(0x60c000, 4096, PROT_READ)     = 0
mprotect(0x3b62020000, 4096, PROT_READ) = 0
munmap(0x7ff003529000, 152887)          = 0
set_tid_address(0x7ff002d49a10)         = 10728
set_robust_list(0x7ff002d49a20, 24)     = 0
rt_sigaction(SIGRTMIN, {0x7ff002d52650, [], SA_RESTORER|SA_SIGINFO, 0x7ff002d5b000}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7ff002d526d0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7ff002d5b000}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR1, {0x403fc0, [INT USR1], SA_RESTORER, 0x7ff002f9ec20}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x403fb0, [INT USR1], SA_RESTORER|SA_NODEFER|SA_RESETHAND, 0x7ff002f9ec20}, NULL, 8) = 0
brk(0)                                  = 0x1018000
brk(0x1039000)                          = 0x1039000
brk(0)                                  = 0x1039000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=104789808, ...}) = 0
mmap(NULL, 104789808, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7feffc959000
close(3)                                = 0
open("/dev/zero", O_RDONLY)             = 3
dup2(3, 0)                              = 0
close(3)                                = 0
lseek(0, 0, SEEK_CUR)                   = 0
open("./nukefile.out", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1)                              = 1
close(3)                                = 0
clock_gettime(CLOCK_MONOTONIC, {39711, 530718654}) = 0
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = -1 ENOSPC (No space left on device)
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2444, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff00354e000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2444
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x7ff00354e000, 4096)            = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib64/charset.alias", O_RDONLY|O_NOFOLLOW) = -1 ENOENT (No such file or directory)
write(2, "dd: ", 4dd: )                     = 4
write(2, "writing to \342\200\230./nukefile.out\342\200\231", 31writing to ‘./nukefile.out’) = 31
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, ": No space left on device", 25: No space left on device) = 25
write(2, "\n", 1
)                       = 1
close(0)                                = 0
close(1)                                = 0
clock_gettime(CLOCK_MONOTONIC, {39711, 536341588}) = 0
write(2, "1+0 records in\n0+0 records out\n", 311+0 records in
0+0 records out
) = 31
write(2, "0 bytes (0 B) copied", 200 bytes (0 B) copied)    = 20
write(2, ", 0.00562293 s, 0.0 kB/s\n", 25, 0.00562293 s, 0.0 kB/s
) = 25
close(2)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

>> Now, after some seconds try again.

# dd if=/dev/zero of=./nukefile2.out
dd: writing to ‘./nukefile2.out’: No space left on device
6125634+0 records in
6125633+0 records out
3136324096 bytes (3.1 GB) copied, 106.686 s, 29.4 MB/s

>> Now it can do it...

# rm nukefile.out 
rm: remove regular empty file ‘nukefile.out’? y
# rm nukefile2.out 
rm: remove regular file ‘nukefile2.out’? y

Actual results:

>> Well, BTRFS seems to give spurious or transient ENOSPC errors under load.
>> IMHO, this is dangerous file-system behavior, it should lose performance
>> but not give ENOSPC at certain times, otherwise writing is a lottery.

Expected results:

>> No spurious / transient / intermittent ENOSPC errors.
>> If there are pending I/O operations, then wait for those instead
>> of giving ENOSPC.

Additional info:

I opened this bug-report strictly for the transient / intermittent / spurious ENOSPC errors that BTRFS gives sometimes.

Comment 1 Reartes Guillermo 2013-01-14 01:17:59 UTC

Created attachment 677981 [details]
dmesg of the affected system, which is the same system in which i noticed the transient ENOSPC

I added another disk and then booted the guest, nothing more.
ABRT says the kernel is tainted but it should not be. (maybe something like old Bug 744887). So i could not use ABRT to report this. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~

 WARNING: at fs/btrfs/extent-tree.c:6323 btrfs_alloc_free_block+0x376/0x380 [btrfs]()

Jan 12 19:53:55 localhost kernel: [73834.376794] Pid: 14675, comm: btrfs-endio-wri Tainted: G        W    3.6.10-4.fc18.x86_64 #1
Jan 12 19:53:55 localhost kernel: [73834.376798] Call Trace:
Jan 12 19:53:55 localhost kernel: [73834.376811]  [<ffffffff8105c8df>] warn_slowpath_common+0x7f/0xc0
Jan 12 19:53:55 localhost kernel: [73834.376818]  [<ffffffff8105c93a>] warn_slowpath_null+0x1a/0x20
Jan 12 19:53:55 localhost kernel: [73834.376851]  [<ffffffffa0031a06>] btrfs_alloc_free_block+0x376/0x380 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.376888]  [<ffffffffa005dc23>] ? read_extent_buffer+0xc3/0x120 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.376888]  [<ffffffffa001da9a>] __btrfs_cow_block+0x12a/0x510 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.376888]  [<ffffffffa001df77>] btrfs_cow_block+0xf7/0x200 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377123]  [<ffffffffa00221a7>] btrfs_search_slot+0x3e7/0x8f0 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377160]  [<ffffffffa0092c64>] ? btrfs_qgroup_record_ref+0x44/0x90 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377196]  [<ffffffffa0092c64>] ? btrfs_qgroup_record_ref+0x44/0x90 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377230]  [<ffffffffa0036466>] btrfs_lookup_csum+0x76/0x180 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377262]  [<ffffffffa003991f>] ? btree_set_page_dirty+0x3f/0x50 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377295]  [<ffffffffa0037630>] btrfs_csum_file_blocks+0xd0/0x670 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377332]  [<ffffffffa0043935>] add_pending_csums.isra.35+0x45/0x60 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377367]  [<ffffffffa0048290>] btrfs_finish_ordered_io+0x260/0x420 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377403]  [<ffffffffa0048465>] finish_ordered_fn+0x15/0x20 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377439]  [<ffffffffa00684d6>] worker_loop+0x136/0x580 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377478]  [<ffffffffa00683a0>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
Jan 12 19:53:55 localhost kernel: [73834.377485]  [<ffffffff8107fde3>] kthread+0x93/0xa0
Jan 12 19:53:55 localhost kernel: [73834.377494]  [<ffffffff8162cb84>] kernel_thread_helper+0x4/0x10
Jan 12 19:53:55 localhost kernel: [73834.377501]  [<ffffffff8107fd50>] ? kthread_freezable_should_stop+0x70/0x70
Jan 12 19:53:55 localhost kernel: [73834.377507]  [<ffffffff8162cb80>] ? gs_change+0x13/0x13
Jan 12 19:53:55 localhost kernel: [73834.377511] ---[ end trace 50d9d2acf26a96b8 ]--

Comment 2 Reartes Guillermo 2013-01-14 06:37:55 UTC

I created another btrfs filesystem on the disk i added to the guest. The new disk is 41gb, so it is not small.

The new filesystem is mounted in /btdata1, and by doing the same procedure: 

# dd if=/dev/zero of=/btdata1/nukefile.out
dd: writing to ‘/btdata1/nukefile.out’: No space left on device
80101506+0 records in
80101505+0 records out
41011970560 bytes (41 GB) copied, 2321.98 s, 17.7 MB/s

# rm nukefile.out
rm: remove regular file ‘nukefile.out’? y

While i do not get ENOSPC, when i delete the file i get multiples kernel call traces... I cannot report it via ABRT, because it still thinks the kernel is tainted. Should i open another bug-report for the call traces?

Comment 3 Reartes Guillermo 2013-01-15 01:08:04 UTC

Created attachment 678530 [details]
messages with kernel call traces (3.6.10)

Comment 4 Reartes Guillermo 2013-01-15 01:09:06 UTC

Created attachment 678531 [details]
messages with kernel call traces (3.7.2)

Comment 5 Reartes Guillermo 2013-01-16 19:56:58 UTC

I installed another kvm guest on another kvm host (also F17)
This guest has two disks instead of one, so i used a different
btrfs multi-volume option. In this case /boot is not a subvolume.
I did not see any call trace with this configuration, but i do 
experience transient / spurious / intermittent ENOSPC.

The kernel version is still: 3.7.2-201.fc18.x86_64
(NOTE to SELF: KVM GUEST: FN-TSTx-1)

# btrfs fi show /dev/vda2
Label: 'fedora'  uuid: 300a7cc8-3cc6-47fb-81ed-0291fa0b2b27
        Total devices 2 FS bytes used 31.52GB
        devid    2 size 39.01GB used 39.00GB path /dev/vdb1
        devid    1 size 39.01GB used 39.01GB path /dev/vda2

Btrfs Btrfs v0.19

# btrfs fi df  /
Data, RAID0: total=73.97GB, used=34.28GB
Data: total=8.00MB, used=0.00
System, RAID1: total=8.00MB, used=12.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=2.00GB, used=243.00MB
Metadata: total=8.00MB, used=0.00

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# dd if=/dev/zero of=/nukefile.out
dd: writing to ‘/nukefile.out’: No space left on device
102227138+0 records in
102227137+0 records out
52340294144 bytes (52 GB) copied, 1641.97 s, 31.9 MB/s

>> I waited some time +15 minutes

# dd if=/dev/zero of=/nukefile8.out
dd: writing to ‘/nukefile8.out’: No space left on device
33642+0 records in
33641+0 records out
17224192 bytes (17 MB) copied, 0.0901648 s, 191 MB/s

>> WoW, there was still more space left...
>> Do a 'sync' and try again, maybe there is more juice to extract from
>> the fruit if one compresses it harder...

# sync
# dd if=/dev/zero of=/nukefile9.out
dd: writing to ‘/nukefile9.out’: No space left on device
2570+0 records in
2569+0 records out
1315328 bytes (1.3 MB) copied, 0.00975233 s, 135 MB/s

>> Humm, i got some extra drops of space....

# sync
# dd if=/dev/zero of=/nukefile10.out
dd: writing to ‘/nukefile10.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000866384 s, 0.0 kB/s

>> Now it appears as truly ENOSPC.
>> Wait some time, do a 'sync' and try again.

# sync
# dd if=/dev/zero of=/nukefile11.out
dd: writing to ‘/nukefile11.out’: No space left on device
850+0 records in
849+0 records out
434688 bytes (435 kB) copied, 0.00718903 s, 60.5 MB/s

>> Well, less than a droplet of space, but not zero.

# sync
# dd if=/dev/zero of=/nukefile12.out
dd: writing to ‘/nukefile12.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00121953 s, 0.0 kB/s

>> is it the true ENOSPC?

# sync
# sync
# dd if=/dev/zero of=/nukefile13.out
dd: writing to ‘/nukefile13.out’: No space left on device
33146+0 records in
33145+0 records out
16970240 bytes (17 MB) copied, 0.080919 s, 210 MB/s

>> NO! It was not. So, if i wait some minutes, do some 'sync'
>> then i can write a bit more!!

# df -k
Filesystem     1K-blocks     Used Available Use% Mounted on
devtmpfs         1002328        0   1002328   0% /dev
tmpfs            1013620       80   1013540   1% /dev/shm
tmpfs            1013620     2204   1011416   1% /run
tmpfs            1013620        0   1013620   0% /sys/fs/cgroup
/dev/vda2        1013620     1820   1011800   1% /tmp
/dev/vda2       81819648 78188368      1536 100% /
/dev/vda1        1007896    83836    872860   9% /boot

>> It seems it could squeeze up to a megabyte by this
>> method. 

# sync  
# sync
# sync
# sync
# dd if=/dev/zero of=/nukefile14.out
dd: writing to ‘/nukefile14.out’: No space left on device
3554+0 records in
3553+0 records out
1819136 bytes (1.8 MB) copied, 0.0119488 s, 152 MB/s


# df -k
Filesystem     1K-blocks     Used Available Use% Mounted on
devtmpfs         1002328        0   1002328   0% /dev
tmpfs            1013620       80   1013540   1% /dev/shm
tmpfs            1013620     2220   1011400   1% /run
tmpfs            1013620        0   1013620   0% /sys/fs/cgroup
/dev/vda2        1013620     1820   1011800   1% /tmp
/dev/vda2       81819648 78189904         0 100% /
/dev/vda1        1007896    83836    872860   9% /boot

>> Well, this should be the 'true' ENOSPC for /.
>> But is it?? It appeas that it is.

# sync
# sync
# dd if=/dev/zero of=/nukefile15.out
dd: writing to ‘/nukefile15.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0012356 s, 0.0 kB/s

>> Ok, then i deleted ok all the nukefiles*, the 'rm' command
>> worked ok. Now i tried again.

# rm nukefile*
# dd if=/dev/zero of=/nukefile17.out
dd: writing to ‘/nukefile17.out’: No space left on device
102292290+0 records in
102292289+0 records out
52373651968 bytes (52 GB) copied, 1617.21 s, 32.4 MB/s

# sync
# sync
# sync
# dd if=/dev/zero of=/nukefile18.out
dd: writing to ‘/nukefile18.out’: No space left on device
29058+0 records in
29057+0 records out
14877184 bytes (15 MB) copied, 0.0892675 s, 167 MB/s

# sync
# sync
# sync
# dd if=/dev/zero of=/nukefile19.out
dd: writing to ‘/nukefile19.out’: No space left on device
506+0 records in
505+0 records out
258560 bytes (259 kB) copied, 0.0055089 s, 46.9 MB/s

# sync
# sync
# sync
# dd if=/dev/zero of=/nukefile20.out
dd: writing to ‘/nukefile20.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00127036 s, 0.0 kB/s

# sync
# sync
# sync
# dd if=/dev/zero of=/nukefile21.out
dd: writing to ‘/nukefile21.out’: No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00121454 s, 0.0 kB/s

# df -k /
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/vda2       81819648 78189920         0 100% /

# btrfs fi show /dev/vda2 
Label: 'fedora'  uuid: 300a7cc8-3cc6-47fb-81ed-0291fa0b2b27
        Total devices 2 FS bytes used 74.27GB
        devid    2 size 39.01GB used 39.00GB path /dev/vdb1
        devid    1 size 39.01GB used 39.01GB path /dev/vda2

Btrfs Btrfs v0.19

# btrfs fi df / 
Data, RAID0: total=73.97GB, used=73.97GB
Data: total=8.00MB, used=8.00MB
System, RAID1: total=8.00MB, used=12.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=2.00GB, used=299.66MB
Metadata: total=8.00MB, used=0.00

>> So, why could i not reach this with a single 'dd' command?

Comment 6 Josef Bacik 2013-09-23 20:38:33 UTC

Enospc is a moving target in btrfs.  Please try and reproduce on btrfs-next, if you can reproduce there file a bugzilla at bugzilla.kernel.org and set the component to btrfs.