Bug 894837
Summary: | Transient / Intermittent ENOSPC errors with BTRFS and F18 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Reartes Guillermo <rtguille> |
Component: | kernel | Assignee: | Zach Brown <zab> |
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 18 | CC: | gansalmon, itamar, jonathan, josef, kernel-maint, madhu.chinakonda, shyu, stephent98, sweil |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-09-23 20:38:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Reartes Guillermo
2013-01-13 20:11:59 UTC
Created attachment 677981 [details] dmesg of the affected system, which is the same system in which i noticed the transient ENOSPC I added another disk and then booted the guest, nothing more. ABRT says the kernel is tainted but it should not be. (maybe something like old Bug 744887). So i could not use ABRT to report this. ~~~~~~~~~~~~~~~~~~~~~~~~~~~ WARNING: at fs/btrfs/extent-tree.c:6323 btrfs_alloc_free_block+0x376/0x380 [btrfs]() Jan 12 19:53:55 localhost kernel: [73834.376794] Pid: 14675, comm: btrfs-endio-wri Tainted: G W 3.6.10-4.fc18.x86_64 #1 Jan 12 19:53:55 localhost kernel: [73834.376798] Call Trace: Jan 12 19:53:55 localhost kernel: [73834.376811] [<ffffffff8105c8df>] warn_slowpath_common+0x7f/0xc0 Jan 12 19:53:55 localhost kernel: [73834.376818] [<ffffffff8105c93a>] warn_slowpath_null+0x1a/0x20 Jan 12 19:53:55 localhost kernel: [73834.376851] [<ffffffffa0031a06>] btrfs_alloc_free_block+0x376/0x380 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.376888] [<ffffffffa005dc23>] ? read_extent_buffer+0xc3/0x120 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.376888] [<ffffffffa001da9a>] __btrfs_cow_block+0x12a/0x510 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.376888] [<ffffffffa001df77>] btrfs_cow_block+0xf7/0x200 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377123] [<ffffffffa00221a7>] btrfs_search_slot+0x3e7/0x8f0 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377160] [<ffffffffa0092c64>] ? btrfs_qgroup_record_ref+0x44/0x90 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377196] [<ffffffffa0092c64>] ? btrfs_qgroup_record_ref+0x44/0x90 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377230] [<ffffffffa0036466>] btrfs_lookup_csum+0x76/0x180 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377262] [<ffffffffa003991f>] ? btree_set_page_dirty+0x3f/0x50 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377295] [<ffffffffa0037630>] btrfs_csum_file_blocks+0xd0/0x670 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377332] [<ffffffffa0043935>] add_pending_csums.isra.35+0x45/0x60 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377367] [<ffffffffa0048290>] btrfs_finish_ordered_io+0x260/0x420 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377403] [<ffffffffa0048465>] finish_ordered_fn+0x15/0x20 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377439] [<ffffffffa00684d6>] worker_loop+0x136/0x580 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377478] [<ffffffffa00683a0>] ? btrfs_queue_worker+0x300/0x300 [btrfs] Jan 12 19:53:55 localhost kernel: [73834.377485] [<ffffffff8107fde3>] kthread+0x93/0xa0 Jan 12 19:53:55 localhost kernel: [73834.377494] [<ffffffff8162cb84>] kernel_thread_helper+0x4/0x10 Jan 12 19:53:55 localhost kernel: [73834.377501] [<ffffffff8107fd50>] ? kthread_freezable_should_stop+0x70/0x70 Jan 12 19:53:55 localhost kernel: [73834.377507] [<ffffffff8162cb80>] ? gs_change+0x13/0x13 Jan 12 19:53:55 localhost kernel: [73834.377511] ---[ end trace 50d9d2acf26a96b8 ]-- I created another btrfs filesystem on the disk i added to the guest. The new disk is 41gb, so it is not small. The new filesystem is mounted in /btdata1, and by doing the same procedure: # dd if=/dev/zero of=/btdata1/nukefile.out dd: writing to ‘/btdata1/nukefile.out’: No space left on device 80101506+0 records in 80101505+0 records out 41011970560 bytes (41 GB) copied, 2321.98 s, 17.7 MB/s # rm nukefile.out rm: remove regular file ‘nukefile.out’? y While i do not get ENOSPC, when i delete the file i get multiples kernel call traces... I cannot report it via ABRT, because it still thinks the kernel is tainted. Should i open another bug-report for the call traces? Created attachment 678530 [details]
messages with kernel call traces (3.6.10)
Created attachment 678531 [details]
messages with kernel call traces (3.7.2)
I installed another kvm guest on another kvm host (also F17) This guest has two disks instead of one, so i used a different btrfs multi-volume option. In this case /boot is not a subvolume. I did not see any call trace with this configuration, but i do experience transient / spurious / intermittent ENOSPC. The kernel version is still: 3.7.2-201.fc18.x86_64 (NOTE to SELF: KVM GUEST: FN-TSTx-1) # btrfs fi show /dev/vda2 Label: 'fedora' uuid: 300a7cc8-3cc6-47fb-81ed-0291fa0b2b27 Total devices 2 FS bytes used 31.52GB devid 2 size 39.01GB used 39.00GB path /dev/vdb1 devid 1 size 39.01GB used 39.01GB path /dev/vda2 Btrfs Btrfs v0.19 # btrfs fi df / Data, RAID0: total=73.97GB, used=34.28GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=2.00GB, used=243.00MB Metadata: total=8.00MB, used=0.00 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # dd if=/dev/zero of=/nukefile.out dd: writing to ‘/nukefile.out’: No space left on device 102227138+0 records in 102227137+0 records out 52340294144 bytes (52 GB) copied, 1641.97 s, 31.9 MB/s >> I waited some time +15 minutes # dd if=/dev/zero of=/nukefile8.out dd: writing to ‘/nukefile8.out’: No space left on device 33642+0 records in 33641+0 records out 17224192 bytes (17 MB) copied, 0.0901648 s, 191 MB/s >> WoW, there was still more space left... >> Do a 'sync' and try again, maybe there is more juice to extract from >> the fruit if one compresses it harder... # sync # dd if=/dev/zero of=/nukefile9.out dd: writing to ‘/nukefile9.out’: No space left on device 2570+0 records in 2569+0 records out 1315328 bytes (1.3 MB) copied, 0.00975233 s, 135 MB/s >> Humm, i got some extra drops of space.... # sync # dd if=/dev/zero of=/nukefile10.out dd: writing to ‘/nukefile10.out’: No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000866384 s, 0.0 kB/s >> Now it appears as truly ENOSPC. >> Wait some time, do a 'sync' and try again. # sync # dd if=/dev/zero of=/nukefile11.out dd: writing to ‘/nukefile11.out’: No space left on device 850+0 records in 849+0 records out 434688 bytes (435 kB) copied, 0.00718903 s, 60.5 MB/s >> Well, less than a droplet of space, but not zero. # sync # dd if=/dev/zero of=/nukefile12.out dd: writing to ‘/nukefile12.out’: No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00121953 s, 0.0 kB/s >> is it the true ENOSPC? # sync # sync # dd if=/dev/zero of=/nukefile13.out dd: writing to ‘/nukefile13.out’: No space left on device 33146+0 records in 33145+0 records out 16970240 bytes (17 MB) copied, 0.080919 s, 210 MB/s >> NO! It was not. So, if i wait some minutes, do some 'sync' >> then i can write a bit more!! # df -k Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 1002328 0 1002328 0% /dev tmpfs 1013620 80 1013540 1% /dev/shm tmpfs 1013620 2204 1011416 1% /run tmpfs 1013620 0 1013620 0% /sys/fs/cgroup /dev/vda2 1013620 1820 1011800 1% /tmp /dev/vda2 81819648 78188368 1536 100% / /dev/vda1 1007896 83836 872860 9% /boot >> It seems it could squeeze up to a megabyte by this >> method. # sync # sync # sync # sync # dd if=/dev/zero of=/nukefile14.out dd: writing to ‘/nukefile14.out’: No space left on device 3554+0 records in 3553+0 records out 1819136 bytes (1.8 MB) copied, 0.0119488 s, 152 MB/s # df -k Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 1002328 0 1002328 0% /dev tmpfs 1013620 80 1013540 1% /dev/shm tmpfs 1013620 2220 1011400 1% /run tmpfs 1013620 0 1013620 0% /sys/fs/cgroup /dev/vda2 1013620 1820 1011800 1% /tmp /dev/vda2 81819648 78189904 0 100% / /dev/vda1 1007896 83836 872860 9% /boot >> Well, this should be the 'true' ENOSPC for /. >> But is it?? It appeas that it is. # sync # sync # dd if=/dev/zero of=/nukefile15.out dd: writing to ‘/nukefile15.out’: No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.0012356 s, 0.0 kB/s >> Ok, then i deleted ok all the nukefiles*, the 'rm' command >> worked ok. Now i tried again. # rm nukefile* # dd if=/dev/zero of=/nukefile17.out dd: writing to ‘/nukefile17.out’: No space left on device 102292290+0 records in 102292289+0 records out 52373651968 bytes (52 GB) copied, 1617.21 s, 32.4 MB/s # sync # sync # sync # dd if=/dev/zero of=/nukefile18.out dd: writing to ‘/nukefile18.out’: No space left on device 29058+0 records in 29057+0 records out 14877184 bytes (15 MB) copied, 0.0892675 s, 167 MB/s # sync # sync # sync # dd if=/dev/zero of=/nukefile19.out dd: writing to ‘/nukefile19.out’: No space left on device 506+0 records in 505+0 records out 258560 bytes (259 kB) copied, 0.0055089 s, 46.9 MB/s # sync # sync # sync # dd if=/dev/zero of=/nukefile20.out dd: writing to ‘/nukefile20.out’: No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00127036 s, 0.0 kB/s # sync # sync # sync # dd if=/dev/zero of=/nukefile21.out dd: writing to ‘/nukefile21.out’: No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00121454 s, 0.0 kB/s # df -k / Filesystem 1K-blocks Used Available Use% Mounted on /dev/vda2 81819648 78189920 0 100% / # btrfs fi show /dev/vda2 Label: 'fedora' uuid: 300a7cc8-3cc6-47fb-81ed-0291fa0b2b27 Total devices 2 FS bytes used 74.27GB devid 2 size 39.01GB used 39.00GB path /dev/vdb1 devid 1 size 39.01GB used 39.01GB path /dev/vda2 Btrfs Btrfs v0.19 # btrfs fi df / Data, RAID0: total=73.97GB, used=73.97GB Data: total=8.00MB, used=8.00MB System, RAID1: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=2.00GB, used=299.66MB Metadata: total=8.00MB, used=0.00 >> So, why could i not reach this with a single 'dd' command? Enospc is a moving target in btrfs. Please try and reproduce on btrfs-next, if you can reproduce there file a bugzilla at bugzilla.kernel.org and set the component to btrfs. |