599485 – sequential dd will cause btrfs panic

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 599485 - sequential dd will cause btrfs panic

Summary: sequential dd will cause btrfs panic

Keywords:
Status:	CLOSED DUPLICATE of bug 663749
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Josef Bacik
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	584197
TreeView+	depends on / blocked

Reported:	2010-06-03 10:39 UTC by Eryu Guan
Modified:	2011-02-01 18:30 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-02-01 18:30:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
kernel panic console log when btrfs is under dd pressure (11.00 KB, text/plain) 2010-06-03 10:39 UTC, Eryu Guan	no flags	Details
View All

Description Eryu Guan 2010-06-03 10:39:45 UTC

Created attachment 419323 [details]
kernel panic console log when btrfs is under dd pressure

Description of problem:
btrfs will cause kernel panic under dd pressure

Version-Release number of selected component (if applicable):
# uname -a
Linux amd-drachma-01.lab.bos.redhat.com 2.6.32-27.el6.x86_64 #1 SMP Tue May 18 12:25:26 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
# rpm -q btrfs-progs
btrfs-progs-0.19-10.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
dd if=/dev/null of=/home/img0 bs=1G seek=50
losetup /dev/loop0 /home/img0
mkfs.btrfs /dev/loop0
mkdir -p /mnt/btrfs
mount /dev/loop0 /mnt/btrfs
for i in `seq 1 30`;do echo "No. $i dd";dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=$((1024*1024));done

About after 2-6 dd, kernel will panic.

Or schedule test /kernel/filesystems/btrfs/dd-pressure
  
Actual results:
kernel panic

Expected results:
no panic, pass this test

Additional info:
The format of attached console log may be messed up, that's what I got from console

Comment 4 Eryu Guan 2010-06-04 03:25:24 UTC

I'll do more tests on latest kernel. Installing new test system now.

Comment 5 Eryu Guan 2010-06-04 07:58:48 UTC

On Kernel 2.6.32-33.el6.x86_64, so far no panic was found, bug a huge performance regression. See No.5 8 12 dd test below.

dmesg
INFO: task btrfs-transacti:8187 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btrfs-transac D ffff88027fe77600     0  8187      2 0x00000080
 ffff880274fb7bd0 0000000000000046 0000000000000000 0000000071fb5501
 ffff880678ee5c00 ffff880677b6b470 ffff880679227800 00000000fffe9fbf
 ffff8802744066b0 ffff880274fb7fd8 00000000000103b8 ffff8802744066b0
Call Trace:
 [<ffffffff81198870>] ? sync_buffer+0x0/0x50
 [<ffffffff814d8193>] io_schedule+0x73/0xc0
 [<ffffffff811988b0>] sync_buffer+0x40/0x50
 [<ffffffff814d8a0f>] __wait_on_bit+0x5f/0x90
 [<ffffffff81198870>] ? sync_buffer+0x0/0x50
 [<ffffffff814d8ab8>] out_of_line_wait_on_bit+0x78/0x90
 [<ffffffff81090a90>] ? wake_bit_function+0x0/0x50
 [<ffffffff81198866>] __wait_on_buffer+0x26/0x30
 [<ffffffffa0205658>] write_dev_supers+0x148/0x3b0 [btrfs]
 [<ffffffffa0205acb>] write_all_supers+0x20b/0x280 [btrfs]
 [<ffffffffa0205b53>] write_ctree_super+0x13/0x20 [btrfs]
 [<ffffffffa020b057>] btrfs_commit_transaction+0x4c7/0x700 [btrfs]
 [<ffffffff81090a50>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa020a4a9>] ? start_transaction+0x129/0x180 [btrfs]
 [<ffffffffa020501b>] transaction_kthread+0x20b/0x240 [btrfs]
 [<ffffffffa0204e10>] ? transaction_kthread+0x0/0x240 [btrfs]
 [<ffffffff810906e6>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff81090650>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
INFO: task dd:8240 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dd            D ffff88027fe75200     0  8240   8137 0x00000080
 ffff880675d6bc78 0000000000000082 0000000000000000 ffff880677b6b470
 ffff8802759d0000 ffffea0010c975f8 ffff880675d6bc08 00000000fffea28a
 ffff880676809ad0 ffff880675d6bfd8 00000000000103b8 ffff880676809ad0
Call Trace:
 [<ffffffff8110be40>] ? sync_page+0x0/0x50
 [<ffffffff814d8193>] io_schedule+0x73/0xc0
 [<ffffffff8110be7d>] sync_page+0x3d/0x50
 [<ffffffff814d8a0f>] __wait_on_bit+0x5f/0x90
 [<ffffffff8110bc4e>] ? find_get_page+0x1e/0xa0
 [<ffffffff8110c033>] wait_on_page_bit+0x73/0x80
 [<ffffffff81090a90>] ? wake_bit_function+0x0/0x50
 [<ffffffffa0215508>] prepare_pages+0x248/0x270 [btrfs]
 [<ffffffffa02179ea>] btrfs_file_write+0x56a/0x820 [btrfs]
 [<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff81169b68>] vfs_write+0xb8/0x1a0
 [<ffffffff810d3782>] ? audit_syscall_entry+0x252/0x280
 [<ffffffff8116a5a1>] sys_write+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b


No. 1 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.0581 s, 265 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 2 dd test'
+ echo 'No. 2 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 2 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.42274 s, 243 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 3 dd test'
+ echo 'No. 3 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 3 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.4505 s, 241 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 4 dd test'
+ echo 'No. 4 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 4 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 10.7871 s, 99.5 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 5 dd test'
+ echo 'No. 5 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 5 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576

1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 1386.24 s, 775 kB/s  <============775 KB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 6 dd test'
+ echo 'No. 6 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 6 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.34202 s, 247 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 7 dd test'
+ echo 'No. 7 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 7 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 8.01839 s, 134 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 8 dd test'
+ echo 'No. 8 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 8 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 1496.97 s, 717 kB/s  <============== 717 KB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 9 dd test'
+ echo 'No. 9 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 9 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.43113 s, 242 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 10 dd test'
+ echo 'No. 10 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 10 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.34923 s, 247 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 11 dd test'
+ echo 'No. 11 dd test'
+ tee -a /tmp/tmp.KjTvAR
No. 11 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 12.5895 s, 85.3 MB/s
+ for i in '`seq 1 30`'
+ myecho 'No. 12 dd test'
+ tee -a /tmp/tmp.KjTvAR
+ echo 'No. 12 dd test'
No. 12 dd test
+ dd if=/dev/zero of=/mnt/btrfs/ddtest bs=1k count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 1412.57 s, 760 kB/s  <============== 760 KB/s

Comment 9 Eryu Guan 2010-07-15 08:00:11 UTC

(In reply to comment #0)
> Steps to Reproduce:
> dd if=/dev/null of=/home/img0 bs=1G seek=50
> losetup /dev/loop0 /home/img0
> mkfs.btrfs /dev/loop0
> mkdir -p /mnt/btrfs
> mount /dev/loop0 /mnt/btrfs
> for i in `seq 1 30`;do echo "No. $i dd";dd if=/dev/zero of=/mnt/btrfs/ddtest
> bs=1k count=$((1024*1024));done

This only happens when partition holding img0 is a small one, say 3G, and creating large sparse image on it, here is a 50G sparse image.

On a 3G partition, I tried different image size and got different result.
2G image : PASS
4G image : PANIC
Confirmed this on 2.6.32-44.el6.x86_64


 ------------[ cut here ]------------
kernel BUG at fs/btrfs/extent-tree.c:5149!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 4
Modules linked in: btrfs zlib_deflate libcrc32c autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log kvm_amd kvm bnx2 serio_raw amd64_edac_mod edac_core edac_mce_amd sg i2c_piix4 shpchp ipmi_si ipmi_msghandler hpilo ext4 mbcache jbd2 sr_mod cdrom qla2xxx scsi_transport_fc scsi_tgt hpsa cciss ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table]

Modules linked in: btrfs zlib_deflate libcrc32c autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log kvm_amd kvm bnx2 serio_raw amd64_edac_mod edac_core edac_mce_amd sg i2c_piix4 shpchp ipmi_si ipmi_msghandler hpilo ext4 mbcache jbd2 sr_mod cdrom qla2xxx scsi_transport_fc scsi_tgt hpsa cciss ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table]
Pid: 3543, comm: btrfs-endio-wri Not tainted 2.6.32-44.el6.x86_64 #1 ProLiant DL785 G6
RIP: 0010:[<ffffffffa0442b1a>]  [<ffffffffa0442b1a>] run_clustered_refs+0x54a/0x7f0 [btrfs]
RSP: 0018:ffff88081da19b10  EFLAGS: 00010282
RAX: 00000000fffffffb RBX: ffff88081d5423c0 RCX: ffff88141e111228
RDX: 0000000000000008 RSI: ffff88141e008b50 RDI: 0000000000000000
RBP: ffff88081da19bf0 R08: 0000000000000001 R09: 0000000000000000
R10: ffff880c1e61e000 R11: ffff88141e1f85e0 R12: ffff88081da26540
R13: ffff88081d542340 R14: ffff880c1e58a000 R15: ffff88141e1f85e0
FS:  00007ff755bad700(0000) GS:ffff88103e600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000450040 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs-endio-wri (pid: 3543, threadinfo ffff88081da18000, task ffff88081db48ae0)
Stack:
 0000000000000000 0000000000000000 ffff880800000001 0000000000000000
<0> ffff880c1d658000 ffff88081e33bec0 0000000000000000 ffff88081da19c30
<0> 0000000000000000 ffff88081e02c880 ffff88081e33be10 ffff88081e33bec8
Call Trace:
 [<ffffffffa0488a01>] ? btrfs_delayed_ref_pending+0x61/0x80 [btrfs]
 [<ffffffffa0442e88>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs]
 [<ffffffffa044f8da>] __btrfs_end_transaction+0x6a/0x230 [btrfs]
 [<ffffffffa0438fd2>] ? btrfs_block_rsv_release+0x32/0x50 [btrfs]
 [<ffffffffa044fad0>] btrfs_end_transaction+0x10/0x20 [btrfs]
 [<ffffffffa04530ea>] btrfs_finish_ordered_io+0x1ea/0x2b0 [btrfs]
 [<ffffffff8111ef5d>] ? test_clear_page_writeback+0x8d/0x150
 [<ffffffffa04531ca>] btrfs_writepage_end_io_hook+0x1a/0x20 [btrfs]
 [<ffffffffa046e21b>] end_bio_extent_writepage+0x13b/0x180 [btrfs]
 [<ffffffff814d8c3c>] ? schedule_timeout+0x19c/0x2f0
 [<ffffffff8119bb9d>] bio_endio+0x1d/0x40
 [<ffffffffa0449e34>] end_workqueue_fn+0xf4/0x130 [btrfs]
 [<ffffffffa047a031>] worker_loop+0x141/0x5b0 [btrfs]
 [<ffffffffa0479ef0>] ? worker_loop+0x0/0x5b0 [btrfs]
 [<ffffffff810909e6>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff81090950>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: 00 eb f9 48 8b 75 a0 48 8b 7d a8 ba b0 00 00 00 e8 ec 3c 02 00 48 8b 55 80 48 8b 75 a0 48 8b 7d a8 e8 db 3b 02 00 e9 0e ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 48 8b 4b 20 48 8b 53 18 4c
RIP  [<ffffffffa0442b1a>] run_clustered_refs+0x54a/0x7f0 [btrfs]
 RSP <ffff88081da19b10>
---[ end trace 18323aaaf8edea40 ]---
Kernel panic - not syncing: Fatal exception
Pid: 3543, comm: btrfs-endio-wri Tainted: G      D    2.6.32-44.el6.x86_64 #1
Call Trace:
 [<ffffffff814d7823>] panic+0x78/0x137
 [<ffffffff814db8f4>] oops_end+0xe4/0x100
 [<ffffffff8101731b>] die+0x5b/0x90
 [<ffffffff814db1a4>] do_trap+0xc4/0x160
 [<ffffffff81014ec5>] do_invalid_op+0x95/0xb0
 [<ffffffffa0442b1a>] ? run_clustered_refs+0x54a/0x7f0 [btrfs]
 [<ffffffffa042e79a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
 [<ffffffff81013f5b>] invalid_op+0x1b/0x20
 [<ffffffffa0442b1a>] ? run_clustered_refs+0x54a/0x7f0 [btrfs]
 [<ffffffffa0488a01>] ? btrfs_delayed_ref_pending+0x61/0x80 [btrfs]
 [<ffffffffa0442e88>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs]
 [<ffffffffa044f8da>] __btrfs_end_transaction+0x6a/0x230 [btrfs]
 [<ffffffffa0438fd2>] ? btrfs_block_rsv_release+0x32/0x50 [btrfs]
 [<ffffffffa044fad0>] btrfs_end_transaction+0x10/0x20 [btrfs]
 [<ffffffffa04530ea>] btrfs_finish_ordered_io+0x1ea/0x2b0 [btrfs]
 [<ffffffff8111ef5d>] ? test_clear_page_writeback+0x8d/0x150
 [<ffffffffa04531ca>] btrfs_writepage_end_io_hook+0x1a/0x20 [btrfs]
 [<ffffffffa046e21b>] end_bio_extent_writepage+0x13b/0x180 [btrfs]
 [<ffffffff814d8c3c>] ? schedule_timeout+0x19c/0x2f0
 [<ffffffff8119bb9d>] bio_endio+0x1d/0x40
 [<ffffffffa0449e34>] end_workqueue_fn+0xf4/0x130 [btrfs]

Comment 11 Josef Bacik 2010-07-28 17:56:13 UTC

Ok this is just because of BTRFS's enospc handling.  There are places where we just don't deal with getting back ENOSPC properly.  The fix for this is going to be more involved than is going to be possible for 6.0, pushing this out to 6.1.

Comment 12 Eryu Guan 2010-08-01 09:42:19 UTC

I tested on 2.6.32-55.el6 again and problem still exists

Comment 14 RHEL Program Management 2011-01-07 04:44:33 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 15 Suzanne Logcher 2011-01-07 16:04:04 UTC

This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 17 Josef Bacik 2011-01-07 18:23:38 UTC

Yeah it's still an issue, since its an error back from the lower levels, we don't handle that well at all atm.  I'll fix this as i do the big btrfs backport.

Comment 18 RHEL Program Management 2011-02-01 05:37:18 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 19 RHEL Program Management 2011-02-01 18:23:36 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 20 Josef Bacik 2011-02-01 18:30:05 UTC


*** This bug has been marked as a duplicate of bug 663749 ***

Note You need to log in before you can comment on or make changes to this bug.