Bug 746414 - Kernel panic/system crash on btrfs subvolume / snapshot deletions on RHEL 6.1 & 6.2beta
Summary: Kernel panic/system crash on btrfs subvolume / snapshot deletions on RHEL 6.1...
Keywords:
Status: CLOSED DUPLICATE of bug 698324
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Josef Bacik
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 767187
TreeView+ depends on / blocked
 
Reported: 2011-10-15 14:28 UTC by Amr Hamdy
Modified: 2012-06-07 14:19 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-07 14:19:54 UTC
Target Upstream Version:


Attachments (Terms of Use)
Backup script to reproduce the problem (819 bytes, application/x-sh)
2011-10-15 14:30 UTC, Amr Hamdy
no flags Details
Crash dump created automatically with RHEL 6.1 kernel (63 bytes, text/plain)
2011-10-15 14:47 UTC, Amr Hamdy
no flags Details
Crash dump created automatically with RHEL 6.2 beta kernel (63 bytes, text/plain)
2011-10-15 14:48 UTC, Amr Hamdy
no flags Details

Description Amr Hamdy 2011-10-15 14:28:13 UTC
Description of problem:
I've a backup script that stores data on a btrfs volume, take btrfs snapshots daily, update the data and keep a maximum number of snapshots by deleting the old snapshots. 
When you run the script multiple times - more than 15 times - , then try to delete multiple old snapshots by running
btrfs subvolume delete /data/_backup/backup.8 , for example, and running it again for deletion of few other subvolume/snapshots the system crashes with kernel panic , crash dump, then automatic reboot.
I've tried this on another machine, freshly created the fs, copied the data, ran the script multiple times, tried to multiple/all the snapshots and the same problem happened.

I've finally found a temporary solution, but pulling the latest kernel from git repos, linux-HEAD-37cf951 , compiled it and reboot the system .. Now the problem is solved, and it doesn't occur any more. I've aggressively tested it and I'm sure now that this bug is solved at this bleeding edge kernel.

The used data size is 2.6TB, the number of files is ~ 2,500,000 .. The total btrfs volume size is 4TB .

The crash dumps with 6.1 kernel and 6.2 beta kernel and the script is attached .. "The rsync lines are commented to reproduce the bug faster"

Version-Release number of selected component (if applicable):
The problem occurred on kernels;
kernel-2.6.32-131.12.1.el6.x86_64
kernel-2.6.32-131.17.1.el6.x86_64
kernel-2.6.32-202.el6.x86_64

It's solved on;
kernel-3.1.0_rc9-1.x86_64  compiled from linux-HEAD-37cf951 

How reproducible:
Create a btrfs volume > 3.5TB . Mount it, under /data for example, without any special options. Populate it with random data of 2.6TB size and ~ 2,500,00 , avg file size is ~ 1MB .

Steps to Reproduce:
1. Take 25 snapshot of the whole voulme, btrfs subvolume snapshot /data/ /data/_backup/snap.1 , again 25 times until btrfs subvolume snapshot /data/ /data/_backup/snap.25 .. Try to make any little data modification between each snapshot .. 
2. Run this script to delete then randomly , 
for i in `ls -1 | sort -R`; do btrfs subvolume delete /data/_backup/$i ; sleep 5; done

  
Actual results:
The system should crash within few seconds.

Expected results:
It should work fine, and those subvolume should be deleted without a problem.

Additional info:

Comment 1 Amr Hamdy 2011-10-15 14:30:32 UTC
Created attachment 528323 [details]
Backup script to reproduce the problem

Comment 3 Amr Hamdy 2011-10-15 14:47:03 UTC
Created attachment 528325 [details]
Crash dump created automatically with RHEL 6.1 kernel

Comment 4 Amr Hamdy 2011-10-15 14:48:40 UTC
Created attachment 528326 [details]
Crash dump created automatically with RHEL 6.2 beta kernel

Comment 5 Amr Hamdy 2011-11-09 15:51:45 UTC
Hi,
Any updates?
Is there a beta kernel we can test?

Thanks

Comment 6 Josef Bacik 2011-11-09 16:10:18 UTC
This has been put off until 6.3, I cannot release beta kernels with the fix yet.

Comment 7 Josef Bacik 2011-11-09 16:10:59 UTC
I should say that I've tested it with my backport and the problem is fixed, so it will be fixed in 6.3.

Comment 8 Amr Hamdy 2011-12-21 21:59:47 UTC
Hi Josef,
Can we get a test/alpha//beta kernel with the fix?

Comment 10 Amr Hamdy 2012-04-25 10:12:00 UTC
Have this been solved in RHEL 6.3 beta?

Comment 11 Rodrigo A B Freire 2012-05-07 14:03:48 UTC
Hi Amr,

Unfortunately, the fix didn't yet hit RHEL 6.3.

However, we have a work around for this:

1) Umount your btrfs volume
2) Check it with btrfsck
3) Remount it
4) Then, delete your snapshot.

Hope that helps,

- RF

Comment 13 Josef Bacik 2012-06-07 14:19:54 UTC

*** This bug has been marked as a duplicate of bug 698324 ***


Note You need to log in before you can comment on or make changes to this bug.