Bug 746414

Summary: Kernel panic/system crash on btrfs subvolume / snapshot deletions on RHEL 6.1 & 6.2beta
Product: Red Hat Enterprise Linux 6 Reporter: Amr Hamdy <amr.el.sharnoby>
Component: kernelAssignee: Josef Bacik <jbacik>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: jbacik, rfreire, rwheeler
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-07 14:19:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 767187    
Attachments:
Description Flags
Backup script to reproduce the problem
none
Crash dump created automatically with RHEL 6.1 kernel
none
Crash dump created automatically with RHEL 6.2 beta kernel none

Description Amr Hamdy 2011-10-15 14:28:13 UTC
Description of problem:
I've a backup script that stores data on a btrfs volume, take btrfs snapshots daily, update the data and keep a maximum number of snapshots by deleting the old snapshots. 
When you run the script multiple times - more than 15 times - , then try to delete multiple old snapshots by running
btrfs subvolume delete /data/_backup/backup.8 , for example, and running it again for deletion of few other subvolume/snapshots the system crashes with kernel panic , crash dump, then automatic reboot.
I've tried this on another machine, freshly created the fs, copied the data, ran the script multiple times, tried to multiple/all the snapshots and the same problem happened.

I've finally found a temporary solution, but pulling the latest kernel from git repos, linux-HEAD-37cf951 , compiled it and reboot the system .. Now the problem is solved, and it doesn't occur any more. I've aggressively tested it and I'm sure now that this bug is solved at this bleeding edge kernel.

The used data size is 2.6TB, the number of files is ~ 2,500,000 .. The total btrfs volume size is 4TB .

The crash dumps with 6.1 kernel and 6.2 beta kernel and the script is attached .. "The rsync lines are commented to reproduce the bug faster"

Version-Release number of selected component (if applicable):
The problem occurred on kernels;
kernel-2.6.32-131.12.1.el6.x86_64
kernel-2.6.32-131.17.1.el6.x86_64
kernel-2.6.32-202.el6.x86_64

It's solved on;
kernel-3.1.0_rc9-1.x86_64  compiled from linux-HEAD-37cf951 

How reproducible:
Create a btrfs volume > 3.5TB . Mount it, under /data for example, without any special options. Populate it with random data of 2.6TB size and ~ 2,500,00 , avg file size is ~ 1MB .

Steps to Reproduce:
1. Take 25 snapshot of the whole voulme, btrfs subvolume snapshot /data/ /data/_backup/snap.1 , again 25 times until btrfs subvolume snapshot /data/ /data/_backup/snap.25 .. Try to make any little data modification between each snapshot .. 
2. Run this script to delete then randomly , 
for i in `ls -1 | sort -R`; do btrfs subvolume delete /data/_backup/$i ; sleep 5; done

  
Actual results:
The system should crash within few seconds.

Expected results:
It should work fine, and those subvolume should be deleted without a problem.

Additional info:

Comment 1 Amr Hamdy 2011-10-15 14:30:32 UTC
Created attachment 528323 [details]
Backup script to reproduce the problem

Comment 3 Amr Hamdy 2011-10-15 14:47:03 UTC
Created attachment 528325 [details]
Crash dump created automatically with RHEL 6.1 kernel

Comment 4 Amr Hamdy 2011-10-15 14:48:40 UTC
Created attachment 528326 [details]
Crash dump created automatically with RHEL 6.2 beta kernel

Comment 5 Amr Hamdy 2011-11-09 15:51:45 UTC
Hi,
Any updates?
Is there a beta kernel we can test?

Thanks

Comment 6 Josef Bacik 2011-11-09 16:10:18 UTC
This has been put off until 6.3, I cannot release beta kernels with the fix yet.

Comment 7 Josef Bacik 2011-11-09 16:10:59 UTC
I should say that I've tested it with my backport and the problem is fixed, so it will be fixed in 6.3.

Comment 8 Amr Hamdy 2011-12-21 21:59:47 UTC
Hi Josef,
Can we get a test/alpha//beta kernel with the fix?

Comment 10 Amr Hamdy 2012-04-25 10:12:00 UTC
Have this been solved in RHEL 6.3 beta?

Comment 11 Rodrigo A B Freire 2012-05-07 14:03:48 UTC
Hi Amr,

Unfortunately, the fix didn't yet hit RHEL 6.3.

However, we have a work around for this:

1) Umount your btrfs volume
2) Check it with btrfsck
3) Remount it
4) Then, delete your snapshot.

Hope that helps,

- RF

Comment 13 Josef Bacik 2012-06-07 14:19:54 UTC

*** This bug has been marked as a duplicate of bug 698324 ***