Bug 707686

Summary: Kernel md raid code crash on amanda backup
Product: [Fedora] Fedora Reporter: Trever Adams <trever>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-14 14:07:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
screen shot of oops
none
Additional crash that looks slightly different
none
This may or may not have what was requested
none
Another backtrace from a freeze that may or may not be the same bug
none
This has things not seen in others
none
a few more backtraces none

Description Trever Adams 2011-05-25 17:26:18 UTC
Created attachment 500885 [details]
screen shot of oops

Description of problem:
This may or may not be related to bug 704462. Whenever I run amdump with the virtual tapes on an md raid device, the system crashes. I have attached an image of the crash since it is the only way I have access to it.

Version-Release number of selected component (if applicable):
kernel-2.6.38.6-27.fc15.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a raid device
2.  disk -> raid -> lvm -> ext4
3. Setup amanda with virtual tapes on that raid device
4. amdump yourdumpname
5. Crash

Comment 1 Trever Adams 2011-05-29 18:09:31 UTC
Created attachment 501637 [details]
Additional crash that looks slightly different

This may or may not be the same crash, but this one is much cleaner. (Other problems have been fixed which should leave fewer problems).

Comment 2 Chuck Ebbert 2011-06-15 14:33:21 UTC
This looks like a spinlock deadlock.

A program called rs:main is calling sys_futex() which eventually ends up calling _raw_spin_lock(), and it looks like that is deadlocking with a worker thread trying to dispatch a disk request. We'd really need to see several screens of the earlier messages to see exactly where it's deadlocking.

Comment 3 Trever Adams 2011-06-15 15:44:49 UTC
I would gladly do this, but the machine hard locks when this lockup happens, so I cannot scroll back. If you can provide me with instructions on somehow capturing all of this, I will do my best to do it.

Comment 4 Trever Adams 2011-06-18 15:50:14 UTC
Created attachment 505396 [details]
This may or may not have what was requested

I am not sure if this is related or not. It does have much more information.

Comment 5 Trever Adams 2011-06-20 13:34:01 UTC
Created attachment 505612 [details]
Another backtrace from a freeze that may or may not be the same bug

Comment 6 Trever Adams 2011-06-20 13:42:06 UTC
I should mention that any backtraces after June 16 at 6:16 AM MDT is from kernel-2.6.38.8-32.fc15.x86_64

Comment 7 Trever Adams 2011-06-20 13:55:38 UTC
Created attachment 505619 [details]
This has things not seen in others

Comment 8 Trever Adams 2011-06-20 14:14:01 UTC
Created attachment 505622 [details]
a few more backtraces

I do not think I will do anymore. While there are some unique parts, there appears to be a core that is repeated over and over. I imagine the trouble is there.

Comment 9 Trever Adams 2011-07-05 19:46:36 UTC
I switched Realtek 8169 to Intel e100e PCIe card. I have not been able to duplicate any of these problems since, even under very heavy load. The process is also much more idle (nearly completely used w/ 8169 and about 30-70% idle most of the time, more than 50 quite often, with the later card).

I do not know if the 8169 chipset is just broken or if the driver is, but the problem lies with one of the two.

Comment 10 Trever Adams 2011-10-14 14:07:40 UTC
I am closing this as NOTABUG as it is a kernel driver problem/hardware problem with Realtek 8169 and the like.