Bug 768416 - Machine locks up
Summary: Machine locks up
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-16 15:42 UTC by Pascal Patry
Modified: 2012-02-07 19:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-07 19:10:04 UTC
Type: ---


Attachments (Terms of Use)
/var/log/messages (60.95 KB, text/plain)
2011-12-16 15:42 UTC, Pascal Patry
no flags Details
/var/log/messages (52.42 KB, text/plain)
2011-12-28 06:14 UTC, Pascal Patry
no flags Details

Description Pascal Patry 2011-12-16 15:42:14 UTC
Created attachment 547848 [details]
/var/log/messages

Description of problem:
Machine locks-up after 24 to 48 hours of uptime.

Version-Release number of selected component (if applicable):
Fedora 16 - Kernel 3.1.5-1

How reproducible:
I haven't notice any other trigger than time.

Additional info:
Interesting part of /var/log/messages has been attached.

I used to get _raw_spin_lock issues on Kernel 3.1.0 and since I updated, this problem started to occur.

Comment 1 Josh Boyer 2011-12-16 16:05:50 UTC
Can you recreate this without the nvidia module loaded?

Comment 2 Pascal Patry 2011-12-16 16:21:29 UTC
Short answer: Yes, long answer..

I have another machine, running on:
Linux sheol 2.6.33.6-147.fc13.x86_64 #1 SMP

Yes, I agree, a bit old, but it's able to easily get uptime of more than 200 days. That computer doesn't have the same hardware, but it has the same silent graphic card and uses the exact same nvidia module. I know that it taints the kernel, and that the tow kernel are different versions, but it proved itself to be quite stable.

If you really want me to disable that module and reproduce it, I can do it.

Comment 3 Josh Boyer 2011-12-16 16:29:39 UTC
(In reply to comment #2)
> Short answer: Yes, long answer..
> 
> I have another machine, running on:
> Linux sheol 2.6.33.6-147.fc13.x86_64 #1 SMP

That's irrelevant to this bug report, sorry.

> If you really want me to disable that module and reproduce it, I can do it.

Disabling the nvidia module and reproducing on the 3.1.5 kernel is really the only way to make progress here.

Comment 4 Pascal Patry 2011-12-16 16:32:59 UTC
Sure, I also grabbed the debug pkg to have more info. I'll post as soon as I reproduced it.

Comment 5 Pascal Patry 2011-12-28 06:14:22 UTC
Created attachment 549789 [details]
/var/log/messages

As promised, this is the /var/log/messages including the kernel stack of this problem without 'nvidia' tainting the Kernel. It took 11 days before locking up.

Kernel is 3.1.5-2.fc16.x86_64

Comment 6 Josh Boyer 2012-01-04 20:37:36 UTC
We have a similar oops in _raw_spinlock from a different user in bug 771559.  They hit this quite a while after they resumed from a suspend.  Did you happen to also resume from a suspend/hibernate at some point during the uptime?

Comment 7 Pascal Patry 2012-01-04 23:16:02 UTC
No, this workstation never goes to sleep/suspend. It runs 24/7 and doesn't even have a screen saver...

User interaction and/or having load is not necessary either. Most of the time, it locks up while being used, but it did also happen over night. I also got it after ~38 hours of uptime.

Comment 8 Pascal Patry 2012-01-14 21:39:18 UTC
Still reproducible with latest kernel pkg (3.1.8-2.fc16.x86_64).

Comment 9 Pascal Patry 2012-01-14 21:49:12 UTC
Looks like someone has put his finger on this issue a few days ago:
https://lkml.org/lkml/2012/1/9/114

Comment 10 Pascal Patry 2012-02-07 18:35:39 UTC
Currently on 3.2.2-1.fc16.x86_64 with an uptime of 6 days and an half. No issues to report yet. If the problem was really caused by comment #9, then 3.2.2 has the fix and I shouldn't be able to reproduce it.

Comment 11 Josh Boyer 2012-02-07 19:10:04 UTC
Agreed.  Let's close this one out for now.  If you see it again on 3.2.2 or newer, please reopen.


Note You need to log in before you can comment on or make changes to this bug.