Bug 768416

Summary:

Machine locks up

Product:

[Fedora] Fedora

Reporter:

Pascal Patry <iscy>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-02-07 19:10:04 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
/var/log/messages	none
/var/log/messages	none

Description Pascal Patry 2011-12-16 15:42:14 UTC

Created attachment 547848 [details]
/var/log/messages

Description of problem:
Machine locks-up after 24 to 48 hours of uptime.

Version-Release number of selected component (if applicable):
Fedora 16 - Kernel 3.1.5-1

How reproducible:
I haven't notice any other trigger than time.

Additional info:
Interesting part of /var/log/messages has been attached.

I used to get _raw_spin_lock issues on Kernel 3.1.0 and since I updated, this problem started to occur.

Comment 1 Josh Boyer 2011-12-16 16:05:50 UTC

Can you recreate this without the nvidia module loaded?

Comment 2 Pascal Patry 2011-12-16 16:21:29 UTC

Short answer: Yes, long answer..

I have another machine, running on:
Linux sheol 2.6.33.6-147.fc13.x86_64 #1 SMP

Yes, I agree, a bit old, but it's able to easily get uptime of more than 200 days. That computer doesn't have the same hardware, but it has the same silent graphic card and uses the exact same nvidia module. I know that it taints the kernel, and that the tow kernel are different versions, but it proved itself to be quite stable.

If you really want me to disable that module and reproduce it, I can do it.

Comment 3 Josh Boyer 2011-12-16 16:29:39 UTC

(In reply to comment #2)
> Short answer: Yes, long answer..
> 
> I have another machine, running on:
> Linux sheol 2.6.33.6-147.fc13.x86_64 #1 SMP

That's irrelevant to this bug report, sorry.

> If you really want me to disable that module and reproduce it, I can do it.

Disabling the nvidia module and reproducing on the 3.1.5 kernel is really the only way to make progress here.

Comment 4 Pascal Patry 2011-12-16 16:32:59 UTC

Sure, I also grabbed the debug pkg to have more info. I'll post as soon as I reproduced it.

Comment 5 Pascal Patry 2011-12-28 06:14:22 UTC

Created attachment 549789 [details]
/var/log/messages

As promised, this is the /var/log/messages including the kernel stack of this problem without 'nvidia' tainting the Kernel. It took 11 days before locking up.

Kernel is 3.1.5-2.fc16.x86_64

Comment 6 Josh Boyer 2012-01-04 20:37:36 UTC

We have a similar oops in _raw_spinlock from a different user in bug 771559.  They hit this quite a while after they resumed from a suspend.  Did you happen to also resume from a suspend/hibernate at some point during the uptime?

Comment 7 Pascal Patry 2012-01-04 23:16:02 UTC

No, this workstation never goes to sleep/suspend. It runs 24/7 and doesn't even have a screen saver...

User interaction and/or having load is not necessary either. Most of the time, it locks up while being used, but it did also happen over night. I also got it after ~38 hours of uptime.

Comment 8 Pascal Patry 2012-01-14 21:39:18 UTC

Still reproducible with latest kernel pkg (3.1.8-2.fc16.x86_64).

Comment 9 Pascal Patry 2012-01-14 21:49:12 UTC

Looks like someone has put his finger on this issue a few days ago:
https://lkml.org/lkml/2012/1/9/114

Comment 10 Pascal Patry 2012-02-07 18:35:39 UTC

Currently on 3.2.2-1.fc16.x86_64 with an uptime of 6 days and an half. No issues to report yet. If the problem was really caused by comment #9, then 3.2.2 has the fix and I shouldn't be able to reproduce it.

Comment 11 Josh Boyer 2012-02-07 19:10:04 UTC

Agreed.  Let's close this one out for now.  If you see it again on 3.2.2 or newer, please reopen.