759776 – kernel 3.1.2-1.fc16.x86_64 gets recursive faults after "Assertion `((reloc->r_info) & 0xffffffff) == 8' failed!"

Bug 759776 - kernel 3.1.2-1.fc16.x86_64 gets recursive faults after "Assertion `((reloc->r_info) & 0xffffffff) == 8' failed!"

Summary: kernel 3.1.2-1.fc16.x86_64 gets recursive faults after "Assertion `((reloc->r...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	16
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	kernel_hibernate
TreeView+	depends on / blocked

Reported:	2011-12-03 18:24 UTC by Michal Jaegermann
Modified:	2012-05-14 17:56 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-05-14 17:56:19 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Traces from a series of faults with 3.1.2-1.fc16.x86_64 kernel (25.51 KB, text/plain) 2011-12-03 18:24 UTC, Michal Jaegermann	no flags	Details
dmesg from a nrmal boot on the same hardware with 3.1.2-1.fc16.x86_64 kernel (67.56 KB, text/plain) 2011-12-03 18:25 UTC, Michal Jaegermann	no flags	Details
View All

Description Michal Jaegermann 2011-12-03 18:24:32 UTC

Created attachment 540123 [details]
Traces from a series of faults with 3.1.2-1.fc16.x86_64 kernel

Description of problem:

Without any obvious reason I suddenly got "kernel:[ 7067.129040] general protection fault: 0000 [#1] SMP".  dmesg revaled a series of recursive faults punctuated by "Fixing recursive fault but reboot is needed!" and indeed - shortly after the laptop - Asus K52Jc - locked up entirely and required a hard reboot. Before freeze I found in dmesg: "umount[7441]: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 460: elf_machine_rela_relative: Assertion `((reloc->r_info) & 0xffffffff) == 8' failed!".

Traces from these faults, and also dmesg from a normal boot, are attached.

abrtd was running at the time but failed to catch anything.

Version-Release number of selected component (if applicable):
kernel-3.1.2-1.fc16.x86_64

How reproducible:
No idea.  Got that so far once.

Additional info:
This was a kernel running, for a while, after a "thaw" boot when I was experimenting with "hibernate" trying to get it to work (that can be coaxed to work but, together with suspend, dismally fails in a default configuration - bug 697150).

Got that surprise shortly after upgrading to Fedora 16.  Before that this machine was running Fedora 14 and I do not recall any similar incident then.

Comment 1 Michal Jaegermann 2011-12-03 18:25:54 UTC

Created attachment 540124 [details]
dmesg from a nrmal boot on the same hardware with 3.1.2-1.fc16.x86_64 kernel

Comment 2 Stanislaw Gruszka 2011-12-03 22:01:39 UTC

We have hibernate memory corruption problems with i915 driver on RHEL6 (bug 746169 ). I did't try to reproduce problem on upstream/fedora kernel, but I think this could be the same issue. As workaround you can use i915.modeset=0, or not use hibernate, I think suspend-to-ram is not affected.

Comment 3 Michal Jaegermann 2011-12-04 03:39:39 UTC

(In reply to comment #2)
> We have hibernate memory corruption problems with i915 driver on RHEL6 (bug
> 746169 ). I did't try to reproduce problem on upstream/fedora kernel, but I
> think this could be the same issue.

Hm, maybe.  But on RHEL6 these are 2.6.32-... kernels.  Right?  I went on this laptop through the whole series of Fedora 14 2.6.35... kernels and this is the first time I was hit by something like that.  Maybe I was just lucky?

> As workaround you can use i915.modeset=0,
> or not use hibernate, I think suspend-to-ram is not affected.

That laptop has also a built-in Nvidia video but nouveau is refusing to find any displays and so far I did not bother with Nvidia binary-only stuff.

Comment 4 Dave Jones 2012-03-22 16:51:43 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 5 Dave Jones 2012-03-22 16:55:56 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 6 Dave Jones 2012-03-22 17:06:41 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 7 Michal Jaegermann 2012-03-22 22:42:52 UTC

(In reply to comment #6)
> kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
> Please retest with this update.
(Oh, one "please" instead of three would be enough :-)

I tried few times "hybernate-thaw" cycle with 3.3.0-4.fc16 kernel on the same laptop as before and so far nothing bad happened.  OTOH this is not an error which was showing up with an absolute consistency.  If you think that there are good reasons for this bug to be gone then close this report and I will hope not to have reasons to reopen. :-)

Comment 8 Dave Jones 2012-03-23 14:22:26 UTC

the i915 bug is still unfixed, so it's likely the problem is still lurking, you just got lucky so far.

Comment 9 Josh Boyer 2012-03-28 17:59:38 UTC

[Mass hibernate bug update]

Dave Airlied has found an issue causing some corruption in the i915 fbdev after a resume from hibernate.  I have included his patch in this scratch build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3940545

This will probably not solve all of the issues being tracked at the moment, but it is worth testing when the build completes.  If this seems to clear up the issues you see with hibernate, please report your results in the bug.

Comment 10 Michal Jaegermann 2012-03-28 19:13:13 UTC

(In reply to comment #9)
> If this seems to clear up the
> issues you see with hibernate, please report your results in the bug.

There is, unfortunately, a catch.  The problem I observed happens only sporadically and unpredictably.  So, like noted in comment #7, kernel 3.3.0-4.fc16 also appeared to "clear" that bug but Dave Jones said in comment 8 that this was only a lucky illusion.

Still I will try that build you mention but how do I know that I am not "lucky" again until I triggered the issue?

Comment 11 Michal Jaegermann 2012-03-29 05:54:35 UTC

I went now through six "hibernate-thaw" cycles with kernel-3.3.0-7.1.fc16, one after another but interspersed with small bits of activity on a desktop, and so far nothing bad happened.  I will leave to others how to interpret this observation. :-)

Note You need to log in before you can comment on or make changes to this bug.