Red Hat Bugzilla – Bug 759776
kernel 3.1.2-1.fc16.x86_64 gets recursive faults after "Assertion `((reloc->r_info) & 0xffffffff) == 8' failed!"
Last modified: 2012-05-14 13:56:19 EDT
Created attachment 540123 [details]
Traces from a series of faults with 3.1.2-1.fc16.x86_64 kernel
Description of problem:
Without any obvious reason I suddenly got "kernel:[ 7067.129040] general protection fault: 0000 [#1] SMP". dmesg revaled a series of recursive faults punctuated by "Fixing recursive fault but reboot is needed!" and indeed - shortly after the laptop - Asus K52Jc - locked up entirely and required a hard reboot. Before freeze I found in dmesg: "umount: Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 460: elf_machine_rela_relative: Assertion `((reloc->r_info) & 0xffffffff) == 8' failed!".
Traces from these faults, and also dmesg from a normal boot, are attached.
abrtd was running at the time but failed to catch anything.
Version-Release number of selected component (if applicable):
No idea. Got that so far once.
This was a kernel running, for a while, after a "thaw" boot when I was experimenting with "hibernate" trying to get it to work (that can be coaxed to work but, together with suspend, dismally fails in a default configuration - bug 697150).
Got that surprise shortly after upgrading to Fedora 16. Before that this machine was running Fedora 14 and I do not recall any similar incident then.
Created attachment 540124 [details]
dmesg from a nrmal boot on the same hardware with 3.1.2-1.fc16.x86_64 kernel
We have hibernate memory corruption problems with i915 driver on RHEL6 (bug 746169 ). I did't try to reproduce problem on upstream/fedora kernel, but I think this could be the same issue. As workaround you can use i915.modeset=0, or not use hibernate, I think suspend-to-ram is not affected.
(In reply to comment #2)
> We have hibernate memory corruption problems with i915 driver on RHEL6 (bug
> 746169 ). I did't try to reproduce problem on upstream/fedora kernel, but I
> think this could be the same issue.
Hm, maybe. But on RHEL6 these are 2.6.32-... kernels. Right? I went on this laptop through the whole series of Fedora 14 2.6.35... kernels and this is the first time I was hit by something like that. Maybe I was just lucky?
> As workaround you can use i915.modeset=0,
> or not use hibernate, I think suspend-to-ram is not affected.
That laptop has also a built-in Nvidia video but nouveau is refusing to find any displays and so far I did not bother with Nvidia binary-only stuff.
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
(In reply to comment #6)
> kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
> Please retest with this update.
(Oh, one "please" instead of three would be enough :-)
I tried few times "hybernate-thaw" cycle with 3.3.0-4.fc16 kernel on the same laptop as before and so far nothing bad happened. OTOH this is not an error which was showing up with an absolute consistency. If you think that there are good reasons for this bug to be gone then close this report and I will hope not to have reasons to reopen. :-)
the i915 bug is still unfixed, so it's likely the problem is still lurking, you just got lucky so far.
[Mass hibernate bug update]
Dave Airlied has found an issue causing some corruption in the i915 fbdev after a resume from hibernate. I have included his patch in this scratch build:
This will probably not solve all of the issues being tracked at the moment, but it is worth testing when the build completes. If this seems to clear up the issues you see with hibernate, please report your results in the bug.
(In reply to comment #9)
> If this seems to clear up the
> issues you see with hibernate, please report your results in the bug.
There is, unfortunately, a catch. The problem I observed happens only sporadically and unpredictably. So, like noted in comment #7, kernel 3.3.0-4.fc16 also appeared to "clear" that bug but Dave Jones said in comment 8 that this was only a lucky illusion.
Still I will try that build you mention but how do I know that I am not "lucky" again until I triggered the issue?
I went now through six "hibernate-thaw" cycles with kernel-3.3.0-7.1.fc16, one after another but interspersed with small bits of activity on a desktop, and so far nothing bad happened. I will leave to others how to interpret this observation. :-)