Bug 701857 - hibernate cause kernel panic
Summary: hibernate cause kernel panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Stanislaw Gruszka
QA Contact: Guangze Bai
URL:
Whiteboard:
: 698061 (view as bug list)
Depends On:
Blocks: 702988 746169 748554
TreeView+ depends on / blocked
 
Reported: 2011-05-04 04:55 UTC by Caspar Zhang
Modified: 2015-02-08 21:36 UTC (History)
9 users (show)

Fixed In Version: kernel-2.6.32-211.el6
Doc Type: Bug Fix
Doc Text:
Cause Try to hibernate for certain laptops including Lenovo T400 and X200. Consequence Kernel could panic occasionally.
Clone Of:
: 746169 (view as bug list)
Environment:
Last Closed: 2011-12-06 13:21:44 UTC
Target Upstream Version:


Attachments (Terms of Use)
checkmem.c (1.63 KB, text/plain)
2011-10-14 07:44 UTC, Stanislaw Gruszka
no flags Details
test_hib.sh (789 bytes, text/plain)
2011-10-14 07:49 UTC, Stanislaw Gruszka
no flags Details
0001-PM-Hibernate-Fix-memory-corruption-related-to-swap.patch (7.22 KB, text/plain)
2011-10-14 08:18 UTC, Stanislaw Gruszka
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Caspar Zhang 2011-05-04 04:55:32 UTC
Description of problem:

When testing v7's suspend/hibernate certification, system crashes on RHEL6. After investigation, found that pm-hibernate/pm-suspend may cause this panic. Hardware is Laptop T400 and X200.

Version-Release number of selected component (if applicable):
kernel-2.6.32-71.el6, -131.0.10.el6, -131.0.13.el6

How reproducible:
very often

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:
Will provide more info soon.

Comment 2 RHEL Product and Program Management 2011-05-04 06:01:24 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Caspar Zhang 2011-05-04 09:31:52 UTC
seems similar as bug 613493

Comment 5 Caspar Zhang 2011-05-04 11:35:49 UTC
*** Bug 698061 has been marked as a duplicate of this bug. ***

Comment 6 Stanislaw Gruszka 2011-05-09 20:21:10 UTC
Can you check this upstream commit 2e725a065b0153f0c449318da1923a120477633d
"PM / Hibernate: Return error code when alloc_image_page() fails" ?
It could help with fist trace, where we are trying to free bits which are not allocated (in such out of memory case hibernate will just fail, instead of panic). No idea about second trace.

Comment 8 Qian Cai 2011-05-17 07:41:43 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    Try to hibernate for certain laptops including Lenovo T400 
    and X200.
Consequence
    Kernel could panic occasionally.

Comment 20 Stanislaw Gruszka 2011-09-08 08:58:42 UTC
That's what I get so far:
> =============================================================================
> BUG kmalloc-128 (Not tainted): Redzone overwritten
> -----------------------------------------------------------------------------
> 
> INFO: 0xffff880036dcd210-0xffff880036dcd217. First byte 0x0 instead of 0xcc
> INFO: Allocated in alloc_vmap_area+0x57/0x380 age=52867 cpu=1 pid=2472
> INFO: Freed in i915_gem_execbuffer2+0xe8/0x210 [i915] age=52869 cpu=1 pid=2472
> INFO: Slab 0xffffea0000c004d8 objects=20 used=17 fp=0xffff880036dcd960 flags=0x20000000000083
> INFO: Object 0xffff880036dcd190 @offset=400 fp=0x(null)
> 
> Bytes b4 0xffff880036dcd180:  75 f8 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a u<F8><FF><FF>....ZZZZZZZZ
>   Object 0xffff880036dcd190:  00 70 91 05 00 c9 ff ff 00 c0 91 05 00 c9 ff ff .p...<C9><FF><FF>.<C0>...<C9><FF><FF>
>   Object 0xffff880036dcd1a0:  06 00 00 00 00 00 00 00 a9 d1 dc 36 00 88 ff ff ........<A9><D1><DC>6..<FF><FF>
>   Object 0xffff880036dcd1b0:  d0 db dc 36 00 88 ff ff 70 22 4a 57 00 88 ff ff <D0><DB><DC>6..<FF><FF>p"JW..<FF><FF>
>   Object 0xffff880036dcd1c0:  c8 d8 dc 36 00 88 ff ff 00 02 20 00 00 00 ad de <C8><D8><DC>6..<FF><FF>......<AD><DE>
>   Object 0xffff880036dcd1d0:  d8 d8 dc 36 00 88 ff ff a0 d9 dc 36 00 88 ff ff <D8><D8><DC>6..<FF><FF>.<D9><DC>6..<FF><FF>
>   Object 0xffff880036dcd1e0:  f8 d7 57 76 00 88 ff ff f0 d8 dc 36 00 88 ff ff <F8><D7>Wv..<FF><FF><F0><D8><DC>6..<FF><FF>
>   Object 0xffff880036dcd1f0:  b0 cd 15 81 ff ff ff ff 6b 6b 6b 6b 6b 6b 6b 6b <B0><CD>..<FF><FF><FF><FF>kkkkkkkk
>   Object 0xffff880036dcd200:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>  Redzone 0xffff880036dcd210:  00 00 00 00 00 00 00 00                         ........        
>  Padding 0xffff880036dcd250:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
> Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.32 #3
> Call Trace:
>  <IRQ>  [<ffffffff81174f52>] ? print_trailer+0x102/0x170
>  [<ffffffff811755ae>] ? check_bytes_and_report+0xfe/0x140
>  [<ffffffff8115cdc2>] ? rcu_free_va+0x12/0x20
>  [<ffffffff8117782a>] ? check_object+0x6a/0x250
>  [<ffffffff8115cdc2>] ? rcu_free_va+0x12/0x20
>  [<ffffffff81178013>] ? __slab_free+0x1f3/0x320
>  [<ffffffff8115cdc2>] ? rcu_free_va+0x12/0x20
>  [<ffffffff811782ae>] ? kfree+0x16e/0x1d0
>  [<ffffffff8115cdc2>] ? rcu_free_va+0x12/0x20
>  [<ffffffff810f0b3d>] ? __rcu_process_callbacks+0x12d/0x3e0
>  [<ffffffff810f0e1b>] ? rcu_process_callbacks+0x2b/0x50
>  [<ffffffff81075bfd>] ? __do_softirq+0xdd/0x200
>  [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
>  <EOI>  [<ffffffff8100dfdd>] ? do_softirq+0xad/0xe0
>  [<ffffffff81075530>] ? ksoftirqd+0x80/0x120
>  [<ffffffff810754b0>] ? ksoftirqd+0x0/0x120
>  [<ffffffff81095a16>] ? kthread+0x96/0xa0
>  [<ffffffff8100c20a>] ? child_rip+0xa/0x20
>  [<ffffffff8100bb50>] ? restore_args+0x0/0x30
>  [<ffffffff81095980>] ? kthread+0x0/0xa0
>  [<ffffffff8100c200>] ? child_rip+0x0/0x20
> FIX kmalloc-128: Restoring 0xffff880036dcd210-0xffff880036dcd217=0xcc

What seems to blame i915 driver. Does other laptops T400 and X200 have also intel graphics hardware?

Comment 21 Caspar Zhang 2011-09-08 09:49:42 UTC
(In reply to comment #20)
> What seems to blame i915 driver. Does other laptops T400 and X200 have also
> intel graphics hardware?

Yes.

Comment 22 Stanislaw Gruszka 2011-10-12 11:36:50 UTC
There are at least two different issues here. One is swapping and hibernate races, other are related with graphics driver. I can fix the former, we have upstream patches for that. For graphics driver problem, I will open separate bug report/s.

Comment 23 Stanislaw Gruszka 2011-10-14 07:44:44 UTC
Created attachment 528161 [details]
checkmem.c

Simple program for check memory corruption in user space.

Comment 24 Stanislaw Gruszka 2011-10-14 07:49:19 UTC
Created attachment 528163 [details]
test_hib.sh

Script that can be used to reproduce that bug. It hibernate/reboot/resume in loop and check memory using previously attached program. When corruption is encountered it wil print error ans sleep forever, system will crach.

Comment 25 Stanislaw Gruszka 2011-10-14 08:18:00 UTC
Created attachment 528166 [details]
0001-PM-Hibernate-Fix-memory-corruption-related-to-swap.patch

Proposed fix.

Comment 28 Aristeu Rozanski 2011-10-19 15:29:22 UTC
Patch(es) available on kernel-2.6.32-211.el6

Comment 33 errata-xmlrpc 2011-12-06 13:21:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.