Bug 649871 - Fedora 13 kernel crashes with echo 3 > /proc/sys/vm/drop_caches
Summary: Fedora 13 kernel crashes with echo 3 > /proc/sys/vm/drop_caches
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-04 17:36 UTC by Lucas Meneghel Rodrigues
Modified: 2015-10-18 22:41 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-28 10:51:25 UTC


Attachments (Terms of Use)
line-fixed patch from upstream (2.20 KB, text/plain)
2011-02-20 03:52 UTC, Veaceslav Falico
no flags Details

Description Lucas Meneghel Rodrigues 2010-11-04 17:36:37 UTC
Description of problem: Fedora 13 kernel crashes with echo 3 > /proc/sys/vm/drop_caches

Version-Release number of selected component (if applicable): 2.6.35.6-44.fc14.x86_64

How reproducible: Allways


Steps to Reproduce:
1. echo 3 > /proc/sys/vm/drop_caches
  
Actual results: Freeze Fedora machine


Expected results: Drop memory caches

Comment 1 Jon Masters 2010-11-18 08:36:56 UTC
Ok. So it shouldn't crash. But the only valid reason to drop caches would be in testing anyway. If you're not doing some testing, and somehow read online that this was a good idea, it isn't in general a good idea at all.

Comment 2 Lucas Meneghel Rodrigues 2010-11-18 10:49:43 UTC
Test automation is my full time job, and I triggered this problem while working with autotest on my laptop. Autotest drops caches between every test execution/benchmark by default, so you can imagine for people like me, that writes tests every day, and does fresh clones of autotest and forgets to disable drop caches once in a while, this is *really* annoying.

Comment 3 Jon Masters 2010-11-18 20:02:06 UTC
I see that, thanks for the clarification. I understand that testing is going to hit this regularly...and it does need looking at. Have you compared with F14, etc.?

Comment 4 Veaceslav Falico 2011-02-20 03:50:02 UTC
Hi,

Seen the same situation here. Was able to produce the vmcore and analyse it. I'm on T500 with i915, so maybe it's the same case.

When I do echo [23] > /proc/sys/vm/drop_caches, the Xes freeze (however, the wireless led is still blinking). 

kernel version 2.6.34.7-66.fc13.x86_64

The patch from http://www.spinics.net/lists/stable-commits/msg09016.html fixes it.

from the vmcore (in short):
there are two processes in UN state:

PID: 173    TASK: ffff8800378d9770  CPU: 1   COMMAND: "i915"
 #0 [ffff880073417c60] schedule at ffffffff8144b068
 #1 [ffff880073417d18] __mutex_lock_common at ffffffff8144b9ec
 #2 [ffff880073417da8] __mutex_lock_slowpath at ffffffff8144ba6c
 #3 [ffff880073417db8] mutex_lock at ffffffff8144bb8c
 #4 [ffff880073417de8] intel_idle_update at ffffffffa008b9be
 #5 [ffff880073417e38] worker_thread at ffffffff810622e5
 #6 [ffff880073417ee8] kthread at ffffffff81065cb9
 #7 [ffff880073417f48] kernel_thread_helper at ffffffff8100aa64

and

PID: 1846   TASK: ffff880073e58000  CPU: 0   COMMAND: "Xorg"
 #0 [ffff880075cd5b98] schedule at ffffffff8144b068
 #1 [ffff880075cd5c50] __mutex_lock_common at ffffffff8144b9ec
 #2 [ffff880075cd5ce0] __mutex_lock_slowpath at ffffffff8144ba6c
 #3 [ffff880075cd5cf0] mutex_lock at ffffffff8144bb8c
 #4 [ffff880075cd5d20] i915_gem_throttle_ioctl at ffffffffa007e35e
 #5 [ffff880075cd5d60] drm_ioctl at ffffffffa002c5f6
 #6 [ffff880075cd5e70] vfs_ioctl at ffffffff8111aa2f
 #7 [ffff880075cd5ea0] do_vfs_ioctl at ffffffff8111afa2
 #8 [ffff880075cd5f30] sys_ioctl at ffffffff8111b03e
 #9 [ffff880075cd5f80] system_call_fastpath at ffffffff81009c72
    RIP: 0000003d21cd8ae7  RSP: 00007fff5dc09988  RFLAGS: 00013246
    RAX: 0000000000000010  RBX: ffffffff81009c72  RCX: 00007f43a1745020
    RDX: 0000000000000000  RSI: 0000000000006458  RDI: 0000000000000008
    RBP: 0000000000006458   R8: 0000000000000001   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000003246  R12: 0000000001e74720
    R13: 00000000007e2b00  R14: 0000000000000008  R15: 0000000000000000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

both are sleeping trying to acquire the same mutex (disasm proves):

Xes:

4482 static void intel_idle_update(struct work_struct *work)
4483 {   
4484         drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_     t,
4485                                                     idle_work);
4486         struct drm_device *dev = dev_priv->dev;
4487         struct drm_crtc *crtc;
4488         struct intel_crtc *intel_crtc;
4489         
4490         if (!i915_powersave)
4491                 return;
4492 
4493         mutex_lock(&dev->struct_mutex);

and i915:

4367 int
4368 i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
4369                         struct drm_file *file_priv)
4370 {
4371     return i915_gem_ring_throttle(dev, file_priv);
4372 }

3504 static int
3505 i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file_priv)
3506 {
3507         struct drm_i915_file_private *i915_file_priv = file_priv->driver_pr     iv;
3508         int ret = 0;
3509         unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
3510 
3511         mutex_lock(&dev->struct_mutex);

So that's why we have a steady picture on the screen. Checking further I've found that the bash process who wrote to proc was also somewhere there, in the GEMs of i915:

PID: 2932   TASK: ffff88003ced9770  CPU: 1   COMMAND: "bash"
 #0 [ffff88007590dc38] schedule at ffffffff8144b068
 #1 [ffff88007590dcf0] i915_do_wait_request at ffffffffa007e1ba
 #2 [ffff88007590dd70] i915_gpu_idle at ffffffffa007e805
 #3 [ffff88007590dda0] i915_gem_shrink at ffffffffa007fc2b
 #4 [ffff88007590de00] shrink_slab at ffffffff810d5850
 #5 [ffff88007590de50] drop_caches_sysctl_handler at ffffffff8112a958
 #6 [ffff88007590de90] proc_sys_call_handler at ffffffff8115d38f
 #7 [ffff88007590def0] proc_sys_write at ffffffff8115d3cd
 #8 [ffff88007590df00] vfs_write at ffffffff8110e022
 #9 [ffff88007590df40] sys_write at ffffffff8110e13f
#10 [ffff88007590df80] system_call_fastpath at ffffffff81009c72
    RIP: 0000003d21cd35f0  RSP: 00007fff483ce540  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffffffff81009c72  RCX: 00000000012011b0
    RDX: 0000000000000002  RSI: 00007f662e34f000  RDI: 0000000000000001
    RBP: 00007f662e34f000   R8: 000000000000000a   R9: 00007f662e339700
    R10: 00000000ffffffff  R11: 0000000000000246  R12: 0000003d21f787a0
    R13: 0000000000000002  R14: 0000000000000002  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

i915_gem_shrink gets the mutex, and then does some magic with freeing the unused GEM (as far as I understand, this code is quite new for me), which gives us a hint to a classical deadlock, one of the possible scenarios of which are describes in the upstream patch.

I've applied the upstream patch to my kernel and it works as expected, i.e. no hangs, and the caches are dropped.

Hope that helps.

Comment 5 Veaceslav Falico 2011-02-20 03:52:02 UTC
Created attachment 479737 [details]
line-fixed patch from upstream

patch from my fedora test kernel, should apply cleanly to linux-2.6.34.7-66.fc13

Comment 6 Chuck Ebbert 2011-02-22 16:01:56 UTC
Comment on attachment 479737 [details]
line-fixed patch from upstream

Never ever check the "patch" box when submitting patches in bugzilla. It can't cope at all with patches that change more than one file, and randomly puts all the changes in one file when you view the patch.

Comment 7 Chuck Ebbert 2011-02-24 19:11:30 UTC
Patch applied.

Comment 8 Bug Zapper 2011-05-30 14:12:34 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Bug Zapper 2011-06-28 10:51:25 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.