Description of problem: Fedora 13 kernel crashes with echo 3 > /proc/sys/vm/drop_caches
Version-Release number of selected component (if applicable): 220.127.116.11-44.fc14.x86_64
How reproducible: Allways
Steps to Reproduce:
1. echo 3 > /proc/sys/vm/drop_caches
Actual results: Freeze Fedora machine
Expected results: Drop memory caches
Ok. So it shouldn't crash. But the only valid reason to drop caches would be in testing anyway. If you're not doing some testing, and somehow read online that this was a good idea, it isn't in general a good idea at all.
Test automation is my full time job, and I triggered this problem while working with autotest on my laptop. Autotest drops caches between every test execution/benchmark by default, so you can imagine for people like me, that writes tests every day, and does fresh clones of autotest and forgets to disable drop caches once in a while, this is *really* annoying.
I see that, thanks for the clarification. I understand that testing is going to hit this regularly...and it does need looking at. Have you compared with F14, etc.?
Seen the same situation here. Was able to produce the vmcore and analyse it. I'm on T500 with i915, so maybe it's the same case.
When I do echo  > /proc/sys/vm/drop_caches, the Xes freeze (however, the wireless led is still blinking).
kernel version 18.104.22.168-66.fc13.x86_64
The patch from http://www.spinics.net/lists/stable-commits/msg09016.html fixes it.
from the vmcore (in short):
there are two processes in UN state:
PID: 173 TASK: ffff8800378d9770 CPU: 1 COMMAND: "i915"
#0 [ffff880073417c60] schedule at ffffffff8144b068
#1 [ffff880073417d18] __mutex_lock_common at ffffffff8144b9ec
#2 [ffff880073417da8] __mutex_lock_slowpath at ffffffff8144ba6c
#3 [ffff880073417db8] mutex_lock at ffffffff8144bb8c
#4 [ffff880073417de8] intel_idle_update at ffffffffa008b9be
#5 [ffff880073417e38] worker_thread at ffffffff810622e5
#6 [ffff880073417ee8] kthread at ffffffff81065cb9
#7 [ffff880073417f48] kernel_thread_helper at ffffffff8100aa64
PID: 1846 TASK: ffff880073e58000 CPU: 0 COMMAND: "Xorg"
#0 [ffff880075cd5b98] schedule at ffffffff8144b068
#1 [ffff880075cd5c50] __mutex_lock_common at ffffffff8144b9ec
#2 [ffff880075cd5ce0] __mutex_lock_slowpath at ffffffff8144ba6c
#3 [ffff880075cd5cf0] mutex_lock at ffffffff8144bb8c
#4 [ffff880075cd5d20] i915_gem_throttle_ioctl at ffffffffa007e35e
#5 [ffff880075cd5d60] drm_ioctl at ffffffffa002c5f6
#6 [ffff880075cd5e70] vfs_ioctl at ffffffff8111aa2f
#7 [ffff880075cd5ea0] do_vfs_ioctl at ffffffff8111afa2
#8 [ffff880075cd5f30] sys_ioctl at ffffffff8111b03e
#9 [ffff880075cd5f80] system_call_fastpath at ffffffff81009c72
RIP: 0000003d21cd8ae7 RSP: 00007fff5dc09988 RFLAGS: 00013246
RAX: 0000000000000010 RBX: ffffffff81009c72 RCX: 00007f43a1745020
RDX: 0000000000000000 RSI: 0000000000006458 RDI: 0000000000000008
RBP: 0000000000006458 R8: 0000000000000001 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000003246 R12: 0000000001e74720
R13: 00000000007e2b00 R14: 0000000000000008 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
both are sleeping trying to acquire the same mutex (disasm proves):
4482 static void intel_idle_update(struct work_struct *work)
4484 drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_ t,
4486 struct drm_device *dev = dev_priv->dev;
4487 struct drm_crtc *crtc;
4488 struct intel_crtc *intel_crtc;
4490 if (!i915_powersave)
4368 i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
4369 struct drm_file *file_priv)
4371 return i915_gem_ring_throttle(dev, file_priv);
3504 static int
3505 i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file_priv)
3507 struct drm_i915_file_private *i915_file_priv = file_priv->driver_pr iv;
3508 int ret = 0;
3509 unsigned long recent_enough = jiffies - msecs_to_jiffies(20);
So that's why we have a steady picture on the screen. Checking further I've found that the bash process who wrote to proc was also somewhere there, in the GEMs of i915:
PID: 2932 TASK: ffff88003ced9770 CPU: 1 COMMAND: "bash"
#0 [ffff88007590dc38] schedule at ffffffff8144b068
#1 [ffff88007590dcf0] i915_do_wait_request at ffffffffa007e1ba
#2 [ffff88007590dd70] i915_gpu_idle at ffffffffa007e805
#3 [ffff88007590dda0] i915_gem_shrink at ffffffffa007fc2b
#4 [ffff88007590de00] shrink_slab at ffffffff810d5850
#5 [ffff88007590de50] drop_caches_sysctl_handler at ffffffff8112a958
#6 [ffff88007590de90] proc_sys_call_handler at ffffffff8115d38f
#7 [ffff88007590def0] proc_sys_write at ffffffff8115d3cd
#8 [ffff88007590df00] vfs_write at ffffffff8110e022
#9 [ffff88007590df40] sys_write at ffffffff8110e13f
#10 [ffff88007590df80] system_call_fastpath at ffffffff81009c72
RIP: 0000003d21cd35f0 RSP: 00007fff483ce540 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffffff81009c72 RCX: 00000000012011b0
RDX: 0000000000000002 RSI: 00007f662e34f000 RDI: 0000000000000001
RBP: 00007f662e34f000 R8: 000000000000000a R9: 00007f662e339700
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000003d21f787a0
R13: 0000000000000002 R14: 0000000000000002 R15: 0000000000000002
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
i915_gem_shrink gets the mutex, and then does some magic with freeing the unused GEM (as far as I understand, this code is quite new for me), which gives us a hint to a classical deadlock, one of the possible scenarios of which are describes in the upstream patch.
I've applied the upstream patch to my kernel and it works as expected, i.e. no hangs, and the caches are dropped.
Hope that helps.
Created attachment 479737 [details]
line-fixed patch from upstream
patch from my fedora test kernel, should apply cleanly to linux-22.214.171.124-66.fc13
Comment on attachment 479737 [details]
line-fixed patch from upstream
Never ever check the "patch" box when submitting patches in bugzilla. It can't cope at all with patches that change more than one file, and randomly puts all the changes in one file when you view the patch.
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '13'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 13's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 13 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.