Description of problem: Fedora 13 kernel crashes with echo 3 > /proc/sys/vm/drop_caches Version-Release number of selected component (if applicable): 2.6.35.6-44.fc14.x86_64 How reproducible: Allways Steps to Reproduce: 1. echo 3 > /proc/sys/vm/drop_caches Actual results: Freeze Fedora machine Expected results: Drop memory caches
Ok. So it shouldn't crash. But the only valid reason to drop caches would be in testing anyway. If you're not doing some testing, and somehow read online that this was a good idea, it isn't in general a good idea at all.
Test automation is my full time job, and I triggered this problem while working with autotest on my laptop. Autotest drops caches between every test execution/benchmark by default, so you can imagine for people like me, that writes tests every day, and does fresh clones of autotest and forgets to disable drop caches once in a while, this is *really* annoying.
I see that, thanks for the clarification. I understand that testing is going to hit this regularly...and it does need looking at. Have you compared with F14, etc.?
Hi, Seen the same situation here. Was able to produce the vmcore and analyse it. I'm on T500 with i915, so maybe it's the same case. When I do echo [23] > /proc/sys/vm/drop_caches, the Xes freeze (however, the wireless led is still blinking). kernel version 2.6.34.7-66.fc13.x86_64 The patch from http://www.spinics.net/lists/stable-commits/msg09016.html fixes it. from the vmcore (in short): there are two processes in UN state: PID: 173 TASK: ffff8800378d9770 CPU: 1 COMMAND: "i915" #0 [ffff880073417c60] schedule at ffffffff8144b068 #1 [ffff880073417d18] __mutex_lock_common at ffffffff8144b9ec #2 [ffff880073417da8] __mutex_lock_slowpath at ffffffff8144ba6c #3 [ffff880073417db8] mutex_lock at ffffffff8144bb8c #4 [ffff880073417de8] intel_idle_update at ffffffffa008b9be #5 [ffff880073417e38] worker_thread at ffffffff810622e5 #6 [ffff880073417ee8] kthread at ffffffff81065cb9 #7 [ffff880073417f48] kernel_thread_helper at ffffffff8100aa64 and PID: 1846 TASK: ffff880073e58000 CPU: 0 COMMAND: "Xorg" #0 [ffff880075cd5b98] schedule at ffffffff8144b068 #1 [ffff880075cd5c50] __mutex_lock_common at ffffffff8144b9ec #2 [ffff880075cd5ce0] __mutex_lock_slowpath at ffffffff8144ba6c #3 [ffff880075cd5cf0] mutex_lock at ffffffff8144bb8c #4 [ffff880075cd5d20] i915_gem_throttle_ioctl at ffffffffa007e35e #5 [ffff880075cd5d60] drm_ioctl at ffffffffa002c5f6 #6 [ffff880075cd5e70] vfs_ioctl at ffffffff8111aa2f #7 [ffff880075cd5ea0] do_vfs_ioctl at ffffffff8111afa2 #8 [ffff880075cd5f30] sys_ioctl at ffffffff8111b03e #9 [ffff880075cd5f80] system_call_fastpath at ffffffff81009c72 RIP: 0000003d21cd8ae7 RSP: 00007fff5dc09988 RFLAGS: 00013246 RAX: 0000000000000010 RBX: ffffffff81009c72 RCX: 00007f43a1745020 RDX: 0000000000000000 RSI: 0000000000006458 RDI: 0000000000000008 RBP: 0000000000006458 R8: 0000000000000001 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000003246 R12: 0000000001e74720 R13: 00000000007e2b00 R14: 0000000000000008 R15: 0000000000000000 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b both are sleeping trying to acquire the same mutex (disasm proves): Xes: 4482 static void intel_idle_update(struct work_struct *work) 4483 { 4484 drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_ t, 4485 idle_work); 4486 struct drm_device *dev = dev_priv->dev; 4487 struct drm_crtc *crtc; 4488 struct intel_crtc *intel_crtc; 4489 4490 if (!i915_powersave) 4491 return; 4492 4493 mutex_lock(&dev->struct_mutex); and i915: 4367 int 4368 i915_gem_throttle_ioctl(struct drm_device *dev, void *data, 4369 struct drm_file *file_priv) 4370 { 4371 return i915_gem_ring_throttle(dev, file_priv); 4372 } 3504 static int 3505 i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file_priv) 3506 { 3507 struct drm_i915_file_private *i915_file_priv = file_priv->driver_pr iv; 3508 int ret = 0; 3509 unsigned long recent_enough = jiffies - msecs_to_jiffies(20); 3510 3511 mutex_lock(&dev->struct_mutex); So that's why we have a steady picture on the screen. Checking further I've found that the bash process who wrote to proc was also somewhere there, in the GEMs of i915: PID: 2932 TASK: ffff88003ced9770 CPU: 1 COMMAND: "bash" #0 [ffff88007590dc38] schedule at ffffffff8144b068 #1 [ffff88007590dcf0] i915_do_wait_request at ffffffffa007e1ba #2 [ffff88007590dd70] i915_gpu_idle at ffffffffa007e805 #3 [ffff88007590dda0] i915_gem_shrink at ffffffffa007fc2b #4 [ffff88007590de00] shrink_slab at ffffffff810d5850 #5 [ffff88007590de50] drop_caches_sysctl_handler at ffffffff8112a958 #6 [ffff88007590de90] proc_sys_call_handler at ffffffff8115d38f #7 [ffff88007590def0] proc_sys_write at ffffffff8115d3cd #8 [ffff88007590df00] vfs_write at ffffffff8110e022 #9 [ffff88007590df40] sys_write at ffffffff8110e13f #10 [ffff88007590df80] system_call_fastpath at ffffffff81009c72 RIP: 0000003d21cd35f0 RSP: 00007fff483ce540 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff81009c72 RCX: 00000000012011b0 RDX: 0000000000000002 RSI: 00007f662e34f000 RDI: 0000000000000001 RBP: 00007f662e34f000 R8: 000000000000000a R9: 00007f662e339700 R10: 00000000ffffffff R11: 0000000000000246 R12: 0000003d21f787a0 R13: 0000000000000002 R14: 0000000000000002 R15: 0000000000000002 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b i915_gem_shrink gets the mutex, and then does some magic with freeing the unused GEM (as far as I understand, this code is quite new for me), which gives us a hint to a classical deadlock, one of the possible scenarios of which are describes in the upstream patch. I've applied the upstream patch to my kernel and it works as expected, i.e. no hangs, and the caches are dropped. Hope that helps.
Created attachment 479737 [details] line-fixed patch from upstream patch from my fedora test kernel, should apply cleanly to linux-2.6.34.7-66.fc13
Comment on attachment 479737 [details] line-fixed patch from upstream Never ever check the "patch" box when submitting patches in bugzilla. It can't cope at all with patches that change more than one file, and randomly puts all the changes in one file when you view the patch.
Patch applied.
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.