Bug 124260
Summary: | process hanged with access to procfs | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Alex Lyashkov <shadow> | ||||||||
Component: | kernel | Assignee: | Dave Anderson <anderson> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 3.0 | CC: | bmg300, managed, petrides, say | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-10-19 19:25:42 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Alex Lyashkov
2004-05-25 09:06:06 UTC
Created attachment 100532 [details] messages + tasks track for bug 124260 messages + tasks track for bug 124260 Can you reproduce this bug without vmware ? It not at VmWare is at my real workstation. At this workstation running vmware but locked not vmware VM - a real host. OK, can you reproduce this bug without the vmware (vmmon, vmnet, etc) kernel modules loaded? If you need to have vmware loaded, the vmware developers will debug this problem. If you can reproduce the problem on a system where the vmware modules aren't loaded (and haven't been loaded since booting) it is a problem for Red Hat to fix. hm.. after removing vmware mogules i do not have system hangs. Okey, I inform vmware developers about this bug. Hm.. After month have hang again. vmware mogules not loaded. at this time run cvs/cvsup for create my snapshot and running ps ax at second kde console. Created attachment 101332 [details]
new backtrace for hang.
My athlon XP 2000+ hanged again.
Hello, Alex. The attachment in comment #7 seems to be encoded strangely, or at the least, it's not viewable or downloadable through Bugzilla. Could you please attach plain text? Thanks in advance. -ernie (assigning to Rik in the meantime -- feel free to reassign) Created attachment 101350 [details] hang backtrace It log is long (~140k) for attach as plain text. It`s my bug, I don`t check content type while uploading log. now i select "auto-detect" type and try upload all.log.bz2. OK, looks like 2 processes stuck on the mm->mmap_sem, one taking a pagefault (or mmapping things?) and the other one accessing the /proc info for the first one. Also, the radeon drm driver appears to have a bug. We'll need to look into that too... Jun 22 06:19:28 berloga kernel: [drm] Initialized radeon 1.7.0 20020828 on minor 0 Jun 22 06:19:28 berloga kernel: [drm:radeon_unlock] *ERROR* Process 7732 using kernel context 0 Jun 22 06:19:28 berloga kernel: [drm:radeon_ioremapfree:mappings] *ERROR* Attempt to free NULL pointer Jun 22 06:19:28 berloga kernel: [drm:radeon_ioremapfree:mappings] *ERROR* Excess frees: 1 frees, 0 allocs but hanged any process accessed to procfs, all other processes worked correctly. Yes, but access to procfs files in /proc/<pid>/ need the exact same lock that the pagefault path takes ... For me - it`s race between kmem_cache_alloc (alloc_inode) and kmem_cache_free (__pte_chain_free) posible with access to high memory areas. And one note - this is bug was detected after I add to box more 1G ram. I'm not sure why you think that, since kmem_cache_alloc and kmem_cache_free never access highmem ... otherwise I not have opinion why this bug not show where box have only 512M RAM. It is possible that the radeon drm driver's double free upset the VM. If it is easy to reproduce this bug, could you reproduce this bug without the radeon drm driver ? ;) Lets narrow this thing down so we can fix it more easily. It not easy but, and I don`t know how bug in driver can be deadlocked slab subsystem? I think at slab subsystem has bug who triggered with radeon driver and vmware (with vmware i have one-two hangs per day). But Radeon is supported RH driver? few time ago Brian <bmg300> posted to linux-kernel@ problem report with simmular hangs. ---------------------------- Hello list, While doing massive memory allocation (I'm using GRASS to project NASA's BlueMarble maps) thekernel apparently tries to kill grass but fails. When I try to access /proc/<grass_pid>/stat theprocess hangs.For example, an 'strace' of 'ps' ends like this: open("/proc/1783/stat", O_RDONLY) = 6 read(6, <PS and strace hang here> I am able to project a few files, but once the filesystem cache fills up, GRASS hangs or gives apanic in vm_stat:381. The strange thing is, very little swap space is in use, and the filesystemcache continues to use most of the RAM.Is this a kernel bug, or do I need to use kernel 2.6.x (I am using kernel 2.4.26) and/proc/sys/vm/overcommit_memory or similar hack? ----------------- What you comments about it ? This is Brian. The bug I posted to linux-kernel was caused by CPU overheating. Check your CPU temp, you might have the same problem. I also see this bug on a highly used production server (maybe once every month or two): 2.4.21-27.0.2.ELsmp Running strace ps -U apache ... stat64("/proc/454", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/454/stat", O_RDONLY) = 7 read(7, (and it just hangs) cd /proc/454; ls -1 works cd /proc/454;strace ls -l gives: ... lstat64("fd", {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0 getxattr("fd", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("environ", {st_mode=S_IFREG|0400, st_size=0, ...}) = 0 getxattr("environ", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 getxattr("status", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("cmdline", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 getxattr("cmdline", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("stat", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 getxattr("stat", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("statm", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 getxattr("statm", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("cpu", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 getxattr("cpu", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("maps", {st_mode=S_IFREG|0400, st_size=0, ...}) = 0 getxattr("maps", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("mem", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 getxattr("mem", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation not supported) lstat64("cwd", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0 readlink("cwd", "/", 128) = 1 lstat64("root", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0 readlink("root", "/", 128) = 1 lstat64("exe", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0 readlink("exe", (and it hangs) Server does not have X11 server running, nor vmware. 4 GiB RAM. Dmesg shows no relevant messages. This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |