Bug 124260

Summary:

process hanged with access to procfs

Product:

Red Hat Enterprise Linux 3

Reporter:

Alex Lyashkov <shadow>

Component:

kernel

Assignee:

Dave Anderson <anderson>

Status:

CLOSED WONTFIX

QA Contact:

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.0

CC:

bmg300, managed, petrides, say

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-10-19 19:25:42 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
messages + tasks track for bug 124260	none
new backtrace for hang.	none
hang backtrace	none

Description Alex Lyashkov 2004-05-25 09:06:06 UTC

Description of problem:
process hanged with access to procfs at at athlon with 1G RAM at
compile kernel and work with cvs and XFREE

Version-Release number of selected component (if applicable):
kernel - 2.4.21-15EL
glibc-2.3.2-95.20

How reproducible:
rarely

Steps to Reproduce:
1. at one console make oldconfig dep bzImage
2. at second console start cvs tag / dif / tag -d with kernel tree
3. at third do ps ax
  
Actual results:
ps / top hanged

Expected results:
work ok.

Additional info:

Comment 1 Alex Lyashkov 2004-05-25 09:07:42 UTC

Created attachment 100532 [details]
messages + tasks track for bug 124260

messages + tasks track for bug 124260

Comment 2 Rik van Riel 2004-05-25 12:41:31 UTC

Can you reproduce this bug without vmware ?

Comment 3 Alex Lyashkov 2004-05-25 14:53:12 UTC

It not at VmWare is at my real workstation.
At this workstation running vmware but locked not vmware VM - a real host.

Comment 4 Rik van Riel 2004-05-25 14:57:45 UTC

OK, can you reproduce this bug without the vmware (vmmon, vmnet, etc)
kernel modules loaded?

If you need to have vmware loaded, the vmware developers will debug
this problem.

If you can reproduce the problem on a system where the vmware modules
aren't loaded (and haven't been loaded since booting) it is a problem
for Red Hat to fix.

Comment 5 Alex Lyashkov 2004-05-31 14:59:00 UTC

hm.. after removing vmware mogules i do not have system hangs.
Okey, I inform vmware developers about this bug.

Comment 6 Alex Lyashkov 2004-06-22 13:15:10 UTC

Hm.. After month have hang again.
vmware mogules not loaded. at this time run cvs/cvsup for create my
snapshot and running ps ax at second kde console.

Comment 7 Alex Lyashkov 2004-06-22 13:19:09 UTC

Created attachment 101332 [details]
new backtrace for hang.

My athlon XP 2000+ hanged again.

Comment 8 Ernie Petrides 2004-06-23 05:21:45 UTC

Hello, Alex.  The attachment in comment #7 seems to be encoded
strangely, or at the least, it's not viewable or downloadable
through Bugzilla.  Could you please attach plain text?

Thanks in advance.  -ernie

(assigning to Rik in the meantime -- feel free to reassign)

Comment 9 Alex Lyashkov 2004-06-23 05:52:31 UTC

Created attachment 101350 [details]
hang backtrace

It log is long (~140k) for attach as plain text. It`s my bug, I don`t check
content type while uploading log. now i select "auto-detect" type and try
upload all.log.bz2.

Comment 10 Rik van Riel 2004-06-23 11:28:50 UTC

OK, looks like 2 processes stuck on the mm->mmap_sem, one taking a
pagefault (or mmapping things?) and the other one accessing the /proc
info for the first one.

Also, the radeon drm driver appears to have a bug.  We'll need to look
into that too...

Jun 22 06:19:28 berloga kernel: [drm] Initialized radeon 1.7.0
20020828 on minor 0
Jun 22 06:19:28 berloga kernel: [drm:radeon_unlock] *ERROR* Process
7732 using kernel context 0
Jun 22 06:19:28 berloga kernel: [drm:radeon_ioremapfree:mappings]
*ERROR* Attempt to free NULL pointer
Jun 22 06:19:28 berloga kernel: [drm:radeon_ioremapfree:mappings]
*ERROR* Excess frees: 1 frees, 0 allocs

Comment 11 Alex Lyashkov 2004-06-23 12:59:53 UTC

but hanged any process accessed to procfs, all other processes worked
correctly.

Comment 12 Rik van Riel 2004-06-23 13:15:06 UTC

Yes, but access to procfs files in /proc/<pid>/ need the exact same
lock that the pagefault path takes ...

Comment 13 Alex Lyashkov 2004-06-23 16:10:57 UTC

For me - it`s race between kmem_cache_alloc (alloc_inode) and
kmem_cache_free (__pte_chain_free) posible with access to high memory
areas. 
And one note - this is bug was detected after I add to box more 1G ram.

Comment 14 Rik van Riel 2004-06-23 16:25:39 UTC

I'm not sure why you think that, since kmem_cache_alloc and
kmem_cache_free never access highmem ...

Comment 15 Alex Lyashkov 2004-06-23 16:35:00 UTC

otherwise I not have opinion why this bug not show where box have only
512M RAM.

Comment 16 Rik van Riel 2004-06-23 17:03:13 UTC

It is possible that the radeon drm driver's double free upset the VM.
If it is easy to reproduce this bug, could you reproduce this bug
without the radeon drm driver ? ;)

Lets narrow this thing down so we can fix it more easily.

Comment 17 Alex Lyashkov 2004-06-23 17:10:18 UTC

It not easy but, and I don`t know how bug in driver can be deadlocked
slab subsystem? I think at slab subsystem has bug who triggered with
radeon driver and vmware (with vmware i have one-two hangs per day).
But Radeon is supported RH driver?

Comment 18 Alex Lyashkov 2004-06-28 05:39:52 UTC

few time ago Brian <bmg300> posted to linux-kernel@ problem
report with simmular hangs.
----------------------------
Hello list,
While doing massive memory allocation (I'm using GRASS to project
NASA's BlueMarble maps) thekernel apparently tries to kill grass but
fails. When I try to access /proc/<grass_pid>/stat theprocess
hangs.For example, an 'strace' of 'ps' ends like this:
open("/proc/1783/stat", O_RDONLY) = 6
read(6, <PS and strace hang here>
I am able to project a few files, but once the filesystem cache fills
up, GRASS hangs or gives apanic in vm_stat:381. 
The strange thing is, very little swap space is in use, and the
filesystemcache continues to use most of the RAM.Is this a kernel bug,
or do I need to use kernel 2.6.x (I am using kernel 2.4.26)
and/proc/sys/vm/overcommit_memory or similar hack?
-----------------
What you comments about it ?

Comment 20 Brian 2004-08-08 03:26:50 UTC

This is Brian.

The bug I posted to linux-kernel was caused by CPU overheating. Check
your CPU temp, you might have the same problem.

Comment 21 Anchor Systems Managed Hosting 2005-03-26 19:40:59 UTC

I also see this bug on a highly used production server (maybe once every month
 or two):
2.4.21-27.0.2.ELsmp

Running strace ps -U apache

...
stat64("/proc/454", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/454/stat", O_RDONLY)        = 7
read(7,

(and it just hangs)

cd /proc/454; ls -1
works

cd /proc/454;strace ls -l gives:

...
lstat64("fd", {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
getxattr("fd", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation
not supported)
lstat64("environ", {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
getxattr("environ", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP
(Operation not supported)
lstat64("status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
getxattr("status", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP
(Operation not supported)
lstat64("cmdline", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
getxattr("cmdline", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP
(Operation not supported)
lstat64("stat", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
getxattr("stat", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation
not supported)
lstat64("statm", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
getxattr("statm", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP
(Operation not supported)
lstat64("cpu", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
getxattr("cpu", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation
not supported)
lstat64("maps", {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
getxattr("maps", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation
not supported)
lstat64("mem", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
getxattr("mem", "system.posix_acl_access", (nil), 0) = -1 EOPNOTSUPP (Operation
not supported)
lstat64("cwd", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0
readlink("cwd", "/", 128)               = 1
lstat64("root", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0
readlink("root", "/", 128)              = 1
lstat64("exe", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0
readlink("exe",

(and it hangs)

Server does not have X11 server running, nor vmware. 4 GiB RAM. Dmesg shows
no relevant messages.

Comment 22 RHEL Program Management 2007-10-19 19:25:42 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.