Bug 836095
| Summary: | for kernel-2.6.18-308.8.2.el5; The user-mode processes are waiting_uninterruptible in kernel-mode. can not reboot machine. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Mitz Amano <mitz.amano> |
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.10 | CC: | agordeev, mitz.amano |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-04-24 04:27:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Mitz Amano
2012-06-28 06:23:40 UTC
Created attachment 594942 [details]
after 1 hour, the task still waiting_uninterruptible
and can not kill them with -9 option;
when type reboot command, it will waiting for it.
Created attachment 594948 [details]
It is seems a dead lock between nfsd (PID 3486) and genload (PID 3517)
It is my current analyzing work, not the final result.
Currently I will report the current status, and still continue to analyze it.
the current status is (but maybe not correct):
1) genload call nfs_commit_inode to lock the NFS_INO_COMMIT;
A) in nfs_commit_inode, call nfs_commit_list
B) nfs_commit_list send rpc command to remote (it is a ASYNC task)
C) genload waiting for the task finished; and then release the lock;
2) nfsd (the nfs mount to local machine) received the task;
A) call svc_process -> nfsd_dispatch ...
B) at last when it call __alloc_pages which is a kernel common function
I) it call .... -> nfs_commit_inode -> nfs_commit_set_lock
II) nfsd is now waiting for NFS_INO_COMMIT;
3) after data analyzation:
A) nfsd and genload are work on the same inode;
B) the RPC task truly not released, it is ACTIVE
C) after kill -9, genload went out from rpc module but still waiting for task commit;
4) Current conclusion:
A) It is a dead lock with nfsd and genload;
5) What next to do
A) get all data flow between genload and nfsd;
B) if all the data truly proof the current conclusion; then we prove it is root cause.
for comment 2, the coredump is generated by merged version (not the pure version), but the result is the same (it is genload and nfsd waiting_uninterruptable instead of cp and genload) and please tell me how to find *debuginfo* rpm for this relative kernel, so I can download and repeat the issue, then analyze the coredump from pure version from Red Hat. Created attachment 594958 [details]
It is another coredump analyzing for merged version (also for nfs commit features )
Root cause: (all happened in fs/nfs sub system, wirte.c file)
a. when nfs_sync_inode_wait is called by a process (such as fsx-linux)
b. At the same time, nfs_commit_inode is called by another process (such as genload)
c. Deadlock occurs:
i. fsx-linux is waiting for the request finishing (which genload will do next);
ii. genload (and for the same to all another process which will sync commit all requests) is waiting for fsx-linux to release commit lock.
The more details to see the attachment.
Created attachment 594959 [details] This is the details data information for support comment #4 This is the details data information for support comment #4 It can be closed, just the same as Bug: 848706 *** This bug has been marked as a duplicate of bug 848706 *** |