Bug 1277254

Summary: /usr/libexec/abrt-hook-ccpp infinite loop
Product: Red Hat Enterprise Linux 6 Reporter: Andy Grimm <agrimm>
Component: abrtAssignee: abrt <abrt-devel-list>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.8CC: agrimm, jfilak, jgoulding
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-03 14:17:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Grimm 2015-11-02 19:56:27 UTC
Description of problem:

We recently found an abrt-hook-ccpp process stuck on one of our nodes for more than a week.  We found the following processes on the system:

nscd       3453  1.7  0.0 2363160  972 ?        Dsl  Aug26 1739:29 /usr/sbin/nscd
root     317150  0.0  0.0  83936  1532 ?        S    Oct23   2:31 /usr/libexec/abrt-hook-ccpp 11 0 3453 28 28 1445604670 nscd

Tracing the abrt process showed a loop:

# strace -f -p 317150
Process 317150 attached
restart_syscall(<... resuming interrupted call ...>) = 0
symlinkat("317150", 5, ".lock")         = -1 EEXIST (File exists)
readlinkat(5, ".lock", "4017", 15)      = 4
access("/proc/4017", F_OK)              = 0
nanosleep({0, 500000000}, NULL)         = 0
symlinkat("317150", 5, ".lock")         = -1 EEXIST (File exists)
readlinkat(5, ".lock", "4017", 15)      = 4
access("/proc/4017", F_OK)              = 0
nanosleep({0, 500000000}, NULL)         = 0
symlinkat("317150", 5, ".lock")         = -1 EEXIST (File exists)
readlinkat(5, ".lock", "4017", 15)      = 4
access("/proc/4017", F_OK)              = 0
nanosleep({0, 500000000}, NULL)         = 0

so I checked the file descriptor to get the path:

# ls -l /proc/317150/fd
total 0
lr-x------. 1 root root 64 Nov  2 14:37 0 -> pipe:[1577878550]
lrwx------. 1 root root 64 Nov  2 14:37 1 -> /dev/null
lrwx------. 1 root root 64 Nov  2 14:37 2 -> /dev/null
lrwx------. 1 root root 64 Nov  2 14:37 3 -> socket:[1577881793]
lr-x------. 1 root root 64 Nov  2 14:37 4 -> /var/spool/abrt
lr-x------. 1 root root 64 Nov  2 14:37 5 -> /var/spool/abrt/ccpp-2015-10-05-15:04:14-262898

then found the .lock file in the directory was a broken symlink:

# ls -la /var/spool/abrt/ccpp-2015-10-05-15:04:14-262898
total 2024
drwxr-x---.  2 root abrt      41 Oct  5 15:06 .
drwxr-xr-x. 37 abrt abrt    4096 Oct 30 10:24 ..
lrwxrwxrwx.  1 root root       4 Oct  5 15:05 .lock -> 4017
-rw-------.  1 root root 2066788 Oct  5 15:06 sosreport.tar.xz


I removed the .lock file, and the processes exited, writing event_log and machine_id files to the directory:

# ls -la /var/spool/abrt/ccpp-2015-10-05-15:04:14-262898
total 2028
drwxr-x---.  2 root abrt      61 Nov  2 14:53 .
drwxr-xr-x. 17 abrt abrt    4096 Nov  2 14:51 ..
-rw-r--r--.  1 root root       0 Nov  2 14:49 event_log
-rw-r--r--.  1 root root      93 Nov  2 14:49 machineid
-rw-------.  1 root root 2066788 Oct  5 15:06 sosreport.tar.xz

Comment 2 Jakub Filak 2015-11-03 07:47:24 UTC
Thank you for the report. This issue looks like a duplicate of bug #1255762. Can you please provide us with full system log?

Comment 3 Andy Grimm 2015-11-03 14:17:58 UTC
It is definitely a duplicate.  Closing this one.

*** This bug has been marked as a duplicate of bug 1255762 ***