This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 154557 - crash netdump session fails with "task does not exist" error
crash netdump session fails with "task does not exist" error
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: crash (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Anderson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-12 13:26 EDT by Dave Anderson
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-06 10:41:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Dave Anderson 2005-04-12 13:26:21 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030611

Description of problem:

When a dead process has called schedule() from do_exit(), 
schedule() should never return:

        ... 
        BUG_ON(!(current->flags & PF_DEAD));
        schedule();
        BUG();
        ...

If it does return, the BUG() forces an oops.

The crash utility fails during initialization when run against a
netdump vmcore in which this kernel anomoly has occurred, with an
error message of the sort:

crash: task does not exist: f6bef7b0





Version-Release number of selected component (if applicable):
crash-3.10-1

How reproducible:
Always

Steps to Reproduce:
1. Run crash on a netdump where the kernel has oops'd in the manner
   described above.
2.
3.
  

Actual Results:  
$ crash vmlinux vmcore

crash 3.10-1
Copyright (C) 2002, 2003, 2004  Red Hat, Inc.
Copyright (C) 2004  IBM Corp.
Copyright (C) 1998-2004  Hewlett-Packard Co
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

WARNING: Because this kernel was compiled with gcc version 3.4.3, certain
         commands or command options may fail unless crash is invoked with
         the  "--readnow" command line option.

GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

The crash session aborts during initialization with a message of the sort:

  crash: task does not exist: f6bef7b0



Expected Results:  
The crash session should come up normally.

Additional info:
Comment 1 Dave Anderson 2005-04-12 13:37:26 EDT
A sample vmlinux/vmcore pair has been provided to me by Guy Streeter
(streeter@redhat.com).

The reason for the failure is because the crash utility uses the
kernel's PID hash chain to gather information for each process
in the system.  In this case, the process had removed itself from
the system's PID hash chain in expectation of having its task_struct
freed by schedule().  However, in a case like this (because of a 
kernel bug), the dead process was not freed, but the task was 
actually re-scheduled and put on the system's run queue.  When
the crash utility attempts to gather information about each active
task on each cpu by looking at the system's run queue, it never
expected to find a task_struct there that was not on the system's
PID hash list, and died prematurely.

I will update the "upstream" version of crash with a fix for this
kind of situation, and do erratas for RHEL3 and RHEL4.

Comment 2 Dave Anderson 2005-04-12 14:32:19 EDT
A fix for this bug can be found in version 3.10-13.3.
The source code is available here:

  http://people.redhat.com/anderson

Both src.rpm and tar.gz files are located there.

To build from the tar.gz file:

  # tar xzf crash-3.10-13.3.tar.gz
  # cd crash-3.10-13.3
  # make

To build from the src.rpm file:

  # rpm -ivh crash-3.10-13.3.src.rpm
  # cd /usr/src/redhat/SPECS   (or wherever your .rpmmacros points to)
  # rpmbuild -ba crash.spec

Then install the resultant binary rpm.

With this version, the dead task is "found", and made accessible:

$ crash vmlinux vmcore

crash 3.10-13.3
Copyright (C) 2002, 2003, 2004, 2005  Red Hat, Inc.
Copyright (C) 2004, 2005  IBM Corporation
Copyright (C) 1999-2005  Hewlett-Packard Co
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

WARNING: Because this kernel was compiled with gcc version 3.4.3, certain
         commands or command options may fail unless crash is invoked with
         the  "--readnow" command line option.

GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

WARNING: active task f6bef7b0 on cpu 3 not found in PID hash

      KERNEL: /usr/dumps/2.6.9-6-nopanictask/vmlinux
    DUMPFILE: /usr/dumps/2.6.9-6-nopanictask/jlab-rackley-vmcore
        CPUS: 4
        DATE: Thu Apr  7 08:21:20 2005
      UPTIME: 17:38:08
LOAD AVERAGE: 29.42, 21.59, 15.99
       TASKS: 185
    NODENAME: sfs29.jlab.org
     RELEASE: 2.6.9-5.ELsmp
     VERSION: #1 SMP Wed Jan 5 19:30:39 EST 2005
     MACHINE: i686  (2667 Mhz)
      MEMORY: 2 GB
       PANIC: "kernel BUG at kernel/exit.c:840!"
         PID: 15071
     COMMAND: "pdflush"
        TASK: f6bef7b0  [THREAD_INFO: e3f86000]
         CPU: 3
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 15071  TASK: f6bef7b0  CPU: 3   COMMAND: "pdflush"
 #0 [e3f86e9c] netpoll_start_netdump at f88f0596
 #1 [e3f86ebc] die at c0105fb9
 #2 [e3f86ef0] do_invalid_op at c01063f0
 #3 [e3f86f9c] error_code (via invalid_op) at c02c6d7d
    EAX: 00000000  EBX: f6befcf0  ECX: f6bef700  EDX: c2034d60  EBP: 00000000
    DS:  007b      ESI: f7ffb680  ES:  007b      EDI: f6bef7b0
    CS:  0060      EIP: c0122c26  ERR: ffffffff  EFLAGS: 00010246
 #4 [e3f86fd8] do_exit at c0122c26
 #5 [e3f86fec] kernel_thread_helper at c01041f2
crash>

The fix will be queued for RHEL3 and RHEL4 errata updates.
Comment 3 Dave Anderson 2005-09-22 13:56:27 EDT
The fix for this issue is contained in the crash utility update
in its respective RHEL3-U6 and RHEL4-U2 errata:

  RHEA-2005:599 RHEL3-U6  crash enhancement update (4.0-1)
  RHEA-2005:600 RHEL4-U2  crash enhancement update (4.0-2)



Note You need to log in before you can comment on or make changes to this bug.