Bug 498489 - blktrace stops working after a trace-file-directory replacement [NEEDINFO]
blktrace stops working after a trace-file-directory replacement
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
medium Severity medium
: rc
: ---
Assigned To: Eric Sandeen
Red Hat Kernel QE team
:
Depends On:
Blocks: 533192
  Show dependency treegraph
 
Reported: 2009-04-30 13:42 EDT by Milos Malik
Modified: 2014-01-27 06:02 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 03:31:52 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
yugzhang: needinfo? (esandeen)


Attachments (Terms of Use)

  None (edit)
Description Milos Malik 2009-04-30 13:42:18 EDT
Description of problem:
I wanted to know how blktrace copes with situation where one of the trace files (which are usually overwritten in the next run) is in fact a directory. The utility terminated with an error message, which is a correct behavior. But after removing the directory the utility stopped working and always terminates with another error message. Removing all trace files doesn't help. Remounting /sys/kernel/debug doesn't help.

Version-Release number of selected component (if applicable):
blktrace-1.0.0-3.el5

How reproducible:
always (i386, ppc, ia64, x86_64, s390x)

Steps to Reproduce:
# df -aT | grep -e debug -e md0
/dev/md0      ext3    17834416  14265104   2648736  85% /
debugfs    debugfs           0         0         0   -  /sys/kernel/debug
# blktrace -d /dev/md0 -w 5
Device: /dev/md0
  CPU  0:                    0 events,       19 KiB data
  CPU  1:                    0 events,        7 KiB data
  Total:                     0 events (dropped 0),       26 KiB data
# ls -l *.blktrace.*
-rw-r--r-- 1 root devqa7 19288 Apr 30 13:03 md0.blktrace.0
-rw-r--r-- 1 root devqa7  6648 Apr 30 13:04 md0.blktrace.1
# rm md0.blktrace.1
rm: remove regular file `md0.blktrace.1'? y
# mkdir md0.blktrace.1
# blktrace -d /dev/md0 -w 5
./md0.blktrace.1: Is a directory
Failed to start worker threads
# rmdir md0.blktrace.1
# ls -l *.blktrace.*
-rw-r--r-- 1 root devqa7 0 Apr 30  2009 md0.blktrace.0
# blktrace -d /dev/md0 -w 5
BLKTRACESETUP: No such file or directory
Failed to start trace on /dev/md0
# rm -f *.blktrace.*
# blktrace -d /dev/md0 -w 5
BLKTRACESETUP: No such file or directory
Failed to start trace on /dev/md0
# blktrace -d /dev/md0 -w 5
BLKTRACESETUP: No such file or directory
Failed to start trace on /dev/md0
  
Actual results:
blktrace is not working

Expected results:
blktrace is working again

Additional information:
On all machines where I reproduced this bug, the /sys/kernel/debug/ filesystem contains a directory called /sys/kernel/debug/block/<device-name> (eg. /sys/kernel/debug/block/md0). This directory is not present on machines where the file-directory replacement wasn't done.
Comment 1 Eric Sandeen 2009-04-30 14:09:07 EDT
Hm, seems to work for me on /dev/sda.  Did you always test with /dev/md0?

[root@bear-05 tmp]# blktrace -d /dev/sda -w 5
Device: /dev/sda
  CPU  0:                    0 events,        4 KiB data
  CPU  1:                    0 events,        1 KiB data
  CPU  2:                    0 events,        0 KiB data
  CPU  3:                    0 events,        0 KiB data
  Total:                     0 events (dropped 0),        4 KiB data
[root@bear-05 tmp]# ls -l *.blktrace.*
-rw-r--r-- 1 root root 3248 Apr 30 13:19 sda.blktrace.0
-rw-r--r-- 1 root root  464 Apr 30 13:19 sda.blktrace.1
-rw-r--r-- 1 root root    0 Apr 30 13:19 sda.blktrace.2
-rw-r--r-- 1 root root    0 Apr 30 13:19 sda.blktrace.3
[root@bear-05 tmp]# rm sda.blktrace.1
rm: remove regular file `sda.blktrace.1'? y
[root@bear-05 tmp]# mkdir sda.blktrace.1
[root@bear-05 tmp]# blktrace -d /dev/sda -w 5
./sda.blktrace.1: Is a directory
Failed to start worker threads
[root@bear-05 tmp]# rmdir sda.blktrace.1
[root@bear-05 tmp]# blktrace -d /dev/sda -w 5
Device: /dev/sda
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                    0 events,        9 KiB data
  CPU  2:                    0 events,      162 KiB data
  CPU  3:                    0 events,        0 KiB data
  Total:                     0 events (dropped 0),      171 KiB data
[root@bear-05 tmp]# rm -f *blktrace*
[root@bear-05 tmp]# blktrace -d /dev/sda -w 5
Device: /dev/sda
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                    0 events,        1 KiB data
  CPU  2:                    0 events,        4 KiB data
  CPU  3:                    0 events,        0 KiB data
  Total:                     0 events (dropped 0),        5 KiB data
Comment 3 Eric Sandeen 2009-05-05 17:03:56 EDT
Maybe I should ask which kernel you're testing?  I still can't reproduce this, although I think I see a couple upstream patches which help with the proper teardown in some circumstances....

Thanks,
-Eric
Comment 5 Eric Sandeen 2009-05-06 15:34:27 EDT
http://git.engineering.redhat.com/?p=linux-2.6.git;a=commitdiff_plain;h=35fc51e7a5056889421270c1fb63d8ec45fbccf4

seem to fix this; it's a kernel change.

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed, 21 Nov 2007 11:25:41 +0000 (+0100)
Subject: blktrace: Make sure BLKTRACETEARDOWN does the full cleanup.
X-Git-Tag: v2.6.24-rc4~87^2~5
X-Git-Url: http://git.engineering.redhat.com/?p=linux-2.6.git;a=commitdiff_plain;h=35fc51e7a5056889421270c1fb63d8ec45fbccf4

blktrace: Make sure BLKTRACETEARDOWN does the full cleanup.

if blktrace program segfault it will not be able
to call BLKTRACETEARDOWN. Now if we run the blktrace
again that would result in a failure to create the
block/<device> debugfs directory.This will result
in blk_remove_root() to be called which will set
blk_tree_root to NULL. But the  debugfs block dir
still exist because it contain subdirectory.

Now if we try to fix it using BLKTRACETEARDOWN
it won't work because blk_tree_root is NULL.

Fix the same.

--------

I guess we'll need an exception to get this in ...

-Eric
Comment 6 Eric Sandeen 2009-05-06 15:36:49 EDT
Requestion exception for this one, without it blktrace gets into a state where tracing no longer works.    Change is uptream, and is confined to block/blktrace.c which can't regress, since we've never supported it before ...
Comment 8 RHEL Product and Program Management 2009-05-12 13:39:21 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 11 RHEL Product and Program Management 2009-09-25 13:39:04 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 Don Zickus 2009-12-11 14:26:54 EST
in kernel-2.6.18-179.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.
Comment 16 errata-xmlrpc 2010-03-30 03:31:52 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.