Bug 1321875

Summary: blktrace attempts to start threads on offline CPUs
Product: Red Hat Enterprise Linux 6 Reporter: Milos Malik <mmalik>
Component: blktraceAssignee: Eric Sandeen <esandeen>
Status: CLOSED WONTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.8CC: hannsj_uhl
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-19 17:40:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Milos Malik 2016-03-29 10:00:09 UTC
Description of problem:
 * either blktrace guesses the number of CPUs incorrectly or it is not able to run on more 32 CPUs

Version-Release number of selected component (if applicable):
RHEL-6.7
blktrace-1.0.1-7.el6.ppc64
kernel-2.6.32-573.el6.ppc64
kernel-bootwrapper-2.6.32-573.el6.ppc64
kernel-devel-2.6.32-573.el6.ppc64
kernel-firmware-2.6.32-573.el6.noarch
kernel-headers-2.6.32-573.el6.ppc64

How reproducible:
always

Steps to Reproduce:
# dmesg | grep -i cpus
Partition configured for 64 cpus.
Initializing cgroup subsys cpuset
Brought up 32 CPUs
Node 0 CPUs: 0-31
# mount -t debugfs debugfs /sys/kernel/debug/
#blktrace -w 1 -d /dev/sda 

Actual results:
FAILED to start thread on CPU 32: 22/Invalid argument
FAILED to start thread on CPU 33: 22/Invalid argument
FAILED to start thread on CPU 34: 22/Invalid argument
FAILED to start thread on CPU 35: 22/Invalid argument
FAILED to start thread on CPU 36: 22/Invalid argument
FAILED to start thread on CPU 37: 22/Invalid argument
FAILED to start thread on CPU 38: 22/Invalid argument
FAILED to start thread on CPU 39: 22/Invalid argument
FAILED to start thread on CPU 40: 22/Invalid argument
FAILED to start thread on CPU 41: 22/Invalid argument
FAILED to start thread on CPU 42: 22/Invalid argument
FAILED to start thread on CPU 43: 22/Invalid argument
FAILED to start thread on CPU 44: 22/Invalid argument
FAILED to start thread on CPU 45: 22/Invalid argument
FAILED to start thread on CPU 46: 22/Invalid argument
FAILED to start thread on CPU 47: 22/Invalid argument
FAILED to start thread on CPU 48: 22/Invalid argument
FAILED to start thread on CPU 49: 22/Invalid argument
FAILED to start thread on CPU 50: 22/Invalid argument
FAILED to start thread on CPU 51: 22/Invalid argument
FAILED to start thread on CPU 52: 22/Invalid argument
FAILED to start thread on CPU 53: 22/Invalid argument
FAILED to start thread on CPU 54: 22/Invalid argument
FAILED to start thread on CPU 55: 22/Invalid argument
FAILED to start thread on CPU 56: 22/Invalid argument
FAILED to start thread on CPU 57: 22/Invalid argument
FAILED to start thread on CPU 58: 22/Invalid argument
FAILED to start thread on CPU 59: 22/Invalid argument
FAILED to start thread on CPU 60: 22/Invalid argument
FAILED to start thread on CPU 61: 22/Invalid argument
FAILED to start thread on CPU 62: 22/Invalid argument
FAILED to start thread on CPU 63: 22/Invalid argument

Expected results:
# cat /sys/devices/system/cpu/online
0-31
# cat /sys/devices/system/cpu/offline 
32-1023

* blktrace runs on all CPUs which are online and it doesn't complain about the others

Comment 3 Eric Sandeen 2016-03-29 16:06:39 UTC
So the root problem is that it is trying to start threads on offline CPUs?

Comment 4 Eric Sandeen 2016-03-29 16:23:09 UTC
commit 80c4041b2e7a7d5afb75df563bf51bb27773c095
Author: Abutalib Aghayev <agayev>
Date:   Tue Feb 9 08:17:50 2016 -0700

    blktrace: Use number of online CPUs
    
    Currently, blktrace uses _SC_NPROCESSORS_CONF to find out the number of
    CPUs.  This is a problem, because if you reduce the number of online
    CPUs by passing kernel parameter maxcpus, then blktrace fails to start
    with the error:
    
    FAILED to start thread on CPU 4: 22/Invalid argument
    FAILED to start thread on CPU 5: 22/Invalid argument
    ...
    
    The attached patch fixes it to use _SC_NPROCESSORS_ONLN.
    
    Signed-off-by: Jens Axboe <axboe>

Comment 5 Karel Srot 2016-04-01 06:28:00 UTC
Postponing to rhel-6.9 as it is too late to address this bug in rhel-6.8.

Comment 6 Eric Sandeen 2016-05-19 17:40:04 UTC
Sadly I don't think we'll be able to ship this fix at this point in rhel6.