Bug 236430 - nfsd: BUG: using smp_processor_id() in preemptible code
nfsd: BUG: using smp_processor_id() in preemptible code
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.0
All Linux
medium Severity urgent
: ---
: ---
Assigned To: Clark Williams
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-13 16:52 EDT by IBM Bug Proxy
Modified: 2008-02-27 14:58 EST (History)
0 users

See Also:
Fixed In Version: 2.6.21-31.el5rt
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-20 12:20:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
breaknfs testcase (1.55 KB, text/plain)
2007-04-13 16:54 EDT, IBM Bug Proxy
no flags Details
fix_drain_array.patch (510 bytes, text/plain)
2007-04-17 01:55 EDT, IBM Bug Proxy
no flags Details
fix_drain_array.patch (651 bytes, text/plain)
2007-04-18 00:55 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 33845 None None None Never

  None (edit)
Description IBM Bug Proxy 2007-04-13 16:52:47 EDT
LTC Owner is: ankigarg@in.ibm.com
LTC Originator is: ankigarg@in.ibm.com


Problem description:
Found the following in dmesg when running breaknfs testcase:

NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
BUG: using smp_processor_id() in preemptible [00000000] code: nfsd/2852
caller is drain_array+0x25/0x132

Call Trace:
 [<ffffffff8026d828>] dump_trace+0xbd/0x3d8
 [<ffffffff8026db87>] show_trace+0x44/0x6d
 [<ffffffff8026ddc8>] dump_stack+0x13/0x15
 [<ffffffff80355e2e>] debug_smp_processor_id+0xe3/0xf1
 [<ffffffff802e01b5>] drain_array+0x25/0x132
 [<ffffffff802e07ac>] __cache_shrink+0xd5/0x1a6
 [<ffffffff802e0a47>] kmem_cache_destroy+0x6c/0xe3
 [<ffffffff8846c809>] :nfsd:nfsd4_free_slab+0x16/0x21
 [<ffffffff8846c824>] :nfsd:nfsd4_free_slabs+0x10/0x36
 [<ffffffff8846daf9>] :nfsd:nfs4_state_shutdown+0x1a2/0x1ae
 [<ffffffff884570f8>] :nfsd:nfsd_last_thread+0x47/0x76
 [<ffffffff8839463a>] :sunrpc:svc_destroy+0x8d/0xd1
 [<ffffffff88394738>] :sunrpc:svc_exit_thread+0xba/0xc6
 [<ffffffff8845795f>] :nfsd:nfsd+0x2a3/0x2b8
 [<ffffffff802600f8>] child_rip+0xa/0x12

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff80355ddb>] .... debug_smp_processor_id+0x90/0xf1
.....[<ffffffff802e01b5>] ..   ( <= drain_array+0x25/0x132)

BUG: nfsd:2852 task might have lost a preemption check!

Call Trace:
 [<ffffffff8026d828>] dump_trace+0xbd/0x3d8
 [<ffffffff8026db87>] show_trace+0x44/0x6d
 [<ffffffff8026ddc8>] dump_stack+0x13/0x15
 [<ffffffff8028be9b>] preempt_enable_no_resched+0x5c/0x5e
 [<ffffffff80355e33>] debug_smp_processor_id+0xe8/0xf1
 [<ffffffff802e01b5>] drain_array+0x25/0x132
 [<ffffffff802e07ac>] __cache_shrink+0xd5/0x1a6
 [<ffffffff802e0a47>] kmem_cache_destroy+0x6c/0xe3
 [<ffffffff8846c809>] :nfsd:nfsd4_free_slab+0x16/0x21
 [<ffffffff8846c824>] :nfsd:nfsd4_free_slabs+0x10/0x36
 [<ffffffff8846daf9>] :nfsd:nfs4_state_shutdown+0x1a2/0x1ae
 [<ffffffff884570f8>] :nfsd:nfsd_last_thread+0x47/0x76
 [<ffffffff8839463a>] :sunrpc:svc_destroy+0x8d/0xd1
 [<ffffffff88394738>] :sunrpc:svc_exit_thread+0xba/0xc6
 [<ffffffff8845795f>] :nfsd:nfsd+0x2a3/0x2b8
 [<ffffffff802600f8>] child_rip+0xa/0x12

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:
----------------------------------------

nfsd: last server has exited
nfsd: unexporting all filesystems


Provide output from "uname -a", if possible:
Linux llm38.in.ibm.com 2.6.20-0119.rt8 #1 SMP PREEMPT Thu Feb 15 15:53:15 CET
2007 x86_64 x86_64 x86_64 GNU/Linux

Is this reproducible? Reproduced by running breaknfs test.
breaknfs has been pulled from lkml:
http://www.ussg.iu.edu/hypermail/linux/kernel/0603.0/0378.html 
 

Is the system (not just the application) hung? No.


A similar error has been reported earlier on linux-rt-users mailing list:

http://www.mail-archive.com/linux-rt-users@vger.kernel.org/msg00240.html



As is clear from the comment #5, the problem is not with nfsd.
kmem_cache_destroy invokes the drain_array function which invokes
smp_processor_id(). Now, if CONFIG_DEBUG_PREEMPT is on, call to
smp_processor_id() becomes a call to debug_smp_processor_id, which prints the
BUG report even if preempt_count =0 as indicated by 'preemptible [00000000]'.

Something to see here is that RHEL5RT kernel config, by default, has
DEBUG_PREEMPT config option ON. Will need to look at the areas of code affected
by this option, to see how it might affect timings/latencies. 

Trying with kernel by changing the call to smp_processor_id() in drain_array()
to raw_smp_processor_id(). This would not be a solution though.
Comment 1 IBM Bug Proxy 2007-04-13 16:54:10 EDT
Created attachment 152589 [details]
breaknfs testcase
Comment 2 IBM Bug Proxy 2007-04-14 11:15:28 EDT
----- Additional Comments From sripathi@in.ibm.com (prefers email at sripathik@in.ibm.com)  2007-04-14 11:11 EDT -------
I recreated this problem in another, easier way. I just started oprofile and
stopped it. I then saw the following message in dmesg. Though nfsd is not
involved here, it seems to be the same problem.

BUG: using smp_processor_id() in preemptible [00000000] code: oprofiled/29202
caller is drain_array+0x25/0x132

Call Trace:
 [<ffffffff8026d828>] dump_trace+0xbd/0x3d8
 [<ffffffff8026db87>] show_trace+0x44/0x6d
 [<ffffffff8026ddc8>] dump_stack+0x13/0x15
 [<ffffffff80355e2e>] debug_smp_processor_id+0xe3/0xf1
 [<ffffffff802e01b5>] drain_array+0x25/0x132
 [<ffffffff802e07ac>] __cache_shrink+0xd5/0x1a6
 [<ffffffff802e0a47>] kmem_cache_destroy+0x6c/0xe3
 [<ffffffff80314be9>] dcookie_unregister+0x109/0x121
 [<ffffffff8843af63>] :oprofile:event_buffer_release+0x1f/0x43
 [<ffffffff80212992>] __fput+0xc3/0x185
 [<ffffffff8022e537>] fput+0x14/0x16
 [<ffffffff802252cd>] filp_close+0x63/0x6e
 [<ffffffff80239cac>] put_files_struct+0x69/0xb8
 [<ffffffff802158a8>] do_exit+0x27d/0x8ad
 [<ffffffff8024af12>] cpuset_exit+0x0/0x70
 [<ffffffff8024ea66>] sys_exit_group+0x12/0x14
 [<ffffffff8025f42c>] tracesys+0x151/0x1be
 [<00000034ece948cf>]

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff80355ddb>] .... debug_smp_processor_id+0x90/0xf1
.....[<ffffffff802e01b5>] ..   ( <= drain_array+0x25/0x132) 
Comment 3 IBM Bug Proxy 2007-04-15 05:00:42 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|RH236430 - nfsd: BUG: using |RH236430 - BUG: using
                   |smp_processor_id() in       |smp_processor_id() in
                   |preemptible code            |preemptible code




------- Additional Comments From ankigarg@in.ibm.com (prefers email at ankita@in.ibm.com)  2007-04-15 04:52 EDT -------
Changing summary to make it more general 
Comment 4 IBM Bug Proxy 2007-04-17 01:55:48 EDT
Created attachment 152770 [details]
fix_drain_array.patch
Comment 5 IBM Bug Proxy 2007-04-17 01:56:03 EDT
----- Additional Comments From ankigarg@in.ibm.com (prefers email at ankita@in.ibm.com)  2007-04-17 01:50 EDT -------
 
fix drain_array in mm/slab.c

With this patch, I have not seen the BUG report in 24 hours of the test run.
Working on investigating if this change would affect the kmem_cache_t control
path. 
Comment 6 IBM Bug Proxy 2007-04-17 07:40:34 EDT
----- Additional Comments From ankigarg@in.ibm.com (prefers email at ankita@in.ibm.com)  2007-04-17 07:38 EDT -------
This patch is just a workaround and does not fix the underlying problem. So,
investigating further. 
Comment 7 IBM Bug Proxy 2007-04-18 00:55:48 EDT
Created attachment 152870 [details]
fix_drain_array.patch
Comment 8 IBM Bug Proxy 2007-04-18 00:56:03 EDT
----- Additional Comments From ankigarg@in.ibm.com (prefers email at ankita@in.ibm.com)  2007-04-18 00:52 EDT -------
 
drain_array_fix.patch

This patch fixes the BUG in drain_array() function. Found that there is no need
to call smp_processor_id from drain_array(), as already the call to
slab_spin_lock_irq() fills the value of this_cpu using raw_smp_processor_id(). 
Comment 9 IBM Bug Proxy 2007-04-18 09:35:30 EDT
----- Additional Comments From ankigarg@in.ibm.com (prefers email at ankita@in.ibm.com)  2007-04-18 09:30 EDT -------
Posted patch to lkml and Ingo 
Comment 10 Tim Burke 2007-04-18 22:03:53 EDT
Thanks for investigating and proposing fix!

Assigning to Ingo.
We'll discuss w/ him on this Friday's internal call.
Comment 11 IBM Bug Proxy 2007-04-19 01:05:29 EDT
----- Additional Comments From ankigarg@in.ibm.com (prefers email at ankita@in.ibm.com)  2007-04-19 01:02 EDT -------
Tim, Ingo has accepted the fix! 
Comment 12 Tim Burke 2007-04-26 18:14:44 EDT
ok, reasigning this to Clark to throw it into the patch heap.
Comment 13 Clark Williams 2007-04-30 11:26:34 EDT
The patch should be in any kernel-rt above 2.6.20-25. I believe it's also in all
the 2.6.21-rtX series.

We'll let this ride for a while and if no one sees it in a week or so, we'll
declare victory and close it. 

Note You need to log in before you can comment on or make changes to this bug.