Bug 1627950

Summary: kernel 4.19.0-0.rc2.git3.1.fc30 - circular locking in cpufreq
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal.jnn>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: airlied, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jcline, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, steved
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-09 23:52:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
WARNING fragement with dependency chain from journal
none
circular locking trace from 4.20.0-0.rc6.git2.1.fc30.x86_64
none
circular locking trace from 5.0.0-0.rc4.git2.1.fc30.x86_64
none
circular locking trace from 5.5.0-0.rc2.git1.1.fc32.x86_64 none

Description Michal Jaegermann 2018-09-12 00:02:04 UTC
Created attachment 1482499 [details]
WARNING fragement with dependency chain from journal

Description of problem:

The following registers in journal:


WARNING: possible circular locking dependency detected
 4.19.0-0.rc2.git3.1.fc30.x86_64 #1 Not tainted
 ------------------------------------------------------
 kworker/0:3/101 is trying to acquire lock:
 000000003b2babfd ((work_completion)(&wfc.work)){+.+.}, at: __flush_work+0x28c/0
x320
 
 but task is already holding lock:
 0000000084d461de (&policy_dbs->update_mutex){+.+.}, at: dbs_work_handler+0x2b/0
x70
 
 which lock already depends on the new lock.
.....

This is immediately followed by a stack backtrace (no idea if this is an independent event) and "tsc: Marking TSC unstable".

In any case the relevant fragment of a journal is in attachment

Version-Release number of selected component (if applicable):
4.19.0-0.rc2.git3.1.fc30


Additional info:
The kernel issues warnings but still boots

Comment 1 Jeremy Cline 2018-09-14 14:52:51 UTC
Hi Michal,

Thanks for the report, I've sent an email to upstream and they're looking into it.

Comment 2 Michal Jaegermann 2018-10-08 20:43:03 UTC
Just for the record - so far 4.19.0-0.rc6.git4.1.fc30.x86_64 shows the same cirular locking dependency.

Comment 3 Michal Jaegermann 2018-12-17 23:34:09 UTC
Created attachment 1515181 [details]
circular locking trace  from 4.20.0-0.rc6.git2.1.fc30.x86_64

This problem still shows up (as in kernel-4.20.0-0.rc6.git2.1.fc30.x86_64). This shows up:

Chain exists of:
   (work_completion)(&wfc.work) --> gov_dbs_data_mutex --> &policy_dbs->update_mutex
  Possible unsafe locking scenario:
        CPU0                    CPU1
        ----                    ----
   lock(&policy_dbs->update_mutex);
                                lock(gov_dbs_data_mutex);
                                lock(&policy_dbs->update_mutex);
   lock((work_completion)(&wfc.work));
 
  *** DEADLOCK ***
 3 locks held by kworker/0:4/161:
  #0: 00000000ebe4492c ((wq_completion)"events"){+.+.}, at: process_one_work+0x1f3/0x600
  #1: 00000000385d9a6d ((work_completion)(&policy_dbs->work)){+.+.}, at: process _one_work+0x1f3/0x600
  #2: 000000001cc467d2 (&policy_dbs->update_mutex){+.+.}, at: dbs_work_handler+0x2b/0x70

On my testbox there is only CPU0.  Should I count myself lucky?  More details attached.

Comment 4 Michal Jaegermann 2019-01-09 23:52:19 UTC
After booting 5.0.0-0.rc1.git0.1.fc30.x86_64 I do not see "possible circular locking dependency detected" so maybe this is gone? If I will see it back I will reopen this bug.

Comment 5 Jeremy Cline 2019-01-10 18:11:24 UTC
Hey Michal,

We build the RC kernels without debugging configurations set so that's likely why you don't see it. Kernels with "rcN.git0" are the RC kernels. If you install the debug kernel or use an "rcN.gitN" kernel (such as https://koji.fedoraproject.org/koji/buildinfo?buildID=1178473) do you still see it? If so, I'll ping upstream again.

Comment 6 Michal Jaegermann 2019-01-10 19:06:30 UTC
(In reply to Jeremy Cline from comment #5)
>
> If you install the debug kernel or use an "rcN.gitN" kernel (such as
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1178473) do you still
> see it? If so, I'll ping upstream again.

Ah, ok.  My mistake with getting too enthusiastic on a non-debugging kernel.  Indeed, when booting 5.0.0-0.rc1.git2.1.fc30.x86_64 I see the same "WARNING: possible circular locking dependency detected" with traces basically the same as before with minor differences here and there like:

   worker_thread+0x3c/0x390
-  ? drain_workqueue+0x180/0x180
+  ? process_one_work+0x600/0x600
   kthread+0x120/0x140
-  ? kthread_park+0x80/0x80
+  ? kthread_create_on_node+0x60/0x60
   ret_from_fork+0x3a/0x50

Booting with 'tsc=unstable' does not affect that; only "clocksource" complaints are gone.

If trace information from 5.0.0-0.rc1.git2.1.fc30.x86_64 is useful/needed please let me know.

Comment 7 Michal Jaegermann 2019-02-02 23:02:55 UTC
Created attachment 1526319 [details]
circular locking trace from 5.0.0-0.rc4.git2.1.fc30.x86_64

With kernel-5.0.0-0.rc4.git2.1.fc30.x86_64 I see additional information which looks like a debugging output.  After

  ret_from_fork+0x3a/0x50

there is now

  ------------[ cut here ]------------
   downgrading a read lock
   WARNING: CPU: 0 PID: 639 at kernel/locking/lockdep.c:3553 lock_downgrade+0x13b/0x1c0

followed by register dumps and other traces.  The whole thing is attached

Comment 8 Michal Jaegermann 2019-12-18 23:18:44 UTC
Created attachment 1646227 [details]
circular locking trace from 5.5.0-0.rc2.git1.1.fc32.x86_64

Just for the record now with fc32 kernel.