Bug 630464

Summary: WARNING: at drivers/char/tty_io.c:1325, only when booting with systemd
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: anton, bruno, dougsland, gansalmon, itamar, jonathan, kernel-maint, kmcmartin, lpoetter, madhu.chinakonda, orion, ozan.caglayan, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-29 14:22:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg for 2.6.36.1-8.rc1.fc15.x86_64 with tty_io.c:1325 warnings
none
dmesg for 2.6.37-0.rc3.git0.1.tty1.fc15.x86_64 with tty_io.c:1331 warnings
none
dmesg for 2.6.36.1-10.fc15.x86_64 with tty_io.c:1329 warnings none

Description Michal Jaegermann 2010-09-05 19:02:00 UTC
Description of problem:

The following shows up in dmesg output when booting with systemd:
------------[ cut here ]------------
WARNING: at drivers/char/tty_io.c:1325 tty_open+0x287/0x484()
Hardware name: To Be Filled By O.E.M.
Modules linked in: ext2 uinput snd_via82xx snd_ac97_codec ac97_bus snd_seq snd_pcm snd_timer snd_page_alloc snd_mpu401 snd_mpu401_uart snd_rawmidi e100 ppdev microcode ns558 snd_seq_device edac_core snd parport_pc edac_mce_amd k8temp i2c_viapro gameport soundcore skge parport mii shpchp ipv6 ata_generic pata_acpi firewire_ohci firewire_core sata_via sata_promise pata_via crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 115, comm: plymouthd Not tainted 2.6.36-0.16.rc3.git0.fc15.x86_64 #1
Call Trace:
 [<ffffffff81051090>] warn_slowpath_common+0x85/0x9d
 [<ffffffff810510c2>] warn_slowpath_null+0x1a/0x1c
 [<ffffffff812d2ba4>] tty_open+0x287/0x484
 [<ffffffff8112e46c>] chrdev_open+0x122/0x14b
 [<ffffffff8112e34a>] ? chrdev_open+0x0/0x14b
 [<ffffffff81128f1f>] __dentry_open+0x1c2/0x338
 [<ffffffff81213b08>] ? devcgroup_inode_permission+0xec/0x115
 [<ffffffff81129da0>] nameidata_to_filp+0x3f/0x50
 [<ffffffff81136200>] do_last+0x432/0x5af
 [<ffffffff81246a75>] ? __raw_spin_lock_init+0x31/0x50
 [<ffffffff811365ad>] do_filp_open+0x230/0x5e1
 [<ffffffff810fac10>] ? might_fault+0x5c/0xac
 [<ffffffff8114040b>] ? alloc_fd+0x3b/0x17c
 [<ffffffff8107fc2a>] ? lock_release+0x19a/0x1a6
 [<ffffffff8114053a>] ? alloc_fd+0x16a/0x17c
 [<ffffffff81129e11>] do_sys_open+0x60/0xfc
 [<ffffffff8149acb2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81129ecd>] sys_open+0x20/0x22
 [<ffffffff81009cb2>] system_call_fastpath+0x16/0x1b
---[ end trace aa2b223ac82cbea9 ]---

I did not notice anything of that sort when booting with 'init=/sbin/upstart'.

Version-Release number of selected component (if applicable):
kernel-2.6.36-0.16.rc3.git0.fc15.x86_64

How reproducible:
every time when systemd was used as init

Comment 1 Chuck Ebbert 2010-09-06 14:28:55 UTC
That's:

    WARN_ON(!test_bit(TTY_LDISC, &tty->flags));

And there are some traces of calls to cgroup-related functions in the stacktrace, so I wonder if that has something to do with this error. (Upstart doesn't use cgroups, but systemd does.)

Comment 2 Kyle McMartin 2010-11-22 19:49:06 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=2617334

Could you try this build please, and let me know if the WARN_ON still triggers?

Comment 3 Kyle McMartin 2010-11-22 19:52:18 UTC
*** Bug 523640 has been marked as a duplicate of this bug. ***

Comment 4 Michal Jaegermann 2010-11-22 22:20:17 UTC
(In reply to comment #2)
> http://koji.fedoraproject.org/koji/taskinfo?taskID=2617334
> 
> Could you try this build please, and let me know if the WARN_ON still triggers?

Yes, it does.  dmesg from a boot with this kernel, 2.6.36.1-8.rc1.fc15.x86_64, is attached.

OTOH it does not trigger when booting with 'init=/sbin/upstart' or when booting 
2.6.36-5.fc15.x86_64 (with or without switching to upstart) curiously enough.

Comment 5 Michal Jaegermann 2010-11-22 22:22:17 UTC
Created attachment 462164 [details]
dmesg for 2.6.36.1-8.rc1.fc15.x86_64 with tty_io.c:1325 warnings

Comment 6 Kyle McMartin 2010-11-22 22:53:29 UTC
OK, thanks. I'm not seeing it with 2.6.37-rc3, and Bill reports he's not seeing it with 2.6.36-5, so I'm really confused as to what's going on here.

Comment 7 Michal Jaegermann 2010-11-22 23:01:01 UTC
(In reply to comment #2)
> Could you try this build please, and let me know if the WARN_ON still triggers?

Apparently there is another way to kill that warning.  Booting with
cgroup_disable=memory appears to be doing that as well; although I am not
completely sure if a pretty consistent warning was not converted into a race.

OTOH with 2.6.36-5.fc15.x86_64 this 'cgroup_disable=memory' does not seem to be
needed.

> Bill reports he's not seeing it with 2.6.36-5

Well, I reported that too. :-)

Comment 8 Kyle McMartin 2010-11-22 23:04:00 UTC
I don't understand... the only interesting thing in -5 was me reverting the drm rebase that came in at -3 (but we've been seeing this for a lot longer.)

:(

--Kyle

Comment 9 Ozan Çağlayan 2010-11-23 17:17:05 UTC
Just to give an idea, 

I'm seeing this on 2.6.36 constantly in a Non-Fedora x86_64 VBox installation. The tainted process is plymouthd just like this one, cgroups are enabled in kernel. But I'm not using systemd for booting.

I'll post my trace tomorrow.

Comment 10 Michal Jaegermann 2010-11-23 17:48:30 UTC
(In reply to comment #9)
> But I'm not using systemd for booting.

I do not try to imply that systemd is "at fault".  It simply does something that, in my case, triggers a kernel action resulting in that warning.  See comment #1 with a conclusion additionaly supported by that that 'cgroup_disable=memory' also makes that warning, for me, to disappear.

I would be really surprised if other triggers would not exist.

Comment 11 Kyle McMartin 2010-11-24 22:03:51 UTC
I've put a debug patch into the kernel to print out which tty device is being accessed when it fails, so hopefully we can chase this down.

I noticed a message on LKML today that reproduces it with a pretty basic X25 ldisc test program attached, hopefully it will get some traction. I can count something like 5 independent threads about this in the recent few weeks.

Comment 12 Kyle McMartin 2010-11-24 22:14:58 UTC
http://lkml.org/lkml/2010/11/24/511

Building a kernel now for you to test with that patch applied.

Comment 13 Kyle McMartin 2010-11-24 22:21:54 UTC
http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git0.1.tty1.fc15/

Please poke that and let me know if it falls over.

Comment 14 Michal Jaegermann 2010-11-24 23:49:31 UTC
(In reply to comment #13)
> Please poke that and let me know if it falls over.

Hi Kyle!  It did not trigger that warning on the first boot after it was installed but on the second try it happened.  Used to be more consistent than that. :-)

I am not sure what your debugging patch was supposed to print but I could not detect any traces of it in an output.

Comment 15 Michal Jaegermann 2010-11-24 23:52:08 UTC
Created attachment 462779 [details]
dmesg for 2.6.37-0.rc3.git0.1.tty1.fc15.x86_64 with tty_io.c:1331 warnings

Comment 16 Michal Jaegermann 2010-11-25 00:28:30 UTC
(In reply to comment #13)
> http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git0.1.tty1.fc15/

Well, there is something new with 2.6.37-0.rc3.git0.1.tty1.fc15.x86_64.  When I tried to power a machine off it did not.  I had to push a power switch. Other recent kernels, including 2.6.36.1-8.rc1.fc15.x86_64, with the current systemd do power down when asked.

Comment 17 Kyle McMartin 2010-11-25 02:54:28 UTC
Darn, back to the drawing board, that looked somewhat plausible though.

Comment 18 Michal Jaegermann 2010-11-25 03:13:37 UTC
(In reply to comment #17)
> Darn, back to the drawing board, that looked somewhat plausible though.

Just thinking loud.  You diagostic did not fire up (I assume that there is one). So maybe something gets overwritten and this only looks like an issue in tty_open()?

Comment 19 Kyle McMartin 2010-11-25 03:53:27 UTC
No, I copied the patch into the wrong tree and built an old kernel. It'll be in the next one.

Comment 20 Ozan Çağlayan 2010-11-25 12:46:27 UTC
I tried your patch with 2.6.36.1 and booted with plymouth enabled. The boot tty of plymouth is /dev/tty7 and here's the output:

tty_reopen: !test_bit(TTY_LDISC, &tty->flags) dev=tty7

Comment 21 Kyle McMartin 2010-11-25 13:24:21 UTC
I'll put money on it being DRM related and going away if you use 'nomodeset' :)

I'll poke airlied.

Comment 22 Ozan Çağlayan 2010-11-25 14:38:21 UTC
Eventhough it is VBox setup with no KMS drivers at all, i tried nomodeset but the trace still exists.

Comment 23 Kyle McMartin 2010-11-25 16:16:38 UTC
Bah. Who the hell knows. I'll revert the BKL removal and see if that makes it go away...

http://kyle.fedorapeople.org/kernel/2.6.36.1-9.tty2/

Comment 24 Kyle McMartin 2010-11-25 17:39:08 UTC
http://kyle.fedorapeople.org/kernel/2.6.36.1-10.fc15-tty/x86_64/

Actually, try this instead, please. I think it should fix the issue.

Comment 25 Ozan Çağlayan 2010-11-25 18:16:20 UTC
Can you tell me what to revert (upstream commit id(s) would be enough) or an SRPM of the kernel you've provided, because I'm not using fedora.

Thanks,

Comment 26 Kyle McMartin 2010-11-25 18:59:10 UTC
http://lkml.org/lkml/2010/11/25/284

Comment 27 Michal Jaegermann 2010-11-25 19:56:54 UTC
(In reply to comment #24)
> http://kyle.fedorapeople.org/kernel/2.6.36.1-10.fc15-tty/x86_64/
> 
> Actually, try this instead, please. I think it should fix the issue.

Sorry! I got TWO hits with this kernel on the first try.  The only difference is that now 'cgroup_disable=memory' does not help.  I got only one hit with that, though; the one after "EXT3-fs (sda11): using internal journal" line (see attached dmesg output).  The following line:

tty_reopen: !test_bit(TTY_LDISC, &tty->flags) dev=tty1 ldisc=n_tty

is an effect of diagnostic additions, I gather?

It does power off again.  That is good.

Comment 28 Michal Jaegermann 2010-11-25 19:58:29 UTC
Created attachment 462958 [details]
dmesg for 2.6.36.1-10.fc15.x86_64 with tty_io.c:1329 warnings

Comment 29 Kyle McMartin 2010-11-25 20:11:32 UTC
Damn, since that didn't fix it can you try the one in comment #23?

What a bizarro bug...

(Not surprised about the cgroup thing, it was a red herring from the start. This is some kind of uber subtle race...)

Comment 30 Michal Jaegermann 2010-11-25 21:33:45 UTC
(In reply to comment #29)
>  can you try the one in comment #23?

On four reboots this warning did not show up. BKL is a pretty big hammer, though.

Comment 31 Kyle McMartin 2010-11-26 00:23:40 UTC
Thanks, at least it means we're probably looking in the right place for the fix.

Comment 32 Kyle McMartin 2010-11-26 15:30:29 UTC
Just to make sure we're not chasing our tails with something else, can you try the 2.6.37-rc3-git2 kernel here (with that patch added)

http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git2.1/x86_64/

Thanks, Kyle

Comment 33 Michal Jaegermann 2010-11-26 22:20:00 UTC
(In reply to comment #32)
> Just to make sure we're not chasing our tails with something else, can you try
> the 2.6.37-rc3-git2 kernel here

Hmm, that is what I see:
[    0.000000] Linux version 2.6.37-0.rc3.git2.1.fc15.x86_64 (kyle.bos.redhat.com) ...
.....
[   17.066468] ------------[ cut here ]------------
[   17.102088] WARNING: at drivers/tty/tty_io.c:1332 tty_open+0x29d/0x48d()
....
[   23.476429] ------------[ cut here ]------------
[   23.506674] WARNING: at drivers/tty/tty_io.c:1332 tty_open+0x29d/0x48d()
....

Do you want the whole dmesg or you have enough

> (with that patch added)

I am not sure what is "that patch".

(Apologies for a delay.  A bit crazy day around here).

Comment 34 Kyle McMartin 2010-11-27 00:08:31 UTC
Damn, I just saw it myself for the first time in recent memory (~25 boots according to messages...) on that. Sigh.

Comment 35 Kyle McMartin 2010-11-27 00:09:15 UTC
'That patch' the one Jiri posted to linux-kernel above. (I meant it was included in there, not that there was something to add on top.)

Comment 36 Kyle McMartin 2010-11-27 16:36:36 UTC
http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git3.2/x86_64/

Jiri posted a few more patches which should hopefully close the race window.. Can you try this one? I haven't seen it in ~25 reboots.

Comment 37 Michal Jaegermann 2010-11-27 18:50:27 UTC
(In reply to comment #36)
> Can you try this one? I haven't seen it in ~25 reboots.

So far so good.  Of course this is a race but from the previous experience on my setup I would see results of such race already if it would be there.

Comment 38 Kyle McMartin 2010-11-27 18:57:12 UTC
Cool, thanks, let us know if you see it. :)

Comment 39 Kyle McMartin 2010-11-29 14:22:32 UTC
OK, fixes are in 2.6.36.1-10.