Bug 630464
Summary: | WARNING: at drivers/char/tty_io.c:1325, only when booting with systemd | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Jaegermann <michal> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | anton, bruno, dougsland, gansalmon, itamar, jonathan, kernel-maint, kmcmartin, lpoetter, madhu.chinakonda, orion, ozan.caglayan, zkabelac |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-11-29 14:22:32 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Michal Jaegermann
2010-09-05 19:02:00 UTC
That's: WARN_ON(!test_bit(TTY_LDISC, &tty->flags)); And there are some traces of calls to cgroup-related functions in the stacktrace, so I wonder if that has something to do with this error. (Upstart doesn't use cgroups, but systemd does.) http://koji.fedoraproject.org/koji/taskinfo?taskID=2617334 Could you try this build please, and let me know if the WARN_ON still triggers? *** Bug 523640 has been marked as a duplicate of this bug. *** (In reply to comment #2) > http://koji.fedoraproject.org/koji/taskinfo?taskID=2617334 > > Could you try this build please, and let me know if the WARN_ON still triggers? Yes, it does. dmesg from a boot with this kernel, 2.6.36.1-8.rc1.fc15.x86_64, is attached. OTOH it does not trigger when booting with 'init=/sbin/upstart' or when booting 2.6.36-5.fc15.x86_64 (with or without switching to upstart) curiously enough. Created attachment 462164 [details]
dmesg for 2.6.36.1-8.rc1.fc15.x86_64 with tty_io.c:1325 warnings
OK, thanks. I'm not seeing it with 2.6.37-rc3, and Bill reports he's not seeing it with 2.6.36-5, so I'm really confused as to what's going on here. (In reply to comment #2) > Could you try this build please, and let me know if the WARN_ON still triggers? Apparently there is another way to kill that warning. Booting with cgroup_disable=memory appears to be doing that as well; although I am not completely sure if a pretty consistent warning was not converted into a race. OTOH with 2.6.36-5.fc15.x86_64 this 'cgroup_disable=memory' does not seem to be needed. > Bill reports he's not seeing it with 2.6.36-5 Well, I reported that too. :-) I don't understand... the only interesting thing in -5 was me reverting the drm rebase that came in at -3 (but we've been seeing this for a lot longer.) :( --Kyle Just to give an idea, I'm seeing this on 2.6.36 constantly in a Non-Fedora x86_64 VBox installation. The tainted process is plymouthd just like this one, cgroups are enabled in kernel. But I'm not using systemd for booting. I'll post my trace tomorrow. (In reply to comment #9) > But I'm not using systemd for booting. I do not try to imply that systemd is "at fault". It simply does something that, in my case, triggers a kernel action resulting in that warning. See comment #1 with a conclusion additionaly supported by that that 'cgroup_disable=memory' also makes that warning, for me, to disappear. I would be really surprised if other triggers would not exist. I've put a debug patch into the kernel to print out which tty device is being accessed when it fails, so hopefully we can chase this down. I noticed a message on LKML today that reproduces it with a pretty basic X25 ldisc test program attached, hopefully it will get some traction. I can count something like 5 independent threads about this in the recent few weeks. http://lkml.org/lkml/2010/11/24/511 Building a kernel now for you to test with that patch applied. http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git0.1.tty1.fc15/ Please poke that and let me know if it falls over. (In reply to comment #13) > Please poke that and let me know if it falls over. Hi Kyle! It did not trigger that warning on the first boot after it was installed but on the second try it happened. Used to be more consistent than that. :-) I am not sure what your debugging patch was supposed to print but I could not detect any traces of it in an output. Created attachment 462779 [details]
dmesg for 2.6.37-0.rc3.git0.1.tty1.fc15.x86_64 with tty_io.c:1331 warnings
(In reply to comment #13) > http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git0.1.tty1.fc15/ Well, there is something new with 2.6.37-0.rc3.git0.1.tty1.fc15.x86_64. When I tried to power a machine off it did not. I had to push a power switch. Other recent kernels, including 2.6.36.1-8.rc1.fc15.x86_64, with the current systemd do power down when asked. Darn, back to the drawing board, that looked somewhat plausible though. (In reply to comment #17) > Darn, back to the drawing board, that looked somewhat plausible though. Just thinking loud. You diagostic did not fire up (I assume that there is one). So maybe something gets overwritten and this only looks like an issue in tty_open()? No, I copied the patch into the wrong tree and built an old kernel. It'll be in the next one. I tried your patch with 2.6.36.1 and booted with plymouth enabled. The boot tty of plymouth is /dev/tty7 and here's the output: tty_reopen: !test_bit(TTY_LDISC, &tty->flags) dev=tty7 I'll put money on it being DRM related and going away if you use 'nomodeset' :) I'll poke airlied. Eventhough it is VBox setup with no KMS drivers at all, i tried nomodeset but the trace still exists. Bah. Who the hell knows. I'll revert the BKL removal and see if that makes it go away... http://kyle.fedorapeople.org/kernel/2.6.36.1-9.tty2/ http://kyle.fedorapeople.org/kernel/2.6.36.1-10.fc15-tty/x86_64/ Actually, try this instead, please. I think it should fix the issue. Can you tell me what to revert (upstream commit id(s) would be enough) or an SRPM of the kernel you've provided, because I'm not using fedora. Thanks, (In reply to comment #24) > http://kyle.fedorapeople.org/kernel/2.6.36.1-10.fc15-tty/x86_64/ > > Actually, try this instead, please. I think it should fix the issue. Sorry! I got TWO hits with this kernel on the first try. The only difference is that now 'cgroup_disable=memory' does not help. I got only one hit with that, though; the one after "EXT3-fs (sda11): using internal journal" line (see attached dmesg output). The following line: tty_reopen: !test_bit(TTY_LDISC, &tty->flags) dev=tty1 ldisc=n_tty is an effect of diagnostic additions, I gather? It does power off again. That is good. Created attachment 462958 [details]
dmesg for 2.6.36.1-10.fc15.x86_64 with tty_io.c:1329 warnings
Damn, since that didn't fix it can you try the one in comment #23? What a bizarro bug... (Not surprised about the cgroup thing, it was a red herring from the start. This is some kind of uber subtle race...) (In reply to comment #29) > can you try the one in comment #23? On four reboots this warning did not show up. BKL is a pretty big hammer, though. Thanks, at least it means we're probably looking in the right place for the fix. Just to make sure we're not chasing our tails with something else, can you try the 2.6.37-rc3-git2 kernel here (with that patch added) http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git2.1/x86_64/ Thanks, Kyle (In reply to comment #32) > Just to make sure we're not chasing our tails with something else, can you try > the 2.6.37-rc3-git2 kernel here Hmm, that is what I see: [ 0.000000] Linux version 2.6.37-0.rc3.git2.1.fc15.x86_64 (kyle.bos.redhat.com) ... ..... [ 17.066468] ------------[ cut here ]------------ [ 17.102088] WARNING: at drivers/tty/tty_io.c:1332 tty_open+0x29d/0x48d() .... [ 23.476429] ------------[ cut here ]------------ [ 23.506674] WARNING: at drivers/tty/tty_io.c:1332 tty_open+0x29d/0x48d() .... Do you want the whole dmesg or you have enough > (with that patch added) I am not sure what is "that patch". (Apologies for a delay. A bit crazy day around here). Damn, I just saw it myself for the first time in recent memory (~25 boots according to messages...) on that. Sigh. 'That patch' the one Jiri posted to linux-kernel above. (I meant it was included in there, not that there was something to add on top.) http://kyle.fedorapeople.org/kernel/2.6.37-0.rc3.git3.2/x86_64/ Jiri posted a few more patches which should hopefully close the race window.. Can you try this one? I haven't seen it in ~25 reboots. (In reply to comment #36) > Can you try this one? I haven't seen it in ~25 reboots. So far so good. Of course this is a race but from the previous experience on my setup I would see results of such race already if it would be there. Cool, thanks, let us know if you see it. :) OK, fixes are in 2.6.36.1-10. |