Bug 76959 (keyrepeat)
Summary: | applications recieve too many keyboard events from X | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Gordon Messmer <gordon.messmer> |
Component: | XFree86 | Assignee: | Mike A. Harris <mharris> |
Status: | CLOSED ERRATA | QA Contact: | David Lawrence <dkl> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | CC: | aoliva, bostjan, dab0816, jifl-bugzilla, jonkv, k.georgiou, mitr, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-02-13 08:46:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gordon Messmer
2002-10-30 06:12:45 UTC
Red Hat does not support VMware as an installation target, nor do we support systems which are running VMware as the VMware kernel modules effectively take over the kernel. You'll need to talk to VMware technical support for resolution of this issue. I also pointed out that I see this problem on a laptop running only Red Hat Linux 8.0. There, I'm running GNOME 2 with Metacity, usually with Evolution, gnome-terminal, and GNU Emacs. The open bug report at gnome.org includes no mention of VMware, but many users who see this problem. I only mentioned VMware at all because it causes the system I run it on to slow down, and exhibit the problem more often. The laptop, which is slow to begin with, shows the problem much more often. Well, it's either a kernel bug then, or some hardware issue as _nothing_ has changed in XFree86 that would cause this behaviour to occur. Prove to me that it is an XFree86 bug, and I'll investigate fixing it. I'm not going to waste my time debugging something I consider to not be a bug in XFree86 however, and of which is not reproduceable on any of 5 machines here. Our kernel maintainer is CC'd currently for comment in case perhaps there is a known kernel issue. Either way, XFree86 bug or not (which I severely doubt), I can't do squat about it if I can't reproduce it or have detailed technical analysis of the problem. Reassigning to kernel component for comment. I've just done some more testing which may or may not be useful. On my laptop (Dell Inspiron 7000, 333Mhz Mobile Pentium II, ATI 3D Rage LT Pro video), which is rather slow, the problem is pretty easy to reproduce. GNOME 2 defaults to setting the keyboard repreat rate to a 500 millisecond delay, with 30 keys per second repeat rate. KDE defaults to a 660 millisecond delay, with a 25 keys per second repeat rate. I observe the problems under GNOME's default settings if I busy the system and type into the gnome terminal. I do not observer the problem if I busy the system in the same way (run glforestfire fullscreen on a different desktop) when I type into the KDE terminal. Given that, I tried setting the keyboard repeat rate to "rate 500 30" under KDE, busied the system, and typed into the KDE terminal. Keys began repeating at random, just like they did in GNOME. I also tried setting the rate to "rate 660 25" under GNOME, and could no longer reproduce the key repeating problem. I think this eliminates the possiblility that the problem is in the X client software, and points strongly toward a problem in XFree86 (or the kernel, I guess...) The problem should be very easy to reproduce if you have a slower machine. Simply run glforestfire in a full screen window and type into another window, such as a terminal or text editor. This bug continues to affect my laptop (particularly), making it a PITA to use. I don't know if the problem is caused by a kernel change or not, but I'd like to speculate about the symptoms that I see. The problem occurs when I press a key, and the application with focus freezes momentarily before receiving it. So... What happens if the X server gets a keypress event, passes it on to the X clients, but blocks in doing so for a period of time greater than the keyboard repeat delay? When the write() sending the keypress event off returns, what is the state of the key? Could the X server believe that it's still down and send off another event before it gets the keyrelease event? There are quite a few bug reports, which I believe are the same as this one against various parts of X and Gnome. I had the same problem on my 2 laptops - keyboards on both were locking up after using shortcut keys for switching workspaces. After preparing a metacity log file for Havoc he saw that extra keyboard events were being sent. The bug reports are: 74760, 74635 and probably others. Like others have reported as well this problem only happens with Red Hat kernels - when using stock 2.4.19 or 2.4.20-rc1/2 the problem goes away. I've also tried to compile a custom RH 2.4.18-18 kernel, by taking out a few things such as low latency, etc - didn't help. Any idea what I should try? I'd be willing to try patches, etc Thanks for the report. I had suspected that the bug was caused by the HZ change in Red Hat's kernel, but removing the patch entirely produced a source tree that didn't complete building for i686. Based on your report, I'll give reverting that change one more try and see if the problem goes away. How about just setting the HZ value to what it was before? How about upgrading to the latest erratum kernel? I have tried that (i.e. running 2.4.18-18.8.0) and I have the same problem. I've been able to reproduce the problem using the latest erratum i686 kernel recompiled to use a HZ value of 100. I'll get to try an unpatched kernel in the next week, hopefully. Same here - I tried with HZ=100 as well as disabling the low latency. I've just done a clean install of RHL 8.0 on the slow laptop, and with a new user (all default settings), I've been able to reproduce the problem with the default kernel, kernel-2.4.18-14.i686.rpm, as well as kernel-2.4.18-18.8.0.i386.rpm (to test for arch-dependent problems), and kernel-2.4.18-18.7.x.i686.rpm and kernel-2.4.18-3.i686.rpm from 7.3. It doesn't look like this is a kernel bug. Can we move this back to XFree86 and ask mharris to look at it again? I would still say this is a kernel bug - for me the problem goes away if I use non-Redhat kernel. I've tried quite a few 2.4.19, 2.4.20-rc2/3 a few -ac ones - I can't reproduce the problem with any of them. I also see this problem, even when typing at a tcsh shell prompt, on a fully "up2date" redhat 8.0 machine with 512Mb and a 1Ghz Athlon processor when it gets loaded down. It is extremely pronounced when using msword inside of vmware on the machine. It is basically unusable. Please look into this before I am forced to abandon 8.0 I'm unable to produce this on any kernel, with any XFree86 release. There is nothing at all that has changed in XFree86 that could all of a sudden cause a problem like this, however various people have noted this problem when changing kernels. Something is broken either in the kernel, or in the hardware. If someone believes this is an X problem, they'll have to convince me by debugging it and pointing out the flaw in the X source code directly, otherwise as far as I'm concerned it is a kernel problem. Ultimately someone who can actually reproduce this is going to have to debug it, whatever the problem is. I agree this is not an XFree problem - to me it seems clear it's a kernel problem (if hardware was the problem I would imagine it would show on all the various kernel versions used). Any idea though how we could try and figure out which part of the kernel. The problem is quite hard to reproduce but since stock 2.4.19 or 20 works I can only imagine one of the RH patches or it's influence on another subsystem must be causing the problem. Well, damndest of all things. After reproducing this with kernels from both 7.3 and 8.0, including the latest erratum, I built and tested stock 2.4.18 and 2.4.20. Stock 2.4.18 has the problem, but I can't for my best efforts duplicate the problem on 2.4.20. It doesn't look like this is the result of any patches that Red Hat's applied to the kernel, but I wonder what of relevance has changed in between 2.4.18 and 2.4.20... Sorry for my english. I use a RH 8.0 and i have the same problem. I see that the RedHat kernel set the priority of X to -10. If i renice the X process to 0, i don't have this problem. I don't have this problem with official Linux kernel (2.4.19, 2.4.20). With i switch between console and come back to the X console the kernel set again to -10 the priority of the X process. Can someone tell me which patch do this "fun" stuff (i don't like this). I still have the same problem with Rawhide kernel 2.4.20-2 (but not with stock 2.4.20 from kernel.org). After a longer testing run with both machines I had a problem with 2.4.18-??? I can now confirm that upgrade to 2.4.20-2.2/6/9 (all from rawhide) fixes the problem completely (what I saw in my previous post must have been something else). 2.4.20-2.2 did lock up my machine one so far but I guess that's to be expected when running a rawhide kernel. Can I just throw my hat into the ring here with something that I believe to be related? I'm using 2.4.18-19.7.x on a 7.3 system. I had previously been using 2.4.9-something that was the previous errata kernel for 7.3 without problems. After updating to this latest errata kernel (2.4.18-19.7.x) I noticed oddities with my Vmware client (windows client, linux host). In the windows client I got odd keyboard behaviour - if I type the keys repeat of their own accord, e.g. I type "this is a test" and it comes out "ttttttttttthis is a test" I also noticed while VMware was running the CPU load seemed to be very very high. When trying to do pretty much anything in the client (open a window, run a program), the CPU load monitor showed it staying at 100% until the operation completed, and the operation was completing *very* slowly. In fact the whole of the system, i.e. normal linux, not just VMware, was at a deathly crawl. I finally noticed from 'top' that one of the VMware threads was running at priority -10 and devouring CPU. When I reniced that to 0 all of a sudden everything worked fine as it had with 2.4.9! I suspect a form of race condition between the threads meant that the -10 niced portion was starving the rest of the system of CPU. But why had it started doing this? It turns out it isn't VMware nice'ing itself to -10, but I believe a Red Hat kernel patch to arch/i386/kernel/ioport.c. Search for set_user_nice. I understand this was done to improve X performance. Unfortunately it isn't just X that is affected, and the result is that applications that rely on being close to the hardware like VMware can also break. Assuming I'm right, you should find some other way to improve X performance. Since X runs as root, why don't you just adjust the XFree86 code to call nice(-10)? Hope this helps, Jifl It isn't a change that was done for X, but was a change that was done in order to give processes that do I/O more CPU to improve interactivity. I have never liked the idea of the patch from the start, and would prefer that the user set the priority of X themself, or have the X server do it itself internally (or other process). I believe that our latest kernels do not have this renicing patch any more (good riddance IMHO), and any perceived or real problems associated with it will be removed when the user upgrades to a newer kernel not containing the patch. Some users as well as developers nonetheless still believe that renicing the X server improves interactivity, performance, or whatever, and want the ability to easily renice the X server. I will be investigating adding an option to the X server itself to implement this in the future. Arjan, if the latest 8.0 erratum kernel has this patch removed, we can probably close this as ERRATA now. I haven't used the laptop in a while, so it's been a while since I tested, but... Contrary to my previous report, I can reproduce the problem on stock 2.4.20. It occurs less frequently, but still occurs. I booted the same kernel that I previously tested and ran two instances of glforestfire instead of just one to add more load to the system, and ran gedit. While typing, some characters came in double. X definitely seems more responsive, and less troublesome without the renicing patch, but the problem still exists. This happens to me on an 800 MHz P3 running Red Hat 8.0. It started happening around Christmas, when I upgraded from 7.3, and it's still happening using the latest kernel (kernel-2.4.18-24.8.0). It happens in various places, but an almost certain way of reproducing this is creating new tabs in Mozilla. I usually get two new tabs when I press ctrl-T -- presumably because creating a new tab is a CPU-intensive operation. If I press Ctrl-T when an empty tab is showing, I'm more likely to get only one new tab (less work required to remove the old contents?). And when I switch between tabs I often skip ahead two or three tabs at a time. This is not Mozilla's fault: It started happening when I upgraded from 7.3 to 8.0, without changing the version of Mozilla I was using. Based on a comment from a user on the redhat-list, I tried disabling XKEYBOARD in the X server by uncommenting the Option "XkbDisable" in XF86Config. When I disable that extension and test by running two full screen glforestfire instances and typing in gedit, I can't reproduce the problem after typing the alphabet 10 times. When I turn the extension back on (comment out the XkbDisable option), and run the same test, I get repeated letters every single time I type the alphabet. This test was done on the 333 Mhz laptop described earlier, running stock Red Hat Linux 8.0. I think this strongly suggests that the bug is in the XKEYBOARD extension in XFree86, and not the kernel. i've found & described the bug at (indeed in xfree86) at: http://lists.eazel.com/pipermail/sawfish/2003-March/004838.html Appears to be an XFree86 bug in fact. FWIW, I don't recall having experienced this problem with Fedora Core test3, maybe not even with test2. I can still reproduce it on FC test 3 (same glforestfire test I always used). I had hoped that Mike Harris would have pushed the patch from XFree86 4.3.99.2 into Fedora's tree, but it looks like this bug hasn't been assigned back to him. I've asked Michal to attach the patch to this bug report, since eazel.com seems to have disappeared not long after he posted that URL, and I can't locate the patch itself. here is my post to the xfree86 mailing list: http://www.mail-archive.com/xfree86@xfree86.org/msg04354.html (hopefully this one will last longer than the sawfish one). As a side note, i have another problem with xfree86 4.3.99.6 (i don't use RH): sometimes pressing Meta, Control, then releasing Control, and then Meta, will make X not notice the release of Meta. When i have time, i'd like to find out. The URL "http://lists.eazel.com/pipermail/sawfish/2003-March/004838.html" from comment 28 above, is invalid. I read the email from comment #32, and don't see a patch there. Since XFree86 4.3.0 is in a non-developmental state of stability right now however, I'm quite reserved about applying patches that may or may not cause unknown breakage/regression for part of the userbase in the name of a quick-fix for a problem reported by only a few users. Is there a proper fix available, and if so, has it been reported in XFree86 bugzilla upstream? If not, please report this upstream and attach whatever patch is deemed necessary to fix it, so that it gets fixed for 4.4.0 if it isn't already. I'd like to see a fix for this problem get into upstream 4.3.0 stable branch also before considering it for our 4.3.0. Please comment. As far as I can tell, it's been reported and fixed upstream. http://xfree86.linuxforum.net/cvs/changes.html: 60. Fix for spontaneous repeated keyboard events during sync grab (#A.1713, Michal Maruska). Fantastic. Ok, I've backported the patch to 4.3.0 and it will be included in 4.3.0-48 and later builds in rawhide, and also in future 4.3.0 erratum, assuming no regressions are observed. Thanks for tracking this issue down, as it would have been very difficult if not impossible to fix this without being able to reproduce it personally. Since I couldn't reproduce it originally, I can't test the fix, so please test the new build from rawhide to make sure the problem is fixed now. Setting bug status to MODIFIED, pending confirmation that the fix works. Please close bug RAWHIDE once confirmed, or set to ASSIGNED if the problem recurs. If this patch introduces new regressions, please file a new bug report. Also, if you are aware of any other bugs open in bugzilla, that are caused due to this problem, please point out the bug ID's here for me to investigate and/or dupe them. Thanks again. An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-059.html |