Bug 501026
Summary: | 'serial8250: too much work for irq4' message when viewing serial console on SMP full-virtualized xen domU | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Casey Dahlin <cdahlin> |
Component: | xen | Assignee: | Michal Novotny <minovotn> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.3 | CC: | areis, armbru, clalance, drjones, jzheng, llim, mfuruta, minovotn, mounesh.b, mrezanin, pbonzini, qcai, riaanvn, tao, unicell, vanhoof, xen-maint |
Target Milestone: | rc | ||
Target Release: | 5.6 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | xen-3.0.3-118.el5 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-01-13 22:17:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 514499, 557597 | ||
Attachments: |
Description
Casey Dahlin
2009-05-15 14:55:48 UTC
This patch also addresses the same area: http://marc.info/?l=linux-serial&m=121863945506976&w=2 Chris Lalancette It appears the patch isn't fixing the customer's issue. I'll see if I can get this to recreate on my test machine here so we can experiment with these patches locally, and/or to dig deeper if they're failing to plug the hole. Was the patch in http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/3367.html ever tested by the customer? It is not even in upstream QEMU. I can build a test package for it if needed. This bug report has two patches attached. Apparently one of them was built and tested (with negative result), and the other one was not. The problem is, it is not clear which one was tested. I requested additional information from cdahlin about which patch was tested, but I got no response so far. I can prepare both a patched xen package with the QEMU patch, or a patched kernel, but it would be nice to know which one wasn't tried yet. The patch posted by clalance in comment #2 is the one that was tested, but looking back it appears I only got the second of the two patches in the email. The patch I originally posted I don't think I could get to apply, which was why I escalated. PING, time is looming for 5.5 Event posted on 2009-11-16 02:43:39 EST by tsato Hi Paolo, NEC tested the test packages(upgraded xen and xen-libs and rebooted) on RHEL5.4(x86_64), but the messages are still printed out. # rpm -q xen xen-libs xen-3.0.3-96.el5.gc6adf02 xen-libs-3.0.3-96.el5.gc6adf02 xen-libs-3.0.3-96.el5.gc6adf02 --- dmesg ------------------------------------------------------------------ mtrr: type mismatch for f0000000,100000 old: uncachable new: write-combining mtrr: type mismatch for f0000000,400000 old: uncachable new: write-combining serial8250: too much work for irq4 serial8250: too much work for irq4 serial8250: too much work for irq4 serial8250: too much work for irq4 serial8250: too much work for irq4 serial8250: too much work for irq4 serial8250: too much work for irq4 serial8250: too much work for irq4 ---------------------------------------------------------------------------- Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by cdahlin issue 296313 Created attachment 373114 [details]
patch that didn't pass testing
Thanks, I attach the backported patch for future reference.
Created attachment 373116 [details]
patch that didn't pass testing
Created attachment 373117 [details]
patch that didn't pass testing
sorry for the repeated mistake
*** Bug 498033 has been marked as a duplicate of this bug. *** Note: bug 498033 (just marked as a duplicate) has some useful background information. I've seen this message on https://inventory.engineering.redhat.com/view/amd-dinar-03.lab.bos.redhat.com when installing a kernel rpm with 'rpm -ivh'. I was on a -164 HVM guest and installing from an nfs mount. This also occurs on F13 FV guests. I played a little bit with slowing serial port rate based on patch attached by Paolo. Even if I slow rate too much (causing problems in guest), I was able to see error message. This is arguably a guest kernel bug, since there is actually no harm if "so much work" is given to irq4. I wonder if we shouldn't close this as WONTFIX. I wouldn't want to call this a kernel bug. A real UART has pretty well-defined timing behavior: if you program it to a certain bit rate, you can rely on its FIFO not emptying faster than that. QEMU's UART emulation does not emulate proper timing at all. Continuous serial I/O can easily overwhelm the guest. Linux detects this "can't happen" condition, and takes proper action to protect itself. "Can't happen" conditions are usually a sign of a bug, so Linux reports it. Yes, we can "fix" it in the Linux kernel. It's really a work-around for broken hardware, where the hardware happens to be virtual. We normally probe the hardware for flaws before we enable work-arounds. How to do that? What about older and non-Linux guests? They're prone to stumble over a misbehaving UART, too. Testing shows that qemu serial driver is not able to measure rate of incoming data properly - proposed patch's counter have only 0 and 1 value so no limiting ever done. I also do not see the way how to determine proper limiting - kernel is sending data too fast (problem is when guest writes to serial, not when it reads). Well, the issue is not about reading or writing but it's because of missing implementation of proper limiting based on the baud rate defined. I'm currently working on implementing that kind of limiting to honor the specifications in this respect. The data are going there as they're being sent by the guest and it's trying to process all the interrupts. From what I know the Windows guests are not having issues with that and they can cope with this just fine unlike 8250 serial driver in the Linux kernel codes that complains with message mentioned in this bug's summary. What we need is basically the implementation of: (read_data_burst + write_data_burst * 8) <= baudrate which means that the data transfer rate is lower than the baud rate itself to make it working fine. The multiplication by 8 is necessary since baud rate itself is in bits per second (bps) instead of bytes per second and the burst variable will be in bytes as measured (computed) from data coming through the ioport_{read|write} functions. The implementation is a little tricky since we need to implement a timer function to periodically check for whether the transfer rate didn't already exceed the baud rate (which is basically the reason of the "serial8250: too much work for irq4" message since the data are being transmitted on a much higher rate than allowed). Michal Created attachment 446305 [details]
Implement rate limiting to qemu-dm's serial port implementation
Hi,
this is the patch I've done to this one that's implementing the rate limiting stuff. It's been tested on RHEL 5.5 x86_64 dom0 using two test-cases:
1) installing the RPM inside the Linux HVM guest
2) doing copy & paste of 18 kB text into the HVM guest's text editor (vim)
Both the cases were working fine with no annoying "too much work for irq4" messages. Could you please ask customer for retesting ?
Thanks,
Michal
Hi Masaki, I've build the package with those patches and they're on my PRC site at: http://people.redhat.com/minovotn/xen/ Could you please pass this URL to the customer for testing? Thanks, Michal Created attachment 446884 [details] Implement rate limiting to qemu-dm's serial port implementation using variable rate checks value This is a slightly modified version of my patch for serial port using the variable value of rate checks per second, i.e. different approach than previous version was using. The code has been rewritten after investigation of kernel code and testing using various baud rates since the previous version was not working correctly using baud rate of e.g. 57600 bps (i.e. it was working just with some guest settings). The version of RPMs at http://people.redhat.com/minovotn/xen has been updated to use this version of the patch (and now it's suffixed serial). Michal Fix built into xen-3.0.3-118.el5 I can reproduce this bug on xen-3.0.3-117.el5 by these steps: 1. append the console param to the hvm guest's kernel cmd line in the grub.conf. console=ttyS0,115200n8 (this is done with guestfish). 2. create the domain and attach to its console, assigning more than 1 vcpus: $ xm create -c hvm1.cfg serial=pty vcpus=2 3. when the boot up is done and logged into the domain on its console, issue the 'yes' command: $ yes this produces a lot of "serial8250: too much work for irq4" mixed in the output of the 'yes' command. Also I tried ifconfig like comment 39 in this step, just the same. After upgraded to xen-3.0.3-118.el5 the reproducer above does not cause this issue any more, so I'm putting this bug into VERIFIED. Thanks. Additionally I've tested with console speed = 9600. Situations are the same with 115200. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0031.html With advisory RHBA-2011:0031-1 (link in Comment 49), I can still reproduce this bug. And I just found out it can be fixed by backporting 16550A support from upstream QEMU or higher xen version (3.3) > And I just found out it can be fixed by backporting 16550A support from
> upstream QEMU or higher xen version (3.3)
That's too intrusive right now, unfortunately.
The bug is much less frequent (and more importantly, it is bearable) with RHBA-2011:0031-1.
|