Bug 103275

Summary: Slow serial I/O responce
Product: Red Hat Enterprise Linux 3 Reporter: Scott Weathers <sweathers>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: dwmw2, jlamb
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-10-22 12:24:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 103278    

Description Scott Weathers 2003-08-28 13:46:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705)

Description of problem:
We have observed significant serial port slowness under Red Hat when 
communicating to serial devices, not modems.  

We currently are in the process of porting our application from QNX 4.25 to 
Red Hat.  Our application depends heavily on high level protocol serial 
communications to control 1 to many of serial devices on 1 or more serial 
ports.  This level of control can be as simple as reading/writing to a 
hardware register or polling a device to determine it's state (i.e. state 
machine) so our software can perform an action the device being polled or 
another device.  When we run our software under Red Had we donât get the same 
type of serial throughput or device response that we have seen in the past 
from QNX.


Version-Release number of selected component (if applicable):
Kernel 2.4.21-1.1931.2.411.entsmp

How reproducible:
Always

Steps to Reproduce:
1.Send text data out a serial port
2.Observer the speed at which it displays on the device LCD
3.
    

Actual Results:  Text displays slowly on the LCD screen

Expected Results:  There should be no observed delay of the text being 
displayed on the screen

Additional info:

Please feel free to contact me directly regarding this issue.
Scott Weathers
Project Lead/Senior Software Engineer
Toptech Systems, Inc
280 Hunt Park Cove
Longwood, FL 32750
Fax: (407) 332-1802
Phone: (407) 332-1774 x208
E-mail: sweathers
www.toptech.com

Comment 1 Arjan van de Ven 2003-08-28 13:50:03 UTC
Red Hat Enterprise Linux is not a hard realtime operating system like QNX is. In
several places latency has been traded off in favor of throughput. It also
greatly depends on the exact ways you drive the UART and what the userspace
application writing to the serial port does.


Comment 2 Scott Weathers 2003-08-28 15:31:54 UTC
It appears based on the above quick response that the bug has not been 
researched fully.  Are you suggesting that Red Hat Enterprise should not be 
used as the platform for a process control system?  Can you please explain 
your statement: âIn several places latency has been traded off in favor of 
throughputâ?  Our system relies heavily on this type of serial communication 
in some cases we are required based on the device we are communicating to use 
9600 baud and the distance to the device could be 500 to 1000 feet.  Will this 
be bug researched further?  If not, I need to know ASAP, so I can begin to 
look at another platform for our system.  I find it hard to believe that 
Microsoft Windows does a better job with its serial I/O than Red Hat Linux.  

We have almost 1000 existing systems plus a similarly large number of systems 
set to roll out in Europe and Asia; this is a very serious issue to our 
company and I would hope that it could be resolved quickly so we can continue 
to use Red Hat in the future


Comment 3 Arjan van de Ven 2003-08-28 15:38:12 UTC
I assume you have investigated all serial port settings into great detail (eg
made sure all fifo settings are optimal etc etc).
Red Hat *Enterprise* Linux, which is different from Red Hat Linux, does not have
"Low latency" as requirement currently. Your description of your application
makes it sound that it basically wants soft real time behavior (or a good
approximation thereof), which mostly comes down to having good latency. QNX is
an operating system that has a pretty high focus on real time behavior, while
RHEL has a focus on server performance. Red Hat Linux has a focus on the
consumer market, for which latency is important again. Can you try a recent RHL
kernel to see if that performs better than RHEL in your application ?





Comment 4 Jennifer E. Lamb 2003-08-28 15:44:36 UTC
Customer has tried RHL 9 and RHEL 3 Beta 1 latest kernels.  RHEL 3 Beta 1 has
worked the best so far.

Comment 5 David Woodhouse 2003-08-29 15:04:21 UTC
Please could you confirm whether you are having problems with thoughput,
latency, or both?

Your 'steps to reproduce' imply only throughput -- which is odd, since in the
absence of flow control we should just spew characters out the serial port at
9600 baud unconditionally -- assuming 8N1 that'll be 872 characters per second,
or one character about every 1.15ms. 

Can you double-check that your UART is detected correctly (presumably as a
16550A) and hence that we're using the FIFO? What is the output of the command
"grep ttyS /var/log/dmesg" ? What serial port are you using?


Comment 6 Scott Weathers 2003-08-29 15:49:10 UTC
throughput and latency are very closely related.

We are currently trying to use ttyS0, with our testing.

Output from  grep ttyS /var/log/dmesg
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A

FYI FROM PREVIOUS E-MAIL:
Please communicate to the kernel engineer that we don't only run at 9600 baud, 
most of our testing has been 38400 baud but there are cases where we will run 
as slow as 9600 baud.  It all depends on the distance from the device to the 
PC.  Most if not all the devices we communicate with in our industry use rs-
232 or rs-485/422 serial I/O with no flow control.

Is the Linux serial driver expecting to see hardware or software flow control 
signals on the serial port (i.e. CTS, DSR, DTR etc)?



Comment 7 Scott Weathers 2003-08-29 15:52:37 UTC
Sorry first line is missing from previous comment it should read...

I would consider the problem we are seeing a throughput issue; however, 
throughput and latency are very closely related.




Comment 8 David Woodhouse 2003-08-29 16:50:45 UTC
Throughput and latency are often related -- in the case where you're just
spewing out data with no flow control, there should be no latency involved
except the time it takes to go from an interrupt caused by the UART FIFO getting
low to the kernel refilling the FIFO. Since the kernel keeps its own internal
flip buffer in addition to the hardware FIFO, this shouldn't even require going
back to userspace each time -- it just shouldn't run out of data to send. 

How slow _is_ it going?

If you've configured the software to not use flow control, the Linux serial
driver will ignore all flow control signals. I could understand latency on
unblocking output becoming a problem if the receiving side is repeatedly
throttling and then unthrottling -- but with no flow control I can't see how
that's happening. 

Comment 9 Scott Weathers 2003-08-29 17:39:53 UTC
I am sure you would agree that slow is a relative term base on the observer, 
so here is how we have determined we have an issue.  We have connect the same 
serial device to a term server, that allows us to communicate to the device 
over a network card, when connected this way we find the device displays data 
at the same rate as on the QNX platform.  But when the same device is 
connecting to the serial port the visual display of data on the device screen 
is noticeably slower.  (i.e. waiting for user prompts to display)

If you are not familiar with a term server here is a quick definition, a term 
servers is a network device that has multiple serial ports on it.  We send the 
same protocoled message to the device on the term server as we do when the 
device is connected to the serial port; the only addition is that the message 
is wrapped in an IP packet.  If you where to up a loop back device on the term 
server port and telnet to the term server on that port 2001 for example, every 
thing typed should be echoed on the screen just as it would with serial port 
loop back. 


Comment 10 David Woodhouse 2003-08-29 21:56:50 UTC
I'd be very interested in getting more quantitative data, if possible. In the
case where you're sending bulk data at a fixed speed with no flow control,
'slow' is very much an objective measurement, not subjective.

I'd like to see if you're seeing bursts of, say, 16 or 256 characters at a time
at 'full' speed interspersed with idle periods, or if it's a different pattern
of sending. This would give clues as to what the problem is.

Are you using IDE disk drives in PIO mode? Anything else which might disable
interrupts for long periods of time?

Comment 11 Scott Weathers 2003-09-02 15:16:58 UTC
I am in the process of modifying one of our serial test programs to log some 
times, so I can confirm that we are see bursts of data. 

I am not familiar with PIO mode on a hard driver, base on my research it looks 
like we are in PIO mode.  Is there a way to disable this? CMOS or Kernel?

Boot log:
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH4: IDE controller at PCI slot 00:1f.1
PCI: Found IRQ 10 for device 00:1f.1
PCI: Sharing IRQ 10 with 00:1d.2
ICH4: chipset revision 1
ICH4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD200BB-75AUA1, ATA DISK drive
blk: queue c0415e80, I/O limit 4095Mb (mask 0xffffffff)
hdc: SAMSUNG CD-ROM SC-152L, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=2434/255/63, UDMA(100)


Comment 12 David Woodhouse 2003-09-02 15:24:23 UTC
Your hard drive is doing DMA. You could try running 'hdparm -u1 /dev/hda' to
enable interrupt unmasking -- but that is most effective when the kernel was
doing byte-at-a-time PIO transfers with interrupts disabled; I'm not sure how
much it helps with DMA.

Comment 13 Arjan van de Ven 2003-09-02 16:10:13 UTC
doesn't do a thing with DMA afaik.
hdparm -I /dev/hda is also a nice way to get drive (settings) info.