Description of problem:Enable I/O AT in RHEL5 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
CONFIG_INTEL_IOATDMA is already turned on in the R5 configs. Check with Dell to see if this meets there needs, if true then close as currenrelease.
Minimum required version for ioatdma is 1.8 as per Intel. That is what's being tested at Dell by the Networking group. I do not see this version upstream as 2.6.19-rc3 still contains v1.7 which RH has already bundled in the last weekly refresh kernel for RHEL5 made available to Dell. Can we poll Intel for when they think they will make that version available upstream and then RH ?? I am doing the same from our end..
Escalating as possible beta blocker waiting for Intel to respond regarding ioatdma v1.8 requirement.
Will discuss this with the Intel Partner Manager. At this time, I do not have the data to justify raising this as an exception for RHEL5.0 GA.
If it is going to go in then it has to be before Beta2 is released.
Will ask for a 3-way call between RH, Dell and Intel. Intel has only told RH that base 2.6.18 provides full I/OAT support. If this is a bug, we need a better description. Per Amit's comment #2, it appears the code version of ioatdma Intel is requesting with the "fix" is NOT upstream yet. We will need a specific patch on top of the version of ioatdma in the RHEL5 beta (ioatdma-1.7?). Our QE group will not accept a package upgrade to a new level at this point in our beta cycle. If ioatdma-1.8 is required, this will be a RHEL5.1 candidate.
On Mon, 13 Nov 2006 17:11:44, Lei_Wang2 wrote: > Thomas is right. We have version 1.8 and will release 1.8 in next year > Q1. We didn't receive 1.9 from Intel. > > -----Original Message----- > From: Chenault, Thomas > Sent: Friday, November 10, 2006 3:30 PM > To: Bhutani, Amit; Hentosh, Robert; 'John Feeney'; 'Larry Troan'; Hull, > John > Cc: Wang, Lei > Subject: RE: I/O AT patch info and history > > As far as I know, we have nothing newer than 1.8. I have no insights > into the differences between 1.8 and 1.9. I have not had much > involvement with IOAT, so Lei may be better positioned to answer your > questions. > > > Thomas Chenault Based on the above, I'm closing this bug with status=NOTABUG. We can reopen the bug when we have a requirement to upgrade the ioatdma driver.
Agree.
Actually, insead of closing it, lets please mark this as a feature request against RHEL 5.1.0, so we can sync up at least at that point. v1.8 should be upstream by 2.6.20 per Intel.
Changing Summary to "ioaddma 1.8 required for better I/OAT performance" ioatdma version 1.8 or later is required by Dell. Target is RHEL5.1.
Larry- Please change version field to reflect 5.1.0
John@Intel- Upstream kernel (2.6.20) still has ioatdma module version 1.7. Intel was supposed to get 1.8 upstream a while ago (as early as 2.6.18). Can you please bring us up to speed on the challenges that Intel is facing in making this happen ?
The patch has been in the -mm tree for some time now. It is there because it wasn't in time for the .20 add window. Now that .20 shipped, it is being sent for the addition to .21 kernel. It was never (and I mean never!) intended for .18. The .18 kernel was where I/OAT was initally included. This change came about during .19 and we thought it could have made .20 but because of mail list disussion it didn't make the window. Also, this really isn't the place to discuss this since this topic has nothing to do with RH accepting this into 5.1. There is a feature request for it to be picked up for 5.1 but RH this really hasn't been discussed with RH. In the past RH has been very anti-patches for our drivers (even when the patches enable new HW). We also expect the same will happen here. This should be discussed in the Dell/Intel team discussions which happen each week.
(In reply to comment #15) > The patch has been in the -mm tree for some time now. It is there because it > wasn't in time for the .20 add window. Now that .20 shipped, it is being sent > for the addition to .21 kernel. > > It was never (and I mean never!) intended for .18. The .18 kernel was where > I/OAT was initally included. This change came about during .19 and we thought > it could have made .20 but because of mail list disussion it didn't make the > window. My bad. I stand corrected about my mis-assumption about Intel's efforts to incorporate the IO-AT perf enhancement patch (which generated an I/O write command after every 20th descriptor write) into the .18 upstream kernel. > > Also, this really isn't the place to discuss this since this topic has nothing > to do with RH accepting this into 5.1. There is a feature request for it to be > picked up for 5.1 but RH this really hasn't been discussed with RH. In the past > RH has been very anti-patches for our drivers (even when the patches enable new > HW). We also expect the same will happen here. This should be discussed in the > Dell/Intel team discussions which happen each week. Umm...I beg to differ with your statements. The Dell/Intel weekly team discussions are centered around the add-on (read: DKMS) driver releases that Dell would pre-load on top of the baseline Linux OS image. There is *no* focus what so ever during those calls (as much as I would like to see that behavior change) to track progress on patch inclusions, neither for upstream nor in linux distributions such as RHEL 5.0 or RHEL 5.1. Bugzilla is the place to discuss this.
After reading through the bug and the original issue tracker that created it, then opening a new issue tracker to track this specific issue, I'm raising this to URGENT so we can get a reading from Intel as to whether it is required for proper IOAT operation. Dell appears to be running under that assumption. Can we get s statement from Intel please. Setting to NEEDINFO of john_ronciak.
John, per your comment #15 above, it appears this version is also required to support new Intel hardware. Could you specify the PCI IDs of that hardware please so our partners understand whether they are inpacted.
The I/OAT driver needs to be updated for performance reasons. There is no new IDs being added. We have found some performance enhancements that can be made. We made these and sent it out to the -mm tree where it has been and tested. We as well as Dell woiuld like to see this added. There is minimal (if any) risk involved in applying the patch for 5.1.
John, Could you provide more detail about the patch(es) you referenced in comment #21? I see one patch in -mm that deals with ioat performance, the one where the ioat_chan->pending variable check goes from 20 to 4, dated 2/3/2007. Is that the one you were referring to? Is that all? Just trying to make sure I am doing the right thing.
Yes that is one of the 2 changes. Here they are in patch form that went into the -mm tree. The patches probably will need to be tweeked to apply to the RHEL5 tree. Let me know if you need the patches as an attachment and I can do that for you. -------------------------------------- commit bccee70c2fc53e41daa536ad6b5a115df999dffb Author: Chris Leech <christopher.leech> Date: Wed Mar 7 11:49:17 2007 -0800 I/OAT: Only offload copies for TCP when there will be a context switch The performance wins come with having the DMA copy engine doing the copies in parallel with the context switch. If there is enough data ready on the socket at recv time just use a regular copy. Signed-off-by: Chris Leech <christopher.leech> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 74c4d10..5ccd5e1 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1110,6 +1110,8 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, long timeo; struct task_struct *user_recv = NULL; int copied_early = 0; + int available = 0; + struct sk_buff *skb; lock_sock(sk); @@ -1136,7 +1138,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, #ifdef CONFIG_NET_DMA tp->ucopy.dma_chan = NULL; preempt_disable(); - if ((len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) && + skb = skb_peek_tail(&sk->sk_receive_queue); + if (skb) + available = TCP_SKB_CB(skb)->seq + skb->len - (*seq); + if ((available < target) && + (len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) && !sysctl_tcp_low_latency && __get_cpu_var(softnet_data).net_dma) { preempt_enable_no_resched(); tp->ucopy.pinned_list = dma_pin_iovec_pages(msg->msg_iov, len); @@ -1145,7 +1151,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, #endif do { - struct sk_buff *skb; u32 offset; /* Are we at urgent data? Stop if we have read anything or have SIGURG pending. */ @@ -1433,7 +1438,6 @@ skip_copy: #ifdef CONFIG_NET_DMA if (tp->ucopy.dma_chan) { - struct sk_buff *skb; dma_cookie_t done, used; dma_async_memcpy_issue_pending(tp->ucopy.dma_chan); commit 5b91cd202e786dce6256598ea2c5117d03f1be40 Author: Chris Leech <christopher.leech> Date: Wed Mar 7 11:49:15 2007 -0800 ioatdma: Push pending transactions to hardware more frequently Every 20 descriptors turns out to be to few append commands with newer/faster CPUs. Pushing every 4 still cuts down on MMIO writes to an acceptable level without letting the DMA engine run out of work. Signed-off-by: Chris Leech <christopher.leech> diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c index 8e87261..0f77a9d 100644 --- a/drivers/dma/ioatdma.c +++ b/drivers/dma/ioatdma.c @@ -310,7 +310,7 @@ static dma_cookie_t do_ioat_dma_memcpy(struct ioat_dma_chan *ioat_chan, list_splice_init(&new_chain, ioat_chan->used_desc.prev); ioat_chan->pending += desc_count; - if (ioat_chan->pending >= 20) { + if (ioat_chan->pending >= 4) { append = 1; ioat_chan->pending = 0; } @@ -818,7 +818,7 @@ static void __devexit ioat_remove(struct pci_dev *pdev) } /* MODULE API */ -MODULE_VERSION("1.7"); +MODULE_VERSION("1.9"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Intel Corporation");
Devel ack for posting upstream changes to R5.1.
Awaiting results from Dell on testing outcome. Setting Needinfo.
Still need test results. Note: The lack of testing may hinder the acceptance of the patch into RHEL5.1.
reply to comment #27 I applied the patch rhel 5.0 GA. Tested the patch no stats show up in /sys/class/dma/dma0chan[0123]/bytes_transferred. /sys/class/dma/dma0chan[0123]/in_use shows up with 1 (indicating ioat is operational) On RHEL 5.0 GA the stats show up correctly, are we missing something here? The way I tested is to scp a 500 meg iso image over the network. I used a PowerEdge 1950 with BIOS version 1.3.7. I enabled ioatdma in BIOS. The lspci output is attached. I didnt proceed to test further because I didnt see any point.
Created attachment 156370 [details] lspci output of the machine I tried this on.
Sandeep, Any chance of testing with the release that I created and sent to Charles via Robert Hentosh a couple of weeks ago? I'm sorry you had to go through the work of applying the patch yourself. John
I tried the test out with the kernel you mentioned but I get the same result. I used the command time < scp command > but I dont see any performance improvement when compared to the same system without i/oat. I dont see the stats in /sys/class/dma/dma0chan[0123]/bytes_transferred. /sys/class/dma/dma0chan[0123]/in_use is 1 indicating the i/oat is operational.
Flipping status back to ASSIGNED per comment #31.
What I know: from the lspci output provided, the system under test did have a "5000 Series Chipset DMA Engine" (pci id 8086:1a38) so ioatdma.ko should have loaded and I assume it did. What I think I know: I don't think /sys/class/dma/dma0chan[0123]/bytes_transferred. /sys/class/dma/dma0chan[0123]/in_use are good indicators of ioatdma.ko usage. Can Dell prove that these are valid for I/O AT? From what I see in the code, bytes_transferred is per-cpu variable displayed by dmaengine.ko which is a class device. This variable is incremented by any one of three inline macros, dma_async_memcpy_buf_to_buf(), dma_async_memcpy_buf_to_pg() and dma_async_memcpy_pg_to_pg() (see include/linux/dmaengine.h). Let's take dma_async_memcpy_buf_to_buf() for example. It is called by dma_memcpy_to_kernel_iovec() found in iovlock.c which is called by dma_memcpy_to_iovec() also in iovlock.c. dma_memcpy_to_iovec() is only called by dma_skb_copy_datagram_iovec() in net/core/user_dma(). The other functions act in a similar manner. So what does ioatdma do? The calling sequence is below: ioat_dma_memcpy_buf_to_buf() do_ioat_dma_memcpy() to_ioat_desc() or ioat_dma_alloc_descriptor() to_ioat_desc() ioatdma_chan_write8() writeb() I don't see how ioatdma.ko gets involved in bytes_transferred, but those more knowledgable in this realm can certainly point out the error in my logic. So how does one know if ioatdma is functional? I don't see where there are any counters like dmaengine has. There is a printk that is displayed when starting ("Intel(R) I/OAT DMA Engine found, %d channels") before it goes into an ioat_self_test() that displays only errors. Can one assume that if no errors are reported that it is working properly? Finally, it should be noted that this patch is not adding I/O AT, it is just enhancing it so this patch does not enable I/O AT, it just is supposed to improve it and make it more like upstream.
The patch in question actually cuts back on use of I/OAT in situations where performance will most likely be worse with offloaded copies, namely when there is enough data waiting on the socket so the process will not sleep and there is no parallel work to be done while waiting for the asynchronous copy to complete. The test case used here, a single scp, probably falls into this situation and so offloaded copies are not happening with the patched kernel. As to the counters, yes they are managed outside of the driver but track the number of copies and bytes copied passed into the driver. They work fine as an indication that asynchronous copies are being used.
Thanks, Chris. Sorry I had to drag you in on this. I stand corrected and appreciate the input. I would also like to apologize to Dell in general, and Sandeep in particular, for doubting the test procedure but I just didn't see the connection which gave me doubt. This is my first go-round with ioat and I am just trying (desperately) to get this tested. So with regard to testing, would simultaneous scps be better, Chris? Your comment says that a "single scp" provides no "parallel work". So, maybe my naivity is showing (whoops), but what if the system with ioat is bombarded by scps from other systems, since part of the patch is in tcp_rcvmsg(). Since it looks like there is some difficulty testing this patch, is there anything that Intel can do to help test this? Test rpms can be provided, just ask. Or perhaps a script to run?
Okay I understand, This is the first thing I do monday morning IST.
I used axel a download accelerator. It splits a large file into pieces and downloads the parts simultaneously over http. This tool seems simpler to use than miltiple instances of scp. I see some performance benefits in this case.
Okay, so since Sandeep sees some performance benefits, I will consider this a positive test result from Dell. Setting to post.
Since the upstream acceptance of this patch has been called into question, it was not included in the RHEL5.1 beta. Thus, I am moving this bugzilla out of RHEL5.1 to reflect this situation.
The I/O AT patchs above can be found in Linus' git tree with some slight modification. Another patch was applied that changed the declaration of a variable to have a scope more local to the fix, rather than generic to the whole function, since the code being added is included within an #ifdef/#endif. If the #ifdef is disabled, the what-was global variable would be flagged as "unused". In the spirit of keeping RHEL as close to upstream as possible, I will apply both patches to tcp.c, as well as the un-altered patch for ioatdma.c, ask Dell to re-test, and I will re-post for the 5.2 release, thus I am setting devel_ack to plus.
The built rpms can be found at people.redhat.com/jfeeney/.bz209411/*. I would really appreciate it if they could be tested in a manner similar to how their brothers were tested for RHEL5.1, since the code was slightly changed during its migration upstream. Thanks.
*** This bug has been marked as a duplicate of 253303 ***
*** Bug 253303 has been marked as a duplicate of this bug. ***
=Comment: #0================================================= Stephanie A. Glass <sglass.com> - 2007-08-14 15:23 EDT 1. Feature Id: 200960 Feature Name: Update Intel I/OAT driver to version 1.9+. Sponsor: xSeries Category: Device Drivers and IO Request Type: Driver - Update Version 2. Short Description: This requirement is to update the ioat module to version 1.9 or greater. Architecture: x86 x86_64 Architecture Specific ? Purely Common Code Affects Toolchain ? no Affects Installer ? no Affects Desktop ? no Affects Core Kernel (not mod)? no Affects Kernel Module ? yes 3. Describe the Business Case: Customers are starting to request i/oat support, the latest driver fixes critical bugs and includes enhancements to previous versions. Performance Assistance Required?: no 4. Sponsor Priority: 2 IBM Confidential: no Code Contribution ?: 3rd party code Upstream Acceptance: In Progress Component Release Version Target: ioat v1.9 code is still in -mm. Suspect this will show up in 2.6.22. Need redhat to pick this code up directly from kernel.org. 5. Hardware to Red Hat?: N/A 6. PM Contact: Monte Knutson, mknutson.com, 503-894-1495 7. Technical Contact: Chris McDermot, lcm.com, 503-578-5726 *** Bug 209411 has been marked as a duplicate of this bug. *** *** This bug has been marked as a duplicate of 209411 ***
With this bugzilla re-opened, I would again request that the rpms found at people.redhat.com/jfeeney/.bz209411/* be tested by Dell in a manner similar to how their brothers were tested for RHEL5.1, since the code was slightly changed during its migration upstream. Much appreciated. Thanks.
*** Bug 273441 has been marked as a duplicate of this bug. ***
I could not get the reliable results I got last time. It would help if I get a the latest test kernel patched with ioat-1.9. I will test further and get back to you.
Okay, patch has been applied to more recent kernel code 2.6.18-55 for testing. See my people page (http://people.redhat.com/jfeeney/.bz209411) Thanks so much. (Crossing my fingers that this shows definitive results.)
One question was asked on the list: is the patch in linus' tree? Please let us know... thanks.
Yes, it finally made it to Linus' tree. It went in around the 2.6.22-git4 timeframe. But now that you ask, I notice that part of it (ioatdma.c) has been replaced as of 11/15/2007 along with a significant revamping of the driver. The ipv4/tcp.c tcp_recvmsg() piece of the patch is still intact.
QE ack for RHEL5.2. Will need Dell's assistance in testing.
This enhancement request was evaluated by the full Red Hat Enterprise Linux team for inclusion in a Red Hat Enterprise Linux minor release. As a result of this evaluation, Red Hat has tentatively approved inclusion of this feature in the next Red Hat Enterprise Linux Update minor release. While it is a goal to include this enhancement in the next minor release of Red Hat Enterprise Linux, the enhancement is not yet committed for inclusion in the next minor release pending the next phase of actual code integration and successful Red Hat and partner testing.
I ran an iperf server on the ioat capable card and ran 10 threads of the client on another server connected to the same switch. But I still don't see any performance imporvement when ioatdma is enabled. The results seems to consistantly indicate this. The network is an 100Mbps Full duplex network. the tcp windows sizes are set to the default both on client and server.
At 100Mbps you will not see any difference. It needs to be tested on a 1Gbps (or better yet a 10Gbps) network and then fully stressed.
Any chance of trying John's testing requirement, Sandeep?
*** Bug 252949 has been marked as a duplicate of this bug. ***
I seem to get better performance on when I have more threads running in parallel( > 40 threads of iperf) high stress.
in 2.6.18-72.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
------- Comment From drosser.com 2008-03-19 14:31 EDT------- I have verified the "ioatdma" module loads on one of our "ioat capable" systems. The only IBM systems I am sure are "ioat capable" are some of our blades. The physical cards I have in my possession are PCI-express and won't fit into blades. I am investigating whether any of our rack-chassis systems are "ioat capable".
------- Comment From drosser.com 2008-03-20 13:57 EDT------- This might be an important data point: # cat /sys/module/ioatdma/version 1.9
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot1--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot3--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
------- Comment From drosser.com 2008-04-04 18:54 EDT------- Did some final testing with snap2. Got to do some end-to-end testing (tests passed) and also noticed the driver version was bumped to 2.0 Confirming this should be closed.
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot4--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
As pointed out in #66 the code has been updated so it is now verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html