Bug 842275
Summary: | Add support for using scatter-gather for large bulk transfers to usbdevfs | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Hans de Goede <hdegoede> | ||||||
Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.3 | CC: | dzickus | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-09-10 17:37:13 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Hans de Goede
2012-07-23 10:21:33 UTC
This seems straightforward for 6.4. Shouldn't be to hard to backport assuming no complication arise during upstream soaking. Cheers, Don This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Hi, I have uploaded a kernel with a fix for you issue here: http://people.redhat.com/dzickus/rhel6/.c2965718c6bcc035e6d75aaac0cea12d47e82ffa/ If you have the time, please test and let me know if your issue is resolved. Thanks, Don (In reply to comment #3) > Hi, > > I have uploaded a kernel with a fix for you issue here: > > http://people.redhat.com/dzickus/rhel6/. > c2965718c6bcc035e6d75aaac0cea12d47e82ffa/ > > If you have the time, please test and let me know if your issue is resolved. > Hi, I'm afraid that this version causes a panic due to a NULL pointer dereference in free_async. If you cannot spot / find the error yourself, please attach the backported patch here and I'll see if I can find it. Thanks, Hans Hi Hans, Unfortunately, I can't spot the problem but I think I know what happened. RHEL-6 doesn't have commit 910f8d0 USB: Change the scatterlist type in struct urb This removes an extra sg pointer from the urb. So when I backported your upsream commit 3d97ff6 usbdevfs: Use scatter-gather lists for large bulk transfers It failed to compile. So I made some simple changes to add in the extra sg-> field. It seems like it didn't work. I attached my backport of that commit plus a separate piece that I modified to deal with the compile failures. Do you mind taking a look to see what I screwed up? Thanks, Don Created attachment 606684 [details]
backport of upstream patch
Created attachment 606689 [details]
compile fixes
Hi Don, (In reply to comment #5) > Hi Hans, > > Unfortunately, I can't spot the problem but I think I know what happened. > RHEL-6 doesn't have commit > I can, but I'm rather busy (read overwhelmed with work), so I hope with the problem pointed out, you can take a second shot at this ? > 910f8d0 USB: Change the scatterlist type in struct urb > Ah, yes, good catch! > This removes an extra sg pointer from the urb. So when I backported your > upsream commit > > 3d97ff6 usbdevfs: Use scatter-gather lists for large bulk transfers > > It failed to compile. So I made some simple changes to add in the extra > sg-> field. It seems like it didn't work. > > I attached my backport of that commit plus a separate piece that I modified > to deal with the compile failures. > > Do you mind taking a look to see what I screwed up? The basic problem is, that the extra struct usb_sg_request sg-> element which is now in there to get to the actual sg, needs to be alloced and free seperately. So as said the panic pointed to async_free, after your compile fix patch that has: kfree(as->urb->sg->sg); That should be: if (as->urb->sg) { kfree(as->urb->sg->sg); kfree(as->urb->sg); } Which should fix the panic I saw (which likely happened on a ctrl request), but then things will still go boom when we do an actual bulk transfer, as this bit: if (num_sgs) { as->urb->sg->sg = kmalloc(num_sgs * sizeof(struct scatterlist), GFP_KERNEL); if (!as->urb->sg->sg) { ret = -ENOMEM; goto error; } Will deref the NULL as->urb->sg pointer. So it should be changed to: if (num_sgs) { as->urb->sg = kmalloc(sizeof(struct usb_sg_request), GFP_KERNEL); if (!as->urb->sg) { ret = -ENOMEM; goto error; } as->urb->sg->sg = kmalloc(num_sgs * sizeof(struct scatterlist), GFP_KERNEL); if (!as->urb->sg->sg) { ret = -ENOMEM; goto error; } And the error exit code path needs to be fixed like async_free was fixed. Or alternatively we should just backport commit 910f8d0, that looks like it won't be too hard, and will make future backports of other USB stuff easier (and those are likely to happen, for example to extend usbfs with USB3 support). I hope this helps. Regards, Hans Hi Hans, > So as said the panic pointed to async_free, after your compile fix patch > that has: > > kfree(as->urb->sg->sg); > > That should be: > > if (as->urb->sg) { > kfree(as->urb->sg->sg); > kfree(as->urb->sg); > } > > Which should fix the panic I saw (which likely happened on a ctrl request), > but then things will still go boom when we do an actual bulk transfer, as > this bit: > > if (num_sgs) { > as->urb->sg->sg = kmalloc(num_sgs * sizeof(struct scatterlist), > GFP_KERNEL); > if (!as->urb->sg->sg) { > ret = -ENOMEM; > goto error; > } > > > Will deref the NULL as->urb->sg pointer. So it should be changed to: > > if (num_sgs) { > as->urb->sg = kmalloc(sizeof(struct usb_sg_request), GFP_KERNEL); > if (!as->urb->sg) { > ret = -ENOMEM; > goto error; > } > as->urb->sg->sg = kmalloc(num_sgs * sizeof(struct scatterlist), > GFP_KERNEL); > if (!as->urb->sg->sg) { > ret = -ENOMEM; > goto error; > } > > And the error exit code path needs to be fixed like async_free was fixed. Thanks for the fixes! It seems like the error exit code points to free_async, so I don't think I need to do anything for this error path. I am going to rebuild a kernel with this change. > > Or alternatively we should just backport commit 910f8d0, that looks like it > won't be too hard, and will make future backports of other USB stuff easier > (and those are likely to happen, for example to extend usbfs with USB3 > support). I thought about backporting this but then I was worried about kabi breakage. It does break kabi, but because it only redefines a pointer for struct urb, it could be wrapped with #ifdef __GENKSYMS__. However, I am not sure if external modules use this struct or not, so I was avoiding it. Do you have any experience with who is using struct urb outside the kernel? Cheers, Don (In reply to comment #9) > I thought about backporting this but then I was worried about kabi breakage. Ah, good point. > It does break kabi, but because it only redefines a pointer for struct urb, > it could be wrapped with #ifdef __GENKSYMS__. However, I am not sure if > external modules use this struct or not, so I was avoiding it. Do you have > any experience with who is using struct urb outside the kernel? I don't know of any (significant) out of tree USB drivers, since people who want to do USB drivers for proprietary stuff tend to do so from userspace using usbfs :) There are (were mostly) a lot of out of tree webcam drivers by the FOSS community, but most of those have been merged into the mainline by now (as part of gspca), and they don't use the sg field. In general very few drivers use the sg field (mostly mass-storage related drivers do). Regards, Hans Hi Hans, I uploaded a kernel with your suggested fixes to the same spot http://people.redhat.com/dzickus/rhel6/.c2965718c6bcc035e6d75aaac0cea12d47e82ffa/ Use the '64fixes.5' version. Cheers, Don (In reply to comment #11) > Hi Hans, > > I uploaded a kernel with your suggested fixes to the same spot > > http://people.redhat.com/dzickus/rhel6/. > c2965718c6bcc035e6d75aaac0cea12d47e82ffa/ Better (no more panic), but it still does not work. There are 2 issues: 1) For EHCI controllers it claims to not be scatter-gather capable, which is weird, I did not investigate this further. Unsurprisingly with EHCI controllers things still work as advertised, since they still use the old code paths. 2) With XHCI controllers things no longer go boom, but things don't work either (if I try to redirect a trouble some usb mass storage device, windows sees it but cannot access its contents) and the following shows up in dmesg: usb 6-2: check_trb_math - ep 0x81 - Miscalculated number of TRBs, -2 left usb 6-2: check_trb_math - ep 0x81 - Miscalculated tx length, queued 0x0 (0), asked for 0x5000 (20480) So it looks like this needs more work / investigation as to why these 2 things happen with the RHEL-6 kernel, likely by someone who knows the USB subsystem well (ie me). Unfortunately my workload for 6.4 already is too high. I believe, that since we've a workaround in place, and this is just to fix things the right way, that the best way forward now is to delay this to 6.5 (and to retract the devel-ack for now). Regards, Hans (In reply to comment #12) > I believe, that since we've a workaround in place, and this is just to fix > things the right way, that the best way forward now is to delay this to 6.5 > (and to retract the devel-ack for now). Thinking more about this, I think that given the divergence between upstream USB (esp the XHCI code) and 6.4 USB, and that we've a workaround, it is probably best to just stick with the workaround for RHEL-6. Regards, Hans Hi Hans, Are you suggesting we close this bz for now. And if the workaround causes more issues, reopen and re-investigate how to make this work? Thanks for your help. Cheers, Don (In reply to comment #14) > Hi Hans, > > Are you suggesting we close this bz for now. And if the workaround causes > more issues, reopen and re-investigate how to make this work? Yes, that is exactly what I've in mind. Thanks & Regards, Hans Hi Hans, Fair enough. Closing as WONTFIX because backporting the fix from upstream requires more effort than might be necessary considering we have a workaround in place. We can re-open and re-investigate this problem if the workaround fails to address the original problem. Cheers, Don |