Bug 1035863 - Bulk transfer timeouts cause data corruption
Summary: Bulk transfer timeouts cause data corruption
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: libusbx
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Hans de Goede
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1026914
TreeView+ depends on / blocked
 
Reported: 2013-11-28 16:40 UTC by Tim Waugh
Modified: 2013-11-29 11:58 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-29 10:55:54 UTC
Type: Bug


Attachments (Terms of Use)
usb-libusb.c (52.57 KB, text/plain)
2013-11-28 16:43 UTC, Tim Waugh
no flags Details

Description Tim Waugh 2013-11-28 16:40:36 UTC
Description of problem:
In bug #1026914 I'm seeing a correct-looking loop around libusb_bulk_transfer(), which handles LIBUSB_ERROR_TIMEOUT. The timeout is set to 60 seconds, and whenever this timeout is triggered, I see data corruption on the transfer.

This is 100% repeatable using an HP PSC 2210... but I don't know if the device could be at fault, or if it might be a kernel bug, or a libusbx bug, or the loop around libusb_bulk_transfer() really isn't correct after all.

Version-Release number of selected component (if applicable):
kernel-3.11.9-300.fc20.x86_64
cups-1.7.0-5.fc20.x86_64
libusbx-1.0.16-3.fc20.x86_64

How reproducible:
100%

Steps to Reproduce:
1.See bug #1026914 comment #10.

Actual results:
See bug #1026914 comment #1.

Expected results:
See bug #1026914 comment #2.

Additional info:
I can work around this in CUPS by setting the timeout much higher, but I think it needs investigation by someone who knows more about how USB transfers work.

Comment 1 Tim Waugh 2013-11-28 16:43:06 UTC
Created attachment 830317 [details]
usb-libusb.c

The caller code. See print_device(), loop is at lines 364-569.

Comment 2 Hans de Goede 2013-11-28 21:50:15 UTC
Hi,

What usb-speed is the printer, and what speed port is it plugged into  ?

Can you do lsusb and lsusb -t on the machine in question, and paste the output of both here?

There is no such things as a timeout at the lower usb levels, libusb emulates timeouts by simply setting a timer and async. cancelling pending transfers when the timer expires.

In general all layers involved do their best to make sure that the amount of data send before an async cancelled is properly accounted, so that you should be able to re-start the transfer after a cancel as the sane usb backend is trying to do. But this is not guaranteed to work, and there are known cases where it won't work, such as ie an usb-1 device plugged into an usb-2 hub (or an usb-2 port on intel sandy bridge or later hardware, as that has an embedded usb-2 hub. In this case (for example) there are buffers on the hub which may contain data on the async cancel which is not accounted for.

Long story short, don't use timeouts the way they are being used in the sane usb backend. Timeouts in libusb are meant for when you expect a transfer to finish within a certain time. If it may take longer and you want to do other stuff in the mean time use threads and/or libusb's async API.

Regards,

Hans

Comment 3 Tim Waugh 2013-11-29 09:53:24 UTC
I'm not sure how I can find out about speeds.

$ lsusb
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 005: ID 04f2:b217 Chicony Electronics Co., Ltd Lenovo Integrated Camera (0.3MP)
Bus 001 Device 009: ID 045e:00db Microsoft Corp. Natural Ergonomic Keyboard 4000 V1.0
Bus 001 Device 011: ID 046d:c069 Logitech, Inc. M500 Laser Mouse
Bus 001 Device 010: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader
Bus 001 Device 008: ID 0424:2602 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 007: ID 0424:2512 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 006: ID 03f0:3304 Hewlett-Packard DeskJet 990c
Bus 001 Device 012: ID 03f0:2911 Hewlett-Packard PSC 2200
Bus 001 Device 004: ID 17ef:100a Lenovo ThinkPad Mini Dock Plus Series 3
Bus 001 Device 003: ID 0a5c:217f Broadcom Corp. BCM2045B (BDC-2.1)
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

$ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/3p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/8p, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/3p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
        |__ Port 4: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
        |__ Port 4: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
        |__ Port 4: Dev 3, If 2, Class=Vendor Specific Class, Driver=, 12M
        |__ Port 4: Dev 3, If 3, Class=Application Specific Interface, Driver=, 12M
        |__ Port 5: Dev 4, If 0, Class=Hub, Driver=hub/6p, 480M
            |__ Port 1: Dev 12, If 0, Class=Vendor Specific Class, Driver=, 12M
            |__ Port 1: Dev 12, If 1, Class=Printer, Driver=usblp, 12M
            |__ Port 1: Dev 12, If 2, Class=Vendor Specific Class, Driver=, 12M
            |__ Port 1: Dev 12, If 3, Class=Mass Storage, Driver=usb-storage, 12M
            |__ Port 2: Dev 6, If 0, Class=Printer, Driver=usblp, 12M
            |__ Port 6: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M
                |__ Port 1: Dev 8, If 0, Class=Hub, Driver=hub/4p, 480M
                    |__ Port 1: Dev 10, If 0, Class=Mass Storage, Driver=usb-storage, 480M
                    |__ Port 2: Dev 11, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
                |__ Port 2: Dev 9, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
                |__ Port 2: Dev 9, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
        |__ Port 6: Dev 5, If 0, Class=Video, Driver=uvcvideo, 480M
        |__ Port 6: Dev 5, If 1, Class=Video, Driver=uvcvideo, 480M

Thanks for the information about possible quirks... that reassures me that what I thought was a work-around (increase the timeout a lot) is in fact the correct fix.

Comment 4 Hans de Goede 2013-11-29 10:55:54 UTC
Hi,

(In reply to Tim Waugh from comment #3)
> I'm not sure how I can find out about speeds.

They are included in the lsusb -t output, I had forgotten about that, so the lsusb output has all I need :)

> Bus 001 Device 012: ID 03f0:2911 Hewlett-Packard PSC 2200

This is the device we're talking about, right ?

That device is plugged into this hub:
>         |__ Port 5: Dev 4, If 0, Class=Hub, Driver=hub/6p, 480M
And hre is the actual device:
>             |__ Port 1: Dev 12, If 0, Class=Vendor Specific Class, Driver=,
> 12M
>             |__ Port 1: Dev 12, If 1, Class=Printer, Driver=usblp, 12M
>             |__ Port 1: Dev 12, If 2, Class=Vendor Specific Class, Driver=,
> 12M
>             |__ Port 1: Dev 12, If 3, Class=Mass Storage,
> Driver=usb-storage, 12M

Notice how the hub has a speed of 480M, so it us a usb-2 hub, and the device of 12M, so it is a usb-1 device, or iow the known case were cancelling in flight transfers will lead to inaccurate accounting of bytes send before the cancellation I already described (*).

So TL;DR: this a known and unfixable problem, and the solution is to simply not cancel transfers (by setting a timeout) and then decide you want to finish the transfer after all. Either decide you want to really cancel the transfer, or just wait for it to finish normally.

BTW as documented here:
http://libusbx.sourceforge.net/api-1.0/group__syncio.html
You can simply specify 0 as timeout to wait indefinitely. 

Regards,

Hans

*) To be really precise, you're hitting the code at lines 442 - 446 of:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/host/ehci-q.c

Comment 5 Tim Waugh 2013-11-29 11:58:40 UTC
Thanks for the detailed analysis!


Note You need to log in before you can comment on or make changes to this bug.