Bug 575875 - nut with USB TrippLite OMNIVIS1000 causes system lockup after powerfail
Summary: nut with USB TrippLite OMNIVIS1000 causes system lockup after powerfail
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: nut
Version: 12
Hardware: x86_64
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Michal Hlavinka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-22 15:51 UTC by Niels Mayer
Modified: 2010-06-04 14:49 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-04 07:42:56 UTC
Type: ---


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 575334 0 low CLOSED nut.x86_64 0:2.4.3-1.fc12 upgrade breaks cyberpower 800AVS setup 2021-02-22 00:41:40 UTC

Internal Links: 575334

Description Niels Mayer 2010-03-22 15:51:05 UTC
Description of problem:

Sometime after a power-glitch causes UPS to go to battery temporarily, nut's tripplite_usb goes completely haywire, resulting in a system that is unresponsive from console because the X server is taking 100% CPU. It appears something is going wrong on the USB bus that is causing this issue -- it appears to prevent any keyboard or mouse control of the X server once the error condition arises. Even ctl-alt-backspace and ctl-alt-delete don't work.

Once it happens, you have to login remotely, kill the X server and Gdm, then reboot. 

I've seen this bug occur both in nut-2.4.1-8.fc12.x86_64 and the latest nut-2.4.3-1.fc12.x86_64 doesn't fix it.

Version-Release number of selected component (if applicable):

nut-2.4.1-8.fc12.x86_64
nut-2.4.3-1.fc12.x86_64

How reproducible:

Consistently reproducible to the point that I'm afraid to test this again. Just pull the power plug on the UPS for a minute, then plug it back in. System will go belly-up within a few hours of this. It does not happen immediately after the powerfail, but I've seen it happen enough times that I'm confident nut/tripplite_usb is to blame. To prevent it from happening -- just disconnect the USB cable to the UPS and don't run nut. Then the UPS kicks in appropriately, but the USB issue doesn't cause system to become unstable later.

Now, when the powerfails or glitches, I tend to reboot afterwards to prevent this issue from causing me to lose work.

Steps to Reproduce:

1. Use nut and TRIPP LITE OMNIVS1000 plugged in to USB
2. Pull power for a minute
3. System will become unresponsive sometime later, with logs indicating tripplite_usb is to blame.
  
Actual results:

Sometime after temporary powerfail. a locked up system that cannot respond to any user-input to X server. 

Expected results:

A system that worked just like it did before the temporary power failure.

Additional info:

Here's /var/log/messages excerpt showing typical sequence (note one USB device going offline as soon as powerfailed as it's not on UPS... don't think this is the issue as I've seen same problem happen w/o BCD3000 on USB bus). This particular sequence occurred from a power-glitch caused by an spectacular explosion of a power-transformer a few blocks away (which is why i have a UPS in the first place!)

Mar 21 07:02:49 gnulem kernel: usb 6-2: USB disconnect, address 3
Mar 21 07:02:53 gnulem upsmon[1707]: UPS tripplite@localhost on battery
Mar 21 07:02:53 gnulem wall[30102]: wall: user nut broadcasted 1 lines (36 chars)
Mar 21 07:03:08 gnulem kernel: usb 6-2: new full speed USB device using ohci_hcd and address 4
Mar 21 07:03:08 gnulem kernel: usb 6-2: New USB device found, idVendor=1397, idProduct=00bf
Mar 21 07:03:08 gnulem kernel: usb 6-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Mar 21 07:03:08 gnulem kernel: usb 6-2: Product: BCD3000
Mar 21 07:03:08 gnulem kernel: usb 6-2: Manufacturer: Behringer
Mar 21 07:03:08 gnulem kernel: usb 6-2: configuration #1 chosen from 1 choice
Mar 21 07:03:18 gnulem upsmon[1707]: UPS tripplite@localhost on line power
Mar 21 07:03:18 gnulem wall[30124]: wall: user nut broadcasted 1 lines (39 chars)
Mar 21 08:13:10 gnulem ntpd[1753]: synchronized to 169.229.70.183, stratum 2
Mar 21 08:15:25 gnulem kernel: usb 5-2: USB disconnect, address 21
...
Mar 21 09:19:05 gnulem kernel: usb 3-1: USB disconnect, address 2
Mar 21 09:19:05 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_get_interrupt() returned -19 instead of 8 while sending 3a 4c b3 0d 00 00 00 00 '.L......'
Mar 21 09:19:06 gnulem tripplite_usb[1700]: libusb_set_report() returned -19 instead of 8
Mar 21 09:19:06 gnulem upsd[1703]: Data for UPS [tripplite] is stale - check driver
Mar 21 09:19:06 gnulem tripplite_usb[1700]: Error reading L value: Device detached? (error -19: error sending control message: No such device)
Mar 21 09:19:06 gnulem tripplite_usb[1700]: Reconnect attempt #1
Mar 21 09:19:06 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:06 gnulem upsmon[1707]: Communications with UPS tripplite@localhost lost
Mar 21 09:19:06 gnulem wall[30296]: wall: user nut broadcasted 1 lines (50 chars)
Mar 21 09:19:11 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:14 gnulem tripplite_usb[1700]: Reconnecting to UPS failed; will retry later...
Mar 21 09:19:14 gnulem tripplite_usb[1700]: libusb_set_report() returned 0 instead of 8
Mar 21 09:19:14 gnulem tripplite_usb[1700]: Error reading S value: Device detached? (error 0: error sending control message: Operation not permitted)
Mar 21 09:19:14 gnulem tripplite_usb[1700]: Reconnect attempt #2
Mar 21 09:19:16 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:21 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:22 gnulem tripplite_usb[1700]: Reconnecting to UPS failed; will retry later...
Mar 21 09:19:22 gnulem tripplite_usb[1700]: libusb_set_report() returned 0 instead of 8
Mar 21 09:19:22 gnulem tripplite_usb[1700]: Error reading S value: Device detached? (error 0: error sending control message: Operation not permitted)
Mar 21 09:19:22 gnulem tripplite_usb[1700]: Reconnect attempt #3
Mar 21 09:19:25 gnulem kernel: usb 3-1: new low speed USB device using ohci_hcd and address 5
Mar 21 09:19:25 gnulem kernel: usb 3-1: device descriptor read/64, error -62
Mar 21 09:19:26 gnulem kernel: usb 3-1: New USB device found, idVendor=09ae, idProduct=0001
Mar 21 09:19:26 gnulem kernel: usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Mar 21 09:19:26 gnulem kernel: usb 3-1: Product: TRIPP LITE OMNIVS1000         
Mar 21 09:19:26 gnulem kernel: usb 3-1: Manufacturer: TRIPP LITE
Mar 21 09:19:26 gnulem kernel: usb 3-1: configuration #1 chosen from 1 choice
Mar 21 09:19:26 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:26 gnulem kernel: generic-usb 0003:09AE:0001.0005: hiddev96,hidraw0: USB HID v1.00 Device [TRIPP LITE TRIPP LITE OMNIVS1000         ] on usb-0000:00:12.0-1/input0
Mar 21 09:19:30 gnulem tripplite_usb[1700]: Reconnecting to UPS failed; will retry later...
Mar 21 09:19:30 gnulem tripplite_usb[1700]: libusb_set_report() returned 0 instead of 8
Mar 21 09:19:30 gnulem tripplite_usb[1700]: Error reading S value: Device detached? (error 0: error sending control message: Operation not permitted)
Mar 21 09:19:30 gnulem tripplite_usb[1700]: Reconnect attempt #4
Mar 21 09:19:31 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:33 gnulem kernel: usb 6-2: USB disconnect, address 4
Mar 21 09:19:36 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:38 gnulem tripplite_usb[1700]: Successfully reconnected
Mar 21 09:19:39 gnulem tripplite_usb[1700]: Using OMNIVS protocol (1001)
Mar 21 09:19:41 gnulem upsmon[1707]: Poll UPS [tripplite@localhost] failed - Data stale
Mar 21 09:19:41 gnulem kernel: usb 6-1: new full speed USB device using ohci_hcd and address 5
Mar 21 09:19:42 gnulem kernel: usb 6-1: New USB device found, idVendor=1397, idProduct=00bf
Mar 21 09:19:42 gnulem kernel: usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Mar 21 09:19:42 gnulem kernel: usb 6-1: Product: BCD3000
Mar 21 09:19:42 gnulem kernel: usb 6-1: Manufacturer: Behringer
Mar 21 09:19:42 gnulem kernel: usb 6-1: configuration #1 chosen from 1 choice
Mar 21 09:19:45 gnulem upsd[1703]: UPS [tripplite] data is no longer stale
Mar 21 09:19:46 gnulem upsmon[1707]: Communications with UPS tripplite@localhost established
Mar 21 09:19:46 gnulem wall[30335]: wall: user nut broadcasted 1 lines (57 chars)

Comment 1 Niels Mayer 2010-03-23 17:24:25 UTC
Testing fixes in http://admin.fedoraproject.org/updates/nut-2.4.3-2.fc12 to
see if fix "reduced size of buffer to maximum size supported by
low-speed USB devices, which fixes some recent usbhid-ups problems" -- also
relates to this bug. see also https://bugzilla.redhat.com/show_bug.cgi?id=575334

Comment 2 Michal Hlavinka 2010-03-24 07:35:46 UTC
Thanks. I guess the other bug is not related, but let me know the result of testing.

Comment 3 Michal Hlavinka 2010-04-08 15:06:35 UTC
What's the result of testing?

Comment 4 Niels Mayer 2010-04-10 00:07:46 UTC
After running for a while without incident (2.6.32.10-90.fc12.x86_64 with nut-2.4.3-2.fc12.x86_64 nut-client-2.4.3-2.fc12.x86_64), I just ran a successful "pull the plug" test with the UPS powering the system:
> Apr  9 16:19:03 gnulem upsmon[1734]: UPS tripplite@localhost on battery
> Apr  9 16:19:03 gnulem wall[2455]: wall: user nut broadcasted 1 lines (36 chars)
> ...
> Apr  9 16:27:23 gnulem upsmon[1734]: UPS tripplite@localhost on line power
> Apr  9 16:27:23 gnulem wall[2843]: wall: user nut broadcasted 1 lines (39 chars)

I have yet to experience a lockup as described in my initial report. Since I installed the update 3/23, there have also been several small power glitches that have also NOT triggered the bug:
> messages-20100328:9370:Mar 26 08:30:56 gnulem upsmon[1729]: UPS tripplite@localhost on battery
> messages-20100328:9372:Mar 26 08:31:06 gnulem upsmon[1729]: UPS tripplite@localhost on line power
> messages-20100404:1525:Mar 29 23:21:13 gnulem upsmon[1729]: UPS tripplite@localhost on battery
> messages-20100404:1527:Mar 29 23:21:18 gnulem upsmon[1729]: UPS tripplite@localhost on line power

IMHO, the same issue that caused a problem in the previous version of nut ( https://bugzilla.redhat.com/show_bug.cgi?id=575334 ) was also behind the problem I saw with the USB UPS device somehow causing the USB bus and X server to lockup after a power event, rendering keyboard/mouse input impossible. 

The fix -- "reduced size of buffer to maximum size supported by low-speed USB devices, which fixes some recent usbhid-ups problems" -- also fixes the problem here too, perhaps. (Then again, I also put the USB device that went on/offline during powerfail on the UPS, so there is less traffic on the USB bus anyways during the powerfail....)

I'm leaving status in this "needinfo" request as "NEW" as I don't want to be accidentally closing bugs for you again. However, for me, using 2.4.3-2.fc12 version of 'nut', this issue is resolved. However, as F12 hasn't been updated with the new version of 'nut' this issue still exists in F12's latest: 2.4.3-1.fc12 .

I hope my accidental closing of https://bugzilla.redhat.com/show_bug.cgi?id=575334 isn't the reason why F11's nut has the fix and F12's doesn't....

Comment 5 Michal Hlavinka 2010-04-12 13:56:46 UTC
Thanks I'll leave this bug open for another week and when there are no other reporters, I'll close it. 

> I hope my accidental closing of
> bug #575334 isn't the reason why F11's nut has the fix
> and F12's doesn't....

maybe, but id does not matter, update system should handle it anyway and if there is any problem with it then update system is buggy, which is definitely not your fault ;-)

I can see nut 2.4.3-3 in updates repo, so it's ok now. If you don't see it then it means mirror you are using is out of sync which should "fix" itself within a few hours.

Comment 6 Michal Hlavinka 2010-06-04 07:42:56 UTC
> Thanks I'll leave this bug open for another week and when there are no other
reporters, I'll close it. 

closing

Comment 7 Niels Mayer 2010-06-04 14:49:18 UTC
I haven't seen this problem recur since the updates, so I agree this bug is closed.


Note You need to log in before you can comment on or make changes to this bug.