Bug 713829 - Apcupsd 3.14.8 reports wrongly very often "Communications with UPS lost"
Summary: Apcupsd 3.14.8 reports wrongly very often "Communications with UPS lost"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: apcupsd
Version: el5
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Michal Hlavinka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-16 15:24 UTC by Robert Scheck
Modified: 2013-10-08 10:03 UTC (History)
6 users (show)

Fixed In Version: apcupsd-3.14.10-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-15 02:12:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Robert Scheck 2011-06-16 15:24:54 UTC
Description of problem:
Since apcupsd 3.14.8, we get very often e-mails (up to a couple per hour) by
apcupsd containing the information "Communications with UPS lost", which is not
true. A few seconds up to a few minutes later we receive "Communications with
UPS restored" again. In the end, the wrong/broken detection causes trouble and
a e-mail flooding.

Our situations are as follows:

 a) [Server B] - Network - [Server A] - Network - [APC UPS]
 a) [Server B] - Network - [Server A] - USB - [APC UPS]

Server A and B are running both either RHEL 5 or CentOS 5 including apcupsd
from EPEL. The UPS is always directly attached via USB or Network to Server A.
Server B is always fetching the UPS status from Server A. Server B is then (in
case of network load or random guess) reporting "Communications with UPS lost"
via e-mail (e-mail notification was configured).

This issue did not exist with apcupsd-3.14.0-4.el5, a downgrade from the newer
to the older version makes the problem also going away. And the problem is also 
independent of the type of the UPS.

Version-Release number of selected component (if applicable):
apcupsd-3.14.8-1.el5
apcupsd-3.14.0-4.el5

How reproducible:
Everytime, see above.

Actual results:
Apcupsd 3.14.8 reports wrongly very often "Communications with UPS lost".

Expected results:
Non-broken behaviour/detection like at apcupsd 3.14.0.

Additional info:
- With the update from 3.14.0 to 3.14.8 POLLTIME was replaced by NETTIME
- Very similar issue, maybe even the same: http://sourceforge.net/mailarchive/forum.php?thread_name=4D0F5D4D.1060908%40cern.ch&forum_name=apcupsd-users

Comment 1 Robert Scheck 2011-06-16 15:33:43 UTC
Sorry, wrong ordering, I think: NETTIME was replaced by POLLTIME

Comment 2 Robert Scheck 2011-07-10 11:16:09 UTC
Ping?

Comment 3 Michal Hlavinka 2011-07-12 16:30:26 UTC
I know about this, but I'm working on different tasks right now. I hope I'll get to this soon.

> - With the update from 3.14.0 to 3.14.8 POLLTIME was replaced by NETTIME

yes, but afaik it *should* work with the old one too

> - Very similar issue, maybe even the same:
> http://sourceforge.net/mailarchive/forum.php?thread_name=4D0F5D4D.1060908%40cern.ch&forum_name=apcupsd-users

they say 3.14.7 works fine and 3.14.8 is first broken. Could you verify it? 3.14.7 build for el5 is here:
http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3194147/

I don't see any response from upstream and I was not able find anything in the cvs (thought I was not looking very carefully). Could you check if there is any change with this build:

http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3194161/

Thanks

Comment 4 Robert Scheck 2011-11-16 15:50:02 UTC
Sorry, forgot feedback here. The issue did not show up with 3.14.7, but only
with 3.14.8.

Comment 5 Robert Scheck 2011-11-16 15:54:59 UTC
We also see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=586417 with 3.14.8 
on EPEL 6 on another machine, will now try older version.

Comment 6 Robert Scheck 2012-02-01 12:54:50 UTC
Michal, from what we see on EL-5 and EL-6, any older version (< 3.14.8) works 
fine for us. Can we solve this issue somehow?

Comment 7 Michal Hlavinka 2012-04-03 15:53:35 UTC
Could you test if this http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3960447/ changes anything for you? If you still see the errors in logs on "Server B", are there any messages in logs on "Server A" ?

Comment 8 Robert Scheck 2012-04-16 14:45:52 UTC
Michal, can you please provide us either the patch or the Source RPM on a
bit more persistent location than a scratch build? The stuff does no longer
exist at the given URL above. Thank you :)

Comment 9 Michal Hlavinka 2012-05-04 14:49:14 UTC
I put those packages here: http://mhlavink.fedorapeople.org/bz713829/
Please test them. If you still see the errors in logs on "Server B", are there
any messages in logs on "Server A" ? Also does it happen on service start up / soon after start up or just randomly?

Comment 10 Robert Scheck 2012-05-04 15:05:04 UTC
Can you also please provide a Source RPM - because we see this issue on RHEL
5 and RHEL 6 and the best reproducer is a RHEL 5 at the moment.

Comment 11 Michal Hlavinka 2012-05-11 14:46:04 UTC
packages added to that url, they are nothing special 
please test and provide any messages from logs on "Server A"

Comment 12 Robert Scheck 2012-05-17 22:33:09 UTC
We are now testing the apcupsd packages for RHEL 5 and 6 for about 7 days at
the affected customers and no wrong reports so far (hope it stays like that).

Comment 13 Robert Scheck 2012-05-23 15:53:46 UTC
The issue that has been reported with the description in #c0 is resolved. So
can you please push that update to EL-5 and EL-6? :) Thank you very much!


Something which still remains, but hasn't really been part of this report is:

May 23 08:29:09 intranet kernel: usb 2-1.4: USB disconnect, address 5
May 23 08:29:09 intranet kernel: usb 2-1.4: new full speed USB device using ehci_hcd and address 7
May 23 08:29:09 intranet kernel: usb 2-1.4: New USB device found, idVendor=051d, idProduct=0003
May 23 08:29:09 intranet kernel: usb 2-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
May 23 08:29:09 intranet kernel: usb 2-1.4: Product: Smart-UPS 1000 FW:COM 02.1 / UPS 05.0
May 23 08:29:09 intranet kernel: usb 2-1.4: Manufacturer: American Power Conversion 
May 23 08:29:09 intranet kernel: usb 2-1.4: SerialNumber: AS1119112219 
May 23 08:29:09 intranet kernel: usb 2-1.4: configuration #1 chosen from 1 choice
May 23 08:29:09 intranet kernel: generic-usb 0003:051D:0003.0007: hiddev97,hidraw3: USB HID v1.00 Device [American Power Conversion  Smart-UPS 1000 FW:COM 02.1 / UPS 05.0] on usb-0000:00:1d.0-1.4/input0
May 23 08:29:15 intranet apcupsd[2831]: Communications with UPS restored.

Important: The USB cable was never removed and reattached. It just happened
by itself what you can see above. For this case, it's also no server A and B
situation and this issue already existed with every apcupsd version so far.
The situation above happens every 2 weeks more or less regulary by itself...

Comment 14 Fedora Update System 2012-05-28 11:19:00 UTC
apcupsd-3.14.10-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/apcupsd-3.14.10-1.el6

Comment 15 Fedora Update System 2012-05-28 11:19:14 UTC
apcupsd-3.14.10-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/apcupsd-3.14.10-1.el5

Comment 16 Fedora Update System 2012-05-28 17:59:22 UTC
Package apcupsd-3.14.10-1.el5:
* should fix your issue,
* was pushed to the Fedora EPEL 5 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing apcupsd-3.14.10-1.el5'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-5991/apcupsd-3.14.10-1.el5
then log in and leave karma (feedback).

Comment 17 Fedora Update System 2012-06-15 02:12:26 UTC
apcupsd-3.14.10-1.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2012-06-15 02:20:57 UTC
apcupsd-3.14.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Ferry Huberts 2013-04-22 21:27:58 UTC
I'm experiencing this on RHEL6.4/CentOS6.4 with apcupsd 3.14.10-1.el6

It reports on average once/twice a day, in the same setup.

Comment 20 Ferry Huberts 2013-04-22 21:28:46 UTC
please reopen on RHEL6

Comment 21 Robert Scheck 2013-04-23 13:20:04 UTC
Ferry, can you please first cross-check if the problem might be your cabling?
We only see this issue right now (since 3.14.10-1) at only one of our customers 
where the USB cables are much to close to the power cables. That would be then
called electric interferences. Have an exact look to dmesg(1) - and search for 
the USB disconnect messages on Google (just to have this pointed out).

Comment 22 Ferry Huberts 2013-04-23 13:34:46 UTC
that's a good tip.
It'll take me a while to check because this happens at a customer's site and I'll not be visiting it for at least 2 weeks

Comment 23 Ferry Huberts 2013-04-24 13:10:31 UTC
I'd like to add something I noticed:

I always get these errors from the VMs that connect over the network to the host apcupsd, never from the host itself.

Therefore I'm proposing that the apcupsd itself (and the cable) are ok and that the problem is in the network code of apcupsd

Comment 24 Paul van Noort 2013-10-08 10:03:34 UTC
I can confirm the comment #c23

And i can confirm the bug still exists in the current release: apcupsd-3.14.10-1.el5


Setup:

Server A (VM) has USB connection. There is no evidence in the logs for USB reconnects. apcupsd returns no errors

Server B (VM) connects to Server A, reports errors in syslog.


Note You need to log in before you can comment on or make changes to this bug.