Description of problem: Since apcupsd 3.14.8, we get very often e-mails (up to a couple per hour) by apcupsd containing the information "Communications with UPS lost", which is not true. A few seconds up to a few minutes later we receive "Communications with UPS restored" again. In the end, the wrong/broken detection causes trouble and a e-mail flooding. Our situations are as follows: a) [Server B] - Network - [Server A] - Network - [APC UPS] a) [Server B] - Network - [Server A] - USB - [APC UPS] Server A and B are running both either RHEL 5 or CentOS 5 including apcupsd from EPEL. The UPS is always directly attached via USB or Network to Server A. Server B is always fetching the UPS status from Server A. Server B is then (in case of network load or random guess) reporting "Communications with UPS lost" via e-mail (e-mail notification was configured). This issue did not exist with apcupsd-3.14.0-4.el5, a downgrade from the newer to the older version makes the problem also going away. And the problem is also independent of the type of the UPS. Version-Release number of selected component (if applicable): apcupsd-3.14.8-1.el5 apcupsd-3.14.0-4.el5 How reproducible: Everytime, see above. Actual results: Apcupsd 3.14.8 reports wrongly very often "Communications with UPS lost". Expected results: Non-broken behaviour/detection like at apcupsd 3.14.0. Additional info: - With the update from 3.14.0 to 3.14.8 POLLTIME was replaced by NETTIME - Very similar issue, maybe even the same: http://sourceforge.net/mailarchive/forum.php?thread_name=4D0F5D4D.1060908%40cern.ch&forum_name=apcupsd-users
Sorry, wrong ordering, I think: NETTIME was replaced by POLLTIME
Ping?
I know about this, but I'm working on different tasks right now. I hope I'll get to this soon. > - With the update from 3.14.0 to 3.14.8 POLLTIME was replaced by NETTIME yes, but afaik it *should* work with the old one too > - Very similar issue, maybe even the same: > http://sourceforge.net/mailarchive/forum.php?thread_name=4D0F5D4D.1060908%40cern.ch&forum_name=apcupsd-users they say 3.14.7 works fine and 3.14.8 is first broken. Could you verify it? 3.14.7 build for el5 is here: http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3194147/ I don't see any response from upstream and I was not able find anything in the cvs (thought I was not looking very carefully). Could you check if there is any change with this build: http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3194161/ Thanks
Sorry, forgot feedback here. The issue did not show up with 3.14.7, but only with 3.14.8.
We also see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=586417 with 3.14.8 on EPEL 6 on another machine, will now try older version.
Michal, from what we see on EL-5 and EL-6, any older version (< 3.14.8) works fine for us. Can we solve this issue somehow?
Could you test if this http://kojipkgs.fedoraproject.org/scratch/mhlavink/task_3960447/ changes anything for you? If you still see the errors in logs on "Server B", are there any messages in logs on "Server A" ?
Michal, can you please provide us either the patch or the Source RPM on a bit more persistent location than a scratch build? The stuff does no longer exist at the given URL above. Thank you :)
I put those packages here: http://mhlavink.fedorapeople.org/bz713829/ Please test them. If you still see the errors in logs on "Server B", are there any messages in logs on "Server A" ? Also does it happen on service start up / soon after start up or just randomly?
Can you also please provide a Source RPM - because we see this issue on RHEL 5 and RHEL 6 and the best reproducer is a RHEL 5 at the moment.
packages added to that url, they are nothing special please test and provide any messages from logs on "Server A"
We are now testing the apcupsd packages for RHEL 5 and 6 for about 7 days at the affected customers and no wrong reports so far (hope it stays like that).
The issue that has been reported with the description in #c0 is resolved. So can you please push that update to EL-5 and EL-6? :) Thank you very much! Something which still remains, but hasn't really been part of this report is: May 23 08:29:09 intranet kernel: usb 2-1.4: USB disconnect, address 5 May 23 08:29:09 intranet kernel: usb 2-1.4: new full speed USB device using ehci_hcd and address 7 May 23 08:29:09 intranet kernel: usb 2-1.4: New USB device found, idVendor=051d, idProduct=0003 May 23 08:29:09 intranet kernel: usb 2-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 May 23 08:29:09 intranet kernel: usb 2-1.4: Product: Smart-UPS 1000 FW:COM 02.1 / UPS 05.0 May 23 08:29:09 intranet kernel: usb 2-1.4: Manufacturer: American Power Conversion May 23 08:29:09 intranet kernel: usb 2-1.4: SerialNumber: AS1119112219 May 23 08:29:09 intranet kernel: usb 2-1.4: configuration #1 chosen from 1 choice May 23 08:29:09 intranet kernel: generic-usb 0003:051D:0003.0007: hiddev97,hidraw3: USB HID v1.00 Device [American Power Conversion Smart-UPS 1000 FW:COM 02.1 / UPS 05.0] on usb-0000:00:1d.0-1.4/input0 May 23 08:29:15 intranet apcupsd[2831]: Communications with UPS restored. Important: The USB cable was never removed and reattached. It just happened by itself what you can see above. For this case, it's also no server A and B situation and this issue already existed with every apcupsd version so far. The situation above happens every 2 weeks more or less regulary by itself...
apcupsd-3.14.10-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/apcupsd-3.14.10-1.el6
apcupsd-3.14.10-1.el5 has been submitted as an update for Fedora EPEL 5. https://admin.fedoraproject.org/updates/apcupsd-3.14.10-1.el5
Package apcupsd-3.14.10-1.el5: * should fix your issue, * was pushed to the Fedora EPEL 5 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing apcupsd-3.14.10-1.el5' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-5991/apcupsd-3.14.10-1.el5 then log in and leave karma (feedback).
apcupsd-3.14.10-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.
apcupsd-3.14.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.
I'm experiencing this on RHEL6.4/CentOS6.4 with apcupsd 3.14.10-1.el6 It reports on average once/twice a day, in the same setup.
please reopen on RHEL6
Ferry, can you please first cross-check if the problem might be your cabling? We only see this issue right now (since 3.14.10-1) at only one of our customers where the USB cables are much to close to the power cables. That would be then called electric interferences. Have an exact look to dmesg(1) - and search for the USB disconnect messages on Google (just to have this pointed out).
that's a good tip. It'll take me a while to check because this happens at a customer's site and I'll not be visiting it for at least 2 weeks
I'd like to add something I noticed: I always get these errors from the VMs that connect over the network to the host apcupsd, never from the host itself. Therefore I'm proposing that the apcupsd itself (and the cable) are ok and that the problem is in the network code of apcupsd
I can confirm the comment #c23 And i can confirm the bug still exists in the current release: apcupsd-3.14.10-1.el5 Setup: Server A (VM) has USB connection. There is no evidence in the logs for USB reconnects. apcupsd returns no errors Server B (VM) connects to Server A, reports errors in syslog.