Bug 1003919 - nut startup at boot fails to detect UPS
Summary: nut startup at boot fails to detect UPS
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: nut
Version: 19
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Michal Hlavinka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-03 14:08 UTC by Helge Deller
Modified: 2015-02-17 17:02 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 17:02:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
SVG of "systemd-analyze plot" output (66.53 KB, image/svg+xml)
2013-09-06 21:54 UTC, Helge Deller
no flags Details
dmesg of successful boot (76.81 KB, text/plain)
2013-09-11 19:17 UTC, Helge Deller
no flags Details
svg of successful boot (50.19 KB, image/svg+xml)
2013-09-11 19:17 UTC, Helge Deller
no flags Details

Description Helge Deller 2013-09-03 14:08:20 UTC
Description of problem:
If the nut service is configured to startup at boot via systemd, it may happen, that nut is started and try to run the nut device drivers before the UPS device nodes (in /dev/...) are available (which are being made available through udev).
Because the devices are not available then, the nut service fails.

This problem (race) can be fixed it by adding in the systemd startup file, that the nut service is dependend on a finished run of udev.
Thus, adding "systemd-udev-settle.service" to the "After=" line in /usr/lib/systemd/system/nut-driver.service fixes the problem.

Change required to /usr/lib/systemd/system/nut-driver.service:
After=local-fs.target network.target systemd-udev-settle.service

With this trivial change my "Eaton 3S" USB USV finally works directly at startup.

It would be nice, if you could add this (or a similar) dependency to the nut systemd configuration file. I think it may help quite some people.

Thanks,
Helge

Version-Release number of selected component (if applicable):
Fedora 19

Comment 1 Michal Hlavinka 2013-09-04 15:08:22 UTC
could you try to reproduce the problem again without udev-settle in service file,
then run 
$ systemd-analyze plot >/tmp/boot.svg 
and attach that file
thanks

Comment 2 Helge Deller 2013-09-06 21:54:53 UTC
Created attachment 794979 [details]
SVG of "systemd-analyze plot" output

Comment 3 Helge Deller 2013-09-06 21:55:41 UTC
boot.svg is attached as requested.

Interestingly, if I disable Networkmanager and use instead the "LSB network" ("chkconfig network on" where I used DHCP for the NIC) the problem is gone.
nut-driver depends on the network.target, so the call for an IP address is enough time to settle the USB bus.

Comment 4 Michal Hlavinka 2013-09-09 16:27:37 UTC
- the udev settle is there, starts at 2.28, completes at 2.42
- NetworkManager starts at 2.96,completes at 3.2
- network.target is started at 3.2
- nut-driver is started at 3.2

nut-driver already has after=network.target, so udev settle should have no effect here

There must be some different problem.

What configuration you have? 
Run following command and attach it's output here (replace your passwords with some different text). 
grep -rv '^#' /etc/ups/ | grep -v ':$' | grep -v '.html:'

What error do you get in logs exactly?
Change /usr/lib/systemd/system/nut-driver.service
line
ExecStart=/sbin/upsdrvctl start
to
ExecStart=/sbin/<DRIVER> -DD

where <DRIVER> = driver from ups.conf, most probably usbhid-ups
reboot and check the logs.

Thanks

Comment 5 Helge Deller 2013-09-11 19:13:26 UTC
Hi Michal,
this machine is used quite productive (multiseat with 2 Dual-DVI-cards, 2 keyboards, 2 mice, 10 USB ports and various HUBS and 3 USB smartcards), so I can't easily change the settings and reboot because I can't nearly find downtime.

In the meantime, maybe the current settings/config can help to find the cause?
I'll attach the dmesg, ups config and systemd-analyze-plot svg file. I should mention, that in this case here nut-driver.service succeeds (since it is started much later than in the other case).

Interesting here in this log is, that you will find systemd to have executed udev-settle as well here early. network initialization takes long (9 seconds).
And, in the dmesg:
[    3.846158] usb 2-1.7: New USB device found, idVendor=0463, idProduct=ffff
[    3.846161] usb 2-1.7: New USB device strings: Mfr=1, Product=2, SerialNumber=4
[    3.846162] usb 2-1.7: Product: Eaton 3S
...
[    7.179346] hid-generic 0003:0463:FFFF.0001: hiddev0,hidraw0: USB HID v10.10 Device [EATON Eaton 3S] on usb-0000:00:1d.0-1.7/input0
(my assumption is, that even in my original config the detection of the "new USB device" and the generation of the hiddev0 device took equally long)

My feeling is still, that this time the start of usbhid-ups (nut-driver.service) suceeded because the device node (/dev/bus/usb/002/004) was now accessible when it was started.

> What error do you get in logs exactly?

I can't test right now, but last time I did an strace and IIRC, it was that usbhid-ups could not open the /dev-node file (ENODEV or EEXIST or similiar).
That's why I opened this ticket.

Comment 6 Helge Deller 2013-09-11 19:17:19 UTC
Created attachment 796497 [details]
dmesg of successful boot

contents of /etc/ups* files:
----------------------------
/etc/ups/upsd.users:[upsmon]
/etc/ups/upsd.users:            password  = somepass
/etc/ups/upsd.users:            upsmon master
/etc/ups/ups.conf:[eaton3s]
/etc/ups/ups.conf:      driver = "usbhid-ups"
/etc/ups/ups.conf:      port = "auto"
/etc/ups/ups.conf:      product = "Eaton 3S"
/etc/ups/nut.conf:MODE=standalone
/etc/ups/upsmon.conf:MONITOR eaton3s@localhost 1 monuser somepass master
/etc/ups/upsmon.conf:MINSUPPLIES 1
/etc/ups/upsmon.conf:SHUTDOWNCMD "/sbin/shutdown -h +0"
/etc/ups/upsmon.conf:POLLFREQ 10
/etc/ups/upsmon.conf:POLLFREQALERT 5
/etc/ups/upsmon.conf:HOSTSYNC 15
/etc/ups/upsmon.conf:DEADTIME 15
/etc/ups/upsmon.conf:POWERDOWNFLAG /etc/killpower
/etc/ups/upsmon.conf:NOTIFYFLAG ONLINE  SYSLOG+WALL+EXEC
/etc/ups/upsmon.conf:NOTIFYFLAG ONBATT  SYSLOG+WALL+EXEC
/etc/ups/upsmon.conf:NOTIFYFLAG LOWBATT SYSLOG+WALL
/etc/ups/upsmon.conf:RBWARNTIME 43200
/etc/ups/upsmon.conf:NOCOMMWARNTIME 300
/etc/ups/upsmon.conf:FINALDELAY 5

Comment 7 Helge Deller 2013-09-11 19:17:57 UTC
Created attachment 796498 [details]
svg of successful boot

Comment 8 Fedora End Of Life 2015-01-09 19:42:31 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Fedora End Of Life 2015-02-17 17:02:42 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.