Bug 1330694
Summary: | NM does not run pre-down scripts on suspend/sleep/hibernate | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Orion Poplawski <orion> |
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> |
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.2 | CC: | aloughla, atragler, bgalvani, fgiudici, lrintel, rkhan, thaller, vbenes |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | NetworkManager-1.4.0-0.1.git20160606.b769b4df.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-03 19:09:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Orion Poplawski
2016-04-26 17:41:25 UTC
Pushed branch bg/disconnect-on-suspend-rh1330694. >> device: add nm_device_queue_unmanaged_by_flags() I don't like the name too much - nm_device_set_unmanaged_flags() sets the flags (only) - nm_device_set_unmanaged_by_*() set the flag and change the device state as needed. - nm_device_queue_unmanaged_by_flags(): sets the flags and queues a device state (as needed). What they all do in the first place is setting the unmanaged flags. Can we name them all with a nm_device_set_unmanaged_* prefix? Maybe nm_device_set_unmanaged_by_flags_queue()? >> sleep-monitor: add functions for delaying the suspension You say: "If some action were not completed, callers must avoid using release_inhibitor() after the wake-up.", which is a very good advice, but it's unclear how the caller would know when he should avoid release_inhibitor() :) I think nm_sleep_monitor_keep_inhibitor() should return an opaque handle that must be released exactly once. An internal sleep_signal() might invalidate the current release handles, but when the caller does release_inhibitory() on a stale handle, it gets released without actually touching the fd or inhibitor state. How about the fixup commit? >> manager: disconnect devices before unmanaging them lgtm (In reply to Thomas Haller from comment #3) > >> device: add nm_device_queue_unmanaged_by_flags() > > Maybe nm_device_set_unmanaged_by_flags_queue()? Sounds good. > >> sleep-monitor: add functions for delaying the suspension > > You say: "If some action were not completed, callers must avoid using > release_inhibitor() after the wake-up.", which is a very good advice, but > it's unclear how the caller would know when he should avoid > release_inhibitor() :) The caller must subscribe to the SLEEPING signal since keep_inhibitor() can only be called in the handler for the signal. Any use of the function outside the handler is subject to races. So, since the caller is subscribed to the signal, it can certainly know when the wake-up happens :) Even if release_inhibitor() is called after the wake-up but before the SIGNAL (sleep=false) is received, that is harmless. What is important is that the function is not called after the signal, because it would decrease the counter after it has been reset. > I think nm_sleep_monitor_keep_inhibitor() should return an opaque handle > that must be released exactly once. An internal sleep_signal() might > invalidate the current release handles, but when the caller does > release_inhibitory() on a stale handle, it gets released without actually > touching the fd or inhibitor state. > > How about the fixup commit? What's the purpose of the stale_list? Can we have only active_list and ignore release() for non-active items? Also, is there any advantage of this over a simple counter? Because with the fixup callers have to store the handle and this would (only slightly) complicate the next patch. (In reply to Beniamino Galvani from comment #4) > (In reply to Thomas Haller from comment #3) > > >> device: add nm_device_queue_unmanaged_by_flags() > > > > Maybe nm_device_set_unmanaged_by_flags_queue()? > > Sounds good. > > > >> sleep-monitor: add functions for delaying the suspension > > > > You say: "If some action were not completed, callers must avoid using > > release_inhibitor() after the wake-up.", which is a very good advice, but > > it's unclear how the caller would know when he should avoid > > release_inhibitor() :) > > The caller must subscribe to the SLEEPING signal since > keep_inhibitor() can only be called in the handler for the signal. Any > use of the function outside the handler is subject to races. So, since > the caller is subscribed to the signal, it can certainly know when the > wake-up happens :) > > Even if release_inhibitor() is called after the wake-up but before the > SIGNAL (sleep=false) is received, that is harmless. What is important > is that the function is not called after the signal, because it would > decrease the counter after it has been reset. I don't know, I suspect not :) Note that not matching every keep() with a release() call will mess up the counting. For example, take_inhibitor() sets inhibit_count to zero, so, you must no longer call release() for the keep() calls or you hit the assertion g_return_if_fail (self->inhibit_count); In other words, after sleep-monitor calls take_inhibitor(), the caller must know not to return the handle anymore. - receive sleep_signal(is_about_to_suspend=TRUE) - emit SLEEPING - NMManager calls keep_inhibitor() shortly after: - receive sleep_signal(is_about_to_suspend=FALSE) - take_inhibitor() -- which resets inhibit_count=0 - emit SLEEPING - if (manager_sleeping (self)) (a) might register a second sleep-inihibitor(?) else (b) clear_sleep_devices(). Ok, in this case it's all good. After take_inhibitor() we forget about release_inhibit(). after (a), when device_sleep_cb() gets called and wants to call release_inhibitor(). Which is wrong and triggers an assertion. So, at least you always must clear_sleep_devices() after sleeping_cb(is_about_to_suspend=FALSE); -- It gets more complicated, because do_sleep_wake() is also triggered by `nmcli networking on|off`. So, how do they play together? > > I think nm_sleep_monitor_keep_inhibitor() should return an opaque handle > > that must be released exactly once. An internal sleep_signal() might > > invalidate the current release handles, but when the caller does > > release_inhibitory() on a stale handle, it gets released without actually > > touching the fd or inhibitor state. > > > > How about the fixup commit? > > What's the purpose of the stale_list? Can we have only active_list and > ignore release() for non-active items? > Also, is there any advantage of this over a simple counter? Because > with the fixup callers have to store the handle and this would (only > slightly) complicate the next patch. I think, the pattern should be that you must exactly return the handle once. Everything else, is hard to get right. And it's even harder to verify/review. The handles and the stale-list enforces this. The stale-list is only(!) there to assert that the caller didn't mess up the counting. It could be omitted, but it's really a trivial implementation detail of NMSleepMonitor. The handle is necessary, because an internal event can invalidate the handle -- that is, after take_inhibitor(), a release() must do nothing. You cannot express that with a counter. With a counter only, the caller must know when (or when not) to release. Ok, maybe you can also do... dunno. static void sleeping_cb (NMSleepMonitor *monitor, gboolean is_about_to_suspend, gpoi... { NMManager *self = user_data; _LOGD (LOGD_SUSPEND, "Received %s signal", is_about_to_suspend ?... clear_sleep_devices (self); _internal_sleep (self, is_about_to_suspend); } (In reply to Thomas Haller from comment #5) > - receive sleep_signal(is_about_to_suspend=TRUE) > - emit SLEEPING > - NMManager calls keep_inhibitor() > > shortly after: > > - receive sleep_signal(is_about_to_suspend=FALSE) > - take_inhibitor() -- which resets inhibit_count=0 > - emit SLEEPING > - if (manager_sleeping (self)) > (a) might register a second sleep-inihibitor(?) > else > (b) clear_sleep_devices(). Ok, in this case it's all good. > After take_inhibitor() we forget about release_inhibit(). > > after (a), when device_sleep_cb() gets called and wants to call > release_inhibitor(). Which is wrong and triggers an assertion. When the manager receives the SLEEPING signal after a wake up (a) should never be hit... or am I reading it wrong? > So, at least you always must clear_sleep_devices() after > sleeping_cb(is_about_to_suspend=FALSE); > > I think, the pattern should be that you must exactly return the handle once. > Everything else, is hard to get right. And it's even harder to verify/review. Yeah, this makes sense, and actually isn't much more complicated for callers. > -- It gets more complicated, because do_sleep_wake() is also triggered by > `nmcli networking on|off`. So, how do they play together? On networking off the manager should not take the inhibitor, apart from this I think both sleep and network-off should be handled in the same way (being careful not to deactivate an interface which is already deactivating). How about bg/disconnect-on-suspend-rh1330694-v2 ? (In reply to Beniamino Galvani from comment #7) > (In reply to Thomas Haller from comment #5) > How about bg/disconnect-on-suspend-rh1330694-v2 ? I like this. It may be more lines of code, but something that I can understand (simply make sure not to loose a handle without returning it). I think if (!g_hash_table_contains (priv->sleep_devices, device)) { is wrong. How about the fixup? (In reply to Thomas Haller from comment #8) > I think > if (!g_hash_table_contains (priv->sleep_devices, device)) { > is wrong. How about the fixup? Looks good, thanks! lgtm now lgtm Merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3fdffab9550bf7e55815d7a4453854057de55f0a pre-down script in /etc/NetworkManager/dispatcher.d/pre-down.d/97-disp was successfully executed during suspend/resume cycle Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2581.html |