Bug 1330550
Summary: | Teaming service is lacking ordering dependencies for shutdown | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniele <dconsoli> | ||||||||
Component: | libteam | Assignee: | Marcelo Ricardo Leitner <mleitner> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Amit Supugade <asupugad> | ||||||||
Severity: | high | Docs Contact: | Mirek Jahoda <mjahoda> | ||||||||
Priority: | high | ||||||||||
Version: | 7.1 | CC: | aperotti, asupugad, dconsoli, fadamo, kzhang, lnykryn, mjahoda, mleitner, network-qe, sukulkar, systemd-maint-list | ||||||||
Target Milestone: | rc | Keywords: | Reopened, ZStream | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libteam-1.25-3.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Prior to this update, when shutting down a system, the Team daemon (teamd) was stopped too early. As a consequence, the umount command for systems using NFS over a Team driver could wait too long, and this delayed the whole shutdown process. The libteam package has been fixed to better respect shutdown ordering dependencies, and teamd no longer delays system shutdowns.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 1354382 1420814 (view as bug list) | Environment: | |||||||||
Last Closed: | 2016-11-04 01:01:38 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1354382, 1420814 | ||||||||||
Attachments: |
|
Description
Daniele
2016-04-26 12:37:30 UTC
I think that there might be an ordering problem. teamd instances could be terminated anytime during the shutdown. There even might be some race condition with network initscripts, which calls ifdown-Team* which also kills the instances. This seems to be related: https://github.com/jpirko/libteam/commit/2d240e58e07301f40f0b464d84be70e45ceb383d Maybe we also should add Before=network.service, to make sure that the teaming will be killed by network initscripts during shutdown. Yup. Dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1264175, right? Partially, our customer has tried adding After=network.target, but it did not fix the issue. But it looks that Before=network.service did the job. The upstream patch mentions network.service: > there exists an issue: if another service depends network.servie, maybe teamd > service will shutdown ealier than it. cause systemd close them concurrently. but > if it is necessary for that service to ensure the iface up, that service will not be able to work. > > this issue also exits in nfs over team. But the patch does not add any ordering dependency for it. Both of those services will be run in parallel. Okay, that's pretty much what happened with that bz too. Before= was the final solution (comment #22 confirms it). The upstream patch you mentioned is being tracked by that bz, but which got actually applied to RHEL by a libteam rebase. The bz is still open so if the customer needs a z-stream, it can be requested there. But you're saying that instead of using Before=network.target It was better to use: Before=network.service instead? I'm not sure which one is better now, please enlight me :-) Well I meant using both :-D That daemon provides network services so it must have Before=network.target. But because it provides the ifdown scripts and what I have understood, those are proffered method of shutting the interfaces down (when network.service is used), than it should have Before=network.service as well. Xin, parking this one with you. You worked on that other fix, you probably know the details better than me. Thanks sorry for late: if to use Before=network.service doesn't work it out, it must be becuase of NM. see: https://bugzilla.redhat.com/show_bug.cgi?id=1264175#c25 (In reply to Xin Long from comment #11) > sorry for late: > if to use Before=network.service doesn't work it out, it must be becuase of > NM. see: > https://bugzilla.redhat.com/show_bug.cgi?id=1264175#c25 I am not sure I follow, the extra Before dependency only add additional ordering in the case that there will both stop jobs for network script and teamd deamons in one transaction. Also in that case we also might want to add --noblock to systmectl stop in the ifdown script so we avoid deadlocks. (In reply to Lukáš Nykrýn from comment #12) > (In reply to Xin Long from comment #11) > > sorry for late: > > if to use Before=network.service doesn't work it out, it must be becuase of > > NM. see: > > https://bugzilla.redhat.com/show_bug.cgi?id=1264175#c25 > > I am not sure I follow, the extra Before dependency only add additional > ordering in the case that there will both stop jobs for network script and > teamd deamons in one transaction. yes, as long as teamd deamon is managed by systemd, usually, it is. *but* if we use NM to manage teamd, teamd deamon would be not a systemd's service any more, "Before" parameter would not work. (In reply to Lukáš Nykrýn from comment #13) > Also in that case we also might want to add --noblock to systmectl stop in > the ifdown script so we avoid deadlocks. you can try to disable NM to work around this issue, it did work before in my env. I think the better fix should be on NM, like let NM use systemctl to manage teamd deamon, so that it would still be a service of systemd. But in this case, the customer was not using NM, the problem was with initscripts. Hi, Lukáš, if no use NM, the issue must be caused by something else. becase Before=network.service has made sure that teamd is killed after network service. I will close this bug, if any team issue about this found, you can reopen it. I am not sure if you have not mistaken network.service and network.target. In 7.3 dist-git I was only able to find Before=network.target. But I still think that we see some race condition between teamd deamon and network initscripts that calls the ifdown-teamd scripts. Those two action does not have any ordering against each other. Created attachment 1184665 [details]
Hang's screnshot
I have the same problem with the "old" network service (nm is masked), teamd and nfs mounts.
I have to manually unmount the network filesystems before rebooting otherwise the shutdown sequence hangs trying to umount nfs filesystems.
RHEL 7.2
(In reply to yuk from comment #25) > Created attachment 1184665 [details] > Hang's screnshot > > I have the same problem with the "old" network service (nm is masked), teamd > and nfs mounts. > I have to manually unmount the network filesystems before rebooting > otherwise the shutdown sequence hangs trying to umount nfs filesystems. Hi yuk, Would you please try add 'Before' and 'Wants' in teamd@.service? just like [1] did and see if this issue still exists? [1] https://github.com/jpirko/libteam/blob/master/teamd/redhat/systemd/teamd%40.service Thanks Hangbin Created attachment 1186060 [details]
Hang after modify
Still hang with
Before=network-pre.target
Wants=network-pre.target
# cat /usr/lib/systemd/system/teamd@.service
[Unit]
Description=Team Daemon for device %I
Before=network-pre.target
Wants=network-pre.target
[Service]
BusName=org.libteam.teamd.%i
ExecStart=/usr/bin/teamd -U -D -o -t %i -f /run/teamd/%i.conf
Restart=on-failure
RestartPreventExitStatus=1
(In reply to yuk from comment #27) > Created attachment 1186060 [details] > Hang after modify Hi Yuk, Sorry for the late response. Here are the total upstream fix: [1] https://github.com/jpirko/libteam/commit/2d240e58e07301f40f0b464d84be70e45ceb383d [2] https://github.com/jpirko/libteam/commit/0641375d10d692e3dacaeec95e36f2525b95881d [3] https://github.com/jpirko/libteam/commit/4a9e1fac5d69e6abae0451c579b02f16d960e694 Could you please add --ignore-dependencies in ifdown-Team like patch[3] and have a try again? Thanks Hangbin Hi Hangbin Liu, thanks for your update. The final patch seems to work! The server now reboots fine. Bye Fabio Hi all, I copied the pathed files: /usr/lib/systemd/system/teamd@.service /etc/sysconfig/network-scripts/ifdown-Team to another server and rebooted it. Still hang on unmounting nfs filesystems (nfs server not responding)... The problem seems still present. Bye Fabio Created attachment 1192965 [details]
Hang screenshot
Hi Fabio, do you know what changed from comment #29 and comment #31? And note that you should also need the fix from https://bugzilla.redhat.com/show_bug.cgi?id=1354382#c4 Hi Marcelo, nothing has changed on the server on which there was the problem. I copied the modified scripts on another server and this server has hanged during the shutdown. Now I integrated also the last fix and a second reboot went fine. May be I missed "systemctl daemon-reload" after copying the modified files. Bye Fabio Ah, phew, ok thanks :) Hi, Ran test multiple times and machines did not hang during reboot. Verified on- libteam-1.25-2.el7.x86_64 teamd-1.25-2.el7.x86_64 Hi Amit, do you know when the version 1.25-3.el7 will be available ? Thanks Bye Hi all, do you know when the version 1.25-3.el7 will be available ? Thanks Bye Hi yuk, with RHEL 7.3, so in a month or so. Note that this bug requires fixes that went in systemd package too. Hope that helps! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2219.html |