Bug 1316130
Summary: | shutdown clvmd fails and results in fencing | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Christoph <c.handel> | ||||
Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.2 | CC: | abeekhof, agk, apanagio, c.handel, cluster-maint, fdinitto, heinzm, jbrassow, jpokorny, kgaillot, kwenning, mnovacek, msnitzer, oalbrigt, prajnoha, prockai, rhel-docs, sbradley, zkabelac | ||||
Target Milestone: | rc | Keywords: | Documentation | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | resource-agents-3.9.5-97.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1446669 1449419 (view as bug list) | Environment: | |||||
Last Closed: | 2017-08-01 14:55:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1446669, 1449419 | ||||||
Attachments: |
|
Description
Christoph
2016-03-09 13:39:57 UTC
Christoph, Many thanks for reporting this issue and a workaround. After thinking about a general solution to this type of situation, I'm inclined to this approach: 1. Whenever possible, if a resource has a dependency, that dependency should also be managed by pacemaker. For example, disable the service in systemd, and add a systemd resource for it in the cluster, with appropriate constraints. 2. Because that will not always be feasible (the local system might require the dependency before pacemaker is started), resource agents should be allowed to register non-cluster-managed soft dependencies when doing a "start" action, and remove them when doing a "stop" action. 3. If neither of the above options is available, the administrator can do the workaround themselves, as in your example. --- To accomplish #2, I'm reassigning this to the resource-agents component. The idea is that ocf-shellfuncs could supply two new functions, for example "ocf_register_os_dependency $PKG" and "ocf_remove_os_dependency $PKG", that would add/remove After=$PKG.service to/from /etc/systemd/system/pacemaker.service.d/$RESOURCE_AGENT.conf. The clvmd RA would be modified to use these for the two dependencies you mentioned. I'm open to suggestions and alternatives. It only needs to handle systemd to start, but it could be broadened to other init systems in the future if desired. The main drawback I see is that different distros sometimes name packages differently, but I don't think that's a show-stopper since it should always be a soft dependency. --- For the record, other solutions that were considered: We could simply add After lines to the pacemaker unit file as they are found. However, this would clutter the file, potentially create unnecessary dependencies for the majority of clusters that don't use the particular resources, and be difficult to maintain (likely a permanent and growing list that never shrinks). We could have RAs list such dependencies in their metadata, and pacemaker (rather than ocf-shellfuncs) could manage the override files. But ocf-shellfuncs seem to be a simpler and more appropriate place since this is about the RA configuring its environment and not how pacemaker executes the RA. After further investigation, it's not a good idea to do this automatically in resource-agents. Adding/removing a drop-in unit file would require a systemctl daemon-reload, and it's not a good idea to do that every time a resource starts or stops. I think this type of knowledge is too dependent on the local situation to have a clean automated solution. We'll have to leave it to the system administrator to configure the dependencies appropriately. I'm reassigning this BZ again, to documentation, so we can document the issue and workarounds. -- Documentation: We need to describe somewhere what to do if a cluster resource depends on a service that is not under cluster control. If nothing is done, systemd may choose to stop the depended-on service before stopping pacemaker at system shutdown or reboot, leading the cluster resource to fail to stop. The preferred solution is to configure the dependency to be managed by the cluster. The system administrator can disable the service in systemd, and add a systemd resource for it in the cluster, with appropriate constraints. If that is not feasible (for example, the local system requires the dependency to be active before pacemaker is started), the system administrator can create a drop-in systemd unit in /etc/systemd/system/pacemaker.service.d ordering pacemaker after the dependency, as described at: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/sect-Managing_Services_with_systemd-Unit_Files.html#sect-Managing_Services_with_systemd-Unit_File_Modify as clvmd migrated from a system service (el6) to a pacemaker resource (el7), i would then vote to at add multipathd and blk-availability to pacemakers systemd unit. Good point, that may be an exception to the general case. (In reply to Christoph from comment #4) > as clvmd migrated from a system service (el6) to a pacemaker resource (el7), > i would then vote to at add multipathd and blk-availability to pacemakers > systemd unit. No. Pacemaker does not depend on multipath. If the clvmd agent does, thats its issue to manage. After discussing further, I think there is a possibility of doing this in the agent. The agent doesn't really have to remove the dependency when the resource stops. Not doing so opens the potential of leaving the drop-in unit lying around if clvmd is ever removed from the cluster, but it's harmless other than a bit of clutter. And then, the start action has to add the file and do a daemon reload only if it doesn't already exist. The agent could support a needs_multipath=true option if doing this for every clvmd user isn't desired. Oyvind, what do you think? The cleanest way is still to make multipathd a cluster resource, so the cluster can model the dependency using normal constraints. Christoph, is there any reason this idea wouldn't work for your specific case? Even having the clvmd agent add a multipath dependancy unconditionally (if adding a new option was not desirable) is still orders of magnitude more desirable than adding it for all installations. We really don't want to establish a precedent of shipping these kinds of dependancies as part of pacemaker and its unfair to have the admin deal with it. Either agents handle hidden dependancies themselves (we can of course provide tools to simplify that task) or they need to be pacemaker resources with constraints. The only other option I can think of, is if clvmd itself shipped the systemd override. That would also achieve the desired result while limiting its effects to a more relevant subset of installations. > The cleanest way is still to make multipathd a cluster resource, so the cluster > can model the dependency using normal constraints. Christoph, is there any > reason this idea wouldn't work for your specific case? multipath can be used without pacemaker. SAN block devices are often multipaths. There is even the option to have a mix between multipaths used by the system and others used by pacemaker. For example logging filesystem are host specific not started by pacemaker. But the date filesystem is a gfs2 started by pacemaker. > The only other option I can think of, is if clvmd itself shipped the systemd > override. That would also achieve the desired result while limiting its > effects to a more relevant subset of installations. I think this is a very good solution. Pacemaker does not know if it will run a clvmd resource agent. So it should not bother with the dependency. Clvmd knows, that it will require the multipath and blk-availability. It is a pacemaker resource agent, so it knows it will be run from pacemaker. So pacemaker could add the dependency itself during installation. The thing I don't know is, if it is desirable to have overrides shipped with an rpm. I'm not that firm in systemds config preferences. I agree Andrew. It shouldnt even have to be an override, as I think storage services should be the last services to stop in case they're used for any services. Pacemaker's startup/shutdown sequence is necessarily a good bit longer than most other services'. So if systemd wants to start/stop them together, I could see timing issues only showing up at shutdown. But I agree that it's surprising that it wants to start/stop them together. But since it at least has that possibility, it does need an override, and the override needs to be for the pacemaker service (so pacemaker as a whole is ordered after multipathd/blk-availability). If RPMs are allowed to deploy systemd drop-ins (and I think they are), it does make sense to do this in the RHEL7 clvmd package. Otherwise, the RA's start action could add it. Reassigning yet again, to clvmd, to get their opinion ... The idea is that when lvm2-cluster is installed, it would add a drop-in /etc/systemd/system/pacemaker.service.d/lvm2-cluster.conf with the content shown in this bz's Description. Sorry if it is utterly crazy idea, and I haven't considered nearly anything here, but if there's an issue with pacemaker being in the service dependency chain explicitly, what about arranging for a middleman like this: 1. add a dummy oneshot, RemainAfterExit=yes, not Exec{StartStop} specifying service, e.g., called ocf-runner, which is shipped with resource-agents (+ documented properly so that the actual "OCF runners" can accommodate that practice if desired, etc.) 2. pacemaker unit files are modified (or again, drop-in style is used) to specify "After=ocf-runner.service" 3. particular agents put the drop-ins for ocf-runner rather than directly for pacemaker IOW, transitivity should solve the problem in a way generic enough that there's (almost) nothing specific to pacemaker. Possible troubles are mostly integration ones. E.g., should 2. happen behind pacemaker's back silently when updated resource-agents are installed, at least to solve some up-/down-grade issues? Perhaps RPM triggers feature would help here. (In reply to Jan Pokorný from comment #16) > Sorry if it is utterly crazy idea, and I haven't considered nearly > anything here, but if there's an issue with pacemaker being in the > service dependency chain explicitly, what about arranging for > a middleman like this: > > 1. add a dummy oneshot, RemainAfterExit=yes, not Exec{StartStop} > specifying service, e.g., called ocf-runner, which is shipped > with resource-agents (+ documented properly so that the actual > "OCF runners" can accommodate that practice if desired, etc.) > > 2. pacemaker unit files are modified (or again, drop-in style is > used) to specify "After=ocf-runner.service" > > 3. particular agents put the drop-ins for ocf-runner rather than > directly for pacemaker > > IOW, transitivity should solve the problem in a way generic enough > that there's (almost) nothing specific to pacemaker. > > Possible troubles are mostly integration ones. E.g., should 2. > happen behind pacemaker's back silently when updated resource-agents > are installed, at least to solve some up-/down-grade issues? > Perhaps RPM triggers feature would help here. This would definitely be the best automated solution. It doesn't have to be a dummy service; resource-agents could create a systemd target. The definition would be essentially empty; individual RAs would add "After" overrides for the target when they start. We can put After=whatever.target in pacemaker's unit file without worrying about mismatched package versions, because the worst that happens is that users are in the same situation as now. However, any automated approach has serious downsides: * We can't know all possible dependencies. For example, a Filesystem resource might depend on iscsi, or it might not. * We can't assume that the dependency is or is not a pacemaker resource. If it is, we don't want systemd starting or stopping it. For example, some users will use iSCSILogicalUnit/iSCSITarget as resources, while others will let systemd manage iSCSI. Some specialized services, such as blk-availability, could be unconditionally added as external dependencies, but I suspect most can't. Handling it in the resource agent has additional drawbacks: * We'd need a new, systemd-specific ocf_shellfuncs function to register an external dependency. It could be a no-op on non-systemd hosts, but it diverges behavior, with implications for testing etc. * It requires a systemctl daemon-reload if the drop-in doesn't already exist, which is a bit of extra overhead, and possibly unexpected by sysadmins. I guess the next step is to determine what dependencies might be unconditionally added in resource-agents, and based on that, decide whether it's worth creating a target. Either way, we'll need to document (man pages, metadata, online docs) how sysadmins can add dependencies based on their specific environment. Reassigning back to resource-agents, because it looks like that will need changes whatever we go with. (it could also be s/\.service/\.target/ though no experience here) What speaks against having a meta-attribute for semi-automated creation of the target where you put in the dependencies manually? But pacemaker would still be able to modify the target e.g. depending on the node or if the resource is enabled or not. (In reply to Klaus Wenninger from comment #19) > What speaks against having a meta-attribute for semi-automated creation > of the target where you put in the dependencies manually? > But pacemaker would still be able to modify the target e.g. depending on the > node or if the resource is enabled or not. Logically, it doesn't have any connection to pacemaker -- it's a dependency of the agent, which could (in theory) be called by something other than pacemaker. So it makes more sense to me to handle it in resource-agents, if practical. Also, it would be overly complicated in pacemaker. We'd need to replicate systemd's Before/After/Wants/Requires in metadata, since the local dependencies could need any of those. And that would be sort of a confusing half-step to making the dependency a cluster resource. (In reply to Jan Pokorný from comment #18) > (it could also be s/\.service/\.target/ though no experience here) Yes, a .target is essentially identical to a .service, but with only [Unit] dependency information (Before/After/Wants/Requires), no [Service] section. I'm leaning to this solution: * resource-agents would deploy a systemd target for agent dependencies (basically just a name, no actual dependencies listed) * pacemaker's systemd unit file would add After= and Wants= with the new target * If a particular resource agent has a systemd unit dependency for something that cannot be managed by pacemaker as a resource, that agent could create a drop-in adding the dependency to the target when it is started. For example, clvmd and LVM require blk-availability, but blk-availability would never be a pacemaker resource. I would avoid automating any other dependencies, because we don't know whether pacemaker will manage them -- for example, LVM might depend on iSCSI or multipathd, but we wouldn't want drop-in dependencies for them if pacemaker is managing them. * System administrators would be required to manually create drop-ins for the new target for any local dependencies. Resource agent man pages and meta-data, and any relevant online documentation, would be updated to mention how to do this. Resource agents could mention common dependencies (such as iSCSI and multipathd for Filesystem). (In reply to Ken Gaillot from comment #20) > (In reply to Klaus Wenninger from comment #19) > > What speaks against having a meta-attribute for semi-automated creation > > of the target where you put in the dependencies manually? > > But pacemaker would still be able to modify the target e.g. depending on the > > node or if the resource is enabled or not. > > Logically, it doesn't have any connection to pacemaker -- it's a dependency > of the agent, which could (in theory) be called by something other than > pacemaker. So it makes more sense to me to handle it in resource-agents, if > practical. Of course it could be called by anything but the dependencies might differ (e.g. filesystem sitting on top of a device that is configured) depending on how then agent is configured. So there is some connection between the set of attributes in the cib and the dependencies. Hence as the agent wouldn't do anything with the attribute directly but pacemaker would handle it generically for all resources the idea of making it a meta-attribute. > > Also, it would be overly complicated in pacemaker. We'd need to replicate > systemd's Before/After/Wants/Requires in metadata, since the local > dependencies could need any of those. And that would be sort of a confusing > half-step to making the dependency a cluster resource. True. But it would be a way to avoid local configuration that isn't visible in the cib ... Just noticed systemd.generator(7) which also looks appealing, but unfortunately such new generator would be pacemaker-specific and work as resources-in-CIB to trigger-dropin-creation converter. Practically, it would have to be arranged that whenever first instance of given resource is newly configured in CIB, pacemaker would trigger systemctl daemon-reload (or equivalent via DBus), which itself would trigger pacemaker's own generator, which would be a simple program to: 1. detect, which new resources are present that haven't been present before (set NEW) and which resources are not present anymore (set REMOVED) 2. for each from REMOVED, remove artificial, previously crafted drop-ins 3. for each from NEW, "deduce which systemd dependencies are required", and craft artificial drop-ins respectively if so 4. daemon-reload process continues as usual, taking new changes into account immediately Two notes: A. once the cluster is well-established (no new resources being added, no resources removed), rate of such inflicted systemd daemon reloads will be next to zero; downside is that there may be a lot of false positives (resources inactive or running on other nodes), but it should be more bearable than "activate on install" B. it is an open question how "deduce which systemd dependencies are required" should work; my suggestion would be to combine agent's own logic with some to-be-established meta attributes (inspired by Klaus): if particular agent implements "systemd-deps" action, use the output from here, otherwise look at those meta attributes, and skip the agent completely from processing when neither is applicable Apparently, it might be made more universal, the only prerequisite is to make actual "OCF runner" trigger "systemctl daemon-reload" whenever first instance of given resource is newly configured (+ possibly also last instance disappears) from configuration repertoire of the runner (e.g. configuration file), plus implementing a program akin to points 1.-4. above. re [comment 16]: Btw. the overall coupling of cluster components is still pretty high despite the progress in the right direction (think the heartbeat monolith before and separated components like cluster-glue nowadays). I think there's a room for improvement, and in the context of this bug, it might be worth considering splitting proper resource agents themselves and OCF meta layer (not sure if with or without the shell library, but perhaps the constants/paths should be authoritatively declared right here). It would be expected to provide integration help also to other OCF users, e.g. 3rd party resource agent authors. In terms of RPM packaging, ocf-standard package might own directories like /usr/lib/ocf (no need for pacemaker and resource-agents to co-own that anymore, just depend on ocf-standard). This is also a project/package that might be responsible for mentioned ocf-runner.(target|service) ([comment 16]) and perhaps even parts of the logic sketched above. re [comment 23]: > A. once the cluster is well-established (no new resources being added, > no resources removed), rate of such inflicted systemd daemon reloads > will be next to zero; downside is that there may be a lot of false > positives (resources inactive or running on other nodes), but it > should be more bearable than "activate on install" ...or for that matter, than "activate on resource start". I still advocate the approach in Comment 21. We want systemd to see the dependencies as *pacemaker* dependencies -- started before pacemaker, and stopped after pacemaker. Any approach that relies on pacemaker doing something is too late. If someone wants the dependency configuration to be in pacemaker, it has to be a separate resource. The approach here would be for systemd services that are needed by pacemaker resources but might also be needed when pacemaker isn't (yet) running -- which means configuring them in pacemaker is not appropriate. Created attachment 1275915 [details]
Planned pacemaker systemd unit file, for testing
I have vefified that with the new functionality the necessary services will be started before pacemaker in resource-agents-3.9.5-99 (pacemaker-1.1.16-9) ------ Before the patch (resource-agents-3.9.5-97, paceamker-1.1.16-1) =============================================================== >> There is no resource-agents-deps.target or any other target that we know >> that would always come before pacemaker.service [3] After the patch (resource-agents-3.9.5-99, paceamker-1.1.16-9) ============================================================== >> There is new resource-agents-deps.target created in the system (/usr/lib/systemd/system/resource-agents-deps.target) >> Pacemaker systemd unit have dependency and that target (Wants= [root@virt-196 ~]# systemctl cat pacemaker | grep resource-agents-deps.target After=resource-agents-deps.target Wants=resource-agents-deps.target >> resource agents that have dependencies that needs to be started BEFORE pacemaker starts will created drop-in files [1]. At this time it is LVM and clvmd. >> systemctl cat resource-agents-deps.target # /usr/lib/systemd/system/resource-agents-deps.target [Unit] Description=resource-agents dependencies # /run/systemd/system/resource-agents-deps.target.d/99-LVM.conf [Unit] After=blk-availability.service # /run/systemd/system/resource-agents-deps.target.d/99-clvmd.conf [Unit] After=blk-availability.service >> Order of services as they are run by systemd: resource-agents-deps will >> always be before pacemaker.service (based on [2]): ... 1) multipathd.service 2) iscsi.service 3) blk-availability.service 4) resource-agents-deps.target 5) pacemaker.service ---- [1] systemd drop-ins https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/sect-Managing_Services_with_systemd-Unit_Files.html#brid-Managing_Services_with_systemd-Extending_Unit_Config (2) systemd-analyze (after the patch) [root@virt-196 ~]# systemd-analyze dot resource-agents-deps.target iscsi.service multipathd.service --order digraph systemd { > "blk-availability.service"->"iscsi.service" [color="green"]; "lvm2-activation-net.service"->"iscsi.service" [color="green"]; "lvm2-activation-early.service"->"multipathd.service" [color="green"]; "multipathd.service"->"systemd-journald.socket" [color="green"]; "multipathd.service"->"system.slice" [color="green"]; "multipathd.service"->"syslog.target" [color="green"]; "multipathd.service"->"systemd-udev-trigger.service" [color="green"]; "iscsi.service"->"iscsid.service" [color="green"]; "iscsi.service"->"network.target" [color="green"]; "iscsi.service"->"systemd-journald.socket" [color="green"]; "iscsi.service"->"systemd-remount-fs.service" [color="green"]; "iscsi.service"->"iscsiuio.service" [color="green"]; > "iscsi.service"->"multipathd.service" [color="green"]; "iscsi.service"->"system.slice" [color="green"]; > "pacemaker.service"->"resource-agents-deps.target" [color="green"]; > "resource-agents-deps.target"->"blk-availability.service" [color="green"]; "iscsid.service"->"multipathd.service" [color="green"]; "remote-fs-pre.target"->"iscsi.service" [color="green"]; } (3) systemd-analyze (before the patch) Note that you need to have pacemaker < 1.1.16-9 or systemd-analyze will give you 'Access denied' because it does not find Wants=resource-agents-deps.target. Also there is no resource-agents-deps.target before resource-agents-3.9.5-99. [root@virt-197 ~]# systemd-analyze dot pacemaker.service iscsi.service multipathd.service --order digraph systemd { "pacemaker.service"->"corosync.service" [color="green"]; "pacemaker.service"->"systemd-journald.socket" [color="green"]; "pacemaker.service"->"syslog.service" [color="green"]; "pacemaker.service"->"dbus.service" [color="green"]; "pacemaker.service"->"system.slice" [color="green"]; "pacemaker.service"->"network.target" [color="green"]; "pacemaker.service"->"rsyslog.service" [color="green"]; "pacemaker.service"->"time-sync.target" [color="green"]; "pacemaker.service"->"basic.target" [color="green"]; "iscsi.service"->"iscsid.service" [color="green"]; "iscsi.service"->"iscsiuio.service" [color="green"]; "iscsi.service"->"network.target" [color="green"]; "iscsi.service"->"system.slice" [color="green"]; > "iscsi.service"->"multipathd.service" [color="green"]; "iscsi.service"->"systemd-journald.socket" [color="green"]; "iscsi.service"->"systemd-remount-fs.service" [color="green"]; "lvm2-activation-early.service"->"multipathd.service" [color="green"]; "multipathd.service"->"syslog.target" [color="green"]; "multipathd.service"->"systemd-udev-trigger.service" [color="green"]; "multipathd.service"->"system.slice" [color="green"]; "multipathd.service"->"systemd-journald.socket" [color="green"]; "lvm2-activation-net.service"->"iscsi.service" [color="green"]; > "blk-availability.service"->"iscsi.service" [color="green"]; "shutdown.target"->"pacemaker.service" [color="green"]; "remote-fs-pre.target"->"iscsi.service" [color="green"]; "iscsid.service"->"multipathd.service" [color="green"]; } Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1844 |