Some scenarios might wish to minimize the switch of priority groups as far as possible and not have multipath-tools switch priority groups automatically; this behaviour must be configurable. Also, switching back the priority group to the preferred one if paths in it become available again should be configurable. In both scenarios, it might be desireable to not only have the option to fully disable it, but also to have a timer before actually switching, and allow the situation to stabilize first.
What we have today as a switch pg policy is what could be described as "opportunistic", ie if another is seen as better switch to it even if the current one is in a working state. What you suggest to add would be a "last_resort" policy. I would need to add 2 new struct multipath fields : - int switchover_policy (opportunistic or last_resort) - int switchback_policy (true or false) These properties would be definable in the devices{} and multipaths{} config blobs. That definitely doable. Would it fit your needs ?
Yes. However I'd propose the following scheme: pg_select_policy (max_paths|highest_prio) max_paths would select the PG with the most paths (like right now); highest_prio would switch to the one with the highest priority and available paths. pg_switching_policy (<timer>|controlled) <timer> would switch automatically when the pg_select_policy suggests a new PG (in response to some event) after <timer> seconds; "0" would switch immediately. This is to give the system some time to stabilize - ie, most of the time you'd not want to switch _immediately_, but give it 2-3 path checker iterations to make sure the path is there to stay. "controlled" would only switch to the selected PG (or in fact to any other PG) when the admin tells us to (or an error forces us). (Which relates to bug #155546 ;-) (automated and controlled failback/switch-over are well established terms in the HA clustering world for resource migration behaviour, so it makes sense to reuse the concepts here.) Does that make sense?
Is this bug still relevant?
Ben, what do you think?
I relation to comment #2, group_by_prio now simply uses the paths with the highest priority, instead of multiplying the priority by the number of paths. It turns out that for all of our priority functions, it's pretty clear that you want to use the high priority paths, even if there are more low priority paths. This means the first item is no longer an issue. Userspace already has the necessary code for the second issue. However the kernel code doesn't exist yet. I sent off a patch to add the pg_timeout code to dm-mpath.c, but it never made it in, and nobody has requested the issue again in the intervening years. I could definitely dig up that kernel patch, update it, and send it off to dm-devel, but I get the feeling that this isn't going to be a big feature.