Description of problem: A certain sequence of changing the features and no_path_retry settings while multipathd is running may lead to incorrect multipath maps being created. See below. Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.5-27.el4_6.3 How reproducible: Every time with the below sequence. Steps to Reproduce: 1. multipath -F; /etc/init.d/multipathd stop 2. Place the following lines in a devices section of /etc/multipath.conf: features "0" no_path_retry fail 3. /etc/init.d/multipathd start 4. multipath -v0 5. dmsetup table, and verify maps are created without queue_if_no_path 6. multipath -F 7. dmsetup table, verify no maps exist 8. Change features and no_path_retry to this features "1 queue_if_no_path" no_path_retry 5 9. multipath -v0 10. dmsetup table, and note that maps are still created without queue_if_no_path 11. /etc/init.d/multipathd stop 12. multipath -F 13. multipath -v0 14. dmsetup table, and note that maps are now created with queue_if_no_path Actual results: inconsistency in map creation depending on whether multipathd is running Expected results: In step #10, you should see maps created with queue_if_no_path setting. Additional info: I started debugging this and found that in dm_addmap(), 'params' did indeed contain a string with queue_if_no_path in it. However, for some reason the map created did not contain queue_if_no_path. At this point I am speculating something in multipathd is holding something open, and perhaps the kernel state is not fully clean after the paths are flushed? Next steps are to study the kernel and userspace code in more detail as it relates to no_path_retry and queue_if_no_path. I also need to update device-mapper-multipath to the latest rhel4.7 build.
That is expected behavior, not a bug. multipathd is still having the "no_path_retry fail" setting at the step #9, so it overwrites the queue_if_no_path setting which multipath command made using fail_if_no_path message ioctl.
Thanks Kiyoshi. I believed you when you said it the first time, I just wanted to take a little time to better understand the internals. I see this now in the multipathd code (was having debug issues for a while there). Multipathd gets an event saying a map has been created and calls setup_multipath(), which then calls select_no_path_retry where it uses the value it read from the config file at startup (based on the hwentry in /etc/multipath.conf), then changes the map in set_no_path_retry() based on this old value. Although multipath is the one that creates the multipath kernel tables, I understand why multipathd needs to change this value and hence the tables. It is the one that switches the map back and forth in the case of no_path_retry > 0. One alternative would be to have him re-read the config file during an add event but that has its own downsides so I can see why this is the current behavior. (gdb) bt #0 select_no_path_retry (mp=0x530390) at propsel.c:253 #1 0x0000000000403faf in set_no_path_retry (mpp=0x530390) at main.c:329 #2 0x0000000000404248 in setup_multipath (vecs=0x52df00, mpp=0x530390) at main.c:417 #3 0x0000000000404d16 in uev_add_map (devname=0x52e4e0 "dm-4", vecs=0x52df00) at main.c:736 #4 0x0000000000408d95 in cli_add_map (v=0x549c10, reply=0x4005a0f8, len=0x4005a10c, data=0x52df00) at cli_handlers.c:67 #5 0x0000000000408b4a in parse_cmd (cmd=0x531780 "add map dm-4", reply=0x4005a0f8, len=0x4005a10c, data=0x52df00) at cli.c:332 #6 0x0000000000405af7 in uxsock_trigger (str=0x531780 "add map dm-4", reply=0x4005a0f8, len=0x4005a10c, trigger_data=0x52df00) at main.c:1051 #7 0x0000000000408015 in uxsock_listen ( uxsock_trigger=0x405a8b <uxsock_trigger>, trigger_data=0x52df00) at uxlsnr.c:146 #8 0x0000000000405e3e in uxlsnrloop (ap=0x52df00) at main.c:1168 #9 0x000000330c106137 in start_thread () from /lib64/tls/libpthread.so.0 #10 0x000000330bac7113 in clone () from /lib64/tls/libc.so.6