Bug 446758 - multipath sometimes creates incorrect maps when features / no_path_retry is changed
Summary: multipath sometimes creates incorrect maps when features / no_path_retry is c...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.7
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Dave Wysochanski
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-15 21:48 UTC by Dave Wysochanski
Modified: 2010-01-12 02:32 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-16 20:17:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dave Wysochanski 2008-05-15 21:48:51 UTC
Description of problem:

A certain sequence of changing the features and no_path_retry settings while
multipathd is running may lead to incorrect multipath maps being created.  See
below.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.5-27.el4_6.3


How reproducible:
Every time with the below sequence.

Steps to Reproduce:
1. multipath -F; /etc/init.d/multipathd stop
2. Place the following lines in a devices section of /etc/multipath.conf:
               features                "0"
               no_path_retry           fail
3. /etc/init.d/multipathd start
4. multipath -v0
5. dmsetup table, and verify maps are created without queue_if_no_path
6. multipath -F
7. dmsetup table, verify no maps exist
8. Change features and no_path_retry to this
                features                "1 queue_if_no_path"
                no_path_retry           5
9. multipath -v0
10. dmsetup table, and note that maps are still created without queue_if_no_path
11. /etc/init.d/multipathd stop
12. multipath -F
13. multipath -v0
14. dmsetup table, and note that maps are now created with queue_if_no_path  

Actual results:
inconsistency in map creation depending on whether multipathd is running

Expected results:
In step #10, you should see maps created with queue_if_no_path setting.

Additional info:
I started debugging this and found that in dm_addmap(), 'params' did indeed
contain a string with queue_if_no_path in it.  However, for some reason the map
created did not contain queue_if_no_path.  At this point I am speculating
something in multipathd is holding something open, and perhaps the kernel state
is not fully clean after the paths are flushed?  Next steps are to study the
kernel and userspace code in more detail as it relates to no_path_retry and
queue_if_no_path.  I also need to update device-mapper-multipath to the latest
rhel4.7 build.

Comment 1 Kiyoshi Ueda 2008-05-16 03:16:17 UTC
That is expected behavior, not a bug.

multipathd is still having the "no_path_retry  fail" setting
at the step #9, so it overwrites the queue_if_no_path setting
which multipath command made using fail_if_no_path message ioctl.


Comment 2 Dave Wysochanski 2008-05-16 20:17:37 UTC
Thanks Kiyoshi.  I believed you when you said it the first time, I just wanted
to take a little time to better understand the internals.

I see this now in the multipathd code (was having debug issues for a while
there).  Multipathd gets an event saying a map has been created and calls
setup_multipath(), which then calls select_no_path_retry where it uses the value
it read from the config file at startup (based on the hwentry in
/etc/multipath.conf), then changes the map in set_no_path_retry() based on this
old value.

Although multipath is the one that creates the multipath kernel tables, I
understand why multipathd needs to change this value and hence the tables.  It
is the one that switches the map back and forth in the case of no_path_retry >
0.  One alternative would be to have him re-read the config file during an add
event but that has its own downsides so I can see why this is the current behavior.


(gdb) bt
#0  select_no_path_retry (mp=0x530390) at propsel.c:253
#1  0x0000000000403faf in set_no_path_retry (mpp=0x530390) at main.c:329
#2  0x0000000000404248 in setup_multipath (vecs=0x52df00, mpp=0x530390)
    at main.c:417
#3  0x0000000000404d16 in uev_add_map (devname=0x52e4e0 "dm-4", vecs=0x52df00)
    at main.c:736
#4  0x0000000000408d95 in cli_add_map (v=0x549c10, reply=0x4005a0f8, 
    len=0x4005a10c, data=0x52df00) at cli_handlers.c:67
#5  0x0000000000408b4a in parse_cmd (cmd=0x531780 "add map dm-4", 
    reply=0x4005a0f8, len=0x4005a10c, data=0x52df00) at cli.c:332
#6  0x0000000000405af7 in uxsock_trigger (str=0x531780 "add map dm-4", 
    reply=0x4005a0f8, len=0x4005a10c, trigger_data=0x52df00) at main.c:1051
#7  0x0000000000408015 in uxsock_listen (
    uxsock_trigger=0x405a8b <uxsock_trigger>, trigger_data=0x52df00)
    at uxlsnr.c:146
#8  0x0000000000405e3e in uxlsnrloop (ap=0x52df00) at main.c:1168
#9  0x000000330c106137 in start_thread () from /lib64/tls/libpthread.so.0
#10 0x000000330bac7113 in clone () from /lib64/tls/libc.so.6



Note You need to log in before you can comment on or make changes to this bug.