Description of problem: When a LUN is removed from Clariion stoage which was configured with multipath, lv commands will freeze for 10 minutes. Even restore back the LUN cannot make it stop freezing. Version-Release number of selected component (if applicable): RHEL5 How reproducible: Very Reproducible Steps to Reproduce: 1.Configure mulipath on Clariion storage device 2.Remove one LUN 3.Run lv commands like "lvs" or "lvdisplay" Actual results: LV commands will freeze for 10 minutes, Ctrl+C cannot stop the command. After 10 minutes, LV command will show the correct output. Expected results: Should not freeze for 10 minutes. Additional info: Test the same procedure on HP storage without any problem. We found this bug when trying to reproduce the bug 238421.
After 10 minutes when it resume and show the result, /var/log/messages also has the following log information: "Jun 4 14:31:47 xeon3 multipathd: mpath2: Disable queueing" Seems queueing is causing the display to wait for such a long time. So I changed multipath.conf: no_path_retry 300 ==> no_path_retry fail It have no effect on the waiting time, instead I have to change the source file /multipath-tools-0.4.7.rhel5.2/libmultipath/hwtable.c line 176 from 60 to NO_PATH_RETRY_UNDEF, to make it fail directly when there are no other paths to retry. It solves the problem temporary. It should be a bug that configuration file doesn't take effect on the no_path_retry entry, and it seems to me that multipath.conf entries are overwritten by "hwtable" in config.c(method load_config) for this case. I guess loading "hwtable" first and then load user configuration would solve this problem too, but not sure it will have other side-effects.
I changed multipath.conf from # no_path_retry 300 to no_path_retry fail and then do /etc/init.d/multipathd restart The change never affect the result of the following commands: [root@xeon3 ~]# dmsetup table mpath11 0 2179072 multipath 1 queue_if_no_path 1 emc 1 1 round-robin 0 1 1 71:704 1000 [root@xeon3 ~]# multipath -v3 |grep no_path_retry mpath77: no_path_retry = 60 (controller setting) mpath79: no_path_retry = 60 (controller setting) mpath524: no_path_retry = 60 (controller setting) .... mpath89: no_path_retry = 60 (controller setting) mpath529: no_path_retry = 60 (controller setting) mpath91: no_path_retry = 60 (controller setting) mpath530: no_path_retry = 60 (controller setting) By analysing the multipath-tools code in libmultipath/propsel.c select_no_path_retry(struct multipath *mp) { if (mp->mpe && mp->mpe->no_path_retry != NO_PATH_RETRY_UNDEF) { mp->no_path_retry = mp->mpe->no_path_retry; condlog(3, "%s: no_path_retry = %i (multipath setting)", mp->alias, mp->no_path_retry); return 0; } if (mp->hwe && mp->hwe->no_path_retry != NO_PATH_RETRY_UNDEF) { mp->no_path_retry = mp->hwe->no_path_retry; condlog(3, "%s: no_path_retry = %i (controller setting)", mp->alias, mp->no_path_retry); return 0; } if (conf->no_path_retry != NO_PATH_RETRY_UNDEF) { mp->no_path_retry = conf->no_path_retry; condlog(3, "%s: no_path_retry = %i (config file default)", mp->alias, mp->no_path_retry); return 0; } mp->no_path_retry = NO_PATH_RETRY_UNDEF; condlog(3, "%s: no_path_retry = NONE (internal default)", mp->alias); return 0; } we found that multipath will first select controller default setting from hwtable.c and then select configuration file's default value. Multipath is doing this way for all the configurable parameters like get_prio/get_uid/pgpolicy etc. So my questions is: Is this sequence of loading reasonable? And by the way, all controllers' default value for no_path_retry are set to NO_PATH_RETRY_UNDEF except COMPAQ/HP HSV and EMC clariion. So HP HSV might have the same problem as well,with HP HSV's no_path_retry default value being 60.
I have a few questions about this. Is there only one path to the device? The no_path_retry parameter should only effect operation when the last path is removed. If there are still active paths, and you are queuing IO until the no_path_retry limit is reached, then that is a problem all by itself. However, the configuration loading order in not a problem. First multipath tries to load from the multipath specific parameters, then the controller specific parameters, then the parameters specified in the defaults section of the config file. If none of these set a value for the parameter, then it uses a sensible compiled in default. hwtable.c sets some compiled in controller specific defaults. These are checked along with the user defined controller specific parameters. However, user supplied parameters are given priority. So, user supplied parameters in the devices section of the config file overwrite the compiled in ones for that controller. I am assuming that you set no_path_retry in the defaults section of the config file.
No reply for over 2 years, and no other reports like this. Closing.