Description of problem: I have 10 luns, each with 4 FCP paths mapped to a RHEL4 U4 host thereby giving a total of 40 paths on the host. Now when I configure dm-mp on these paths, it sometimes drops any 1 lun from this list i.e. only 9 mpath devices (or 36 paths) are seen in the "multipath -ll" command output. Its not always the same lun that gets dropped by the dm-mp driver..it could be any one in random order. This behaviour is not always reproducible, but occurs frequently. On checking the "multipath -v3" debug output for the same, I noticed an incorrect entry for one of the mpath devices configured by the dm-mp driver. In this case, I had set the "user_friendly_names" option to "yes" in the multipath.conf file. The output of the /var/lib/multipath/bindings file was as follows: *************** # Multipath bindings, Version : 1.0 # NOTE: this file is automatically maintained by the multipath program. # You should not need to edit this file in normal circumstances. # # Format: # alias wwid # mpath0 360a980004334646e2f6f384f50524c4b mpath1 360a980004334646f524a384f50622d4d mpath2 360a980004334646e2f6f384f50516846 mpath3 360a980004334646e2f6f384f50527744 mpath4 360a980004334646e2f6f384f50526463 mpath0 360a980004334646e2f6f384f50523139 mpath5 360a980004334646f524a384f50625555 mpath6 360a980004334646f524a384f50617270 mpath7 360a980004334646f524a384f50615a2d mpath8 360a980004334646f524a384f50614947 **************** As seen above, mpath0 got overwritten which had caused the corresponding lun to be dropped by the dm-mp driver. Expectedly, things worked fine when I reconfigured dm-mp after removing the /var/lib/multipath/bindings file or removing the "user_friendly_names" option from the multipath.conf file. So it does look this option is at fault which may cause the dm-mp driver to miss a lun. Version-Release number of selected component (if applicable): device-mapper-multipath-0.4.5-16.1.RHEL4 How reproducible: Not always. But occurs frequently. Steps to Reproduce: 1.Map 10 luns to a RHEL4 U4 host each with 4 FCP paths giving a total of 40 paths on the host. 2.Configure dm-mp on these paths by setting the "user_friendly_names" option to "yes" in the multipath.conf file. 3. Actual results: The dm-mp driver occasionally configures mpath devices for only 9 of these luns (36 paths) instead of all 10. Expected results: Ideally, the dm-mp driver should have configured mpath devices for all the 10 luns (40 paths). Additional info: This behavior has been seen on RHEL4 U3 as well.
The posix file byte range locks used to provide atomicity for accessing the entries in the multipath bindings file get released from whenever __any__ descriptor or FILE structure for that file is closed. This patch delays the fclose() for the FILE structures used within lookup_binding() and rlookup_binding() until there is no more need for the atomicity. Without this patch I could fairly easily get two multipath mpath0 entries in /var/lib/multipath/bindings by running 8 concurrent instances of multipath (8) while with the patch I cannot get this problem to occur. diff --git a/libmultipath/alias.c b/libmultipath/alias.c index 6d103d7..86cae9b 100644 --- a/libmultipath/alias.c +++ b/libmultipath/alias.c @@ -166,28 +166,14 @@ fail: static int -lookup_binding(int fd, char *map_wwid, char **map_alias) +lookup_binding(FILE *f, char *map_wwid, char **map_alias) { char buf[LINE_MAX]; - FILE *f; unsigned int line_nr = 0; - int scan_fd; int id = 0; *map_alias = NULL; - scan_fd = dup(fd); - if (scan_fd < 0) { - condlog(0, "Cannot dup bindings file descriptor : %s", - strerror(errno)); - return -1; - } - f = fdopen(scan_fd, "r"); - if (!f) { - condlog(0, "cannot fdopen on bindings file descriptor : %s", - strerror(errno)); - close(scan_fd); - return -1; - } + while (fgets(buf, LINE_MAX, f)) { char *c, *alias, *wwid; int curr_id; @@ -215,38 +201,22 @@ lookup_binding(int fd, char *map_wwid, c if (*map_alias == NULL) condlog(0, "Cannot copy alias from bindings " "file : %s", strerror(errno)); - fclose(f); return id; } } condlog(3, "No matching wwid [%s] in bindings file.", map_wwid); - fclose(f); return id; } static int -rlookup_binding(int fd, char **map_wwid, char *map_alias) +rlookup_binding(FILE *f, char **map_wwid, char *map_alias) { char buf[LINE_MAX]; - FILE *f; unsigned int line_nr = 0; - int scan_fd; int id = 0; *map_wwid = NULL; - scan_fd = dup(fd); - if (scan_fd < 0) { - condlog(0, "Cannot dup bindings file descriptor : %s", - strerror(errno)); - return -1; - } - f = fdopen(scan_fd, "r"); - if (!f) { - condlog(0, "cannot fdopen on bindings file descriptor : %s", - strerror(errno)); - close(scan_fd); - return -1; - } + while (fgets(buf, LINE_MAX, f)) { char *c, *alias, *wwid; int curr_id; @@ -274,12 +244,10 @@ rlookup_binding(int fd, char **map_wwid, if (*map_wwid == NULL) condlog(0, "Cannot copy alias from bindings " "file : %s", strerror(errno)); - fclose(f); return id; } } condlog(3, "No matching alias [%s] in bindings file.", map_alias); - fclose(f); return id; } @@ -327,7 +295,8 @@ char * get_user_friendly_alias(char *wwid, char *file) { char *alias; - int fd, id; + int fd, scan_fd, id; + FILE *f; if (!wwid || *wwid == '\0') { condlog(3, "Cannot find binding for empty WWID"); @@ -337,14 +306,37 @@ get_user_friendly_alias(char *wwid, char fd = open_bindings_file(file); if (fd < 0) return NULL; - id = lookup_binding(fd, wwid, &alias); + + scan_fd = dup(fd); + if (scan_fd < 0) { + condlog(0, "Cannot dup bindings file descriptor : %s", + strerror(errno)); + close(fd); + return NULL; + } + + f = fdopen(scan_fd, "r"); + if (!f) { + condlog(0, "cannot fdopen on bindings file descriptor : %s", + strerror(errno)); + close(scan_fd); + close(fd); + return NULL; + } + + id = lookup_binding(f, wwid, &alias); if (id < 0) { + fclose(f); + close(scan_fd); close(fd); return NULL; } + if (!alias) alias = allocate_binding(fd, wwid, id); + fclose(f); + close(scan_fd); close(fd); return alias; } @@ -353,7 +345,8 @@ char * get_user_friendly_wwid(char *alias, char *file) { char *wwid; - int fd, id; + int fd, scan_fd, id; + FILE *f; if (!alias || *alias == '\0') { condlog(3, "Cannot find binding for empty alias"); @@ -363,12 +356,34 @@ get_user_friendly_wwid(char *alias, char fd = open_bindings_file(file); if (fd < 0) return NULL; - id = rlookup_binding(fd, &wwid, alias); + + scan_fd = dup(fd); + if (scan_fd < 0) { + condlog(0, "Cannot dup bindings file descriptor : %s", + strerror(errno)); + close(fd); + return NULL; + } + + f = fdopen(scan_fd, "r"); + if (!f) { + condlog(0, "cannot fdopen on bindings file descriptor : %s", + strerror(errno)); + close(scan_fd); + close(fd); + return NULL; + } + + id = rlookup_binding(f, &wwid, alias); if (id < 0) { + fclose(f); + close(scan_fd); close(fd); return NULL; } + fclose(f); + close(scan_fd); close(fd); return wwid; }
This patch has been applied.
Does RHEL5 has this fix?
Yes
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2007-0256.html