Bug 217014 - Setting "user_friendly_names" to "yes" causes dm-mp to occasionally miss a mpath device during configuration
Summary: Setting "user_friendly_names" to "yes" causes dm-mp to occasionally miss a mp...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.4
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Ben Marzinski
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-11-23 07:18 UTC by NetApp filed bugzillas
Modified: 2018-11-28 20:29 UTC (History)
12 users (show)

Fixed In Version: RHEA-2007-0256
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-01 17:47:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2007:0256 0 normal SHIPPED_LIVE device-mapper-multipath enhancement update 2007-05-01 17:35:42 UTC

Description NetApp filed bugzillas 2006-11-23 07:18:33 UTC
Description of problem:
I have 10 luns, each with 4 FCP paths mapped to a RHEL4 U4 host thereby giving a
total of 40 paths on the host. Now when I configure dm-mp on these paths, it 
sometimes drops any 1 lun from this list i.e. only 9 mpath devices (or 36 paths)
are seen  in the "multipath -ll" command output. Its not always the same lun
that gets dropped by the dm-mp driver..it could be any one in random order. This
behaviour is not always reproducible, but occurs frequently.

On checking the "multipath -v3" debug output for the same, I noticed an
incorrect entry for one of the mpath devices configured by the dm-mp driver. In
this case, I had set the "user_friendly_names" option to "yes" in the
multipath.conf file. The output of the /var/lib/multipath/bindings file was as
follows:

***************
# Multipath bindings, Version : 1.0
# NOTE: this file is automatically maintained by the multipath program.
# You should not need to edit this file in normal circumstances.
#
# Format:
# alias wwid
#
mpath0 360a980004334646e2f6f384f50524c4b
mpath1 360a980004334646f524a384f50622d4d
mpath2 360a980004334646e2f6f384f50516846
mpath3 360a980004334646e2f6f384f50527744
mpath4 360a980004334646e2f6f384f50526463
mpath0 360a980004334646e2f6f384f50523139
mpath5 360a980004334646f524a384f50625555
mpath6 360a980004334646f524a384f50617270
mpath7 360a980004334646f524a384f50615a2d
mpath8 360a980004334646f524a384f50614947
****************

As seen above, mpath0 got overwritten which had caused the corresponding lun to
be dropped by the dm-mp driver. 

Expectedly, things worked fine when I reconfigured dm-mp after removing the
/var/lib/multipath/bindings file or removing the "user_friendly_names" option
from the multipath.conf file. So it does look this option is at fault which may
cause the dm-mp driver to miss a lun.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.5-16.1.RHEL4

How reproducible:
Not always. But occurs frequently.

Steps to Reproduce:
1.Map 10 luns to a RHEL4 U4 host each with 4 FCP paths giving a total of 40   
  paths on the host.
2.Configure dm-mp on these paths by setting the "user_friendly_names" option to 
  "yes" in the multipath.conf file.
3.
  
Actual results:
The dm-mp driver occasionally configures mpath devices for only 9 of these luns
(36 paths) instead of all 10.

Expected results:
Ideally, the dm-mp driver should have configured mpath devices for all the 10
luns (40 paths).

Additional info:
This behavior has been seen on RHEL4 U3 as well.

Comment 1 Ed Goggin 2006-11-27 16:29:55 UTC
The posix file byte range locks used to provide atomicity for accessing the 
entries in the multipath bindings file get released from whenever __any__ 
descriptor or FILE structure for that file is closed.  This patch delays the 
fclose() for the FILE structures used within lookup_binding() and 
rlookup_binding() until there is no more need for the atomicity.

Without this patch I could fairly easily get two multipath mpath0 entries 
in /var/lib/multipath/bindings by running 8 concurrent instances of multipath
(8) while with the patch I cannot get this problem to occur.

diff --git a/libmultipath/alias.c b/libmultipath/alias.c
index 6d103d7..86cae9b 100644
--- a/libmultipath/alias.c
+++ b/libmultipath/alias.c
@@ -166,28 +166,14 @@ fail:
 
 
 static int
-lookup_binding(int fd, char *map_wwid, char **map_alias)
+lookup_binding(FILE *f, char *map_wwid, char **map_alias)
 {
 	char buf[LINE_MAX];
-	FILE *f;
 	unsigned int line_nr = 0;
-	int scan_fd;
 	int id = 0;
 
 	*map_alias = NULL;
-	scan_fd = dup(fd);
-	if (scan_fd < 0) {
-		condlog(0, "Cannot dup bindings file descriptor : %s",
-			strerror(errno));
-		return -1;
-	}
-	f = fdopen(scan_fd, "r");
-	if (!f) {
-		condlog(0, "cannot fdopen on bindings file descriptor : %s",
-			strerror(errno));
-		close(scan_fd);
-		return -1;
-	}
+
 	while (fgets(buf, LINE_MAX, f)) {
 		char *c, *alias, *wwid;
 		int curr_id;
@@ -215,38 +201,22 @@ lookup_binding(int fd, char *map_wwid, c
 			if (*map_alias == NULL)
 				condlog(0, "Cannot copy alias from bindings "
 					"file : %s", strerror(errno));
-			fclose(f);
 			return id;
 		}
 	}
 	condlog(3, "No matching wwid [%s] in bindings file.", map_wwid);
-	fclose(f);
 	return id;
 }	
 
 static int
-rlookup_binding(int fd, char **map_wwid, char *map_alias)
+rlookup_binding(FILE *f, char **map_wwid, char *map_alias)
 {
 	char buf[LINE_MAX];
-	FILE *f;
 	unsigned int line_nr = 0;
-	int scan_fd;
 	int id = 0;
 
 	*map_wwid = NULL;
-	scan_fd = dup(fd);
-	if (scan_fd < 0) {
-		condlog(0, "Cannot dup bindings file descriptor : %s",
-			strerror(errno));
-		return -1;
-	}
-	f = fdopen(scan_fd, "r");
-	if (!f) {
-		condlog(0, "cannot fdopen on bindings file descriptor : %s",
-			strerror(errno));
-		close(scan_fd);
-		return -1;
-	}
+
 	while (fgets(buf, LINE_MAX, f)) {
 		char *c, *alias, *wwid;
 		int curr_id;
@@ -274,12 +244,10 @@ rlookup_binding(int fd, char **map_wwid,
 			if (*map_wwid == NULL)
 				condlog(0, "Cannot copy alias from bindings "
 					"file : %s", strerror(errno));
-			fclose(f);
 			return id;
 		}
 	}
 	condlog(3, "No matching alias [%s] in bindings file.", map_alias);
-	fclose(f);
 	return id;
 }	
 
@@ -327,7 +295,8 @@ char *
 get_user_friendly_alias(char *wwid, char *file)
 {
 	char *alias;
-	int fd, id;
+	int fd, scan_fd, id;
+	FILE *f;
 
 	if (!wwid || *wwid == '\0') {
 		condlog(3, "Cannot find binding for empty WWID");
@@ -337,14 +306,37 @@ get_user_friendly_alias(char *wwid, char
 	fd = open_bindings_file(file);
 	if (fd < 0)
 		return NULL;
-	id = lookup_binding(fd, wwid, &alias);
+
+	scan_fd = dup(fd);
+	if (scan_fd < 0) {
+		condlog(0, "Cannot dup bindings file descriptor : %s",
+			strerror(errno));
+		close(fd);
+		return NULL;
+	}
+
+	f = fdopen(scan_fd, "r");
+	if (!f) {
+		condlog(0, "cannot fdopen on bindings file descriptor : %s",
+			strerror(errno));
+		close(scan_fd);
+		close(fd);
+		return NULL;
+	}
+
+	id = lookup_binding(f, wwid, &alias);
 	if (id < 0) {
+		fclose(f);
+		close(scan_fd);
 		close(fd);
 		return NULL;
 	}
+
 	if (!alias)
 		alias = allocate_binding(fd, wwid, id);
 
+	fclose(f);
+	close(scan_fd);
 	close(fd);
 	return alias;
 }
@@ -353,7 +345,8 @@ char *
 get_user_friendly_wwid(char *alias, char *file)
 {
 	char *wwid;
-	int fd, id;
+	int fd, scan_fd, id;
+	FILE *f;
 
 	if (!alias || *alias == '\0') {
 		condlog(3, "Cannot find binding for empty alias");
@@ -363,12 +356,34 @@ get_user_friendly_wwid(char *alias, char
 	fd = open_bindings_file(file);
 	if (fd < 0)
 		return NULL;
-	id = rlookup_binding(fd, &wwid, alias);
+
+	scan_fd = dup(fd);
+	if (scan_fd < 0) {
+		condlog(0, "Cannot dup bindings file descriptor : %s",
+			strerror(errno));
+		close(fd);
+		return NULL;
+	}
+
+	f = fdopen(scan_fd, "r");
+	if (!f) {
+		condlog(0, "cannot fdopen on bindings file descriptor : %s",
+			strerror(errno));
+		close(scan_fd);
+		close(fd);
+		return NULL;
+	}
+
+	id = rlookup_binding(f, &wwid, alias);
 	if (id < 0) {
+		fclose(f);
+		close(scan_fd);
 		close(fd);
 		return NULL;
 	}
 
+	fclose(f);
+	close(scan_fd);
 	close(fd);
 	return wwid;
 }

Comment 2 Ben Marzinski 2006-12-01 20:13:55 UTC
This patch has been applied.

Comment 6 Rajashekhar M A 2007-03-07 07:17:08 UTC
Does RHEL5 has this fix?

Comment 7 Ben Marzinski 2007-03-07 17:16:53 UTC
Yes

Comment 11 Red Hat Bugzilla 2007-05-01 17:47:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2007-0256.html



Note You need to log in before you can comment on or make changes to this bug.