Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 302495 Details for
Bug 442583
Incorrect priority values displayed with "multipath -ll"
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
Bug triage for 'prio=2' display
prio-bug-triage.txt (text/plain), 8.32 KB, created by
Dave Wysochanski
on 2008-04-15 17:23:00 UTC
(
hide
)
Description:
Bug triage for 'prio=2' display
Filename:
MIME Type:
Creator:
Dave Wysochanski
Created:
2008-04-15 17:23:00 UTC
Size:
8.32 KB
patch
obsolete
>- trying to get this working on RHEL4.7 (gfs1) machine; now running the latest >kernel build with the hardware handler in it; seems to group paths correctly >but priority is not displayed correctly; I did not notice this before but as >I look at the code it does not seem to be calling the hp priority callout, >despite the fact I have set the configuration in /etc/multipath.conf correctly >(or so I think). > >Ok, the only place we call pathinfo with a DI_PRIO flag is from update_paths(). >I traced this back to get_dm_mpvec(), and saw this: >update_paths (mpp=0x8a19540) at main.c:909 >909 if (!mpp->pg) >(gdb) p mpp->pg >$17 = 0x8a19398 >(gdb) list >904 { >905 int i, j; >906 struct pathgroup * pgp; >907 struct path * pp; >908 >909 if (!mpp->pg) >910 return 0; >911 >912 vector_foreach_slot (mpp->pg, pgp, i) { >913 if (!pgp->paths) >(gdb) n >912 vector_foreach_slot (mpp->pg, pgp, i) { >(gdb) n >913 if (!pgp->paths) >(gdb) n >916 vector_foreach_slot (pgp->paths, pp, j) { >(gdb) n >917 if (!strlen(pp->dev)) { >(gdb) p *pp >$18 = {dev = "sdaa", '\0' <repeats 251 times>, dev_t = "65:160", '\0' <repeats 26 times>, scsi_id = {dev_id = 0, > host_unique_id = 0, host_no = 0}, sg_id = {host_no = 1, channel = 0, scsi_id = 0, lun = 2, h_cmd_per_lun = 0, > d_queue_depth = 0, unused1 = 0, unused2 = 0}, wwid = "3600805f30005e240e5d0467b9ede006d", '\0' <repeats 94 times>, > vendor_id = "COMPAQ ", product_id = "MSA1000 VOLUME ", rev = "4.48", serial = '\0' <repeats 63 times>, > tgt_node_name = "0x500805f30005e240", size = 67103505, checkint = 20, tick = 19, bus = 1, state = -1, dmstate = 2, > failcount = 0, priority = 1, pgindex = 1, getuid = 0x0, getprio = 0x0, checkfn = 0x8060080 <hp_sw>, checker_context = 0x0, > mpp = 0x0, fd = -1, hwe = 0x8a10b50} >(gdb) p pp->state >$19 = -1 >(gdb) list >912 vector_foreach_slot (mpp->pg, pgp, i) { >913 if (!pgp->paths) >914 continue; >915 >916 vector_foreach_slot (pgp->paths, pp, j) { >917 if (!strlen(pp->dev)) { >918 if (devt2devname(pp->dev, pp->dev_t)) { >919 /* >920 * path is not in sysfs anymore >921 */ >(gdb) list >922 pp->state = PATH_DOWN; >923 continue; >924 } >925 pathinfo(pp, conf->hwtable, DI_ALL); >926 continue; >927 } >928 if (pp->state == PATH_UNCHECKED) >929 pathinfo(pp, conf->hwtable, DI_CHECKER); >930 >931 if (!pp->priority) >(gdb) frame 0 >#0 update_paths (mpp=0x8a19540) at main.c:917 >917 if (!strlen(pp->dev)) { >(gdb) n >928 if (pp->state == PATH_UNCHECKED) >(gdb) p pp->state >$20 = -1 > > >So the problem is pp->state is not ever set to anything, but (initialized?) to >-1. As a result, we won't call the path priority callout. But another thing >I noticed is that the 'getprio' field of 'pp' is set to 0. Not sure if this >is correct or not. In any case, we are definately not going to call pathinfo() >from that point, so we need to find out what sets pp->state properly... >(NOTE: looking more carefully, it looks like it is fine that pp->getprio is >NULL, since in select_getprio() we look at the hwe) > >So who sets pp->state? >1. update_paths() sets it to PATH_DOWN >2. revoke_cache_info() sets it to PATH_UNCHECKED >3. pathinfo() sets it to the value returned from pp->checkfn() >** This one seems to be key >** It appears the checker is returning -1, which I'm not sure why. Seems >a bit hard to debug - maybe libcheckers are not compiled with debug symbols? >4. pathinfo() may set it to PATH_DOWN if there is a recoverable error > >Indeed, this is the case, just need to add debug symbols to libcheckers >compilation... > >Ok, peeling the onion, I'm getting "no usable fd" because apparently pp->fd >is -1, so the checker cannot send a cmd to the device. >pathinfo() is where we open the device apparently: > /* > * fetch info not available through sysfs > */ > if (pp->fd < 0) > pp->fd = opennode(pp->dev, O_RDONLY); > > >Seems like pathinfo() does succeed in opennnode(), because I always get >pp->fd == 8 here. But then at some point it changes to -1, and strangely >I am not hitting "close()": > >(gdb) n >770 pp->fd = opennode(pp->dev, O_RDONLY); >(gdb) n >772 if (pp->fd < 0) >(gdb) n >775 if (pp->bus == SYSFS_BUS_SCSI && >(gdb) p *pp >$9 = {dev = "sdae", '\0' <repeats 251 times>, dev_t = "65:224", '\0' <repeats 26 times>, scsi_id = {dev_id = 0, > host_unique_id = 0, host_no = 0}, sg_id = {host_no = 1, channel = 0, scsi_id = 0, lun = 24, h_cmd_per_lun = 0, > d_queue_depth = 0, unused1 = 0, unused2 = 0}, wwid = "3600805f30005e240b0acd0c0e9f000d3", '\0' <repeats 94 times>, > vendor_id = "COMPAQ ", product_id = "MSA1000 VOLUME ", rev = "4.48", serial = '\0' <repeats 63 times>, > tgt_node_name = "0x500805f30005e240", size = 4192965, checkint = 20, tick = 18, bus = 1, state = 0, dmstate = 2, > failcount = 0, priority = 1, pgindex = 1, getuid = 0x0, getprio = 0x0, checkfn = 0, checker_context = 0x0, mpp = 0x0, > fd = 8, hwe = 0x9191868} >(gdb) b close >Breakpoint 3 at 0x5838c0 >(gdb) c >Continuing. > >Breakpoint 1, hp_sw (fd=-1, msg=0x0, context=0x0) at hp_sw.c:121 >121 struct sw_checker_context * ctxt = NULL; >(gdb) frame 1 >#1 0x0805365e in pathinfo (pp=0x919c090, hwtable=0x9191780, mask=5) at discovery.c:786 >786 pp->state = pp->checkfn(pp->fd, NULL, NULL); >(gdb) p *pp >$10 = {dev = "sdae", '\0' <repeats 251 times>, dev_t = "65:224", '\0' <repeats 26 times>, scsi_id = {dev_id = 0, > host_unique_id = 0, host_no = 0}, sg_id = {host_no = 1, channel = 0, scsi_id = 0, lun = 24, h_cmd_per_lun = 0, > d_queue_depth = 0, unused1 = 0, unused2 = 0}, wwid = "3600805f30005e240b0acd0c0e9f000d3", '\0' <repeats 94 times>, > vendor_id = "COMPAQ ", product_id = "MSA1000 VOLUME ", rev = "4.48", serial = '\0' <repeats 63 times>, > tgt_node_name = "0x500805f30005e240", size = 4192965, checkint = 20, tick = 18, bus = 1, state = 0, dmstate = 2, > failcount = 0, priority = 1, pgindex = 1, getuid = 0x0, getprio = 0x0, checkfn = 0x8061dfc <hp_sw>, checker_context = 0x0, > mpp = 0x0, fd = -1, hwe = 0x9191868} >(gdb) p pp->hwe >$11 = (struct hwentry *) 0x9191868 >(gdb) p *$11 >$12 = {selector_args = 0, pgpolicy = 4, checker_index = 5, pgfailback = -1, rr_min_io = 0, rr_weight = 0, > no_path_retry = 60, pg_timeout = 0, vendor = 0x9192858 "COMPAQ ", product = 0x91928a0 "MSA1000 VOLUME ", > selector = 0x9192910 "round-robin 0", getuid = 0x0, getprio = 0x91929b8 "/sbin/mpath_prio_hp_sw /dev/%n", > features = 0x9192a40 "2 pg_init_retries 7", hwhandler = 0x9192950 "1 hp-sw", blist = 0x0} >(gdb) info break >Num Type Disp Enb Address What >1 breakpoint keep y 0x08061e05 in hp_sw at hp_sw.c:121 > breakpoint already hit 5 times >2 breakpoint keep y 0x080535c0 in pathinfo at discovery.c:769 > breakpoint already hit 4 times >3 breakpoint keep y 0x005838c0 <close> > > >So I play around a bit and search through the code, and find out we are >calling scsi_ioctl_pathinfo from pathinfo() function and before we call the >path checker. > >As an experiment, I deleted these lines in scsi_ioctl_pathinfo: >#ifndef DAEMON > close(pp->fd); > pp->fd = -1; >#endif > >Now I'm getting the checker returning the correct value. But I'm still not >seeing the priority values updated. Tracked this to the fact that >pp->priority is set to '1' for some reason, which skips over the call >to pathinfo(.., DI_PRIO) in update_paths() > > >At this point, I'm inclined to backup a bit, perhaps to RHEL4.6 released >version to see if it's any better. Might be good to see what changed here. >Alas, CVS is hanging... > >Indeed, if I back up to RHEL4U5 released version, the prio values is displayed >correctly. Looks like RHEL4U6 is where it starts to be incorrect. I think >internally it is setting the groups properly, but just displaying them wrong?
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 442583
: 302495