Description of problem: tcamuso noticed this while debugging another issue. /proc/interrupts shows 74: 30 0 0 0 6880 0 30 0 PCI-MSI-X eth7 82: 163 0 0 0 0 0 53102 304 PCI-MSI-X eth7 lspci -vv shows Interrupt: pin A routed to IRQ 169 Version-Release number of selected component (if applicable): 2.6.18-150.el5 How reproducible: 100% Steps to Reproduce: 1. cat /proc/interrupts 2. lspci -vv 3. compare output Actual results: lspci displays legacy IRQ value. /proc/interrupts displays actual IRQ value. Expected results: IRQ values should be the same Additional info: Good test system is hp-bl495cg5-02.rhts. Also resolving this will require TWO bugzillas, one for the kernel and one for lspci/pciutils.
For the purposes of this BZ please note that MSI = MSI || MSI-X. P.
jgarzik, gospo and I were discussing this while looking at output from a Fedora system. I think that this is something that should be fixed -- lspci should display the *correct* IRQ information. I do, however, have a question. If a device has MSI enabled is it necessary for lspci to report the legacy IRQ value for that device? I can't think of any reason knowing the legacy value of the interrupt would matter when MSI was enabled. I suppose if you did a -bv it might be important. So what we were thinking about was adding and exporting a new data field in the pci_dev struct (msi_interrupts?) which would be non-zero if MSI was enabled. The new field could be a list or it could simply be a bitmap. I haven't thought about the details yet ;) lspci could read the new field via sysfs. If it were zero, lspci would report the legacy interrupt value. If it were non-zero, lspci would report all values in the list as MSI interrupts. Thoughts? P.
From private email between myself and tcamuso: I was thinking of doing something like (pseudo-code warning): (BTW MSI = msi || msi-x enabled ;) ) if (msi) output msi IRQ values else output legacy values So, in your case if MSI was enabled lspci output would contain lines like Interrupt: MSI routed to IRQ 74 Interrupt: MSI routed to IRQ 82 If it wasn't enabled (ie, booted with pci=nomsi) Interrupt: pin D routed to IRQ 185 Which, IMO, isn't too bad. I'm, of course, open to other suggestions. I could certainly do Interrupt: MSI routed to IRQs 74, 82 to make it a bit more readable, but I prefer putting MSI in front of the output to avoid breaking userspace tools that might grep for the Interrupt message. P. (By no means am I saying the above is the correct thing to do -- just copying it here so I don't have to dig it up later)
From mail I just sent to Prarit: :) Prarit Bhargava wrote: > > (BTW MSI = msi || msi-x enabled ;) ) > > if (msi) > output msi IRQ values > else > output legacy values > > So, in your case if MSI was enabled lspci output would contain lines like > > Interrupt: MSI routed to IRQ 74 > Interrupt: MSI routed to IRQ 82 > Yes, I like this. You might want to consider mapping the Vector Table Entry to the IRQ. This could be important for somebody troubleshooting the driver. Something like: Interrupt: MSI Vector Table[0] routed to IRQ 74 Interrupt: MSI Vector Table[1] routed to IRQ 82 IIRC, the kernel does not allow for one vector to be assigned to more than one Vector Table Entry. Either all entries get a unique vector, or they all get the same one. In the case of legacy MSI, which does not have a Vector Table, there will be only one IRQ assigned, anyway, so you can display Interrupt: MSI roued to IRQ 74 That will add a little more conditional code, so I don't know if you think it would be worth it. > If it wasn't enabled (ie, booted with pci=nomsi) > > Interrupt: pin D routed to IRQ 185 This would be fine, of course. > > Which, IMO, isn't too bad. I'm, of course, open to other suggestions. > > I could certainly do > > Interrupt: MSI routed to IRQs 74, 82 > > to make it a bit more readable, but I prefer putting MSI in front of the output to avoid breaking userspace tools that might grep for the Interrupt message. > I think your first suggestion is more readable, since you don't have to go digging through a list. And your concern about grepping for the MSIs is a very valid one.
(In reply to comment #4) > > Interrupt: MSI Vector Table[0] routed to IRQ 74 > Interrupt: MSI Vector Table[1] routed to IRQ 82 > > IIRC, the kernel does not allow for one vector to be assigned to more than > one Vector Table Entry. Either all entries get a unique vector, or they all > get the same one. > There is a chance that multiple msi-x vectors could be assigned to single IRQs. We had this exact problem with ixgbe hardware on some systems and code had to be added to the msi-x handler to make sure it checked more than just the first vector it found as popped when an interrupt occurred. It would be fine with me if the output was just as you had it above and in the case of overlap it would be like this: Interrupt: MSI Vector Table[0] routed to IRQ 74 Interrupt: MSI Vector Table[1] routed to IRQ 82 Interrupt: MSI Vector Table[2] routed to IRQ 74 Interrupt: MSI Vector Table[3] routed to IRQ 84
Of course, there is nothing wrong with one vector being assigned to more than one VTE, I just thought I read somewhere that this wasn't supported. It is now. :) In any event, engineering triumphs and Andy's modification works just fine for me.
I would like to comment on a comment I made earlier.The discussion is somewhat arcane, but bear with me. :) > In the case of legacy MSI, which does not have a Vector Table, there will be > only one IRQ assigned, anyway, so you can display > > Interrupt: MSI roued to IRQ 74 Currently, the kernel does not assign more than one vector to a device that is capable only of legacy MSI. This is because legacy MSI interrupt vectors... * Must be contiguous numbers * Must be naturally aligned, such that Vectors Base Vector Number bit mask ------- --------------------------- 2 1 4 3 8 7 16 F 32 1F ... max number of MSI vectors is only 32, while max number of MSI-X vectors is 2048 ... This makes the interrupt event number map exactly to the corresponding low bits of the vector. I surmise the Linux community that was involved with this at the time did not think it was worth the extra work at the time. MSI-X came very soon after MSI and is the far better choice for an MSI implementation, so it was unlikely that anybody would be using legacy MSI with multiple interrupts. However, engineers don't often respond well to "no, you can't", so I would not be terribly surprised if an enterprising fellow went ahead and implemented multiple vectors for legacy MSI. All that being said, perhaps MSI as well as MSI-X should be checked for multiple vectors. Or maybe I just need to take more of my cold medicine.
See upstream discussion here: http://marc.info/?l=linux-pci&m=122516331313328&w=2 I'm discussing with Matthew Wilcox to see where he was on his open/release idea for sysfs. Due to the complexity of the code, I don't think I'll be able to easily backport this into RHEL5. I may do something out-of-stream for RHEL5. P.
Adding harald and mhlavink (pciutils maintainers). P.
(In reply to comment #8) > I may do something out-of-stream for RHEL5. > s/may/will/g FYI. P.
Created attachment 345777 [details] RHEL5 initial patch for this issue Comments on the kernel-side patch? P.
/proc/interrupts and lspci may show two different values because they are displaying different things. SOMETIMES they are the same, but not always. This is because the numbers obtained from the system irq vector should be considered a "cookie" value. In the case of legacy ISA interrupts, the cookie is the same as the vector. In the case of MSI/MSI-X, the assignment is a bit more arbitrary. As such, the irq vector slot is not the same thing as the value being programmed into a PCI device's register. Furthermore, the PCI device register may not reflect updates from quirks and fixups, so pdev->irq may differ from the PCI device register value in uncommon cases. So: 1) /proc/interrupts displays a platform-specific, CPU-specific enumeration of the system's irq vectors. 2) lspci displays PCI device register values. While sometimes the values will be the same, there is no guarantee.
> While sometimes the values will be the same, there is no guarantee. It would be nice in the case of MSI/MSI-X, however, if lspci could show, not just the contents of the device irq field in the config header, which is meaningless in the MSI/MSI-X context, but also the vector(s) assigned to MSI/MSI-X. Prarit's proposed patch fixes it for /proc/interrupts. Why not have a complementary fix in lspci -vv?
(In reply to comment #12) > /proc/interrupts and lspci may show two different values because they are > displaying different things. SOMETIMES they are the same, but not always. > > This is because the numbers obtained from the system irq vector should be > considered a "cookie" value. In the case of legacy ISA interrupts, the cookie > is the same as the vector. In the case of MSI/MSI-X, the assignment is a bit > more arbitrary. > > As such, the irq vector slot is not the same thing as the value being > programmed into a PCI device's register. Right, we understand all of that. > > Furthermore, the PCI device register may not reflect updates from quirks and > fixups, so pdev->irq may differ from the PCI device register value in uncommon > cases. > > So: > 1) /proc/interrupts displays a platform-specific, CPU-specific enumeration of > the system's irq vectors. > 2) lspci displays PCI device register values. > > While sometimes the values will be the same, there is no guarantee. Yes, understood. However, what I'm adding is not going to make dev->irq, or the interrupt reporting in lspci match that of MSI/-X. I'm adding the ability to lspci to report the MSI/-X vectors. P.
Ok :) I wasn't sure which part of the bug I was supposed to comment on :) The general comment about /proc/interrupts is that is gets parsed, and so we need to try to avoid breaking userland programs that depend on its format. The patch in comment #11 seems to be a sysfs addition rather than a /proc/interrupts update, so that seems fine from that perspective. With that out of the way, the only remaining concerns are ABI supportability moving forward. This is visible to userspace... will it be a RHEL-only extension? RHEL5-only? It sounds like upstream is unknown at the moment, but there is a /possibility/ of divergence. WRT lspci, I don't think there is any objection to getting lspci to output additional information.
(In reply to comment #15) > Ok :) I wasn't sure which part of the bug I was supposed to comment on :) > > The general comment about /proc/interrupts is that is gets parsed, and so we > need to try to avoid breaking userland programs that depend on its format. The > patch in comment #11 seems to be a sysfs addition rather than a > /proc/interrupts update, so that seems fine from that perspective. > > With that out of the way, the only remaining concerns are ABI supportability > moving forward. This is visible to userspace... will it be a RHEL-only > extension? RHEL5-only? It sounds like upstream is unknown at the moment, but > there is a /possibility/ of divergence. > Yeah -- I'm worried about that too. Especially with RHEL6 around the corner. The situation upstream is that the person who originally suggested the patch has moved on from the company he did it for. Add to that Matthew Wilcox's suggestion of expanding sysfs, and upstream looks like it will end up being a *huge* piece of code to backport to RHEL5. I will at least try to implement what was last suggested upstream in RHEL5. > WRT lspci, I don't think there is any objection to getting lspci to output > additional information. P.
Created attachment 346552 [details] RHEL5 patch for this issue Closely matches upstream -- waiting for upstream's decision on implementation.