Description of problem: When running a Virtual Machine on the top of a Host which has either a bonding of multiple 1Gb NICs or 10Gb NIC and the VM has around 1Gbps of traffic it shows it as consuming 100% of Network which is not correct in these scenarios. In previous emails on ovirt-users list it was mentioned this value was entered as a reference in the past and doesn't affect the VM throughput, however it would be good if this could be fixed as it may confuse other people using the Dashboard that may not be aware of this. In one of the cases I have there is a VM which has 2 x VirtIO NICs and is able to output 2Gbps of traffic effectively, but when doing around 1Gbps it shows 100% in the Dashboard. A suggestion to keep it simple would be add option where you can tell that VM specifically what is the maximum Network Throughput it can reach so it can calculate its maximum to base the precentage shown on the Dashboard. In this case the default value could be gathered from the Host physical NICs or Bonding. For example: If the Host NIC where the VM is connected is a 10Gb than it would be the default value to calculate the percentage. Not sure how practical, but also if you have a bonding (e.g: 4 x 1Gb) where the VM connects to it could also be the base value for the percentage calculation. This extra option to input the maximum throughput that VM can reach can be also useful for situations where you have a QoS rule telling that VM can't go above a certain traffic threshold showing it accurately in the Dashboard for all different scenarios. Version-Release number of selected component (if applicable): Engine version 4.1.1.4-1 How reproducible: Always Steps to Reproduce: 1) Make sure the Host can run a higher throughput then 1 Gbps (using either a 10Gb NIC or a bonding of multiple 1Gb NICs) 2) Create a VM with 1 or multiple NICs 3) Run iperf to that VM where the aggregate traffic is above 1Gbps Actual results: Dashboard shows 100% for 1Gbps of traffic Expected results: Dashboard show the correct percentage based on a default value or a user input value for the maximum amount of traffic that Virtual Machines can do. Additional info:
The steps to reproduce are not clear. Please provide screenshots to the exact location of where you see the 100% cpu consumption.
Sorry, here please provide network screenshots
Hello Shirly. That's easy too as the CPU status. The only place where this is available in the Dashboard in the Virtual Machines tab. Please find a screenshot attached. As you see it shows 37% Network usage while the VM has 2 vNICs, so it can do at least 2 Gbps (3 in reality as it sits on the top of a 3 x 1Gb bonding) As I mentioned it should be able someway to set the maximum allowed speed for each VM in order for these graphs to show in the Virtual Machines Tab, having a default value based on something like the physical NIC speed or bonding which it connects to.
Created attachment 1266954 [details] Virtual Machines tab status
Just to amend Comment 3 the status shown on Virtual Machines tab in the dashboard (picture attached) is not totally wrong given that in this case the VM in my environment can do in fact 2Gbps aggregated so the current traffic is in fact 37%, however as I mentioned it sits on the top of a 3 x 1Gb bonding and it theory could do more. Same would happen if it would sit on the top of a 10Gb physical NIC, so having that configurable may help to adjust to each scenario.
Hello there. Just to clarify. Based on the last history interactions this should be available on version 4.2. What was chosen as best option ? Let the system to discovery the physical NICs speeds for a default value or/and let the use enter specific situations where you may have: - Multiple vNIC capable of different speeds (due to different Vlans) - Capped network speed to a given Vlan or VM - Or simple a VM connected to a bridge which has either 1Gb or 10Gb physical NICs.
TBH, we don't know to display well the percentage of a virtual NIC. Since it's virtual, it has no real physical limit. If you'd do a guest <-> guest IO, you may reach 7Gbps or more, depending on various settings and host CPU. There's no real good measurement to show here in percentage - we may want at some point to move it to just Mbps or Gbps.
Hello Yaniv. An option is to allow the user to set the maximum amount that vNIC can pass in Kbps so the real-time percentage can be calculated based on that. Optionally it can be gathered based on the physical NIC settings, but it may impose so extra complexity when using Bonding for exemple (for this the option to allow the user to set the value manually) But moving to Mbps ou Gbps may be an easier option as well.
I am afraid that this would not make it to 4.2. I think that it would be lovely to let the user control the bandwidth associated with 100% of vNIC traffic. The current hard-coded value of 1GiB is arbitrary, and ancient: with bonding, 10GiB nics, or inter-vm communication, we can easily surpass it, as Fernando reports.
We'll be moving the NIC to be reporting 10g in the next release. This will make it somewhat more realistic. However, I assume reporting true number makes a bit more sense than percentage anyway. I don't see us changing this though in 4.3?
Hello Yaniv I think maybe 10Gb could be the default speed if somehow the system cannot gather the correct speed by itself. Yes, reporting the true number is more appropriate, perhaps based in the underlying physical NIC(s) the VM is connected to could be a way as there is a significant number of systems that still have multiple 1Gb interfaces. On the top of that a Edit option to let operator Customize is advisable as there could be situations like Bonding where the value could be multiple of 1Gb ou 10Gb interfaces. And yet another situation is if the VM has QoS rules applied for Network Traffic.
Has this been postponed again ? If so what is the challenge involved ? Or do I understand it wrong and it has been implemented in 4.4 and therefore t ovirt-4.4 removed ?
(In reply to Fernando from comment #12) > Has this been postponed again ? If so what is the challenge involved ? Or do > I understand it wrong and it has been implemented in 4.4 and therefore t > ovirt-4.4 removed ? It was untargeted, since we don't know when this will be fixed due to capacity. Your contribution of patches for this is welcome, if you can work on this.
Hello Yaniv, thanks for the update. I am not really a programmer so I can't help on that. My contribution has been to this case and others gathering necessary and proper troubleshooting data where it applies and proving feedback to those who are best suited to this job. I just found strange that something that looks relatively simple and shown to be of some need keeps postponed or not targeted if that was due anything that was still missing or because was simply forgotten. Regards.
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly
Assign it to who ? This is a pretty simple thing . Nothing major. It perhaps just doesn't go into developer's mind which may be distracted to other things. Most other enterprise products have it already because it is logical and help not to confuse people. Perhaps people in charge to implement such an small feature don't really use the product in the day by day to not notice the lack of it and how it impacts on day to day management. Well, perhaps sometime the same thing may come from other sources and it can be taken in consideration. The hope will continue.
This request is not currently committed to 4.4.z, moving it to 4.5
(In reply to Fernando from comment #17) > Assign it to who ? anyone who's willing to implement this > This is a pretty simple thing . Nothing major. It perhaps just doesn't go > into developer's mind which may be distracted to other things. > Most other enterprise products have it already because it is logical and > help not to confuse people. Perhaps people in charge to implement such an > small feature don't really use the product in the day by day to not notice > the lack of it and how it impacts on day to day management. > Well, perhaps sometime the same thing may come from other sources and it can > be taken in consideration. The hope will continue. That's true. Sadly, no one got to it and so this still rots in a backlog. Dominik, I see you removed Capacity NACK half a year ago, yet there is no progress. Is this on immediate plan, if not I'm afraid we have to close this as there is not enough interest to implement this and no volunteers
This has been going for almost 3 years. Strange such a simple thing still in this status. Is this not being reviewed by UX people or just by people who write code really ?
This bug has not been prioritized or updated for a long time and therefore deemed stale. Closing for now, please feel free to update and reopen, but kindly provide justification or development plan how/when to address this bug
Rather than closing it why not take it to the proper people who have to analyse it and move forward internally ? This seems another case where requirements get lost due to internal organization rather than something complex or costly to be implemented. It is something fairly easy to get implemented and necessary, but there is no justification why it keeps being postponed. Instead of just closing can it be taken to the product manager like person and to get finally implemented in the next opportunity ? It took a fair amount of time to gather all necessary points to create and detail this bug, to follow up on it for all this time, people have already commented and didn't show any significant concern or objection and closing it will only throw it in the bin unnecessarily with all the work that has been done already. Penalizing it because of some internal organization issue is no good and doesn't take the merit of this feature for day to day operations. I believe this can be taken to the next meeting which will be discussed the newer features and get it moved.
You can see our "discussion" from last year, and nothing has changed since. The bug change is just a mere reflection of that. It is still not prioritized, there is still no one working on it nor planning to. It was reviewed several times. Every bug report took someone's time, I get that, but there's no point in tracking it after 4.5 years, apparently there is not enough interest for this, sorry. This is an oVirt community RFE, there's no product manager behind that other than the development team. Granted, that team does primarily work on a product derived from oVirt - RHV, but that has a separate tracking and separate prioritization. If you are opening this request on behalf of a RHV customer or you are the customer or a potential one, then please contact Red Hat sales/support. (I'm closing it again, because even if i keep it open it's only going to get closed again during next cleanup. Please do not reopen unless you have a plan how to address it)
Hi Michal, I keep my claim to not just trash this work. I understand your points well but the thing is legitimate and is not that costly in terms of complexity. One of my points is that this may not have been discussed or looked at perhaps due to some internal organization issue that could be improved. First thing is the RFE to look into it which doesn't seem to have been properly done so far. In any case thanks for your attention to look into this and let's hope that gets picked by someone who can actually give proper attention to something small as this one. I didn't reopen the case. Regards
Why just trash this as it's something still needed to the project. The fact that anyone didn't pickup yet doesn't mean it is not needed.
Network team don't have capacity for this, if it's required by community feel free to post upstream PR fixing this issue
Network team doesn't have capacity, interest or understanding of what is about discussed in the issue, as they are different things. It is hard to understand a RedHat Product doesn't have capacity to implement such a relatively small feature.
@mperina please don't just close the issue without a reply. I am trying to go more in deep of the reasons of the refusal and there a question answered. Lack of capacity sounds an strange excuse. Would you please elaborate what is the real reason please ?
(In reply to Fernando from comment #28) > @mperina please don't just close the issue without a reply. > I am trying to go more in deep of the reasons of the refusal and there a > question answered. Lack of capacity sounds an strange excuse. Would you > please elaborate what is the real reason please ? That's the official reason, we don't have capacity to deliver that in oVirt 4.5, oVirt 4.5 Alpha was already released, which means no more new features can be added to oVirt 4.5. But as mentioned above, if you really wants thiss feature feel free to implement it.