Hide Forgot
Description of problem:top does not dispaly correct idle time Version-Release number of selected component (if applicable): Fedora 16 3.2.7-1 kernel How reproducible: easy Steps to Reproduce: 1. run top 2. notice idle time as compared to running processes 3. Actual results: top top - 10:42:15 up 21:09, 6 users, load average: 0.24, 0.25, 0.52 Tasks: 306 total, 1 running, 305 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 47.5%id, 0.0%wa, 52.5%hi, 0.0%si, 0.0%st Mem: 2052684k total, 1912736k used, 139948k free, 3780k buffers Swap: 2326524k total, 1329092k used, 997432k free, 112848k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2900 jeff 20 0 1899m 899m 16m S 3.3 44.9 2:16.46 firefox 2391 jeff 20 0 1226m 9948 3564 S 1.3 0.5 3:12.41 knotify4 4126 jeff 20 0 1267m 8200 3828 S 1.0 0.4 0:34.20 knotify4 2507 jeff 20 0 11364 812 608 S 0.3 0.0 0:01.33 ksysguardd 2527 root 20 0 534m 2828 1476 S 0.3 0.1 0:00.14 /usr/sbin/httpd 3931 jeff 20 0 650m 23m 12m S 0.3 1.2 0:04.51 konsole 4125 jeff 20 0 1559m 58m 15m S 0.3 2.9 1:27.76 plasma-desktop 1 root 20 0 39796 2804 1420 S 0.0 0.1 0:01.21 systemd Expected results: Idle time should be 93.2 not 47.5 Additional info:
Hello Jeff. I believe the value is correct. 47.5%id 52.5%hi As you can see, the rest is "consumed" by hardware interrupts. This might be caused by a broken device on the bus or by your network card during some network issue (or DoS attack). Please, check your hardware and let me know. Thank you. Regards, Jaromir.
Ok so how would I tell which hardware is being interrupted? Network traffic at the time was Nil.. Thanks if you add up the cpu col you get 6.8 subtracting that from 100 leaves you with 93.2 idle time..
Could you please attach the output of the following command? ( cat /proc/interrupts ; sleep 5 ; cat /proc/interrupts ) > interrupts-diff.txt it's gonna take 5 seconds to finish. ... but you're right, that discrepancy with 6.8% lost somewhere looks really suspicious ... May I also know something about your hardware? Could you please describe your system briefly? Thanks in advance. Regards, Jaromir.
Jarpmir, Cool like learning something new and so easy.. CPU0 0: 151 IO-APIC-edge timer 1: 1143 IO-APIC-edge i8042 8: 1 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12: 221 IO-APIC-edge i8042 14: 50479 IO-APIC-edge ata_piix 15: 0 IO-APIC-edge ata_piix 19: 28461 IO-APIC-fasteoi virtio0 20: 6156 IO-APIC-fasteoi vboxguest 23: 193 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb2 40: 272896 PCI-MSI-edge ahci 41: 205 PCI-MSI-edge snd_hda_intel NMI: 0 Non-maskable interrupts LOC: 6254622 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 172 Machine check polls ERR: 0 MIS: 0 CPU0 0: 151 IO-APIC-edge timer 1: 1143 IO-APIC-edge i8042 8: 1 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12: 221 IO-APIC-edge i8042 14: 50485 IO-APIC-edge ata_piix 15: 0 IO-APIC-edge ata_piix 19: 28467 IO-APIC-fasteoi virtio0 20: 6157 IO-APIC-fasteoi vboxguest 23: 193 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb2 40: 272960 PCI-MSI-edge ahci 41: 205 PCI-MSI-edge snd_hda_intel NMI: 0 Non-maskable interrupts LOC: 6255551 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 0 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 172 Machine check polls ERR: 0 MIS: 0 Running on AMD 4 cores, host os Openindiana Vbox Fedora 16 kernel 3.3 Thanks Jeff
Hello Jeff. That's strange. I don't see any interrupt peaks. Was the issue present during the test? If so ... could you please also do a similar thing with /proc/stat? ( cat /proc/stat ; sleep 1; cat /proc/stat ) > proc-stat-diff.txt ... or simply without redirection if you wanna attach it in form of comment: cat /proc/stat ; sleep 1; cat /proc/stat I've never noticed similar issue in Vbox, but we experienced similar issue with C-states, when Nahalem CPUs went to sleeping state to save power and in that state didn't even produce idle ticks. That caused the final sum to be less than 100%. Maybe a similar behaviour is achievable with some Vbox build-configuration. But let's wait until the /proc/stat results. Thanks, Jaromir.
Hey Jaromir, Yes, iotop showed kswapd going nuts no idea why, but that was a fw kernels ago.. cat proc*txt cpu 82446 3931 123793 12673720 138313 377677 12130 0 0 0 cpu0 82446 3931 123793 12673720 138313 377677 12130 0 0 0 intr 20378602 151 1143 0 0 0 0 0 0 1 0 0 0 221 0 136743 0 0 0 0 166245 15579 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 865980 205 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 38898088 btime 1332886151 processes 136803 procs_running 10 procs_blocked 0 softirq 13383560 0 8998095 2 225621 757810 0 827 0 73521 3327684 cpu 82447 3931 123796 12673789 138318 377692 12130 0 0 0 cpu0 82447 3931 123796 12673789 138318 377692 12130 0 0 0 intr 20378909 151 1143 0 0 0 0 0 0 1 0 0 0 221 0 136745 0 0 0 0 166250 15579 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 866031 205 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 38898697 btime 1332886151 processes 136805 procs_running 1 procs_blocked 0 softirq 13384163 0 8998241 2 225626 757841 0 827 0 73521 3328105 Will check how I have the sleep set in VBOX I believe I turned that off for another issue awhile back with the Openindiana host.
Hello Jeff. The values sent by you correspond to the following result: 1%us, 6% system, 0%ni, 69%id, 5%wa, 15%hi, 0%si Even if the interval was longer than 1 second, the final sum is 96 only (=lower than 100) ... the values usually represent 100Hz ticks, therefore they may be understood as percentage. So ... at least 4 ticks got lost. That might point to instability of the clock or to a completely different issue. Anyway ... the number of hardware interrupts is still too high. It's usually around 2% on my worstations and around 5% on my virtual machines. I don't know if you checked the top output while collecting the results ... do my results correspond to what top displayed? If so, then the problem probably lies in the following things: 1.) guest kernel 2.) VBox (maybe platform dependent issue) 3.) host OS. Please, let me know. Regards, Jaromir.
Hey Jaromir, Ran it when top was running and user went briefly to 0.. Before cat proc-stat-diff.txt cpu 110523 4706 169950 15806004 153347 484663 15454 0 0 0 cpu0 110523 4706 169950 15806004 153347 484663 15454 0 0 0 intr 26614425 151 1490 0 0 0 0 0 0 1 0 0 0 2693 0 170847 0 0 0 0 232917 37695 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1060901 205 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 51965086 btime 1332886151 processes 185477 procs_running 4 procs_blocked 0 softirq 17125833 0 11672536 2 318708 921992 0 1162 0 113399 4098034 cpu 110526 4706 169954 15806077 153355 484669 15454 0 0 0 cpu0 110526 4706 169954 15806077 153355 484669 15454 0 0 0 intr 26614841 151 1491 0 0 0 0 0 0 1 0 0 0 2693 0 170847 0 0 0 0 232917 37695 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1060984 205 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 51965904 btime 1332886151 processes 185479 procs_running 3 procs_blocked 1 softirq 17126080 0 11672693 2 318708 922052 0 1163 0 113400 4098062 During cpu 110566 4706 170041 15806998 154107 484791 15469 0 0 0 cpu0 110566 4706 170041 15806998 154107 484791 15469 0 0 0 intr 26625068 151 1514 0 0 0 0 0 0 1 0 0 0 2693 0 170869 0 0 0 0 232941 37697 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1064439 205 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 51985427 btime 1332886151 processes 185521 procs_running 9 procs_blocked 0 softirq 17134332 0 11676026 2 318742 924666 0 1176 0 113424 4100296 cpu 110567 4706 170046 15807041 154143 484797 15470 0 0 0 cpu0 110567 4706 170046 15807041 154143 484797 15470 0 0 0 intr 26625737 151 1515 0 0 0 0 0 0 1 0 0 0 2693 0 170869 0 0 0 0 232946 37697 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1064788 205 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 51986301 btime 1332886151 processes 185525 procs_running 1 procs_blocked 2 softirq 17134792 0 11676189 2 318747 924890 0 1177 0 113426 4100361
Hello Jeff. The evaluation is a dynamic process, that can return completely different values with each cycle and therefore it's very difficult to compare values returned by the top and values sampled with the provided shell command unless they're stable for a longer time period. Anyway, according to my evaluation, the values returned by the kernel correspond to the following ... before 3%us, 0%ni, 4%sy, 73%id, 8%wa, 6%hi, 0%si during 1%us, 0%ni, 5%sy, 43%id, 36%wa, 6%hi, 1%si as you can see, there was an io_wait peek and that affected the idle counter quite significantly. Please, note, that the virtual machine may distort the kernel counters quite considerably. Moreover, the evaluation logic is quite simple/clear and many times tested/discussed ... so, it's very unlikely that there's something wrong with it. All counter related issues I remember were caused by the kernel, because it does the hardest job in case of stats. At the moment I believe, that the top command behaves correctly according to what kernel exports into the /proc/stat file. If you believe, that the values in /proc/stat are incorrect, then we can change the component to kernel (but I believe this has a lot to do with the virtual machine and it's resource control). Please, let me know. Regards, Jaromir.
Maybe I should also mention that the counters represent time units "consumed" by the processess, idle state, etc. instead of work units and that already causes major distortions in case of hardware with frequency scaling. If the host OS / virtualization technology gives the guest OS different amount of resources each second, then the guest OS is not aware of that and the same amount of work units produces a different amount of time units in the stats and the same amount of load then looks like fluctuating. We've already requested new statistical counters for work units in the kernel, but this hasn't been implemented yet.
Jaromir, Thanks for the info will keep my eye on it see if I can catch anything else. Look forward to be able to gather more detail when new counters are implemented.
Hello Jeff. These new counters are kind of long-term run. The feature has been requested, but it usually takes longer to get it into the upstream. The question is if you'd like me to switch the component to kernel or wait till you collect more information. Regards, Jaromir.
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.