Bug 77652
| Summary: | ksoftirqd kernel load issue - cpu load saturated by ksoftirqd | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | jeffrey.buchsbaum |
| Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
| Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.0 | CC: | mharris |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2006-02-21 18:50:07 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
do you have the nvidia binary only kernel modules loaded? Yes, I have the nvidia drivers, but I built them from their rpm src file. jeff ok here's the problem: ksoftirq uses cpu when you get a lot of interrupts it seems (3D) screensavers trigger this for you; which is probably a bug in the binary only nvidia driver. If you can see in /proc/interrupts that another device is causing interrupts AND you can reproduce this without the nvidia driver ever loaded, please reopen this bug. *** This bug has been marked as a duplicate of 73733 *** Here is a paste of my /proc/interrupts AFTER a reboot (this keyboard/mouse I/O
was driving me nuts!).
Thanks. Perhaps NVIDIA will release their source to you someday????
jb
PASTE:
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 151043 149473 153043 145432 IO-APIC-edge timer
1: 35 35 36 37 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
3: 3 1 2 0 IO-APIC-edge serial
8: 1 0 0 0 IO-APIC-edge rtc
12: 161 107 154 166 IO-APIC-edge PS/2 Mouse
14: 7233 7322 7287 6891 IO-APIC-edge ide0
15: 9847 9487 9876 9510 IO-APIC-edge ide1
16: 29220 28370 29065 26273 IO-APIC-level nvidia
18: 12713 12637 12737 12474 IO-APIC-level SB Live
19: 114 84 118 106 IO-APIC-level aic7xxx, usb-u
hci
20: 4 4 4 4 IO-APIC-level aic7xxx
23: 21556 20963 21382 21074 IO-APIC-level usb-uhci, eth0
NMI: 0 0 0 0
LOC: 598883 598486 598873 598881
ERR: 0
MIS: 0
Very unlikely that they would do that. A new version of the nvidia driver has the same problem, and other poeple report
this bug with ethernet drivers, etc.
So, I think the conclusion that it is the nvidia driver is not correct....
This is now happening on a dailly basis...and is a big, big problem for me and
my work. It got MUCH worse with a recent a updfstab......more usb devices were
brought into fstab (flash card reader and zip 250 to be precise).
Pastes of /proc/interrupts:
A fresh rebooted system (9AM):
CPU0 CPU1 CPU2 CPU3
0: 218970 217810 215901 217999 IO-APIC-edge timer
1: 215 214 215 213 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
12: 9975 10200 9906 10335 IO-APIC-edge PS/2 Mouse
14: 8070 8396 8386 8572 IO-APIC-edge ide0
15: 20151 19695 19402 19769 IO-APIC-edge ide1
16: 34475 33693 33058 33621 IO-APIC-level nvidia
18: 18857 18772 18662 18806 IO-APIC-level SB Live
19: 371 328 342 359 IO-APIC-level aic7xxx, usb-u
hci
20: 4 4 4 4 IO-APIC-level aic7xxx
23: 32973 33308 32834 33029 IO-APIC-level usb-uhci, eth0
NMI: 0 0 0 0
LOC: 870392 870405 870399 870405
ERR: 0
MIS: 0
jb
The thing other people report is a orinoco_cs bug, we know about that. Please try this without the nvidia module loaded AT ALL. It appears your machine uses level interrupts for the irq nvidia uses... that normally requires a validly written driver. Now, onee hour later....load is 18.22+,, ksoftirqd_CPU(0-3) are on the top of "top".
New cat /proc/interrupts
~]$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 593823 603097 600104 593370 IO-APIC-edge timer
1: 395 385 390 385 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
12: 17124 17104 16886 17210 IO-APIC-edge PS/2 Mouse
14: 105586 107111 106192 106488 IO-APIC-edge ide0
15: 43088 42864 42539 42595 IO-APIC-edge ide1
16: 97088 98183 98047 96355 IO-APIC-level nvidia
18: 53127 53994 53852 53255 IO-APIC-level SB Live
19: 371 328 342 359 IO-APIC-level aic7xxx, usb-uhci
20: 4 4 4 4 IO-APIC-level aic7xxx
23: 89931 92120 92063 90208 IO-APIC-level usb-uhci, eth0
NMI: 0 0 0 0
LOC: 2390236 2390249 2390243 2390248
ERR: 0
MIS: 0
[jbuchsba@coil ~]$
Please send me explicit email as to what and how I should proceed.....
Thannks.
jb
OK, I logged in remotely, did telinit 3 as root did an rmmod nvidia and within 1 minute load was 0.01 from 17.00+, so, I stand corrected and the bug is with nvidia.... ug. Jeff (*it looks like I'll be buying a new video card...) Oddly, if I use windowmaker instead of the kde/gnome that comes with RedHat 8, all is well...times two days (100% chance of cpu load at 20....by that point in time). This problem did NOT exist in 7.3..only 8.0. I have not tried phoebe. openGL/nvidia is still loaded and now is fine....SO, the problem seems to be in the lap of the RH8 gui. Please comment. BTW, if I used noapic I had the load moved from ksoftirqd_CPUx to keventd/kjournald.....perhaps these/KDE via RH/ is/are broken. TIA! Jeff Update on this after 2 days of uptime. NO HANG USING WINDOWMAKER...no other changes at all....making me think that the bug is NOT with the nvidia drivers but with the redhat implementation of the gui. I checked with friends and mandrake and suse (current versions) with kde and gnome do not hang on the same hardware.....so the issue is likely with the "bluewave" stuff.... Please advise on a time frame to examine this and to check of the other ksoftirqd issues on bugzilla are due to the same issue. Thanks. Jeff > Please advise on a time frame to examine this and to check of the other > ksoftirqd issues on bugzilla are due to the same issue. any reports with nvidia kernel modules are ignored. it's not worth my time to investigate interaction issues with this module we don't have code for. *** This bug has been marked as a duplicate of 73733 *** If this bug is JUST due to Nvidia, why does it ONLY occur with the redhat gnome/kde.....windowmaker is just fine? JB window maker might just use a subset of the drivers features. Really, please stop reopening this. machines with the nvidia module loaded are not supported and we CAN'T fix the module. *** This bug has been marked as a duplicate of 73733 *** Ok...so you don't want me to reopen it if I use nvidia.... so I remove all the nvidia stuff and installed the XiG DX Platinum drivers (the best...way better than X86Free..sorry). Same problem. Freezes and problems in Gnome. NONE in windowmaker. The problem is clearly with the code by RedHat....please re-open and look at the problem. Jeff ok so I need cat /proc/interrupts about 2 seconds from eachother WHEN THE
PROBLEM IS HAPPENING.
In addition it'll be useful to enable kernel profiling ("nmi_watchdog=1
profile=1" on the kernel commandline) and then
readprofile -r
sleep 10
readprofile -m /boot/System.map | sort -n
to show a list of functions where the kernel spends it's time
So, right away a crash in the screensaver boxed (just trying to do this by
running the module manually in demo mode cannot get the crash...I tried for an
hour).
With the flags, the machine is now frozen (on the x11 console) solid. I could
telnet in and find a poorly responsive, but alive, machine:
w got:
11:56am up 4 days, 2:48, 2 users, load average: 13.48, 13.75, 13.87
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
jbuchsba :0 - 8:03am ? 0.00s ? -
jeffb pts/3 slab 11:06am 0.00s 0.07s 0.02s w
Doing cat /proc/interrupts with about 2 seconds between hitting return got:
[jeffb@coil jeffb]$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 45139083 45476391 43976219 45996759 IO-APIC-edge timer
1: 760 760 763 754 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
12: 36206 36630 36252 36099 IO-APIC-edge PS/2 Mouse
14: 445461 278205 97800 513978 IO-APIC-edge ide0
15: 72197 246626 45307 249435 IO-APIC-edge ide1
18: 4130887 4161548 4024310 4209429 IO-APIC-level SB Live
19: 248791 315531 21520 417173 IO-APIC-level aic7xxx, usb-uhci
20: 4 4 4 4 IO-APIC-level aic7xxx
23: 6183082 6264429 5993566 6354479 IO-APIC-level usb-uhci, eth0
NMI: 180588150 180588150 180588150 180588150
LOC: 180603531 180603523 180603536 180603536
ERR: 0
MIS: 1
[jeffb@coil jeffb]$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 45140372 45476391 43976219 45997009 IO-APIC-edge timer
1: 760 760 763 754 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
12: 36206 36630 36252 36099 IO-APIC-edge PS/2 Mouse
14: 445775 278205 97800 514276 IO-APIC-edge ide0
15: 72197 246626 45307 249435 IO-APIC-edge ide1
18: 4131007 4161548 4024310 4209450 IO-APIC-level SB Live
19: 249044 315531 21520 417220 IO-APIC-level aic7xxx, usb-uhci
20: 4 4 4 4 IO-APIC-level aic7xxx
23: 6183286 6264429 5993566 6354513 IO-APIC-level usb-uhci, eth0
NMI: 180589689 180589689 180589689 180589689
LOC: 180605070 180605063 180605076 180605075
ERR: 0
MIS: 1
I am not sure what I need to do to get the above readprofile to work....I could
not find readprofile via which as root or as a user...is that a boot command?
(no, I am not a grub guru... :-) ) Please let me know what else I can do to
help get information up...the machine is crashed right now (always < 2 hours
after loading gnome).
jeff
Clarification: System has no NVIDIA software on board now. Just XiG platinum dx drivers....so x86free is all changed...same bug......making in more likely to be a gnome issue. The graphic card was swapped out as well....quadro4xgl900 was changed to an ATI fire gl 8800. Of note, the bug again is absent in windowmaker. JB Per the request.
Jeff
[root@coil /]# cd /usr/sbin
[root@coil sbin]# ./readprofile -r
[root@coil sbin]# sleep 10
[root@coil sbin]# readprofile -m /boot/System.map | sort -n
bash: readprofile: command not found
[root@coil sbin]# ./readprofile -m /boot/System.map | sort -n
1 add_blkdev_randomness 0.0104
1 add_timer 0.0104
1 __alloc_pages 0.0014
1 atomic_dec_and_lock 0.0145
1 __block_commit_write 0.0048
1 buffer_insert_inode_queue 0.0104
1 call_reschedule_interrupt 0.0909
1 clear_page_tables 0.0104
1 collect_signal 0.0039
1 __constant_memcpy 0.0037
1 copy_strings 0.0016
1 del_timer 0.0104
1 disk_round_stats 0.0156
1 do_gettimeofday 0.0078
1 do_mmap_pgoff 0.0006
1 do_select 0.0017
1 emit_log_char 0.0089
1 end_buffer_io_sync 0.0208
1 eth_type_trans 0.0052
1 exit_mmap 0.0027
1 fd_install 0.0125
1 fib_lookup 0.0031
1 filemap_fdatawait 0.0042
1 file_read_actor 0.0039
1 __find_lock_page_helper 0.0057
1 find_snap_client 0.0125
1 fn_hash_lookup 0.0037
1 __generic_copy_from_user 0.0089
1 generic_plug_device 0.0089
1 generic_unplug_device 0.0125
1 get_vm_area 0.0045
1 __global_save_flags 0.0104
1 handle_IRQ_event 0.0063
1 handle_mm_fault 0.0030
1 handle_stop_signal 0.0063
1 idle_cpu 0.0312
1 internal_add_timer 0.0057
1 interruptible_sleep_on 0.0078
1 ip_route_input_slow 0.0004
1 IRQ0x17_interrupt 0.0833
1 kmap_high 0.0104
1 kstat_read_proc 0.0009
1 locate_hd_struct 0.0069
1 lock_vma_mappings 0.0208
1 may_open 0.0031
1 new_inode 0.0089
1 page_cache_read 0.0039
1 proc_info_read 0.0033
1 proc_pid_lookup 0.0019
1 proc_pid_statm 0.0023
1 __read_lock_failed 0.0500
1 release_console_sem 0.0057
1 remove_wait_queue 0.0312
1 run_timer_list 0.0025
1 send_sig_info 0.0045
1 setup_frame 0.0019
1 setup_sigcontext 0.0033
1 smp_send_reschedule 0.0156
1 sockfd_lookup 0.0078
1 submit_bh 0.0078
1 supplemental_group_member 0.0156
1 sys_select 0.0008
1 sys_sigreturn 0.0035
1 task_dumpable 0.0208
1 tcp_transmit_skb 0.0009
1 .text.lock.acct 0.0085
1 .text.lock.ioctl 0.0256
1 .text.lock.printk 0.0044
1 .text.lock.readdir 0.0097
1 tty_read 0.0031
1 unix_dgram_recvmsg 0.0028
1 unix_write_space 0.0069
1 update_wall_time_one_tick 0.0057
1 vfs_permission 0.0031
1 write_profile 0.0063
2 account_io_end 0.0250
2 batch_entropy_store 0.0114
2 blkdev_release_request 0.0179
2 __block_prepare_write 0.0024
2 __constant_c_and_count_memset 0.0125
2 __constant_memcpy 0.0074
2 do_anonymous_page 0.0054
2 do_check_pgt_cache 0.0096
2 do_page_fault 0.0016
2 do_zap_page_range 0.0052
2 d_rehash 0.0179
2 dup_mmap 0.0038
2 end_level_ioapic_irq 0.0057
2 fget 0.0312
2 __find_get_page 0.0250
2 fput 0.0063
2 __free_pages_ok 0.0024
2 get_gendisk 0.0312
2 get_unused_fd 0.0048
2 ide_do_request 0.0042
2 ide_error 0.0043
2 ide_set_handler 0.0125
2 inode_has_buffers 0.0312
2 iput 0.0028
2 IRQ0x13_interrupt 0.1667
2 kfree 0.0104
2 kmem_cache_free 0.0139
2 kunmap_high 0.0156
2 load_balance 0.0021
2 __make_request 0.0012
2 page_add_rmap 0.0125
2 rmqueue 0.0027
2 __switch_to 0.0078
2 sys_fsync 0.0096
2 .text.lock.locks 0.0104
2 unlock_page 0.0179
2 update_one_process 0.0069
2 __wait_on_buffer 0.0125
2 zap_pte_range 0.0039
3 __brelse 0.0938
3 do_IRQ 0.0099
3 do_signal 0.0044
3 do_syslog 0.0032
3 __free_pages 0.0938
3 generic_file_write 0.0014
3 mark_page_accessed 0.0208
3 page_remove_rmap 0.0134
3 pool_find_page 0.0375
3 real_lookup 0.0094
3 refile_buffer 0.0625
3 strnlen_user 0.0441
3 switch_mm 0.0093
3 __tasklet_hi_schedule 0.0312
3 vsnprintf 0.0027
4 copy_page_range 0.0081
4 d_alloc 0.0100
4 dput 0.0096
4 ide_intr 0.0100
4 IRQ0x0e_interrupt 0.3333
4 pte_chain_free 0.0357
5 ide_end_request 0.0240
5 link_path_walk 0.0026
5 proc_lookup 0.0223
5 proc_pid_stat 0.0047
5 try_to_wake_up 0.0116
6 bh_action 0.0469
6 __kmem_cache_alloc 0.0197
6 pci_pool_alloc 0.0156
6 reschedule_interrupt 0.2857
6 .text.lock.inode 0.0123
7 d_lookup 0.0230
7 .text.lock.namei 0.0059
8 page_fault 0.6667
8 __wake_up 0.0625
9 pci_pool_free 0.0331
9 set_ioapic_affinity 0.0511
10 pte_chain_alloc 0.1042
11 smp_apic_timer_interrupt 0.0491
11 start_request 0.0181
12 get_hash_table 0.0833
12 unlock_buffer 0.1500
15 .text.lock.sched 0.0285
17 number 0.0197
17 scheduler_tick 0.0236
19 ide_dma_intr 0.0913
29 ide_wait_stat 0.0954
35 invalidate_bdev 0.0875
39 apic_timer_interrupt 1.6250
55 ide_dmaproc 0.0637
95 do_rw_disk 0.0565
133 statm_pte_range 0.4030
213 ret_from_sys_call 12.5294
350 ksoftirqd 1.2153
611 restore_all 40.7333
3090 deliver_to_old_ones 14.8558
3955 schedule 5.2593
4212 sys_sched_yield 13.1625
7421 do_softirq 33.1295
9534 system_call 170.2500
9981 tasklet_hi_action 62.3813
18811 default_idle 235.1375
20767 __rdtsc_delay 648.9688
79780 total 0.0549
Just for completeness, without the crash/high load level...being logged into
windowmaker...I get:
[root@coil sbin]# ./readprofile -r
[root@coil sbin]# sleep 10
[root@coil sbin]# ./readprofile -m /boot/System.map | sort -n
1 alloc_skb 0.0021
1 __constant_c_and_count_memset 0.0063
1 d_alloc 0.0025
1 d_instantiate 0.0104
1 do_wp_page 0.0012
1 follow_page 0.0078
1 fput 0.0031
1 get_empty_filp 0.0031
1 IRQ0x12_interrupt 0.0833
1 IRQ0x17_interrupt 0.0833
1 kstat_read_proc 0.0009
1 link_path_walk 0.0005
1 __mark_inode_dirty 0.0052
1 poll_freewait 0.0125
1 proc_lookup 0.0045
1 pty_chars_in_buffer 0.0125
1 pty_unthrottle 0.0089
1 reschedule_interrupt 0.0476
1 rmqueue 0.0014
1 set_ioapic_affinity 0.0057
1 sock_def_readable 0.0069
1 statm_pgd_range 0.0052
1 sys_close 0.0078
1 sys_select 0.0008
1 system_call 0.0179
1 udp_v4_mcast_deliver 0.0023
1 unix_ioctl 0.0048
1 vsnprintf 0.0009
1 __wake_up 0.0078
2 d_lookup 0.0066
2 load_balance 0.0021
2 netif_receive_skb 0.0037
2 proc_pid_make_inode 0.0104
2 write_profile 0.0125
3 atomic_dec_and_lock 0.0435
3 collect_sigign_sigcatch 0.0234
3 __kmem_cache_alloc 0.0099
3 proc_pid_statm 0.0069
4 number 0.0046
4 smp_apic_timer_interrupt 0.0179
6 fget 0.0938
7 proc_pid_stat 0.0066
9 scheduler_tick 0.0125
18 apic_timer_interrupt 0.7500
38 statm_pte_range 0.1152
60970 default_idle 762.1250
61107 total 0.0421
[root@coil sbin]#
[jbuchsba@coil sbin]$ w
5:10pm up 2:37, 2 users, load average: 0.06, 0.08, 0.03
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
jbuchsba :0 - 5:07pm ? 0.00s ? -
jbuchsba pts/1 - 5:07pm 3:39 1.37s 1.37s top
[jbuchsba@coil sbin]$
Update....
Up to the current kernel for 8.0.
Glibc 2.3.2x was updated/installed.
Same thing...
[root@coil sbin]# ./bugzillascript
1 account_io_start 0.0104
1 batch_entropy_store 0.0057
1 __brelse 0.0312
1 call_do_IRQ 0.0769
1 __constant_c_and_count_memset 0.0063
1 __constant_memcpy 0.0037
1 copy_page_range 0.0020
1 d_lookup 0.0033
1 do_fcntl 0.0013
1 do_no_page 0.0015
1 do_select 0.0017
1 fget 0.0156
1 filp_close 0.0048
1 __generic_copy_to_user 0.0125
1 generic_file_write 0.0005
1 get_gendisk 0.0156
1 handle_mm_fault 0.0030
1 ide_destroy_dmatable 0.0208
1 inode_has_buffers 0.0156
1 ip_check_mc 0.0156
1 kfree 0.0052
1 __kmem_cache_alloc 0.0033
1 mark_page_accessed 0.0069
1 new_inode 0.0089
1 number 0.0012
1 page_fault 0.0833
1 proc_file_lseek 0.0048
1 proc_lookup 0.0045
1 proc_pid_cmdline 0.0037
1 prune_dcache 0.0019
1 reschedule_interrupt 0.0476
1 set_page_dirty 0.0078
1 skb_release_data 0.0069
1 sleep_on 0.0078
1 switch_mm 0.0031
1 sys_rt_sigprocmask 0.0022
1 __tasklet_hi_schedule 0.0104
1 .text.lock.namei 0.0008
1 .text.lock.socket 0.0044
2 do_IRQ 0.0066
2 get_hash_table 0.0139
2 kunmap_high 0.0156
2 load_balance 0.0021
2 pte_chain_free 0.0179
2 schedule 0.0027
2 set_ioapic_affinity 0.0114
2 statm_pgd_range 0.0104
2 system_call 0.0357
2 .text.lock.inode 0.0041
2 write_profile 0.0125
3 restore_all 0.2000
3 smp_apic_timer_interrupt 0.0134
3 __wake_up 0.0234
4 ide_wait_stat 0.0132
4 kmap_high 0.0417
4 pte_chain_alloc 0.0417
4 .text.lock.ioctl 0.1026
4 unlock_buffer 0.0500
5 bh_action 0.0391
5 start_request 0.0082
6 apic_timer_interrupt 0.2500
7 ide_dmaproc 0.0081
7 scheduler_tick 0.0097
19 do_rw_disk 0.0113
19 statm_pte_range 0.0576
23 ksoftirqd 0.0799
1098 do_softirq 4.9018
3050 tasklet_hi_action 19.0625
3602 .text.lock.dev 8.8938
3889 deliver_to_old_ones 18.6971
4180 __rdtsc_delay 130.6250
7861 default_idle 98.2625
23861 total
Please respond/post ideas about this...as I am loosing a lot of work because of
this and might have to flee RedHat altogether. My mandrake box at home has none
of this ...same set up, different obviously in sotware...
I really want to support RHL and just paid for additional support
yesterday...but I really think this is a big deal.....
jb
Addendum:
Logging out of gnome and logging in (to a "crashed" state) makes CPU load go to
near 0....here is the proc file:
[root@coil sbin]# ./bugzillascript
1 copy_page_range 0.0020
1 d_alloc 0.0025
1 __generic_copy_to_user 0.0125
1 get_user_pages 0.0020
1 ip_route_input 0.0020
1 iput 0.0014
1 IRQ0x10_interrupt 0.0833
1 kfree 0.0052
1 __kmem_cache_alloc 0.0033
1 kmem_cache_free 0.0069
1 link_path_walk 0.0005
1 netif_rx 0.0021
1 new_inode 0.0089
1 set_ioapic_affinity 0.0057
1 smp_apic_timer_interrupt 0.0045
1 sock_ioctl 0.0078
1 sys_write 0.0031
1 write_profile 0.0063
2 proc_pid_stat 0.0019
3 atomic_dec_and_lock 0.0435
4 scheduler_tick 0.0056
7 apic_timer_interrupt 0.2917
15 statm_pte_range 0.0455
20436 default_idle 255.4500
20485 total 0.0141
Hope this helps...it definitely is NOT video card/driver related...it is GNOME
related in the RedHat modification....Gnome on other linux brands does not do
this (same hardware). Mandrake 9 is at home.....
Addendum 2:
After logging into WindowMaker, if I log out and log into Gnome....it half opens
(RH menu fails to work, two terms to work, no desktop icons come up....most of
the menu bar is missing....)
Here is the procfile:
[root@coil sbin]# ./bugzillascript
1 add_timer 0.0104
1 atomic_dec_and_lock 0.0145
1 bh_action 0.0078
1 blkdev_release_request 0.0089
1 __brelse 0.0312
1 __constant_c_and_count_memset 0.0063
1 do_munmap 0.0013
1 do_no_page 0.0015
1 do_page_fault 0.0008
1 do_readv_writev 0.0014
1 do_syslog 0.0011
1 do_zap_page_range 0.0026
1 fget 0.0156
1 __find_get_page 0.0125
1 __find_lock_page 0.0208
1 flush_signal_handlers 0.0125
1 free_one_pmd 0.0048
1 __free_pte 0.0089
1 generic_plug_device 0.0089
1 get_empty_filp 0.0031
1 get_unmapped_area 0.0033
1 get_unused_buffer_head 0.0057
1 ide_do_request 0.0021
1 IRQ0x0e_interrupt 0.0833
1 link_path_walk 0.0005
1 lru_cache_add 0.0057
1 __make_request 0.0006
1 move 0.0069
1 neigh_lookup 0.0045
1 page_add_rmap 0.0063
1 page_remove_rmap 0.0045
1 path_release 0.0156
1 pte_chain_free 0.0089
1 remove_wait_queue 0.0312
1 ret_from_sys_call 0.0588
1 skb_recv_datagram 0.0042
1 strncpy_from_user 0.0089
1 submit_bh 0.0078
1 sys_read 0.0031
1 sys_setsid 0.0078
1 tcp_v4_init_sock 0.0042
1 .text.lock.locks 0.0052
1 try_to_wake_up 0.0023
1 unix_write_space 0.0069
1 vsnprintf 0.0009
1 wake_up_forked_process 0.0033
1 write_profile 0.0063
1 zap_pte_range 0.0019
2 __constant_memcpy 0.0074
2 do_sigaction 0.0057
2 find_vma 0.0208
2 fsync_buffers_list 0.0036
2 __generic_copy_to_user 0.0250
2 generic_file_write 0.0009
2 kunmap_high 0.0156
2 netif_receive_skb 0.0037
2 pte_chain_alloc 0.0208
2 schedule 0.0027
2 update_one_process 0.0069
3 apic_timer_interrupt 0.1250
3 del_timer 0.0312
3 handle_IRQ_event 0.0187
4 d_lookup 0.0132
4 do_anonymous_page 0.0109
4 file_read_actor 0.0156
4 system_call 0.0714
4 unlock_buffer 0.0500
5 get_hash_table 0.0347
5 start_request 0.0082
6 ide_dmaproc 0.0069
7 __constant_c_and_count_memset 0.0437
7 ide_wait_stat 0.0230
8 page_fault 0.6667
16 do_rw_disk 0.0095
22 ksoftirqd 0.0764
914 do_softirq 4.0804
2200 tasklet_hi_action 13.7500
2790 .text.lock.dev 6.8889
3044 deliver_to_old_ones 14.6346
3127 __rdtsc_delay 97.7188
7913 default_idle 98.9125
20163 total 0.0139
That is "it" for me today....hope this data helps you guys figure this one out...
Jeff
Ok, I lied.... I did a net search on _rdtsc_delay and noticed it had to do with audio. I also noted that on gnome and not on windowmaker and applet called CDPlayer 2.01 was on my menu....installed in 7.3 and all was well then. Well, I removed this applet from Gnome 5 hours ago and no crash.... Please try this in the RH official office...on an SMP xeon machine if you have one..... Anyway, I will let it run over the weekend and see what happens....this might be the culprit.... jb The crash is back....not luck....it is a kkernel thing, random, and bad. Odd that NO information is coming out of RH.....is this fixed in 9.0? jb could you try to rename the esd binary so that it doesn't auto-start? sometimes it seems esd is causing very bad behavior Ok, rebooting after chaning esd's name. Following the esd bug led me back to bugzilla, and your kernels.....any chance they would fix this? i.e.: http://people.redhat.com/arjanv/testkernels/i686/*smp* Thanks. Jeff PS: I have the latest kernel.....as of 3/24/03. jb Well, for other reason, I decided to put kde 3.1 on my box via apt-get (the rpm out there....). No problem getting KDE. Funny, the whole of my box stopped crashing even in gnome ....so, I have no idea what was wrong, esd was not at fault....., but it seems to be gone now x 5 days. Perhaps rh9 with kde 3.1 (right?) will have my issue fixed by default... I would leave this bug as solved via kde upgrade with no known direct cause. Jeff >System has no NVIDIA software on board now. Just XiG platinum dx drivers....so >x86free is all changed...same bug......making in more likely to be a gnome >issue. We don't support _ANY_ 3rd party drivers. We support only the drivers which we ship with XFree86. As has been stated several times, this issue is not something we will support in any way, as we do not support 3rd party kernel modules or XFree86 drivers. Closing (for the 4th or so time) as a duplicate of bug #73733 *** This bug has been marked as a duplicate of 73733 *** Changed to 'CLOSED' state since 'RESOLVED' has been deprecated. |
From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; Linux i686; U;) Gecko/20020830 Description of problem: ksoftirqd takes over the whole machine, filling up each cpu with load....it happens after being idle overnight, and can only be fixed with a reboot. It can skip nights. Machine is a dual Xeon 2.4, 4gb ram, dual 120gb disks, nvidia quaddro4900xgl video system. Here is a paste from "top": [jbuchsba@coil ~]$ top 9:00am up 4 days, 14:44, 1 user, load average: 17.51, 19.60, 19.82 207 processes: 192 sleeping, 2 running, 13 zombie, 0 stopped CPU0 states: 0.3% user, 56.2% system, 0.0% nice, 43.0% idle CPU1 states: 0.5% user, 54.1% system, 0.0% nice, 44.5% idle CPU2 states: 0.2% user, 68.0% system, 0.0% nice, 31.3% idle CPU3 states: 0.3% user, 71.2% system, 0.0% nice, 27.5% idle Mem: 3356868K av, 2139256K used, 1217612K free, 0K shrd, 295188K buff Swap: 2097136K av, 0K used, 2097136K free 1423700K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 9 root 39 19 0 0 0 SWN 34.8 0.0 622:11 ksoftirqd_CPU2 10 root 39 19 0 0 0 SWN 34.6 0.0 625:48 ksoftirqd_CPU3 10158 jbuchsba 15 0 17104 15M 12496 S 1.3 0.4 75:25 gnomemeeting 24 root 15 0 0 0 0 SW 1.1 0.0 39:13 kjournald 2376 root 6 -10 305M 48M 10592 S < 1.1 1.4 64:17 X 28321 jbuchsba 16 0 9728 9724 7120 S 1.1 0.2 2:10 gnome-terminal 848 root 15 0 540 540 460 D 1.0 0.0 25:54 syslogd 31086 jbuchsba 15 0 1152 1152 784 R 1.0 0.0 0:45 top 6927 jbuchsba 15 0 5952 5948 5016 D 0.5 0.1 33:20 magicdev 6993 jbuchsba 15 0 7908 7904 6464 S 0.3 0.2 8:35 multiload-apple 1515 root 15 0 8920 8920 8780 S 0.1 0.2 0:09 httpd 6925 jbuchsba 15 0 12172 11M 8820 S 0.1 0.3 4:16 gnome-panel 1 root 15 0 476 476 424 S 0.0 0.0 0:07 init 2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0 3 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU1 4 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU2 5 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU3 Version-Release number of selected component (if applicable): 2.4.2.4.18-17.8.0smp #1 SMP Tue Oct 8 12:39:01 EDT 2002 i686 i686 i386 GNU/Linux 18-17.8.0smp #1 SMP Tue Oct 8 12:39:01 EDT 2002 i686 i686 i386 GNU/Linux How reproducible: Always Steps to Reproduce: 1.Boot machine. 2.Leave overnight. 3.See screensavers pause on returrn the next day...like a kiccup. Actual Results: Slow machine with jerky I/O. Expected Results: Super fast machine. Additional info: This is a bug seen on kernel bug mailing lists. RH, you need to fix this and put a new kernel out! I also have a huge memory leaak....not sure if Gnome terminal or other device.....