From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; Linux i686; U;) Gecko/20020830 Description of problem: ksoftirqd takes over the whole machine, filling up each cpu with load....it happens after being idle overnight, and can only be fixed with a reboot. It can skip nights. Machine is a dual Xeon 2.4, 4gb ram, dual 120gb disks, nvidia quaddro4900xgl video system. Here is a paste from "top": [jbuchsba@coil ~]$ top 9:00am up 4 days, 14:44, 1 user, load average: 17.51, 19.60, 19.82 207 processes: 192 sleeping, 2 running, 13 zombie, 0 stopped CPU0 states: 0.3% user, 56.2% system, 0.0% nice, 43.0% idle CPU1 states: 0.5% user, 54.1% system, 0.0% nice, 44.5% idle CPU2 states: 0.2% user, 68.0% system, 0.0% nice, 31.3% idle CPU3 states: 0.3% user, 71.2% system, 0.0% nice, 27.5% idle Mem: 3356868K av, 2139256K used, 1217612K free, 0K shrd, 295188K buff Swap: 2097136K av, 0K used, 2097136K free 1423700K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 9 root 39 19 0 0 0 SWN 34.8 0.0 622:11 ksoftirqd_CPU2 10 root 39 19 0 0 0 SWN 34.6 0.0 625:48 ksoftirqd_CPU3 10158 jbuchsba 15 0 17104 15M 12496 S 1.3 0.4 75:25 gnomemeeting 24 root 15 0 0 0 0 SW 1.1 0.0 39:13 kjournald 2376 root 6 -10 305M 48M 10592 S < 1.1 1.4 64:17 X 28321 jbuchsba 16 0 9728 9724 7120 S 1.1 0.2 2:10 gnome-terminal 848 root 15 0 540 540 460 D 1.0 0.0 25:54 syslogd 31086 jbuchsba 15 0 1152 1152 784 R 1.0 0.0 0:45 top 6927 jbuchsba 15 0 5952 5948 5016 D 0.5 0.1 33:20 magicdev 6993 jbuchsba 15 0 7908 7904 6464 S 0.3 0.2 8:35 multiload-apple 1515 root 15 0 8920 8920 8780 S 0.1 0.2 0:09 httpd 6925 jbuchsba 15 0 12172 11M 8820 S 0.1 0.3 4:16 gnome-panel 1 root 15 0 476 476 424 S 0.0 0.0 0:07 init 2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0 3 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU1 4 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU2 5 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU3 Version-Release number of selected component (if applicable): 2.4.2.4.18-17.8.0smp #1 SMP Tue Oct 8 12:39:01 EDT 2002 i686 i686 i386 GNU/Linux 18-17.8.0smp #1 SMP Tue Oct 8 12:39:01 EDT 2002 i686 i686 i386 GNU/Linux How reproducible: Always Steps to Reproduce: 1.Boot machine. 2.Leave overnight. 3.See screensavers pause on returrn the next day...like a kiccup. Actual Results: Slow machine with jerky I/O. Expected Results: Super fast machine. Additional info: This is a bug seen on kernel bug mailing lists. RH, you need to fix this and put a new kernel out! I also have a huge memory leaak....not sure if Gnome terminal or other device.....
do you have the nvidia binary only kernel modules loaded?
Yes, I have the nvidia drivers, but I built them from their rpm src file. jeff
ok here's the problem: ksoftirq uses cpu when you get a lot of interrupts it seems (3D) screensavers trigger this for you; which is probably a bug in the binary only nvidia driver. If you can see in /proc/interrupts that another device is causing interrupts AND you can reproduce this without the nvidia driver ever loaded, please reopen this bug. *** This bug has been marked as a duplicate of 73733 ***
Here is a paste of my /proc/interrupts AFTER a reboot (this keyboard/mouse I/O was driving me nuts!). Thanks. Perhaps NVIDIA will release their source to you someday???? jb PASTE: $ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 151043 149473 153043 145432 IO-APIC-edge timer 1: 35 35 36 37 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 3 1 2 0 IO-APIC-edge serial 8: 1 0 0 0 IO-APIC-edge rtc 12: 161 107 154 166 IO-APIC-edge PS/2 Mouse 14: 7233 7322 7287 6891 IO-APIC-edge ide0 15: 9847 9487 9876 9510 IO-APIC-edge ide1 16: 29220 28370 29065 26273 IO-APIC-level nvidia 18: 12713 12637 12737 12474 IO-APIC-level SB Live 19: 114 84 118 106 IO-APIC-level aic7xxx, usb-u hci 20: 4 4 4 4 IO-APIC-level aic7xxx 23: 21556 20963 21382 21074 IO-APIC-level usb-uhci, eth0 NMI: 0 0 0 0 LOC: 598883 598486 598873 598881 ERR: 0 MIS: 0
Very unlikely that they would do that.
A new version of the nvidia driver has the same problem, and other poeple report this bug with ethernet drivers, etc. So, I think the conclusion that it is the nvidia driver is not correct.... This is now happening on a dailly basis...and is a big, big problem for me and my work. It got MUCH worse with a recent a updfstab......more usb devices were brought into fstab (flash card reader and zip 250 to be precise). Pastes of /proc/interrupts: A fresh rebooted system (9AM): CPU0 CPU1 CPU2 CPU3 0: 218970 217810 215901 217999 IO-APIC-edge timer 1: 215 214 215 213 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 0 IO-APIC-edge rtc 12: 9975 10200 9906 10335 IO-APIC-edge PS/2 Mouse 14: 8070 8396 8386 8572 IO-APIC-edge ide0 15: 20151 19695 19402 19769 IO-APIC-edge ide1 16: 34475 33693 33058 33621 IO-APIC-level nvidia 18: 18857 18772 18662 18806 IO-APIC-level SB Live 19: 371 328 342 359 IO-APIC-level aic7xxx, usb-u hci 20: 4 4 4 4 IO-APIC-level aic7xxx 23: 32973 33308 32834 33029 IO-APIC-level usb-uhci, eth0 NMI: 0 0 0 0 LOC: 870392 870405 870399 870405 ERR: 0 MIS: 0 jb
The thing other people report is a orinoco_cs bug, we know about that. Please try this without the nvidia module loaded AT ALL. It appears your machine uses level interrupts for the irq nvidia uses... that normally requires a validly written driver.
Now, onee hour later....load is 18.22+,, ksoftirqd_CPU(0-3) are on the top of "top". New cat /proc/interrupts ~]$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 593823 603097 600104 593370 IO-APIC-edge timer 1: 395 385 390 385 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 0 IO-APIC-edge rtc 12: 17124 17104 16886 17210 IO-APIC-edge PS/2 Mouse 14: 105586 107111 106192 106488 IO-APIC-edge ide0 15: 43088 42864 42539 42595 IO-APIC-edge ide1 16: 97088 98183 98047 96355 IO-APIC-level nvidia 18: 53127 53994 53852 53255 IO-APIC-level SB Live 19: 371 328 342 359 IO-APIC-level aic7xxx, usb-uhci 20: 4 4 4 4 IO-APIC-level aic7xxx 23: 89931 92120 92063 90208 IO-APIC-level usb-uhci, eth0 NMI: 0 0 0 0 LOC: 2390236 2390249 2390243 2390248 ERR: 0 MIS: 0 [jbuchsba@coil ~]$ Please send me explicit email as to what and how I should proceed..... Thannks. jb
OK, I logged in remotely, did telinit 3 as root did an rmmod nvidia and within 1 minute load was 0.01 from 17.00+, so, I stand corrected and the bug is with nvidia.... ug. Jeff (*it looks like I'll be buying a new video card...)
Oddly, if I use windowmaker instead of the kde/gnome that comes with RedHat 8, all is well...times two days (100% chance of cpu load at 20....by that point in time). This problem did NOT exist in 7.3..only 8.0. I have not tried phoebe. openGL/nvidia is still loaded and now is fine....SO, the problem seems to be in the lap of the RH8 gui. Please comment. BTW, if I used noapic I had the load moved from ksoftirqd_CPUx to keventd/kjournald.....perhaps these/KDE via RH/ is/are broken. TIA! Jeff
Update on this after 2 days of uptime. NO HANG USING WINDOWMAKER...no other changes at all....making me think that the bug is NOT with the nvidia drivers but with the redhat implementation of the gui. I checked with friends and mandrake and suse (current versions) with kde and gnome do not hang on the same hardware.....so the issue is likely with the "bluewave" stuff.... Please advise on a time frame to examine this and to check of the other ksoftirqd issues on bugzilla are due to the same issue. Thanks. Jeff
> Please advise on a time frame to examine this and to check of the other > ksoftirqd issues on bugzilla are due to the same issue. any reports with nvidia kernel modules are ignored. it's not worth my time to investigate interaction issues with this module we don't have code for. *** This bug has been marked as a duplicate of 73733 ***
If this bug is JUST due to Nvidia, why does it ONLY occur with the redhat gnome/kde.....windowmaker is just fine? JB
window maker might just use a subset of the drivers features. Really, please stop reopening this. machines with the nvidia module loaded are not supported and we CAN'T fix the module. *** This bug has been marked as a duplicate of 73733 ***
Ok...so you don't want me to reopen it if I use nvidia.... so I remove all the nvidia stuff and installed the XiG DX Platinum drivers (the best...way better than X86Free..sorry). Same problem. Freezes and problems in Gnome. NONE in windowmaker. The problem is clearly with the code by RedHat....please re-open and look at the problem. Jeff
ok so I need cat /proc/interrupts about 2 seconds from eachother WHEN THE PROBLEM IS HAPPENING. In addition it'll be useful to enable kernel profiling ("nmi_watchdog=1 profile=1" on the kernel commandline) and then readprofile -r sleep 10 readprofile -m /boot/System.map | sort -n to show a list of functions where the kernel spends it's time
So, right away a crash in the screensaver boxed (just trying to do this by running the module manually in demo mode cannot get the crash...I tried for an hour). With the flags, the machine is now frozen (on the x11 console) solid. I could telnet in and find a poorly responsive, but alive, machine: w got: 11:56am up 4 days, 2:48, 2 users, load average: 13.48, 13.75, 13.87 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT jbuchsba :0 - 8:03am ? 0.00s ? - jeffb pts/3 slab 11:06am 0.00s 0.07s 0.02s w Doing cat /proc/interrupts with about 2 seconds between hitting return got: [jeffb@coil jeffb]$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 45139083 45476391 43976219 45996759 IO-APIC-edge timer 1: 760 760 763 754 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 0 IO-APIC-edge rtc 12: 36206 36630 36252 36099 IO-APIC-edge PS/2 Mouse 14: 445461 278205 97800 513978 IO-APIC-edge ide0 15: 72197 246626 45307 249435 IO-APIC-edge ide1 18: 4130887 4161548 4024310 4209429 IO-APIC-level SB Live 19: 248791 315531 21520 417173 IO-APIC-level aic7xxx, usb-uhci 20: 4 4 4 4 IO-APIC-level aic7xxx 23: 6183082 6264429 5993566 6354479 IO-APIC-level usb-uhci, eth0 NMI: 180588150 180588150 180588150 180588150 LOC: 180603531 180603523 180603536 180603536 ERR: 0 MIS: 1 [jeffb@coil jeffb]$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 45140372 45476391 43976219 45997009 IO-APIC-edge timer 1: 760 760 763 754 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 8: 1 0 0 0 IO-APIC-edge rtc 12: 36206 36630 36252 36099 IO-APIC-edge PS/2 Mouse 14: 445775 278205 97800 514276 IO-APIC-edge ide0 15: 72197 246626 45307 249435 IO-APIC-edge ide1 18: 4131007 4161548 4024310 4209450 IO-APIC-level SB Live 19: 249044 315531 21520 417220 IO-APIC-level aic7xxx, usb-uhci 20: 4 4 4 4 IO-APIC-level aic7xxx 23: 6183286 6264429 5993566 6354513 IO-APIC-level usb-uhci, eth0 NMI: 180589689 180589689 180589689 180589689 LOC: 180605070 180605063 180605076 180605075 ERR: 0 MIS: 1 I am not sure what I need to do to get the above readprofile to work....I could not find readprofile via which as root or as a user...is that a boot command? (no, I am not a grub guru... :-) ) Please let me know what else I can do to help get information up...the machine is crashed right now (always < 2 hours after loading gnome). jeff
Clarification: System has no NVIDIA software on board now. Just XiG platinum dx drivers....so x86free is all changed...same bug......making in more likely to be a gnome issue. The graphic card was swapped out as well....quadro4xgl900 was changed to an ATI fire gl 8800. Of note, the bug again is absent in windowmaker. JB
Per the request. Jeff [root@coil /]# cd /usr/sbin [root@coil sbin]# ./readprofile -r [root@coil sbin]# sleep 10 [root@coil sbin]# readprofile -m /boot/System.map | sort -n bash: readprofile: command not found [root@coil sbin]# ./readprofile -m /boot/System.map | sort -n 1 add_blkdev_randomness 0.0104 1 add_timer 0.0104 1 __alloc_pages 0.0014 1 atomic_dec_and_lock 0.0145 1 __block_commit_write 0.0048 1 buffer_insert_inode_queue 0.0104 1 call_reschedule_interrupt 0.0909 1 clear_page_tables 0.0104 1 collect_signal 0.0039 1 __constant_memcpy 0.0037 1 copy_strings 0.0016 1 del_timer 0.0104 1 disk_round_stats 0.0156 1 do_gettimeofday 0.0078 1 do_mmap_pgoff 0.0006 1 do_select 0.0017 1 emit_log_char 0.0089 1 end_buffer_io_sync 0.0208 1 eth_type_trans 0.0052 1 exit_mmap 0.0027 1 fd_install 0.0125 1 fib_lookup 0.0031 1 filemap_fdatawait 0.0042 1 file_read_actor 0.0039 1 __find_lock_page_helper 0.0057 1 find_snap_client 0.0125 1 fn_hash_lookup 0.0037 1 __generic_copy_from_user 0.0089 1 generic_plug_device 0.0089 1 generic_unplug_device 0.0125 1 get_vm_area 0.0045 1 __global_save_flags 0.0104 1 handle_IRQ_event 0.0063 1 handle_mm_fault 0.0030 1 handle_stop_signal 0.0063 1 idle_cpu 0.0312 1 internal_add_timer 0.0057 1 interruptible_sleep_on 0.0078 1 ip_route_input_slow 0.0004 1 IRQ0x17_interrupt 0.0833 1 kmap_high 0.0104 1 kstat_read_proc 0.0009 1 locate_hd_struct 0.0069 1 lock_vma_mappings 0.0208 1 may_open 0.0031 1 new_inode 0.0089 1 page_cache_read 0.0039 1 proc_info_read 0.0033 1 proc_pid_lookup 0.0019 1 proc_pid_statm 0.0023 1 __read_lock_failed 0.0500 1 release_console_sem 0.0057 1 remove_wait_queue 0.0312 1 run_timer_list 0.0025 1 send_sig_info 0.0045 1 setup_frame 0.0019 1 setup_sigcontext 0.0033 1 smp_send_reschedule 0.0156 1 sockfd_lookup 0.0078 1 submit_bh 0.0078 1 supplemental_group_member 0.0156 1 sys_select 0.0008 1 sys_sigreturn 0.0035 1 task_dumpable 0.0208 1 tcp_transmit_skb 0.0009 1 .text.lock.acct 0.0085 1 .text.lock.ioctl 0.0256 1 .text.lock.printk 0.0044 1 .text.lock.readdir 0.0097 1 tty_read 0.0031 1 unix_dgram_recvmsg 0.0028 1 unix_write_space 0.0069 1 update_wall_time_one_tick 0.0057 1 vfs_permission 0.0031 1 write_profile 0.0063 2 account_io_end 0.0250 2 batch_entropy_store 0.0114 2 blkdev_release_request 0.0179 2 __block_prepare_write 0.0024 2 __constant_c_and_count_memset 0.0125 2 __constant_memcpy 0.0074 2 do_anonymous_page 0.0054 2 do_check_pgt_cache 0.0096 2 do_page_fault 0.0016 2 do_zap_page_range 0.0052 2 d_rehash 0.0179 2 dup_mmap 0.0038 2 end_level_ioapic_irq 0.0057 2 fget 0.0312 2 __find_get_page 0.0250 2 fput 0.0063 2 __free_pages_ok 0.0024 2 get_gendisk 0.0312 2 get_unused_fd 0.0048 2 ide_do_request 0.0042 2 ide_error 0.0043 2 ide_set_handler 0.0125 2 inode_has_buffers 0.0312 2 iput 0.0028 2 IRQ0x13_interrupt 0.1667 2 kfree 0.0104 2 kmem_cache_free 0.0139 2 kunmap_high 0.0156 2 load_balance 0.0021 2 __make_request 0.0012 2 page_add_rmap 0.0125 2 rmqueue 0.0027 2 __switch_to 0.0078 2 sys_fsync 0.0096 2 .text.lock.locks 0.0104 2 unlock_page 0.0179 2 update_one_process 0.0069 2 __wait_on_buffer 0.0125 2 zap_pte_range 0.0039 3 __brelse 0.0938 3 do_IRQ 0.0099 3 do_signal 0.0044 3 do_syslog 0.0032 3 __free_pages 0.0938 3 generic_file_write 0.0014 3 mark_page_accessed 0.0208 3 page_remove_rmap 0.0134 3 pool_find_page 0.0375 3 real_lookup 0.0094 3 refile_buffer 0.0625 3 strnlen_user 0.0441 3 switch_mm 0.0093 3 __tasklet_hi_schedule 0.0312 3 vsnprintf 0.0027 4 copy_page_range 0.0081 4 d_alloc 0.0100 4 dput 0.0096 4 ide_intr 0.0100 4 IRQ0x0e_interrupt 0.3333 4 pte_chain_free 0.0357 5 ide_end_request 0.0240 5 link_path_walk 0.0026 5 proc_lookup 0.0223 5 proc_pid_stat 0.0047 5 try_to_wake_up 0.0116 6 bh_action 0.0469 6 __kmem_cache_alloc 0.0197 6 pci_pool_alloc 0.0156 6 reschedule_interrupt 0.2857 6 .text.lock.inode 0.0123 7 d_lookup 0.0230 7 .text.lock.namei 0.0059 8 page_fault 0.6667 8 __wake_up 0.0625 9 pci_pool_free 0.0331 9 set_ioapic_affinity 0.0511 10 pte_chain_alloc 0.1042 11 smp_apic_timer_interrupt 0.0491 11 start_request 0.0181 12 get_hash_table 0.0833 12 unlock_buffer 0.1500 15 .text.lock.sched 0.0285 17 number 0.0197 17 scheduler_tick 0.0236 19 ide_dma_intr 0.0913 29 ide_wait_stat 0.0954 35 invalidate_bdev 0.0875 39 apic_timer_interrupt 1.6250 55 ide_dmaproc 0.0637 95 do_rw_disk 0.0565 133 statm_pte_range 0.4030 213 ret_from_sys_call 12.5294 350 ksoftirqd 1.2153 611 restore_all 40.7333 3090 deliver_to_old_ones 14.8558 3955 schedule 5.2593 4212 sys_sched_yield 13.1625 7421 do_softirq 33.1295 9534 system_call 170.2500 9981 tasklet_hi_action 62.3813 18811 default_idle 235.1375 20767 __rdtsc_delay 648.9688 79780 total 0.0549
Just for completeness, without the crash/high load level...being logged into windowmaker...I get: [root@coil sbin]# ./readprofile -r [root@coil sbin]# sleep 10 [root@coil sbin]# ./readprofile -m /boot/System.map | sort -n 1 alloc_skb 0.0021 1 __constant_c_and_count_memset 0.0063 1 d_alloc 0.0025 1 d_instantiate 0.0104 1 do_wp_page 0.0012 1 follow_page 0.0078 1 fput 0.0031 1 get_empty_filp 0.0031 1 IRQ0x12_interrupt 0.0833 1 IRQ0x17_interrupt 0.0833 1 kstat_read_proc 0.0009 1 link_path_walk 0.0005 1 __mark_inode_dirty 0.0052 1 poll_freewait 0.0125 1 proc_lookup 0.0045 1 pty_chars_in_buffer 0.0125 1 pty_unthrottle 0.0089 1 reschedule_interrupt 0.0476 1 rmqueue 0.0014 1 set_ioapic_affinity 0.0057 1 sock_def_readable 0.0069 1 statm_pgd_range 0.0052 1 sys_close 0.0078 1 sys_select 0.0008 1 system_call 0.0179 1 udp_v4_mcast_deliver 0.0023 1 unix_ioctl 0.0048 1 vsnprintf 0.0009 1 __wake_up 0.0078 2 d_lookup 0.0066 2 load_balance 0.0021 2 netif_receive_skb 0.0037 2 proc_pid_make_inode 0.0104 2 write_profile 0.0125 3 atomic_dec_and_lock 0.0435 3 collect_sigign_sigcatch 0.0234 3 __kmem_cache_alloc 0.0099 3 proc_pid_statm 0.0069 4 number 0.0046 4 smp_apic_timer_interrupt 0.0179 6 fget 0.0938 7 proc_pid_stat 0.0066 9 scheduler_tick 0.0125 18 apic_timer_interrupt 0.7500 38 statm_pte_range 0.1152 60970 default_idle 762.1250 61107 total 0.0421 [root@coil sbin]# [jbuchsba@coil sbin]$ w 5:10pm up 2:37, 2 users, load average: 0.06, 0.08, 0.03 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT jbuchsba :0 - 5:07pm ? 0.00s ? - jbuchsba pts/1 - 5:07pm 3:39 1.37s 1.37s top [jbuchsba@coil sbin]$
Update.... Up to the current kernel for 8.0. Glibc 2.3.2x was updated/installed. Same thing... [root@coil sbin]# ./bugzillascript 1 account_io_start 0.0104 1 batch_entropy_store 0.0057 1 __brelse 0.0312 1 call_do_IRQ 0.0769 1 __constant_c_and_count_memset 0.0063 1 __constant_memcpy 0.0037 1 copy_page_range 0.0020 1 d_lookup 0.0033 1 do_fcntl 0.0013 1 do_no_page 0.0015 1 do_select 0.0017 1 fget 0.0156 1 filp_close 0.0048 1 __generic_copy_to_user 0.0125 1 generic_file_write 0.0005 1 get_gendisk 0.0156 1 handle_mm_fault 0.0030 1 ide_destroy_dmatable 0.0208 1 inode_has_buffers 0.0156 1 ip_check_mc 0.0156 1 kfree 0.0052 1 __kmem_cache_alloc 0.0033 1 mark_page_accessed 0.0069 1 new_inode 0.0089 1 number 0.0012 1 page_fault 0.0833 1 proc_file_lseek 0.0048 1 proc_lookup 0.0045 1 proc_pid_cmdline 0.0037 1 prune_dcache 0.0019 1 reschedule_interrupt 0.0476 1 set_page_dirty 0.0078 1 skb_release_data 0.0069 1 sleep_on 0.0078 1 switch_mm 0.0031 1 sys_rt_sigprocmask 0.0022 1 __tasklet_hi_schedule 0.0104 1 .text.lock.namei 0.0008 1 .text.lock.socket 0.0044 2 do_IRQ 0.0066 2 get_hash_table 0.0139 2 kunmap_high 0.0156 2 load_balance 0.0021 2 pte_chain_free 0.0179 2 schedule 0.0027 2 set_ioapic_affinity 0.0114 2 statm_pgd_range 0.0104 2 system_call 0.0357 2 .text.lock.inode 0.0041 2 write_profile 0.0125 3 restore_all 0.2000 3 smp_apic_timer_interrupt 0.0134 3 __wake_up 0.0234 4 ide_wait_stat 0.0132 4 kmap_high 0.0417 4 pte_chain_alloc 0.0417 4 .text.lock.ioctl 0.1026 4 unlock_buffer 0.0500 5 bh_action 0.0391 5 start_request 0.0082 6 apic_timer_interrupt 0.2500 7 ide_dmaproc 0.0081 7 scheduler_tick 0.0097 19 do_rw_disk 0.0113 19 statm_pte_range 0.0576 23 ksoftirqd 0.0799 1098 do_softirq 4.9018 3050 tasklet_hi_action 19.0625 3602 .text.lock.dev 8.8938 3889 deliver_to_old_ones 18.6971 4180 __rdtsc_delay 130.6250 7861 default_idle 98.2625 23861 total Please respond/post ideas about this...as I am loosing a lot of work because of this and might have to flee RedHat altogether. My mandrake box at home has none of this ...same set up, different obviously in sotware... I really want to support RHL and just paid for additional support yesterday...but I really think this is a big deal..... jb
Addendum: Logging out of gnome and logging in (to a "crashed" state) makes CPU load go to near 0....here is the proc file: [root@coil sbin]# ./bugzillascript 1 copy_page_range 0.0020 1 d_alloc 0.0025 1 __generic_copy_to_user 0.0125 1 get_user_pages 0.0020 1 ip_route_input 0.0020 1 iput 0.0014 1 IRQ0x10_interrupt 0.0833 1 kfree 0.0052 1 __kmem_cache_alloc 0.0033 1 kmem_cache_free 0.0069 1 link_path_walk 0.0005 1 netif_rx 0.0021 1 new_inode 0.0089 1 set_ioapic_affinity 0.0057 1 smp_apic_timer_interrupt 0.0045 1 sock_ioctl 0.0078 1 sys_write 0.0031 1 write_profile 0.0063 2 proc_pid_stat 0.0019 3 atomic_dec_and_lock 0.0435 4 scheduler_tick 0.0056 7 apic_timer_interrupt 0.2917 15 statm_pte_range 0.0455 20436 default_idle 255.4500 20485 total 0.0141 Hope this helps...it definitely is NOT video card/driver related...it is GNOME related in the RedHat modification....Gnome on other linux brands does not do this (same hardware). Mandrake 9 is at home.....
Addendum 2: After logging into WindowMaker, if I log out and log into Gnome....it half opens (RH menu fails to work, two terms to work, no desktop icons come up....most of the menu bar is missing....) Here is the procfile: [root@coil sbin]# ./bugzillascript 1 add_timer 0.0104 1 atomic_dec_and_lock 0.0145 1 bh_action 0.0078 1 blkdev_release_request 0.0089 1 __brelse 0.0312 1 __constant_c_and_count_memset 0.0063 1 do_munmap 0.0013 1 do_no_page 0.0015 1 do_page_fault 0.0008 1 do_readv_writev 0.0014 1 do_syslog 0.0011 1 do_zap_page_range 0.0026 1 fget 0.0156 1 __find_get_page 0.0125 1 __find_lock_page 0.0208 1 flush_signal_handlers 0.0125 1 free_one_pmd 0.0048 1 __free_pte 0.0089 1 generic_plug_device 0.0089 1 get_empty_filp 0.0031 1 get_unmapped_area 0.0033 1 get_unused_buffer_head 0.0057 1 ide_do_request 0.0021 1 IRQ0x0e_interrupt 0.0833 1 link_path_walk 0.0005 1 lru_cache_add 0.0057 1 __make_request 0.0006 1 move 0.0069 1 neigh_lookup 0.0045 1 page_add_rmap 0.0063 1 page_remove_rmap 0.0045 1 path_release 0.0156 1 pte_chain_free 0.0089 1 remove_wait_queue 0.0312 1 ret_from_sys_call 0.0588 1 skb_recv_datagram 0.0042 1 strncpy_from_user 0.0089 1 submit_bh 0.0078 1 sys_read 0.0031 1 sys_setsid 0.0078 1 tcp_v4_init_sock 0.0042 1 .text.lock.locks 0.0052 1 try_to_wake_up 0.0023 1 unix_write_space 0.0069 1 vsnprintf 0.0009 1 wake_up_forked_process 0.0033 1 write_profile 0.0063 1 zap_pte_range 0.0019 2 __constant_memcpy 0.0074 2 do_sigaction 0.0057 2 find_vma 0.0208 2 fsync_buffers_list 0.0036 2 __generic_copy_to_user 0.0250 2 generic_file_write 0.0009 2 kunmap_high 0.0156 2 netif_receive_skb 0.0037 2 pte_chain_alloc 0.0208 2 schedule 0.0027 2 update_one_process 0.0069 3 apic_timer_interrupt 0.1250 3 del_timer 0.0312 3 handle_IRQ_event 0.0187 4 d_lookup 0.0132 4 do_anonymous_page 0.0109 4 file_read_actor 0.0156 4 system_call 0.0714 4 unlock_buffer 0.0500 5 get_hash_table 0.0347 5 start_request 0.0082 6 ide_dmaproc 0.0069 7 __constant_c_and_count_memset 0.0437 7 ide_wait_stat 0.0230 8 page_fault 0.6667 16 do_rw_disk 0.0095 22 ksoftirqd 0.0764 914 do_softirq 4.0804 2200 tasklet_hi_action 13.7500 2790 .text.lock.dev 6.8889 3044 deliver_to_old_ones 14.6346 3127 __rdtsc_delay 97.7188 7913 default_idle 98.9125 20163 total 0.0139 That is "it" for me today....hope this data helps you guys figure this one out... Jeff
Ok, I lied.... I did a net search on _rdtsc_delay and noticed it had to do with audio. I also noted that on gnome and not on windowmaker and applet called CDPlayer 2.01 was on my menu....installed in 7.3 and all was well then. Well, I removed this applet from Gnome 5 hours ago and no crash.... Please try this in the RH official office...on an SMP xeon machine if you have one..... Anyway, I will let it run over the weekend and see what happens....this might be the culprit.... jb
The crash is back....not luck....it is a kkernel thing, random, and bad. Odd that NO information is coming out of RH.....is this fixed in 9.0? jb
could you try to rename the esd binary so that it doesn't auto-start? sometimes it seems esd is causing very bad behavior
Ok, rebooting after chaning esd's name. Following the esd bug led me back to bugzilla, and your kernels.....any chance they would fix this? i.e.: http://people.redhat.com/arjanv/testkernels/i686/*smp* Thanks. Jeff
PS: I have the latest kernel.....as of 3/24/03. jb
Well, for other reason, I decided to put kde 3.1 on my box via apt-get (the rpm out there....). No problem getting KDE. Funny, the whole of my box stopped crashing even in gnome ....so, I have no idea what was wrong, esd was not at fault....., but it seems to be gone now x 5 days. Perhaps rh9 with kde 3.1 (right?) will have my issue fixed by default... I would leave this bug as solved via kde upgrade with no known direct cause. Jeff
>System has no NVIDIA software on board now. Just XiG platinum dx drivers....so >x86free is all changed...same bug......making in more likely to be a gnome >issue. We don't support _ANY_ 3rd party drivers. We support only the drivers which we ship with XFree86. As has been stated several times, this issue is not something we will support in any way, as we do not support 3rd party kernel modules or XFree86 drivers. Closing (for the 4th or so time) as a duplicate of bug #73733
*** This bug has been marked as a duplicate of 73733 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.