This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 841539 - Kernel 3.4.x crash hard after 2-3 days
Kernel 3.4.x crash hard after 2-3 days
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
16
i686 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-19 06:47 EDT by Knudch
Modified: 2013-02-21 10:42 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-30 09:00:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Knudch 2012-07-19 06:47:20 EDT
Hi

After an update from kernel-PAE.i686 3.3.8-1.fc16 to 3.4.2-1 or 3.4.4-4 the system hard locks after running for 2-3 days.
Power cycling or hard reset is needed.
No logs shows anything "interesting" just at a certain time no more entries(crash time)
Until this update it has be up for months.
Returning to kernel 3.3.8-1 makes it run as before.


How do I provide more info ?

Knud


System:
Board = Intel DB65AL, B65 chipset
CPU = Celeron G530
8G memory
2*500G disk as raid 1

Run as home server, FTP, VNC, music server..etc
Comment 1 Josh Boyer 2012-07-19 09:22:21 EDT
When it locks up, does the machine respond to pings?  If so, can you ssh into it?

You might want to run kernel-debug for a while and see if some kind of oops or error messages are produced.
Comment 2 Knudch 2012-07-19 10:11:09 EDT
(In reply to comment #1)
> When it locks up, does the machine respond to pings?  If so, can you ssh
> into it?
> 
> You might want to run kernel-debug for a while and see if some kind of oops
> or error messages are produced.

No it is total dead, no ping, keyb and video(black) dead, HD LED show no activities.

Run the kernel-debug, just use fx. kernel-PAEdebug.i686 3.4.4-4.fc16 ?

Should i enable any special log functions ?

I must admit that I am not very expirenced in using kernel-debugging at least in Linux.
Comment 3 Knudch 2012-07-20 02:28:59 EDT
(In reply to comment #1)
> When it locks up, does the machine respond to pings?  If so, can you ssh
> into it?
> 
> You might want to run kernel-debug for a while and see if some kind of oops
> or error messages are produced.

With kernel-PAEdebug.i686 3.4.4-4.fc16:
After 10,5 hours the machine stopped working(crashed/locked up)
messages shows last entry after 7 hours (normal entries).
dnsmasq.log shows last activty (normal entries every few minutes) after 10.5 hours

That did not give more information except that the crash happend after a shorter period of time....the other 5 times it crashed after 2-3 days.
Comment 4 Knudch 2012-07-22 03:24:35 EDT
(In reply to comment #1)
> When it locks up, does the machine respond to pings?  If so, can you ssh
> into it?
> 
> You might want to run kernel-debug for a while and see if some kind of oops
> or error messages are produced.

Having kernel-PAEdebug running, kdump installed and monitored dmesg contionuesly for 23 hours having no freeze but a oops is orcurred after 15 hours.

I have a log file from dmesg and a lot files in /var/spool/abrt/oops.....

This snip is from dmesg

[  505.088532] TCP: lp registered
[54170.071031] md: data-check of RAID array md0
[54170.071036] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[54170.071039] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[54170.071044] md: using 128k window, over a total of 487422840k.
[54584.565937] BUG: unable to handle kernel paging request at 6b6b6ba3
[54584.567785] IP: [<f7f111dc>] sync_request+0x73c/0xc10 [raid1]
[54584.568712] *pdpt = 0000000027c5d001 *pde = 0000000000000000 
[54584.569630] Oops: 0000 [#1] SMP 
[54584.570532] Modules linked in: tcp_lp fuse tpm_bios w83627ehf hwmon_vid snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_timer snd soundcore iTCO_wdt iTCO_vendor_support snd_page_alloc coretemp microcode i2c_i801 serio_raw uinput nfsd lockd nfs_acl auth_rpcgss sunrpc raid1 crc32c_intel usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[54584.572768] 
[54584.573868] Pid: 5201, comm: md0_resync Not tainted 3.4.4-4.fc16.i686.PAEdebug #1                  /DB65AL
[54584.575029] EIP: 0060:[<f7f111dc>] EFLAGS: 00010202 CPU: 0
[54584.576187] EIP is at sync_request+0x73c/0xc10 [raid1]
[54584.577355] EAX: 6b6b6b6b EBX: 00000002 ECX: e68b5640 EDX: 00000002
[54584.578536] ESI: dfd8b880 EDI: 00000000 EBP: e074be40 ESP: e074bdd0
[54584.579724]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[54584.580916] CR0: 8005003b CR2: 6b6b6ba3 CR3: 27cae000 CR4: 000407f0
[54584.582132] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[54584.583364] DR6: ffff0ff0 DR7: 00000400
[54584.584584] Process md0_resync (pid: 5201, ti=e074a000 task=e68b5640 task.ti=e074a000)
[54584.585852] Stack:
[54584.587137]  00000000 00000000 00000000 f7f10b45 00000080 ffffffff 00000000 00000002
[54584.588513]  00000000 00000000 020ddaf8 00000000 00000080 00000000 ee51ce30 020ddb00
[54584.589851]  00000000 dfd8b880 f63c5cb0 00001000 05f22500 00000000 ffffffff ffffffff
[54584.591135] Call Trace:
[54584.592372]  [<f7f10b45>] ? sync_request+0xa5/0xc10 [raid1]
[54584.593603]  [<c0873177>] md_do_sync+0xb97/0x1250
[54584.594842]  [<f7f10aa0>] ? raid1_congested+0x40/0x40 [raid1]
[54584.596102]  [<c04b624b>] ? trace_hardirqs_on+0xb/0x10
[54584.597376]  [<c09f9787>] ? _raw_spin_unlock_irq+0x27/0x40
[54584.598664]  [<c086c35d>] md_thread+0xed/0x120
[54584.599966]  [<c047f54e>] ? complete+0x4e/0x60
[54584.601277]  [<c086c270>] ? md_unregister_thread+0x80/0x80
[54584.602609]  [<c04746cd>] kthread+0x7d/0x90
[54584.603923]  [<c0474650>] ? __init_kthread_worker+0x60/0x60
[54584.605233]  [<c0a02042>] kernel_thread_helper+0x6/0x10
[54584.606530] Code: 8b 44 24 44 8b 74 24 30 8b 7c 24 34 89 18 8b 5c 24 48 8b 53 08 8d 04 12 85 c0 0f 8e 9b fd ff ff 8b 74 24 44 31 db 90 8b 44 9e 34 <81> 78 38 e0 d2 f0 f7 0f 84 37 03 00 00 83 c3 01 8d 04 12 39 d8 
[54584.608108] EIP: [<f7f111dc>] sync_request+0x73c/0xc10 [raid1] SS:ESP 0068:e074bdd0
[54584.609592] CR2: 000000006b6b6ba3
[54584.624383] ---[ end trace 8eb8359bfd21f2ba ]---


Which shall  I attach ?

Knud
Comment 5 Knudch 2012-09-11 16:31:57 EDT
With kernel 3.4.9-1.fc16.i686.PAE the system has now been up for 16 days without any problems.

I believe the BUG has gone...whatever has been changed

BUG report should be closed

Knud
Comment 6 Knudch 2012-09-29 18:12:50 EDT
Bug seems not be dissapeared

Kernel 3.4.9-1.fc16.i686.PAE crashed first time after 26 days
then after 4 days, then after 3 days

System unmodified in the entire period

Knud
Comment 7 Dave Jones 2012-10-23 11:31:40 EDT
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
Comment 8 Knudch 2012-10-29 12:01:13 EDT
Installed kernel 3.6.2-1.PAE (system otherwise unchanged since bug was first reported.
Reboot after 19 hours, after that it has now run for 5 days without problem.
Comment 9 Josh Boyer 2012-10-30 09:00:51 EDT
Thanks for letting us know.
Comment 10 Knudch 2012-11-01 02:22:54 EDT
After another 4,5 days kernel 3.6.2-1.pae was crashed hardlock eg. HW reset needed

Is returning to old kernel 3.3.8-1 which seems stable, eg can run for month's, never had an issue with that kernel.
Comment 11 Knudch 2013-02-21 10:34:21 EST
I have just tested kernel 3.6.11-4

Crashed ( eg. hard locked ) after 12 days.

returning to 3.3.8-1 which still run for months with no problems

Knud


PS
My access to the bugzilla system has been cluttered.
Some how my user name/password was invalidated but forgotten Password togehter with my emailadress worked...have no clue why.
Comment 12 Dave Jones 2013-02-21 10:38:59 EST
As F16 reached end of life last week, there will be no further updates for this release.
Comment 13 Knudch 2013-02-21 10:42:37 EST
I am aware that

But is was given an information in case someone has expirenced similar behavior.
Knud

Note You need to log in before you can comment on or make changes to this bug.