Bug 590403 - /etc/cron.weekly/99-raid-check and kernel 2.6.18-194.3.1.el5 (crash)
Summary: /etc/cron.weekly/99-raid-check and kernel 2.6.18-194.3.1.el5 (crash)
Keywords:
Status: CLOSED DUPLICATE of bug 573106
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Doug Ledford
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-09 11:06 UTC by GV
Modified: 2011-06-21 15:05 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-21 15:05:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kernel.txt (20.43 KB, text/plain)
2010-05-09 11:08 UTC, GV
no flags Details
Kernel log (22.35 KB, application/octet-stream)
2010-05-28 08:43 UTC, Simon Matter
no flags Details
Kernel log (22.35 KB, text/plain)
2010-05-28 08:45 UTC, Simon Matter
no flags Details

Description GV 2010-05-09 11:06:54 UTC
Description of problem:
kernel crash

Version-Release number of selected component (if applicable):
kernel 2.6.18-194.3.1.el5

Additional info:
See attached kernel.txt.

Comment 1 GV 2010-05-09 11:08:46 UTC
Created attachment 412630 [details]
kernel.txt

Comment 2 Simon Matter 2010-05-28 08:42:22 UTC
While investigating an issue with a hung server in a remote office, I saw the same  or similar problem in my logs. Maybe the server hang is also related but I couldn't investigate much.

I was surprised to see that most of our servers running software raid show the same behaviour.

There is also something I recall now, while a resync was running on one of our servers it became almost unusable because disk access became very slow. I think it could also be related.

Comment 3 Simon Matter 2010-05-28 08:43:26 UTC
Created attachment 417517 [details]
Kernel log

Comment 4 Simon Matter 2010-05-28 08:45:42 UTC
Created attachment 417518 [details]
Kernel log

Try to make it a text file

Comment 5 GV 2010-05-31 11:25:12 UTC
Ping! Is anyone from redhat working on this?

All my servers (7 with raid 1, one with raid 5]) have the same problem (all running kernel 2.6.18-194.3.1.el5).

Comment 6 Simon Matter 2010-05-31 14:36:33 UTC
It seems that I'm seeing it only on servers having multiple disks where more than one resync runs in parallel.

Comment 7 Doug Ledford 2010-05-31 16:30:43 UTC
The resyncs complete, correct?  At least that's what the kernel log looks like.  There doesn't appear to be any crash at all, just a mistaken kernel log about hung processes, but as it turns out they are hung for a reason (waiting for other resync processes to complete) and as such don't constitute a problem, just a false positive on the hung process test.

Comment 8 Doug Ledford 2010-05-31 16:31:37 UTC
As such, these false positives on the hung process test are a low priority item as they are mainly a cosmetic fix.

Comment 9 GV 2010-05-31 16:57:12 UTC
(In reply to comment #6)
> It seems that I'm seeing it only on servers having multiple disks where more
> than one resync runs in parallel.    
I always have 2 resyncs running in parallel.

Comment 10 GV 2010-05-31 17:03:37 UTC
(In reply to comment #7)
> The resyncs complete, correct?
Yes, if I should trust /proc/mdstat. But how can I be sure of this without removing a disk (and I cannot do this)?

>  There doesn't appear to be any crash at all, just a mistaken kernel log about
> hung processes, but as it turns out they are hung for a reason (waiting for
> other resync processes to complete) and as such don't constitute a problem,
> just a false positive on the hung process test.    

This seems like a info message:
May  9 04:22:08 xyz kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
May  9 04:22:08 xyz kernel: md: using 128k window, over a total of 219319232 blocks.
May  9 04:22:08 xyz kernel: RAID1 conf printout:
May  9 04:22:08 xyz kernel:  --- wd:2 rd:2
May  9 04:22:08 xyz kernel:  disk 0, wo:0, o:1, dev:sda1
May  9 04:22:08 xyz kernel:  disk 1, wo:0, o:1, dev:sdc1
May  9 04:24:54 xyz kernel: INFO: task md2_resync:13861 blocked for more than 120 seconds.
May  9 04:24:54 xyz kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

And this seems to me like a crash:

May  9 04:24:54 xyz kernel: md2_resync    D ffff81007f6407e0     0 13861     27         13870  9331 (L-TLB)
May  9 04:24:54 xyz kernel:  ffff8100439e1d70 0000000000000046 ffff81005bd0a3c0 ffff81007fbb460c
May  9 04:24:54 xyz kernel:  ffff81007fbb4c0c 000000000000000a ffff810004130820 ffff81007f6407e0
May  9 04:24:54 xyz kernel:  000087dfdbb9592b 0000000000000b81 ffff810004130a08 000000018008b4b1
May  9 04:24:54 xyz kernel: Call Trace:
May  9 04:24:54 xyz kernel:  [<ffffffff800a08a6>] keventd_create_kthread+0x0/0xc4
May  9 04:24:54 xyz kernel:  [<ffffffff8021ae50>] md_do_sync+0x1d8/0x833
May  9 04:24:54 xyz kernel:  [<ffffffff8008ca21>] enqueue_task+0x41/0x56
May  9 04:24:54 xyz kernel:  [<ffffffff8008ca8c>] __activate_task+0x56/0x6d
May  9 04:24:54 xyz kernel:  [<ffffffff8008c871>] dequeue_task+0x18/0x37
May  9 04:24:54 xyz kernel:  [<ffffffff80062ff8>] thread_return+0x62/0xfe
May  9 04:24:55 xyz kernel:  [<ffffffff800a0abe>] autoremove_wake_function+0x0/0x2e
May  9 04:24:55 xyz kernel:  [<ffffffff800a08a6>] keventd_create_kthread+0x0/0xc4
May  9 04:24:56 xyz kernel:  [<ffffffff8021b824>] md_thread+0xf8/0x10e
May  9 04:24:56 xyz kernel:  [<ffffffff800a08a6>] keventd_create_kthread+0x0/0xc4
May  9 04:24:56 xyz kernel:  [<ffffffff8021b72c>] md_thread+0x0/0x10e
May  9 04:24:56 xyz kernel:  [<ffffffff80032894>] kthread+0xfe/0x132
May  9 04:24:56 xyz kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
May  9 04:24:56 xyz kernel:  [<ffffffff800a08a6>] keventd_create_kthread+0x0/0xc4
May  9 04:24:56 xyz kernel:  [<ffffffff80032796>] kthread+0x0/0x132
May  9 04:24:56 xyz kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11

Comment 11 Simon Matter 2010-06-01 04:51:51 UTC
@Doug I'm aware that the resync has finished on my systems and that the crash I have seen on one of them may have nothing to do with it. The time of the crash was at 3:30 in the morning on a friday which is for sure not the time any resync was run. It was just because of this crash I was looking closer to the kernel messages.

@Gabriel did you see any problems after above logs showed up? In my case I have never seen a real crash related to the resync. What I have seen is a system becoming _very_ slow and unresposive while the resyncs were running. I know how syncing arrays usually feel but in that case it was much worse and I just felt something is not quite right here.

Comment 12 Doug Ledford 2010-06-01 16:33:18 UTC
(In reply to comment #10)
> (In reply to comment #7)
> > The resyncs complete, correct?
> Yes, if I should trust /proc/mdstat. But how can I be sure of this without
> removing a disk (and I cannot do this)?

If /proc/mdstat says that the resync has completed, then it has.  The contents of /proc/mdstat are not stored anywhere, they are generated every time you cat the file by looking at the in memory copy of the superblock of every raid array on the system, and by reading the state of the drives belonging to every raid array on the system.  The contents of /proc/mdstat will never be different than the true state of the raid arrays on the system because it is read from those raid arrays in the system on each and every access of the file.

> >  There doesn't appear to be any crash at all, just a mistaken kernel log about
> > hung processes, but as it turns out they are hung for a reason (waiting for
> > other resync processes to complete) and as such don't constitute a problem,
> > just a false positive on the hung process test.    
> 
> This seems like a info message:
> May  9 04:22:08 xyz kernel: md: using maximum available idle IO bandwidth (but
> not more than 200000 KB/sec) for reconstruction.
> May  9 04:22:08 xyz kernel: md: using 128k window, over a total of 219319232
> blocks.
> May  9 04:22:08 xyz kernel: RAID1 conf printout:
> May  9 04:22:08 xyz kernel:  --- wd:2 rd:2
> May  9 04:22:08 xyz kernel:  disk 0, wo:0, o:1, dev:sda1
> May  9 04:22:08 xyz kernel:  disk 1, wo:0, o:1, dev:sdc1

The info about the raid array stops here.

> May  9 04:24:54 xyz kernel: INFO: task md2_resync:13861 blocked for more than
> 120 seconds.
> May  9 04:24:54 xyz kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.

This is unrelated to the above info about the raid array and is instead related to the below backtrace.  The above guide about echoing 0 to hung_task_timeout_secs to disable the message will disable everything starting from the INFO: line to the end of the following backtrace.

> And this seems to me like a crash:
> 
> May  9 04:24:54 xyz kernel: md2_resync    D ffff81007f6407e0     0 13861     27
>         13870  9331 (L-TLB)
> May  9 04:24:54 xyz kernel:  ffff8100439e1d70 0000000000000046 ffff81005bd0a3c0
> ffff81007fbb460c
> May  9 04:24:54 xyz kernel:  ffff81007fbb4c0c 000000000000000a ffff810004130820
> ffff81007f6407e0
> May  9 04:24:54 xyz kernel:  000087dfdbb9592b 0000000000000b81 ffff810004130a08
> 000000018008b4b1
> May  9 04:24:54 xyz kernel: Call Trace:
> May  9 04:24:54 xyz kernel:  [<ffffffff800a08a6>]
> keventd_create_kthread+0x0/0xc4
> May  9 04:24:54 xyz kernel:  [<ffffffff8021ae50>] md_do_sync+0x1d8/0x833
> May  9 04:24:54 xyz kernel:  [<ffffffff8008ca21>] enqueue_task+0x41/0x56
> May  9 04:24:54 xyz kernel:  [<ffffffff8008ca8c>] __activate_task+0x56/0x6d
> May  9 04:24:54 xyz kernel:  [<ffffffff8008c871>] dequeue_task+0x18/0x37
> May  9 04:24:54 xyz kernel:  [<ffffffff80062ff8>] thread_return+0x62/0xfe
> May  9 04:24:55 xyz kernel:  [<ffffffff800a0abe>]
> autoremove_wake_function+0x0/0x2e
> May  9 04:24:55 xyz kernel:  [<ffffffff800a08a6>]
> keventd_create_kthread+0x0/0xc4
> May  9 04:24:56 xyz kernel:  [<ffffffff8021b824>] md_thread+0xf8/0x10e
> May  9 04:24:56 xyz kernel:  [<ffffffff800a08a6>]
> keventd_create_kthread+0x0/0xc4
> May  9 04:24:56 xyz kernel:  [<ffffffff8021b72c>] md_thread+0x0/0x10e
> May  9 04:24:56 xyz kernel:  [<ffffffff80032894>] kthread+0xfe/0x132
> May  9 04:24:56 xyz kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
> May  9 04:24:56 xyz kernel:  [<ffffffff800a08a6>]
> keventd_create_kthread+0x0/0xc4
> May  9 04:24:56 xyz kernel:  [<ffffffff80032796>] kthread+0x0/0x132
> May  9 04:24:56 xyz kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11    

This is not a crash, it is a backtrace.  A backtrace is also used whenever a crash happens, but it is always preceded by the reason for the backtrace, and in a true crash the backtrace is preceded by a message about some sort of fatal error, such as a Segmentation violation (aka a segfault, fatal to the user space process but harmless to the kernel itself) or a kernel oops (general fatal kernel error, the oops is further defined by what type of fatal error such as a kernel segmentation violation, invalid instruction, etc).  In this case the reason for the backtrace is the INFO: hung task.  The backtrace itself is nothing more than a tracing tool used by various parts of the kernel to allow the end user to figure out both why a certain kernel message was generated and what process context the message was generated in so that the true cause can be tracked down.  So, in this case the kernel message iteslf told us the "why", which is that the process is supposedly hung, and the by what context is answered by the backtrace with the answer being the md2_resync process.  Other messages, from the md raid stack, have told us that the md2_resync process is being intentionally blocked so that it isn't running at the same time as other resync processes that need to access the same physical disk.  So the only real issue here is that the method of blocking the other processes that the md raid stack uses triggers a false positive on the hung process test.  A different method of blocking the other resync processes could avoid this false positive, but it is a purely cosmetic change.  Hence the low priority.

Comment 13 GV 2010-06-01 18:52:22 UTC
I do not want to offend you, but please do not try to explain to me what /proc or a backtrace is. I use linux since 1994 (programming too). And over the years I wrote some kernel modules (simple ones and not too many) and I've patched many others for my needs (OK, 2.4). I do not want to pretend I'm an expert in kernel development but still... Anyway, this does not matter.

The problem here is that over time I saw not one time that /proc/mdstat is telling me everything was wonderful and then was not. And I lost some data (again very long time ago and using 2.4, and yes, from my knowledge, there was no way to verify the array in 2.4) and some people where angry on me and so on.

I did not say anywhere "that is an oops". I say is a crash. Maybe it was an misunderstanding since my English is not that good.

I just wanted to be sure that my arrays are fine. If you say is a backtrace and I should not be worry about this - fine. I will not argue with you anymore about this.

But why that process hung in the first place? It's because the kernel runs 2 resync arrays in parallel all the time? How can I force synchronization of one array at a time?
No, I do not want to set hung_task_timeout_secs to 0. Because is not raid specific and if another process hung I definitely want to see that.

And thanks for your time for this (not really a) bug.

Comment 14 Doug Ledford 2010-06-01 20:10:58 UTC
(In reply to comment #13)
> I do not want to offend you, but please do not try to explain to me what /proc
> or a backtrace is.

> I did not say anywhere "that is an oops". I say is a crash. Maybe it was an
> misunderstanding since my English is not that good.

It was a misunderstanding on two levels.  One, the word crash implies that something quit working or did not work.  Everything worked in this case, it just tripped a false positive on an informative test.  That is not a crash.  Two, despite the fact that you have used Linux for many years, you missed the fact that a backtrace is always preceeded by the cause of the backtrace and by itself doesn't mean anything about what caused it to happen.  I was merely trying to explain the relationship between the preceeding message and the backtrace and that you *always* have to look to the message preceeding of a backtrace to know what it's about, and that preceeding message in this case determined it was not a crash.

> I just wanted to be sure that my arrays are fine. If you say is a backtrace and
> I should not be worry about this - fine. I will not argue with you anymore
> about this.
> 
> But why that process hung in the first place? It's because the kernel runs 2
> resync arrays in parallel all the time? How can I force synchronization of one
> array at a time?

The process is hung because both resync processes are started at the same time, the raid stack figures out that they share physical disks, and it selects one process to continue and blocks the other process.  To force syncronization of the arrays one at a time, you would need to change the /etc/cron.weekly/99-raid-check script so that it doesn't start all of the resyncs simultaneously but instead does them one at a time.

> No, I do not want to set hung_task_timeout_secs to 0. Because is not raid
> specific and if another process hung I definitely want to see that.
> 
> And thanks for your time for this (not really a) bug.

Comment 15 GV 2010-06-01 20:58:28 UTC
(In reply to comment #14)
> One, the word crash implies that
> something quit working or did not work.  Everything worked in this case, it
> just tripped a false positive on an informative test.  That is not a crash. 
OK, it's not a crash.

> Two, despite the fact that you have used Linux for many years, you missed the
> fact that a backtrace is always preceeded by the cause of the backtrace and by
> itself doesn't mean anything about what caused it to happen.
:-)
Again, I already know that. This is why I used the (wrong) word "crash" and not "oops". Anyway... forget-it.

> The process is hung because both resync processes are started at the same time,
> the raid stack figures out that they share physical disks, and it selects one
> process to continue and blocks the other process.  To force syncronization of
> the arrays one at a time, you would need to change the
> /etc/cron.weekly/99-raid-check script so that it doesn't start all of the
> resyncs simultaneously but instead does them one at a time.
File is not marked as %config(noreplace) so all my modifications will be lost if a new mdadm package is released.

Comment 16 Simon Matter 2010-06-02 06:40:37 UTC
Allow me to summarize this a bit.
- The kernel messages just _look_ bad because the backtrace usually means something bad has happened. In this case it's not bad.
- /proc/mdstat never lies, but never forget, it will show only what the kernel knows at the current time. If a disk has for example bad blocks but the kernel didn't touch it since becoming bad, /proc/mdstat will show that everything is fine.
- /etc/cron.weekly/99-raid-check runs to prevent from above problems. It mainly tries to detect disk errors so the user can act on it. That's exactly what 'hardware' raid controllers also do.
- the whole cosmetic errors are reported because /etc/cron.weekly/99-raid-check starts resync on all raids at once. That's fine because the kernel takes care of not running more than one sync on a physical disk at once, but produces the cosmetic error. Unfortunately this simple approach is not optimal on servers with lot's of disks - it simply runs too many syncs in parallel and it would be nice if it was possible to limit the number of parallel sync somehow. Maybe I just missed how to limit it, or otherwise it has to be done in the 99-raid-check script which makes it a bit complicated.

So really, there is no bug but room for improvement.

Comment 17 john.haxby@oracle.com 2010-07-20 09:25:37 UTC
See also bug 573106.

It might be cosmetic, but it's scaring a lot of people!

The fix to 99-raid-check is basically that instead of this:

----------------
devnum=0
for dev in $dev_list; do
    echo "${check[$devnum]}" > /sys/block/$dev/md/sync_action
    let devnum++
done
[ -z "$check_list" ] && exit 0

checking=1
while [ $checking -ne 0 ]
do
        sleep 60
        checking=0
        for dev in $check_list; do
        sync_action=`cat /sys/block/$dev/md/sync_action`
                if [ "$sync_action" != "idle" ]; then
                        checking=1
                fi
        done
done
---------------

you have this:

---------------
devnum=0
for dev in $dev_list; do
    echo "${check[$devnum]}" > /sys/block/$dev/md/sync_action
    while :; do
        sleep 60
        sync_action=`cat /sys/block/$dev/md/sync_action`
        if [ "$sync_action" == "idle" ]; then
            break
        fi
    done
    let devnum++
done
[ -z "$check_list" ] && exit 0
---------------

I don't have anything to test this on I'm afraid so I haven't been able to make sure that this is right.  However, I'm sure that someone would like to take this and make it into a proper patch!

I don't think it's worth modifying the script so that it only serialises those checks that need to be serialised (basically doing what the kernel does) as this just runs at about 4am once a week anyway.

Comment 18 john.haxby@oracle.com 2010-08-23 09:23:35 UTC
I've just run across another sysadmin who is seriously worried about this.  As I said, it may be a cosmetic false positive, but the kernel stack trace is causing severe difficulties for a lot of people.  I can't change the severity or priority of this bug, but I would urge someone to do so.

Comment 21 RHEL Program Management 2010-12-07 10:22:05 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 22 RHEL Program Management 2011-06-20 21:49:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 23 john.haxby@oracle.com 2011-06-21 08:07:58 UTC
This problem was fixed differently in bug 573106 (the processes in question no longer sleep in a way that triggers the hung task timeout).   So why did you not simply close this as a duplicate?

Comment 24 Jarod Wilson 2011-06-21 15:05:42 UTC
Its quite possible the connection between the two bugs was never made. I'll go ahead and closed->dupe this one now.

*** This bug has been marked as a duplicate of bug 573106 ***


Note You need to log in before you can comment on or make changes to this bug.