Bug 1419796 - mdadm Floating point exception (core dumped)
Summary: mdadm Floating point exception (core dumped)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 25
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: XiaoNi
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-07 05:04 UTC by Dmitriy Degtyaryov
Modified: 2017-08-28 06:48 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-08-28 06:48:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dmitriy Degtyaryov 2017-02-07 05:04:39 UTC
hello

create raid10
# mdadm --create /dev/md127 --level=10  --raid-devices=4 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2

some time later
# mdadm --add /dev/md127 /dev/sda2 /dev/sdb2
# mdadm --grow --raid-devices=6 /dev/md127

after reshape
# mdadm --verbose --grow --size=max /dev/md127

as a result raid10 successful resized from 4 Tb to 6 Tb

some time later in logs
# journalctl -xe
Feb 07 04:23:36 localhost systemd-coredump[2293]: Process 2290 (mdadm) of user 0 dumped core.
                                                  
                                                  Stack trace of thread 2290:
                                                  #0  0x00005623f3f09225 getinfo_super1 (mdadm)
                                                  #1  0x00005623f3ee1a20 guess_super_type (mdadm)
                                                  #2  0x00005623f3ee9b4e select_devices (mdadm)
                                                  #3  0x00005623f3eea88c Assemble (mdadm)
                                                  #4  0x00005623f3ed8185 main (mdadm)
                                                  #5  0x00007f2fd85ce401 __libc_start_main (libc.so.6)
                                                  #6  0x00005623f3ed946a _start (mdadm)
-- Subject: Process 2290 (mdadm) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 2290 (mdadm) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.

after reboot raid10 not working
i try
# mdadm --assemble --scan
Floating point exception (core dumped)

# mdadm --version
mdadm - v3.4 - 28th January 2016

# cat /etc/fedora-release 
Fedora release 25 (Twenty Five)

how to recover raid10?

Comment 1 Dmitriy Degtyaryov 2017-02-07 09:24:53 UTC
I download latest Fedora 26 iso and use mdadm version 4.0.

Try assemble
# mdadm --assemble /dev/md127 /dev/sd[abcdef]2

From dmesg:
bitmap chunk size too small

I make sure all of the disks has state clean
#  mdadm --examine /dev/sd[abcdef]2 | grep 'State'

Force run raid without bitmap
# mdadm --assemble --update no-bitmap /dev/md127 /dev/sd[abcdef]2

Write bitmap
# mdadm --grow --bitmap=internal /dev/md127

My mistake this is no remove bitmap.

But, segmentation fault on mdadm version 3.4 this is very bad.

Comment 2 XiaoNi 2017-02-07 09:28:18 UTC
Hi Dmitriy

Let me make sure one thing. So now you already have your raid re-run normally in f26 right? I hope you don't lose data.

And yes if it's a bug for f25 we can try to fix it.

Thanks
Xiao

Comment 3 Dmitriy Degtyaryov 2017-02-07 13:14:44 UTC
The data is not lost, because raid10 and at the time of reboot, all the drives were synchronized. Only bad bitmap.

My RAID10 now working on Fedora 25.

mdadm version 3.4 on Fedora 25 segmentation fault when the bitmap was broken and you can not fix it.
mdadm version 4.0 on Fedora 26 working, reports about this problem and it is possible to fix it.

I think the simple and best solution is to upgrade mdadm to version 4.0 for Fedora 25.

Comment 4 XiaoNi 2017-08-03 03:21:39 UTC
Hi Dmitriy

Sorry for the late response. 

(In reply to Dmitriy Degtyaryov from comment #3)
> The data is not lost, because raid10 and at the time of reboot, all the
> drives were synchronized. Only bad bitmap.

How do you know the bitmap is bad? Can I reproduce this problem in my environment? 

> 
> My RAID10 now working on Fedora 25.
I'm glad to hear this. 
So you upgraded mdadm to 4.0 in fedora 25 and it worked, right?

> 
> mdadm version 3.4 on Fedora 25 segmentation fault when the bitmap was broken
> and you can not fix it.
> mdadm version 4.0 on Fedora 26 working, reports about this problem and it is
> possible to fix it.

It works in f26, but it reports this problem. What's the problem? "Floating point exception"? 
But in comment 0, the problem happened in f25 with mdadm 3.4

> 
> I think the simple and best solution is to upgrade mdadm to version 4.0 for
> Fedora 25.

Hmm, I'll try this and do some tests first. 

Thanks
Xiao

Comment 5 XiaoNi 2017-08-28 06:48:50 UTC
The commands in comment 0 can't create a raid with bitmap. 

And I did two steps in fedora25 and couldn't reproduce this.

mdadm -CR /dev/md0 -l10 -n4 /dev/loop[0-3]
mdadm --wait /dev/md0
mdadm /dev/md0 -a /dev/loop4 /dev/loop5
mdadm /dev/md0 --grow --raid-devices=4
mdadm /dev/md0 --grow --raid-devices=6
mdadm /dev/md0 --grow --size=max

mdadm -CR /dev/md0 -l10 -n4 /dev/loop[0-3] --bitmap=internal
mdadm --wait /dev/md0
mdadm /dev/md0 -a /dev/loop4 /dev/loop5
mdadm /dev/md0 --grow --raid-devices=6
mdadm /dev/md0 --grow --size=max

Close this first.

Thanks
Xiao


Note You need to log in before you can comment on or make changes to this bug.