478768 – Disk formatting fails because the system is already using the partition

Bug 478768 - Disk formatting fails because the system is already using the partition

Summary: Disk formatting fails because the system is already using the partition

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	anaconda
Sub Component:
Version:	10
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Anaconda Maintenance Team
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-01-04 21:23 UTC by David Tonhofer
Modified:	2009-06-18 18:17 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-06-18 18:17:55 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Anaconda's log (11.23 KB, text/plain) 2009-01-04 21:24 UTC, David Tonhofer	no flags	Details
Anaconda's not-very-helpful message ("Computer says no") (13.20 KB, image/png) 2009-01-04 21:28 UTC, David Tonhofer	no flags	Details
View All

Description David Tonhofer 2009-01-04 21:23:53 UTC

Description of problem:

Trying to set up a new system using F10 live CD. Two SATA drives are to be
arranged as follows:

sda ----+---->sda1----->/boot
        +---->sda2----->software RAID mirror part 1 (md1)

sdb ----+---->sdb1----->/bootspare
        +---->sdb2----->software RAID mirror part 2 (md1)

the software RAID mirror will be filled with an LVM Logical Volume which will then be filled with swap, root, etc..

This works, however I had some trouble with other setups (see bug#478751), so I decided on the present one. Should be straigtforward, but surprises await.

Anaconda sets up the partitions, and starts formatting, but then breaks off with the message:

"An error occurred trying to format sdb1.  This problem is serious, and the   install cannot continue. Press <Enter> to exit the installer."

...this is not very informative (could a perror() call help?), but looking at anaconda's log file, the last command seems to be:

21:53:23 INFO    : formatting /bootspare as ext3
21:53:23 INFO    : Format command:  ['mke2fs', '/dev/sdb1', '-t', 'ext3']

Trying mke2fs /dev/sdb1 -t ext3

[root@localhost ~]# mke2fs /dev/sdb1 -t ext3
mke2fs 1.41.3 (12-Oct-2008)
/dev/sdb1 is apparently in use by the system; will not make a filesystem here!

The first time I saw this, I though there might be some RAID magic leftover from an earlier installation, so I tried this:

dd if=/dev/zero of=/dev/sdb1 count=200

But it didn't help. 

The status of the software RAID md1 (there is no md0) is:

---------------------
[root@localhost ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sun Jan  4 18:49:07 2009
     Raid Level : raid1
     Array Size : 156087424 (148.86 GiB 159.83 GB)
  Used Dev Size : 156087424 (148.86 GiB 159.83 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sun Jan  4 21:56:58 2009
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 52de844c:d5bac52d:7e2a17b6:9576b63d
         Events : 0.12

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2
---------------------

Version-Release number of selected component (if applicable):

anaconda-11.4.1.62-1.x86_64

How reproducible:

Encountered twice in a row, with reboot in between.

Comment 1 David Tonhofer 2009-01-04 21:24:45 UTC

Created attachment 328153 [details]
Anaconda's log

Comment 2 David Tonhofer 2009-01-04 21:28:59 UTC

Created attachment 328154 [details]
Anaconda's not-very-helpful message ("Computer says no")

Comment 3 David Tonhofer 2009-01-04 21:47:15 UTC

Tried again, leaving the /dev/sdb1 alone, with no reboot in between. Installation seems to have worked this time.

Comment 4 David Tonhofer 2009-01-04 22:26:36 UTC

Installation worked well, but the resulting RAID array is only moderately successful for unknown reasons. The single RAID array is blown from the get go:


Jan  4 23:48:56 gefjun kernel: md: raid6 personality registered for level 6
Jan  4 23:48:56 gefjun kernel: md: raid5 personality registered for level 5
Jan  4 23:48:56 gefjun kernel: md: raid4 personality registered for level 4
Jan  4 23:48:56 gefjun kernel: md: md1 stopped.
Jan  4 23:48:56 gefjun kernel: md: bind<sda2>
Jan  4 23:48:56 gefjun kernel: md: bind<sdb2>
Jan  4 23:48:56 gefjun kernel: md: kicking non-fresh sda2 from array!
Jan  4 23:48:56 gefjun kernel: md: unbind<sda2>
Jan  4 23:48:56 gefjun kernel: md: export_rdev(sda2)
Jan  4 23:48:56 gefjun kernel: raid1: raid set md1 active with 1 out of 2 mirrors

Ok, so that is ANOTHER problem.

On the other hand, the system still holds fast on /dev/sdb1 (though how exactly I don't know, it isn't mounted for sure):


[root@gefjun log]# mke2fs /dev/sdb1 -t ext3
mke2fs 1.41.3 (12-Oct-2008)
/dev/sdb1 is apparently in use by the system; will not make a filesystem here!



[root@gefjun log]# fdisk -l
Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000c622d

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          25      200781   83  Linux
/dev/sda2              26       19457   156087540   fd  Linux raid autodetect

Disk /dev/sdb: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x9cd19cd1

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          25      200781   83  Linux
/dev/sdb2              26       19457   156087540   fd  Linux raid autodetect

Comment 5 David Tonhofer 2009-01-04 22:40:38 UTC

Looking further, one half of the RAID seems to somehow have been shanghaied into a RAID array that no-one ordered. I wonder how that is possible:


[root@gefjun log]# mdadm --detail /dev/md_d1
mdadm: Unknown keyword mdadm:
/dev/md_d1:
        Version : 0.90
  Creation Time : Mon Jan  5 00:49:07 2009
     Raid Level : raid1
     Array Size : 156087424 (148.86 GiB 159.83 GB)
  Used Dev Size : 156087424 (148.86 GiB 159.83 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Jan  5 03:36:04 2009
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 52de844c:d5bac52d:7e2a17b6:9576b63d
         Events : 0.10

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       0        0        1      removed



But enough of this!



[root@gefjun log]# mdadm --manage --stop /dev/md_d1
mdadm: Unknown keyword mdadm:
mdadm: stopped /dev/md_d1


[root@gefjun log]# mdadm --manage /dev/md1 --add /dev/sda2
mdadm: Unknown keyword mdadm:
mdadm: re-added /dev/sda2


[root@gefjun log]# mdadm --detail /dev/md1 
mdadm: Unknown keyword mdadm:
/dev/md1:
        Version : 0.90
  Creation Time : Mon Jan  5 00:49:07 2009
     Raid Level : raid1
     Array Size : 156087424 (148.86 GiB 159.83 GB)
  Used Dev Size : 156087424 (148.86 GiB 159.83 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sun Jan  4 23:39:08 2009
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           UUID : 52de844c:d5bac52d:7e2a17b6:9576b63d
         Events : 0.2633

    Number   Major   Minor   RaidDevice State
       2       8        2        0      spare rebuilding   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2

Comment 6 Chris Lumens 2009-01-06 22:28:34 UTC

I wonder if this error message from the log file has something to do with it:

21:50:57 ERROR   : raid set inconsistency for md1: all drives in this raid set do not agree on raid parameters.  Skipping raid device

Comment 7 David Tonhofer 2009-01-11 14:41:19 UTC

Update on this issue.

Even though the machine is now happily running, /dev/sdb1 still cannot be formatted:

[root@gefjun ~]# mke2fs /dev/sdb1 -t ext3
mke2fs 1.41.3 (12-Oct-2008)
/dev/sdb1 is apparently in use by the system; will not make a filesystem here!

A hexdump shows that there are zeros up to 0x19000, after which comes a remainder of stuff from an earlier installation (matches the /boot partition):

[root@gefjun ~]# hexdump -C /dev/sdb1

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00019000  64 20 00 00 64 60 00 00  64 a0 00 00 64 e0 00 00  |d ..d`..d...d...|
00019010  64 20 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |d ..............|
00019020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00019400  65 20 00 00 65 60 00 00  65 a0 00 00 65 e0 00 00  |e ..e`..e...e...|
00019410  65 20 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |e ..............|
00019420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00019800  66 20 00 00 66 60 00 00  66 a0 00 00 66 e0 00 00  |f ..f`..f...f...|
00019810  66 20 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |f ..............|
00019820  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00019c00  67 20 00 00 67 60 00 00  67 a0 00 00 67 e0 00 00  |g ..g`..g...g...|
00019c10  67 20 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |g ..............|
00019c20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0001a000  68 20 00 00 68 60 00 00  68 a0 00 00 68 e0 00 00  |h ..h`..h...h...|
0001a010  68 20 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |h ..............|
0001a020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

Looking at /var/log/messages:

Jan 11 14:16:35 gefjun kernel: sdb: sdb1 sdb2
Jan 11 14:16:35 gefjun kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Jan 11 14:16:35 gefjun kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
....
Jan 11 14:16:35 gefjun kernel: md: bind<sdb1>
...

So the md-devices seem to like sdb1.

Grepping just messages containing "md"


Jan 11 14:16:35 gefjun kernel: md: raid1 personality registered for level 1
Jan 11 14:16:35 gefjun kernel: md: raid6 personality registered for level 6
Jan 11 14:16:35 gefjun kernel: md: raid5 personality registered for level 5
Jan 11 14:16:35 gefjun kernel: md: raid4 personality registered for level 4
Jan 11 14:16:35 gefjun kernel: md: md1 stopped.
Jan 11 14:16:35 gefjun kernel: md: bind<sdb2>
Jan 11 14:16:35 gefjun kernel: md: bind<sda2>
Jan 11 14:16:35 gefjun kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jan 11 14:16:35 gefjun kernel: scsi4 : pata_amd
Jan 11 14:16:35 gefjun kernel: scsi5 : pata_amd
Jan 11 14:16:35 gefjun kernel: ata5: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
Jan 11 14:16:35 gefjun kernel: ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
Jan 11 14:16:35 gefjun kernel: md: bind<sdb1>
Jan 11 14:16:35 gefjun kernel: md: md_d1 stopped.


Will try to zero the whole partition, not only the start, then reboot.

Comment 8 David Tonhofer 2009-01-11 14:42:19 UTC

[root@gefjun ~]# dd if=/dev/zero of=/dev/sdb1
dd: writing to `/dev/sdb1': No space left on device
401563+0 records in
401562+0 records out
205599744 bytes (206 MB) copied, 1.2918 s, 159 MB/s


[root@gefjun ~]# hexdump -C /dev/sdb1
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0c413400

Comment 9 David Tonhofer 2009-01-11 14:51:46 UTC

Ok -- that did it. 

"mke2fs /dev/sdb1 -t ext3" now works.

Does the software RAID depend on what's at the _end_ of the partition?

The "md" messages from /var/log/message:

Jan 11 15:43:24 gefjun kernel: md: raid1 personality registered for level 1
Jan 11 15:43:24 gefjun kernel: md: raid6 personality registered for level 6
Jan 11 15:43:24 gefjun kernel: md: raid5 personality registered for level 5
Jan 11 15:43:24 gefjun kernel: md: raid4 personality registered for level 4
Jan 11 15:43:24 gefjun kernel: md: md1 stopped.
Jan 11 15:43:24 gefjun kernel: md: bind<sdb2>
Jan 11 15:43:24 gefjun kernel: md: bind<sda2>
Jan 11 15:43:24 gefjun kernel: raid1: raid set md1 active with 2 out of 2 mirrors
Jan 11 15:43:24 gefjun kernel: scsi4 : pata_amd
Jan 11 15:43:24 gefjun kernel: scsi5 : pata_amd
Jan 11 15:43:24 gefjun kernel: ata5: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
Jan 11 15:43:24 gefjun kernel: ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
Jan 11 15:43:24 gefjun kernel: md: md_d1 stopped.

Comment 10 David Tonhofer 2009-01-11 14:59:02 UTC

Apparently, RAID metadata is indeed stored towards the end of the partition. To conclude, there possibly should be a red-button "partition destruction" option in anaconda to completely erase a partition as a last resort.

Comment 11 Chris Lumens 2009-04-08 20:13:53 UTC

We have made extensive changes to the partitioning code for F11 beta, such that it is very difficult to tell whether your bug is still relevant or not.  Please test with either the latest rawhide you have access to or F11 and let us know whether you are still seeing this problem.  Thanks for the bug report.

In particular, I know I have seen patches go in to address the persistence of metadata problem.  There's been a whole lot of RAID work recently so I suspect this problem should be fixed.

Note You need to log in before you can comment on or make changes to this bug.