Bug 1292871 - RHEL-6.7 Server i386 fails to format 36 JBODS
RHEL-6.7 Server i386 fails to format 36 JBODS
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: e2fsprogs (Show other bugs)
6.7
i386 Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Eric Sandeen
Filesystem QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-18 10:27 EST by andrew
Modified: 2016-08-03 10:10 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-07-05 15:43:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description andrew 2015-12-18 10:27:51 EST
Description of problem:

 RHEL-6.7 Server i386 fails to format 36 JBODS

Version-Release number of selected component (if applicable):

RHEL-6.7 Server i386

How reproducible: 100%


Steps to Reproduce:
1. Install rhel6.7 via kickstart with 36 JBODS
2.
3.

Actual results:
Creating device /dev/sdx1 
Creating physical volume (LVM) on /dev/sdx1 
In progress 
 
Creating device /dev/sdy1 
Creating physical volume (LVM) on /dev/sdy1 
In progress 
 
Creating device /dev/sdz1 
Creating physical volume (LVM) on /dev/sdz1 
In progress 
 
Creating device /dev/mapper/vg_gsper62001 
In progress 
[-- MARK -- Wed Dec  2 17:55:00 2015] 
 
Creating device /dev/mapper/vg_gsper62001-lv_root 
In progress 
 
Creating ext4 filesystem on /dev/mapper/vg_gsper62001-lv_root 
In progress 
 
Creating device /dev/mapper/vg_gsper62001-lv_home 
In progress 
 
Creating ext4 filesystem on /dev/mapper/vg_gsper62001-lv_home 
In progress 
[-- MARK -- Wed Dec  2 18:00:00 2015] 
 
Command line mode requires all choices to be specified in a kickstart configuration file. 
Formatting Failed 
An error was encountered while formatting device /dev/mapper/vg_gsper62001-lv_home. 
 
format failed: 1 
('custom', ['_File Bug', '_Exit installer']) 

Expected results:

Installer not to fail

Additional info:


x86_64 dose not have this problem
Comment 7 Brian Lane 2015-12-21 17:03:22 EST
Please attach program.log

17:39:15,793 WARNING kernel:scsi target7:0:33: predicted fault

in syslog, not sure if that is a problem or not.
Comment 9 andrew 2015-12-21 20:03:03 EST
(In reply to Brian Lane from comment #7)
> Please attach program.log
> 
> 17:39:15,793 WARNING kernel:scsi target7:0:33: predicted fault
> 
> in syslog, not sure if that is a problem or not.

I would also like to make note that this system will install RHEL-6.7 Server x86_64 with the same install options and base kickstart without a problem. Only i386 is having this this problem,
Comment 10 Brian Lane 2015-12-22 11:08:52 EST
12:57:18,795 INFO    : Running... ['mke2fs', '-t', 'ext4', '/dev/mapper/vg_gsper62001-lv_root']
12:57:18,806 INFO    : Filesystem label=
12:57:18,807 INFO    : OS type: Linux
12:57:18,808 INFO    : Block size=4096 (log=2)
12:57:18,808 INFO    : Fragment size=4096 (log=2)
12:57:18,809 INFO    : Stride=0 blocks, Stripe width=0 blocks
12:57:18,810 INFO    : 3276800 inodes, 13107200 blocks
12:57:18,811 INFO    : 655360 blocks (5.00%) reserved for the super user
12:57:18,811 INFO    : First data block=0
12:57:18,812 INFO    : Maximum filesystem blocks=0
12:57:18,813 INFO    : 400 block groups
12:57:18,814 INFO    : 32768 blocks per group, 32768 fragments per group
12:57:18,814 INFO    : 8192 inodes per group
12:57:18,815 INFO    : Superblock backups stored on blocks: 
12:57:18,816 INFO    : 	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
12:57:18,817 INFO    : 	4096000, 7962624, 11239424
12:57:18,817 INFO    : 
12:57:23,229 INFO    : Writing inode tables:  

...

done                            
13:02:11,967 INFO    : Creating journal (32768 blocks): 
13:02:11,967 ERROR   : mke2fs 1.41.12 (17-May-2010)
13:02:11,967 ERROR   : mke2fs: File exists 
13:02:11,967 ERROR   : 	while trying to create journal


So something went wrong with the journal creation.
Comment 11 Eric Sandeen 2016-01-04 19:51:05 EST
(In reply to Brian Lane from comment #10)
> 12:57:18,795 INFO    : Running... ['mke2fs', '-t', 'ext4',
> '/dev/mapper/vg_gsper62001-lv_root']

<too-big snip>

> ...
> 
> done                            
> 13:02:11,967 INFO    : Creating journal (32768 blocks): 
> 13:02:11,967 ERROR   : mke2fs 1.41.12 (17-May-2010)
> 13:02:11,967 ERROR   : mke2fs: File exists 
> 13:02:11,967 ERROR   : 	while trying to create journal
> 
> 
> So something went wrong with the journal creation.

But not on vg_gsper62001-lv_root; that succeeded:

> 12:57:24,402 INFO    : Creating journal (32768 blocks): done

the failure was from 

> 12:57:54,365 INFO    : Running... ['mke2fs', '-t', 'ext4', '/dev/mapper/vg_gsper62001-lv_home']

which failed:

> 13:02:11,967 ERROR   : mke2fs: File exists 

The early output from mkfs was:

> 12:57:55,232 INFO    : Filesystem label=
> 12:57:55,232 INFO    : OS type: Linux
> 12:57:55,233 INFO    : Block size=4096 (log=2)
> 12:57:55,233 INFO    : Fragment size=4096 (log=2)
> 12:57:55,234 INFO    : Stride=0 blocks, Stripe width=0 blocks
> 12:57:55,234 INFO    : 1073741824 inodes, 4294967295 blocks

so it's a maximally-sized ext4 filesystem, with 2^32-1 fs blocks, or just shy of 16T... but doing that here works fine.

Oh, but here is x86_64.  Sigh, a 32-bit problem I guess.
Comment 12 Eric Sandeen 2016-01-04 23:42:02 EST
Testing this against a 16T file on a 32-bit box yields no problems:

# file sbin/mke2fs
sbin/mke2fs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped

# sbin/mke2fs -t ext4 /mnt/test2/testfile
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1073741824 inodes, 4294967295 blocks
214748364 blocks (5.00%) reserved for the super user
First data block=0
131072 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 
	2560000000, 3855122432

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
You have new mail in /var/spool/mail/root

so I wonder if it's possible that this is an LVM problem.  Please include information which shows us how /dev/mapper/vg_gsper62001-lv_home was assembled.

Thanks,
-Eric
Comment 13 Eric Sandeen 2016-01-06 12:15:31 EST
On this same box, doing a pvcreate of /dev/sd[b-z], making one big VG out of it, and a same-sized-as-kickstart (16T) LV mkfs'd without any problems.  So something different/strange/unexpected is going on in the kickstart environment.

#anaconda tells me it's possible to enable ssh to the installer environment; if you can do that and rerun the failing kickstart maybe I can look further at the state of things.
Comment 14 Eric Sandeen 2016-01-06 14:31:16 EST
Ok, was able to log into the install session post-failure.  The home lv dm table looks like:

-bash-4.1# dmsetup table
vg_gsper62001-lv_home: 0 3907018752 linear 65:161 2048
vg_gsper62001-lv_home: 3907018752 3907018752 linear 65:177 2048
vg_gsper62001-lv_home: 7814037504 3907018752 linear 65:193 2048
vg_gsper62001-lv_home: 11721056256 3907018752 linear 65:209 2048
vg_gsper62001-lv_home: 15628075008 3907018752 linear 65:225 2048
vg_gsper62001-lv_home: 19535093760 3907018752 linear 65:241 2048
vg_gsper62001-lv_home: 23442112512 3907018752 linear 66:1 2048
vg_gsper62001-lv_home: 27349131264 3907018752 linear 66:17 2048
vg_gsper62001-lv_home: 31256150016 3103588352 linear 66:33 2048

Will keep investigating.
Comment 15 Eric Sandeen 2016-01-06 18:28:10 EST
Hm, while testing I get:

scsi target7:0:33: predicted fault
scsi target7:0:33: predicted fault
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
SELinux: initialized (dev dm-0, type ext4), uses xattr
scsi target7:0:33: predicted fault
sd 7:0:33:0: [sdag]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 7:0:33:0: [sdag]  Sense Key : Hardware Error [current] 
Info fld=0xe67dd15b
sd 7:0:33:0: [sdag]  Add. Sense: No defect spare location available
sd 7:0:33:0: [sdag] CDB: Read(10): 28 00 e6 7d d0 e8 00 00 80 00
end_request: critical target error, dev sdag, sector 3867005160
Buffer I/O error on device dm-1, logical block 3413639197
Buffer I/O error on device dm-1, logical block 3413639198
Buffer I/O error on device dm-1, logical block 3413639199
Buffer I/O error on device dm-1, logical block 3413639200
Buffer I/O error on device dm-1, logical block 3413639201
Buffer I/O error on device dm-1, logical block 3413639202
Buffer I/O error on device dm-1, logical block 3413639203
Buffer I/O error on device dm-1, logical block 3413639204
Buffer I/O error on device dm-1, logical block 3413639205
sd 7:0:33:0: [sdag]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 7:0:33:0: [sdag]  Sense Key : Hardware Error [current] 
Info fld=0xe67dd16a
sd 7:0:33:0: [sdag]  Add. Sense: No defect spare location available
sd 7:0:33:0: [sdag] CDB: Read(10): 28 00 e6 7d d1 68 00 01 00 00
end_request: critical target error, dev sdag, sector 3867005288
sd 7:0:33:0: [sdag]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 7:0:33:0: [sdag]  Sense Key : Hardware Error [current] 
Info fld=0xe6bdd176
sd 7:0:33:0: [sdag]  Add. Sense: No defect spare location available
sd 7:0:33:0: [sdag] CDB: Read(10): 28 00 e6 bd d1 68 00 01 00 00
end_request: critical target error, dev sdag, sector 3871199592
__ratelimit: 39 callbacks suppressed


dm-1 is the home lv.  however, sdag (66:0) doesn't show up in the dmsetup table so I'm a little confused.

Anyway, I wonder what happens if you exclude sdag from the install.  Probably not the issue, but storage errors always raise red flags.  I'll let you know when I'm done w/ current testing if we want to try this.

Another datapoint, if we enable lazy inode table initialization, the problem doesn't happen.  Not sure what to make of that.

-Eric
Comment 16 Eric Sandeen 2016-01-06 18:53:13 EST
Ok, punting this to devicemapper.

xfs_io can write, and show, patterns:

-bash-4.1# /root/xfs_io -f -c "pwrite 0 16m" testfile
wrote 16777216/16777216 bytes at offset 0
16 MiB, 4096 ops; 0.0000 sec (990.651 MiB/sec and 253606.5878 ops/sec)
-bash-4.1# /root/xfs_io -c "pread -v 0 16m" testfile | more                          
00000000:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000010:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000020:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000030:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000040:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000050:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
...

What if we point it at the device (which I previously formatted with xfs?)

-bash-4.1# /root/xfs_io -c "pwrite 0 16m" /dev/mapper/vg_gsper62001-lv_home 
wrote 16777216/16777216 bytes at offset 0
16 MiB, 4096 ops; 0.0000 sec (918.801 MiB/sec and 235213.0470 ops/sec)
-bash-4.1# sync
-bash-4.1# /root/xfs_io -c "pread -v 0 16m" /dev/mapper/vg_gsper62001-lv_home  | more
00000000:  58 46 53 42 00 00 10 00 00 00 00 00 ff ff ff f0  XFSB............
00000010:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000020:  f6 1c b9 8b 0a 09 4e 06 9a 31 f1 e3 af 7d 4b e0  ......N..1....K.
00000030:  00 00 00 00 80 00 00 05 00 00 00 00 00 00 00 60  ................
00000040:  00 00 00 00 00 00 00 61 00 00 00 00 00 00 00 62  .......a.......b
00000050:  00 00 00 01 0f ff ff ff 00 00 00 10 00 00 00 00  ................
00000060:  00 07 f6 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ................

Um, why didn't it write any data?

OTOH, how did the xfs data get there?  (mkfs.xfs uses DIO...)

If I do the pwrite w/ DIO, it works:

-bash-4.1# /root/xfs_io -d -c "pwrite 0 16m" /dev/mapper/vg_gsper62001-lv_home       
...
-bash-4.1# /root/xfs_io -c "pread -v 0 16m" /dev/mapper/vg_gsper62001-lv_home  | more
00000000:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000010:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................
00000020:  cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  ................


what the heck?  Some kind of vm/pagecache bug...?
Comment 17 Eric Sandeen 2016-01-06 18:54:33 EST
(actually I did not punt it to devicemapper, dm probably does not care one bit about buffered vs. DIO... not an e2fsprogs bug but not sure where it goes, yet)
Comment 18 Eric Sandeen 2016-01-07 11:27:28 EST
Regarding comment #15, the failing disk is indeed part of the LV; Jeff reminded me that we partition disks for PVs, oops:

-bash-4.1# pvs | grep sdag
  /dev/sdag1 vg_gsper62001 lvm2 a--    1.82t      0 
-bash-4.1# ls -l /dev/sdag1
brw-rw----. 1 root disk 66, 1 Jan  6 14:17 /dev/sdag1
Comment 19 Mike Snitzer 2016-01-07 13:19:33 EST
comment#15 clearly shows SCSI IO errors, those IO errors _should_ be felt by the application if it waits for all buffered IO before exiting.

Given the use of 'sync' after the xfs_io initiated "pwrite 0 16m" am I to assume xfs_io isn't waiting for all buffered IO before exiting?

In any case, seems pretty clear that the SCSI device is unhappy.  Question becomes: why?
Comment 20 Eric Sandeen 2016-01-07 14:43:29 EST
Mike, the errors in comment #15 are on a PV which is much further into the LV than the IO I am issuing, right?

And direct IO completes without error, so unseen IO errors seem unlikely.

It's true that xfs_io doesn't wait for IO (it is buffered IO).

But I can do the same with opening O_SYNC, and get no errors.

And I can issue an fsync at the end, and get no error from that either:

...
pwrite64(3, "\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253\253"..., 4096, 2093056) = 4096
gettimeofday({1452195719, 464362}, NULL) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77d1000
write(1, "wrote 2097152/2097152 bytes at o"..., 40wrote 2097152/2097152 bytes at offset 0
) = 40
write(1, "2 MiB, 512 ops; 0.0000 sec (62.0"..., 672 MiB, 512 ops; 0.0000 sec (62.094 MiB/sec and 15896.1781 ops/sec)
) = 67
fsync(3)                                = 0
Comment 21 Mike Snitzer 2016-01-08 16:30:10 EST
(In reply to Eric Sandeen from comment #20)
> Mike, the errors in comment #15 are on a PV which is much further into the
> LV than the IO I am issuing, right?
> 
> And direct IO completes without error, so unseen IO errors seem unlikely.

But direct IO isn't having problems, not sure what it completing without error has to do with the buffered case that doesn't work right...

> It's true that xfs_io doesn't wait for IO (it is buffered IO).
> 
> But I can do the same with opening O_SYNC, and get no errors.
> 
> And I can issue an fsync at the end, and get no error from that either

Fair enough.. those errors happened though (and raise concerns about the underlying storage).  I don't have the answers you're looking for but I also don't have _any_ other reports of this kind of failure on DM devices.

If you confine the volume size to less than 4T does that mask the failure?
Comment 22 andrew 2016-02-15 12:07:54 EST
(In reply to Mike Snitzer from comment #21)
> (In reply to Eric Sandeen from comment #20)
> > Mike, the errors in comment #15 are on a PV which is much further into the
> > LV than the IO I am issuing, right?
> > 
> > And direct IO completes without error, so unseen IO errors seem unlikely.
> 
> But direct IO isn't having problems, not sure what it completing without
> error has to do with the buffered case that doesn't work right...
> 
> > It's true that xfs_io doesn't wait for IO (it is buffered IO).
> > 
> > But I can do the same with opening O_SYNC, and get no errors.
> > 
> > And I can issue an fsync at the end, and get no error from that either
> 
> Fair enough.. those errors happened though (and raise concerns about the
> underlying storage).  I don't have the answers you're looking for but I also
> don't have _any_ other reports of this kind of failure on DM devices.
> 
> If you confine the volume size to less than 4T does that mask the failure?

Is this something you need me to do? its just unclear who you are asking to do what.

thanks
Comment 23 Eric Sandeen 2016-06-16 19:19:27 EDT
At this point I'm inclined to think this was a storage problem... is there any other info to be gathered on this bug, or should we close it?
Comment 24 Eric Sandeen 2016-07-05 15:43:08 EDT
We've been in NEEDINFO for a few weeks, if any more information can be gathered feel free to add it and reopen.

But best guess is a storage/disk problem (given the scsi/disk errors shown in comment #15), not an e2fsprogs problem.
Comment 25 andrew 2016-08-03 09:48:11 EDT
(In reply to Eric Sandeen from comment #24)
> We've been in NEEDINFO for a few weeks, if any more information can be
> gathered feel free to add it and reopen.
> 
> But best guess is a storage/disk problem (given the scsi/disk errors shown
> in comment #15), not an e2fsprogs problem.

There was nothing wrong with the storage as it works fine on x86_64. No one contacted me about really getting there hands on the setup So there really is not anything for me to do. Just trying to do the good thing and get these kind of things reported so they are not a problem later down the road.
Comment 26 Eric Sandeen 2016-08-03 10:10:05 EDT
Andrew, sorry, if you'd like to keep the bug open and keep looking at it, we can.

I did see what looked like scsi errors though, see comment #15.

If you exclude the sdag disk from comment #15, does the problem go away?

My other thought is - what if you restrict the fs size to under 16T, does it go away?

Note You need to log in before you can comment on or make changes to this bug.