Bug 1020981 - xfs: don't emit corruption noise on fs probes
Summary: xfs: don't emit corruption noise on fs probes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-18 16:10 UTC by Justin Clift
Modified: 2015-07-13 04:35 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-10 14:42:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Screenshot of the error message in systemd output (2.15 MB, image/jpeg)
2013-10-18 16:57 UTC, Justin Clift
no flags Details

Description Justin Clift 2013-10-18 16:10:46 UTC
Description of problem:

  The output from systemd's filesystem "mount" procedure is extremely
  sub-optimal for at least XFS if the mount fails. (can lead to data loss)

  If (for example), partition numbering on a machine has changed, such that
  systemd tries to mount a non-XFS filesystem as XFS, the message displayed
  to the user is:

    XFS (sda7): Corruption detected. Unmount and run xfs_repair

  Data recovery procedures to fix this misidentified "corruption" can then
  cause data loss.  (just happened)

  ***

  The problem appears to be that systemd is showing the last few lines
  of the system log (misleading), instead of the output from the mount
  command (useful, and accurate).

  Systemd should show the output from the failed mount command, instead of
  or in addition to the output from the system log.

  Current real world system log output
  ************************************

  Oct 18 16:46:30 f19laptop kernel: [ 1801.893919] XFS (sda7): bad magic number
  Oct 18 16:46:30 f19laptop kernel: [ 1801.893924] ffff8801e1bc7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Oct 18 16:46:30 f19laptop kernel: [ 1801.893926] ffff8801e1bc7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Oct 18 16:46:30 f19laptop kernel: [ 1801.893927] ffff8801e1bc7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Oct 18 16:46:30 f19laptop kernel: [ 1801.893928] ffff8801e1bc7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Oct 18 16:46:30 f19laptop kernel: [ 1801.893929] XFS (sda7): Internal error xfs_sb_read_verify at line 780 of file fs/xfs/xfs_mount.c.  Caller 0xffffffffa060ea25
  Oct 18 16:46:30 f19laptop kernel: [ 1801.893929] 
  Oct 18 16:46:30 f19laptop kernel: [ 1801.894037] XFS (sda7): Corruption detected. Unmount and run xfs_repair

  Output from corresponding mount command
  ***************************************

  $ sudo mount -t xfs /dev/sda7 /example
  mount: wrong fs type, bad option, bad superblock on /dev/sda7,
         missing codepage or helper program, or other error

         In some cases useful info is found in syslog - try
         dmesg | tail or so.
  

Version-Release number of selected component (if applicable):

  $ rpm -qa|grep -i 'systemd\|mount'|sort
  libmount-2.23.2-4.fc19.x86_64
  systemd-204-16.fc19.x86_64
  systemd-devel-204-16.fc19.x86_64
  systemd-libs-204-16.fc19.i686
  systemd-libs-204-16.fc19.x86_64
  systemd-python-204-16.fc19.x86_64
  systemd-sysv-204-16.fc19.x86_64


How reproducible:

  Every time

Comment 1 Michal Schmidt 2013-10-18 16:33:00 UTC
[Adding Eric Sandeen to CC to have a look at the reported XFS error. Thanks.]

(In reply to Justin Clift from comment #0)
>   The problem appears to be that systemd is showing the last few lines
>   of the system log (misleading), instead of the output from the mount
>   command (useful, and accurate).

Where exactly is it showing that? Is it in the systemd journal?

>   Systemd should show the output from the failed mount command, instead of
>   or in addition to the output from the system log.
> 
>   Current real world system log output
>   ************************************
> 
>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893919] XFS (sda7): bad magic
> number

Looks like here xfs correctly saw that this is NOT an XFS filesystem. Why did it continue to mount it regardless?

>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893924] ffff8801e1bc7000: 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893926] ffff8801e1bc7010: 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893927] ffff8801e1bc7020: 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893928] ffff8801e1bc7030: 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893929] XFS (sda7): Internal
> error xfs_sb_read_verify at line 780 of file fs/xfs/xfs_mount.c.  Caller
> 0xffffffffa060ea25

"Internal" errors should never happen. This is a bug in xfs kernel code.

>   Oct 18 16:46:30 f19laptop kernel: [ 1801.893929] 
>   Oct 18 16:46:30 f19laptop kernel: [ 1801.894037] XFS (sda7): Corruption
> detected. Unmount and run xfs_repair

Even if you saw the error output from the mount command, it still points you to check dmesg where you'd find this misleading advice. So I don't see how better off you'd be.

>   Output from corresponding mount command
>   ***************************************
> 
>   $ sudo mount -t xfs /dev/sda7 /example
>   mount: wrong fs type, bad option, bad superblock on /dev/sda7,
>          missing codepage or helper program, or other error
> 
>          In some cases useful info is found in syslog - try
>          dmesg | tail or so.

Exactly! I would think "Bad superblock!", check dmesg and get the bad advice.

Comment 2 Justin Clift 2013-10-18 16:57:03 UTC
Created attachment 813855 [details]
Screenshot of the error message in systemd output

Comment 3 Eric Sandeen 2013-10-18 17:02:14 UTC
(In reply to Justin Clift from comment #0)
> Description of problem:
> 
>   The output from systemd's filesystem "mount" procedure is extremely
>   sub-optimal for at least XFS if the mount fails. (can lead to data loss)
> 
>   If (for example), partition numbering on a machine has changed, such that
>   systemd tries to mount a non-XFS filesystem as XFS, the message displayed
>   to the user is:
> 
>     XFS (sda7): Corruption detected. Unmount and run xfs_repair

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=commit;h=31625f28ad7be67701dc4cefcf52087addd88af4

"xfs: don't emit corruption noise on fs probes

If we get EWRONGFS due to probing of non-xfs filesystems,
there's no need to issue the scary corruption error and backtrace."

Courtesy of yours truly.  ;)

>   Data recovery procedures to fix this misidentified "corruption" can then
>   cause data loss.  (just happened)
> 
>   ***
> 
>   The problem appears to be that systemd is showing the last few lines
>   of the system log (misleading), instead of the output from the mount
>   command (useful, and accurate).

I disagree on that, mount output is almost always generic, dmesg shows the real problems.

But xfs was being too noisy, and it should eventually be fixed.  That could be a stable candidate if people want it.

-Eric

Comment 4 Justin Clift 2013-10-18 17:16:18 UTC
Thanks Michal.  Just attached screenshot showing where in the systemd output it shows up.  Hopefully that helps.

Also useful might be the full text of the system log output (more in depth than my summary version):

*****************************************************************

Oct 18 16:46:30 f19laptop kernel: [ 1801.893919] XFS (sda7): bad magic number
Oct 18 16:46:30 f19laptop kernel: [ 1801.893924] ffff8801e1bc7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 18 16:46:30 f19laptop kernel: [ 1801.893926] ffff8801e1bc7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 18 16:46:30 f19laptop kernel: [ 1801.893927] ffff8801e1bc7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 18 16:46:30 f19laptop kernel: [ 1801.893928] ffff8801e1bc7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 18 16:46:30 f19laptop kernel: [ 1801.893929] XFS (sda7): Internal error xfs_sb_read_verify at line 780 of file fs/xfs/xfs_mount.c.  Caller 0xffffffffa060ea25
Oct 18 16:46:30 f19laptop kernel: [ 1801.893929] 
Oct 18 16:46:30 f19laptop kernel: [ 1801.893932] CPU: 7 PID: 181 Comm: kworker/7:1H Not tainted 3.11.4-201.0.bz953211.fc19.x86_64 #1
Oct 18 16:46:30 f19laptop kernel: [ 1801.893933] Hardware name: Dell Inc.          Dell System Inspiron 7720/04M3YM, BIOS A15 05/17/2013
Oct 18 16:46:30 f19laptop kernel: [ 1801.893950] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
Oct 18 16:46:30 f19laptop kernel: [ 1801.893952]  0000000000000001 ffff88023063fd68 ffffffff81638c2f ffff880231c27800
Oct 18 16:46:30 f19laptop kernel: [ 1801.893954]  ffff88023063fd80 ffffffffa06110db ffffffffa060ea25 ffff88023063fdb8
Oct 18 16:46:30 f19laptop kernel: [ 1801.893956]  ffffffffa0611135 0000030c00000000 ffff8801e8df7980 0000000000000016
Oct 18 16:46:30 f19laptop kernel: [ 1801.893958] Call Trace:
Oct 18 16:46:30 f19laptop kernel: [ 1801.893963]  [<ffffffff81638c2f>] dump_stack+0x45/0x56                                                                                                        
Oct 18 16:46:30 f19laptop kernel: [ 1801.893973]  [<ffffffffa06110db>] xfs_error_report+0x3b/0x40 [xfs]                                                                                            
Oct 18 16:46:30 f19laptop kernel: [ 1801.893981]  [<ffffffffa060ea25>] ? xfs_buf_iodone_work+0x85/0xf0 [xfs]                                                                                       
Oct 18 16:46:30 f19laptop kernel: [ 1801.893989]  [<ffffffffa0611135>] xfs_corruption_error+0x55/0x80 [xfs]                                                                                        
Oct 18 16:46:30 f19laptop kernel: [ 1801.894003]  [<ffffffffa06622b4>] xfs_sb_read_verify+0x114/0x130 [xfs]                                                                                        
Oct 18 16:46:30 f19laptop kernel: [ 1801.894011]  [<ffffffffa060ea25>] ? xfs_buf_iodone_work+0x85/0xf0 [xfs]                                                                                       
Oct 18 16:46:30 f19laptop kernel: [ 1801.894018]  [<ffffffffa060ea25>] xfs_buf_iodone_work+0x85/0xf0 [xfs]                                                                                         
Oct 18 16:46:30 f19laptop kernel: [ 1801.894023]  [<ffffffff810810d5>] process_one_work+0x175/0x430                                                                                                
Oct 18 16:46:30 f19laptop kernel: [ 1801.894025]  [<ffffffff81081cfb>] worker_thread+0x11b/0x3a0                                                                                                   
Oct 18 16:46:30 f19laptop kernel: [ 1801.894027]  [<ffffffff81081be0>] ? rescuer_thread+0x340/0x340                                                                                                
Oct 18 16:46:30 f19laptop kernel: [ 1801.894029]  [<ffffffff81088640>] kthread+0xc0/0xd0                                                                                                           
Oct 18 16:46:30 f19laptop kernel: [ 1801.894031]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
Oct 18 16:46:30 f19laptop kernel: [ 1801.894034]  [<ffffffff81647d2c>] ret_from_fork+0x7c/0xb0
Oct 18 16:46:30 f19laptop kernel: [ 1801.894036]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
Oct 18 16:46:30 f19laptop kernel: [ 1801.894037] XFS (sda7): Corruption detected. Unmount and run xfs_repair
Oct 18 16:46:30 f19laptop kernel: [ 1801.894043] XFS (sda7): SB validate failed with error 22

*****************************************************************


> Even if you saw the error output from the mount command, it still
> points you to check dmesg where you'd find this misleading advice.
> So I don't see how better off you'd be.
>
> >   Output from corresponding mount command
> >   ***************************************
> > 
> >   $ sudo mount -t xfs /dev/sda7 /example
> >   mount: wrong fs type, bad option, bad superblock on /dev/sda7,
> >          missing codepage or helper program, or other error
> > 
> >          In some cases useful info is found in syslog - try
> >          dmesg | tail or so.
> 
> Exactly! I would think "Bad superblock!", check dmesg and get the bad advice.

Good point.

* the system log output was directly misleading
* the mount command output was a bit better, but not great

I kind of wonder if there's an easy way to add a sanity check before attempting the mount.  Something similar to running fstyp (a Solaris command to return the filesystem type on a block device), would be useful here, but I don't know of a Linux equivalent off the top of my head. :(

Comment 5 Eric Sandeen 2013-10-18 17:25:17 UTC
(In reply to Justin Clift from comment #4)

> I kind of wonder if there's an easy way to add a sanity check before
> attempting the mount.  Something similar to running fstyp (a Solaris command
> to return the filesystem type on a block device), would be useful here, but
> I don't know of a Linux equivalent off the top of my head. :(

There should be no need for that... this was bad xfs behavior, fixed upstream.  No need to engineer more complex userspace workarounds...

Comment 6 Justin Clift 2013-10-18 17:28:59 UTC
No worries, thanks Eric. :)

Comment 7 Justin M. Forbes 2014-01-03 22:09:46 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 8 Justin M. Forbes 2014-03-10 14:42:24 UTC
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested.

Comment 9 Eric Sandeen 2014-03-10 15:50:06 UTC
Closing seems fine, this was fixed upstream by:


http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=commit;h=31625f28ad7be67701dc4cefcf52087addd88af4

Comment 10 Josh Boyer 2014-03-10 17:21:07 UTC
Changing to ERRATA.  Thanks Eric.


Note You need to log in before you can comment on or make changes to this bug.