Bug 771285

Summary: mount fails with 2 XFS filesystems
Product: [Fedora] Fedora Reporter: Pete Zaitcev <zaitcev>
Component: kmodAssignee: kmod development team <kmod-maint>
Status: CLOSED RAWHIDE QA Contact: Kay Sievers <kay>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: awalkersg, circular, colin, gansalmon, itamar, johannbg, jonathan, kernel-maint, kzak, lemenkov, lpoetter, madhu.chinakonda, marcosfrm, metherid, mschmidt, msivak, notting, plautrba, systemd-maint, vmlinuz386
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kmod-7-1.fc17 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-08 19:46:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
console capture 1
none
/etc/fstab
none
dmesg none

Description Pete Zaitcev 2012-01-03 05:48:08 UTC
Description of problem:

After a boot, system comes to the repair prompt due to failure to mount.

Version-Release number of selected component (if applicable):

systemd-36-3.fc16
xfsprogs-3.1.7-1.fc17

How reproducible:

Unknown... Seems 100% now, but somehow worked before

Steps to Reproduce:
1. configure 2 xfs filesystems
2. reboot
  
Actual results:

Stuck at "Give root password for maintenance"

Expected results:

Normal boot as usual

Additional info:

No idea what I break. This definitely worked before Christmas vacation.
I shut down the VMs, turned the box off, and turned it on today.

Please see attached console capture.

In it, the "first" filesystem fails (vdb), but the second filesystem (vdc)
mounts just fine. When I log in through maintenance prompt, it's mounted.
Exactly same parameters, filesystem is completely identical!

Comment 1 Pete Zaitcev 2012-01-03 05:49:22 UTC
Created attachment 550363 [details]
console capture 1

Comment 2 Pete Zaitcev 2012-01-03 05:57:11 UTC
The problem may to have something with XFS an unclean shutdown.
I ran xfs_check on both filesystems, and the VM now boots normally.
There were no messages about any filesystem errors, but presumably
xfs_check sets a superblock flag.

Comment 3 Michal Schmidt 2012-01-03 14:03:08 UTC
Could you attach your /etc/fstab?

systemd spawned "/bin/mount /src/node/vdb", but the mount failed with an error:
mount: unknown filesystem type 'xfs'

I don't see what systemd did wrong here. Reassigning to util-linux.

Comment 4 Pete Zaitcev 2012-01-03 14:35:50 UTC
Created attachment 550439 [details]
/etc/fstab

Comment 5 Karel Zak 2012-01-03 14:37:35 UTC
Please:

 * check dmesg output

 * try "strace -o ~/log mount /src/node/vdb" and send me the ~/log file

Comment 6 Pete Zaitcev 2012-01-03 14:52:24 UTC
You do realize that mount under strace is going to succeed, don't you?
I suppose I could create a wrapper that traces _all_ mount invocations.

Comment 7 Pete Zaitcev 2012-01-03 14:57:29 UTC
Created attachment 550445 [details]
dmesg

This dmesg is captured at the maintenance prompt after failure.

Comment 8 Karel Zak 2012-01-03 15:47:41 UTC
(In reply to comment #6)
> You do realize that mount under strace is going to succeed, don't you?
> I suppose I could create a wrapper that traces _all_ mount invocations.

I thought that you're able to call mount(8) manually from command line. It seems that you can disable (comment out) the /src/node/* entries in your fstab to boot successfully

Comment 9 Karel Zak 2012-01-03 15:48:21 UTC
(In reply to comment #8)
> you can disable (comment out) the /src/node/* entries

or add "noauto" there

Comment 10 Pete Zaitcev 2012-01-03 19:14:50 UTC
The bug only occurs when two mounts are run simultaneously by the systemd.
If they run consequently or only one is run, they succeed. It's something
about the way mount detects the presense of the module before mounting.

Comment 11 Karel Zak 2012-01-03 20:07:35 UTC
(In reply to comment #10)
> The bug only occurs when two mounts are run simultaneously by the systemd.
> If they run consequently or only one is run, they succeed. It's something
> about the way mount detects the presense of the module before mounting.

It sounds like kernel problem, mount(8) does not care about modules, it's kernel job...

mount(8) prints the "unknown filesystem type" message only if mount(2) syscall returns ENODEV and the FS type is not found in /proc/filesystems.

BTW, 
  udevd[293]: segfault at 24 ip 00007f13dbd01992 sp 00007fff6dc53fa0 error 6 in udevd[7f13dbcfd000+21000]

looks strange.

Comment 12 Michal Schmidt 2012-02-14 11:06:48 UTC
*** Bug 790238 has been marked as a duplicate of this bug. ***

Comment 13 Kay Sievers 2012-02-14 14:24:56 UTC
Usually the mount() syscall triggers the in-kernel modprobe  loader to insert
the module for an unknown, not already loaded filesystem. This call blocks
until the module in properly linked into the kernel.

One possible explanation could be that that two competing mount() syscalls
for the same filesystem module race against each other and one of them does
not block for some reason.

The problem might be new, before systemd, we certainly did almost everything
fully serialized in userspace.

It can be that the modprobe binary returns to early, or that the kernel does
not call the second modprobe at all.

Can someone who can reproduce the problem possibly add some printk() debugs
to:
  get_fs_type()
in:
  fs/filesystems.c

to get a clue here. Thanks!

Comment 14 Andrew Walker 2012-02-16 01:58:56 UTC
I added printk() into get_fs_type() as suggested and here's what I saw:

[   18.947397] #####-----> get_fs_type() entered with name=xfs
[   18.965933] #####-----> get_fs_type() entered with name=xfs
<snip>
[   19.214892] #####-----> get_fs_type() for name=xfs returned with   (null)
[   19.216575] SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
[   19.218279] systemd[1]: mnt-whatever.mount mount process exited, code=exited status=32
[   19.219521] mount[472]: mount: unknown filesystem type 'xfs'
[   19.222075] SGI XFS Quota Management subsystem
[   19.223593] #####-----> get_fs_type() for name=xfs returned with f7ff57e0
[   19.225218] XFS (sdb2): Mounting Filesystem
[   19.230243] systemd[1]: Job fedora-autorelabel-mark.service/start failed with result 'dependency'.
[   19.232221] systemd[1]: Job fedora-autorelabel.service/start failed with result 'dependency'.
[   19.233151] systemd[1]: Job local-fs.target/start failed with result 'dependency'.
[   19.233985] systemd[1]: Triggering OnFailure= dependencies of local-fs.target.
[   19.234828] systemd[1]: Unit mnt-whatever.mount entered failed state.
[   19.365065] XFS (sdb2): Ending clean mount

You can see from the above that one of the invocations of get_fs_type() returns with (null) while the other succeeds later.

Hope this helps!

Comment 15 Pete Zaitcev 2012-02-23 19:53:39 UTC
For now, I worked around this by doing this:

cat <<EOF >/etc/rc.modules
#!/bin/sh
modprobe xfs
EOF
chmod 755 /etc/rc.modules

Comment 16 Kay Sievers 2012-02-24 00:17:05 UTC
A possible explanation is that two modprobe calls are issued by the kernel.
the first one links the module into the kernel, and the second one bails out
to early because it finds the module in /sys/module/ but it is not fully
initialized at that moment, so the second call does not block long enough
and fails.

Taking over the bug until we find out if that's the case. I'm trying to fix
modprobe now.

Comment 17 Kay Sievers 2012-02-24 00:58:57 UTC
New kmod package on the way, which might block the second modprobe for a
longer time:
  http://koji.fedoraproject.org/koji/taskinfo?taskID=3814472

Comment 18 Fedora Update System 2012-02-24 10:01:15 UTC
kmod-5-8.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kmod-5-8.fc17

Comment 19 Fedora Update System 2012-02-24 22:32:04 UTC
Package kmod-5-8.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kmod-5-8.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2335/kmod-5-8.fc17
then log in and leave karma (feedback).

Comment 20 Andrew Walker 2012-02-25 01:01:44 UTC
Will this fix be back-ported to Fedora 16?

Comment 21 Fedora Update System 2012-03-04 15:47:09 UTC
kmod-6-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kmod-6-1.fc17

Comment 22 Fedora Update System 2012-03-19 14:50:03 UTC
kmod-7-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kmod-7-1.fc17

Comment 23 Fedora Update System 2012-04-12 03:21:25 UTC
kmod-7-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Pete Zaitcev 2012-06-17 16:10:56 UTC
Still a problem on F17 with kmod-7-2.fc17 but whatever. The workaround
is still effective.

Comment 25 Jeremy Uchitel 2012-07-31 01:40:22 UTC
I think I am also seeing this problem on F16 with kernel-3.4.6-1 and module-init-tools-3.16-5.  Mounting the two xfs systems worked when I originally configured the system with F15, but stopped after my upgrade to F16.  Hoping to replace this with a new F17 install in the near future, but have copied Pete's pre-loading of the xfs module as a fix for now. (It works).  Just a side observation, but it seems there are a few cases where the systemd init seems more susceptible to race conditions than the old one.

Comment 26 Kay Sievers 2012-07-31 13:39:49 UTC
Seems we still miss the loop in kmod, that blocks the second modprobe
until the first modprobe returns and the module state has turned from
loading to ready.

Comment 27 Gerardo Exequiel Pozzi 2012-09-11 16:21:50 UTC
https://bugs.freedesktop.org/show_bug.cgi?id=53665 [mount fails when fstab has more than one entry for unloaded fs module]

Comment 28 Josh Boyer 2012-09-14 12:39:17 UTC
Rusty has submitted a patch in the kernel module loading to fix this issue:

http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709

That should resolve things as soon as it gets into Fedora.

Comment 29 Pete Zaitcev 2013-01-08 19:46:06 UTC
Fixed in kernel-3.7.0-6.fc19 (turned out to require a kernel fix after all,
the workarounds in kmod were insufficient). Closing.