Bug 442315 - FATAL: Error inserting ecryptfs (/lib/modules/2.6.18-81.el5/kernel/fs/ecryptfs/ecryptfs.ko): Input/output error
FATAL: Error inserting ecryptfs (/lib/modules/2.6.18-81.el5/kernel/fs/ecryptf...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
medium Severity medium
: rc
: ---
Assigned To: Tom Coughlan
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-14 05:52 EDT by Michal Nowak
Modified: 2013-03-07 21:04 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-25 06:45:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
kring from HP Sapphire (25.89 KB, text/plain)
2008-04-21 06:54 EDT, Michal Nowak
no flags Details

  None (edit)
Description Michal Nowak 2008-04-14 05:52:43 EDT
Description of problem:

Not possible to insert ecryptfs kernel module

FATAL: Error inserting ecryptfs
(/lib/modules/2.6.18-81.el5/kernel/fs/ecryptfs/ecryptfs.ko): Input/output error

Version-Release number of selected component (if applicable):
kernel-2.6.18-89.el5
kernel-2.6.18-81.el5


How reproducible:
always

Steps to Reproduce:
1. modprobe ecryptfs

Actual results:
not loaded

Expected results:
loaded

Additional info:

machine is: hp-sapphire-01.rhts.boston.redhat.com

Apr 14 05:41:01 hp-sapphire-01 kernel: ecryptfs_init_netlink: Failed to create
netlink socket

Apr 14 05:41:01 hp-sapphire-01 kernel: ecryptfs_init: Failure occured while
attempting to initialize the eCryptfs netlink socket
Comment 1 Michal Nowak 2008-04-14 06:26:11 EDT
But runs OK on intel-s6e5132-01.rhts.boston.redhat.com with -89 kernel. 
Comment 2 Doug Chapman 2008-04-15 10:55:52 EDT
So far runs OK on all the other systems I have tried this on.  I am reserving
hp-sapphire-01 now to see if the problem is unique to that box but I have a
suspicion that this is more related to network configuration (based on the
netlink error).  Had any networking tests or something that may have done
someting "special" with the networking been done on this box prior to seeing the
error?
Comment 3 Michal Nowak 2008-04-16 04:32:41 EDT
> Had any networking tests or something that may have done
> someting "special" with the networking been done on this box prior to seeing the
> error?

Don't think so. It happened right after restart. Basically the networking
worked, was logged via ssh in.
Comment 4 Luming Yu 2008-04-20 22:46:48 EDT
please post dmesg after you see "Error inserting ecryptfs..."
Comment 5 Michal Nowak 2008-04-21 05:46:22 EDT
Unluckily, we have in lab only one this HP Sapphire machine and that is booked.
I sent my registration job to queue and will wait for it, dunno how long it
might take.

Anyone, feel free to pick up the machine if you get it into it and post the
kring data here. 
Comment 7 Michal Nowak 2008-04-21 06:54:21 EDT
Created attachment 303133 [details]
kring from HP Sapphire
Comment 8 Eric Sandeen 2008-04-21 10:25:04 EDT
Does this happen every time?  I wonder why the netlink socket creation fails
(this is below ecryptfs, FWIW...)  Instrumenting to find where it failed would
probably be instructive...
Comment 9 Michal Nowak 2008-04-22 02:59:48 EDT
(In reply to comment #8)
> Does this happen every time?  I wonder why the netlink socket creation fails
> (this is below ecryptfs, FWIW...)  Instrumenting to find where it failed would
> probably be instructive...

Yes, every time.

I am sorry but probably I am unable to provide more information.

I just picked up the machine from RHTS and modprobed ecryptfs kernel module and
got error msg on stdout, in /var/log/messages as well as in kring/dmesg. 

I saw it happen only on this machine and only with this module. For more
information, do register the machine in RHTS lab and take a look on your own,
please.
Comment 10 Luming Yu 2008-04-24 02:33:11 EDT
There are a few possibilities: 

* no networking support in the kernel.
* Low memory.
* Security policy.

I don't see any possibility that is IA64 arch specific so far...
So it should have chance to be reproduced on other platforms with similar
configurations...
Comment 11 Luming Yu 2008-04-24 02:36:04 EDT
Please verify if any possibility above applies.
Comment 12 Michal Nowak 2008-04-24 05:09:31 EDT
(In reply to comment #10)
> There are a few possibilities: 
> 
> * no networking support in the kernel.

I am connected to the server via SSH, there are eth0-3, eth0 active.

> * Low memory.

Don't think so.

[root@hp-sapphire-01 ~]# free
             total       used       free     shared    buffers     cached
Mem:     100125888    1033856   99092032          0      31120     230080
-/+ buffers/cache:     772656   99353232
Swap:      4194272          0    4194272

> * Security policy.

[root@hp-sapphire-01 ~]# ausearch -m avc -ts recent
<no matches>

after modprobing module. Another "Security policy" I should check?

> I don't see any possibility that is IA64 arch specific so far...
> So it should have chance to be reproduced on other platforms with similar
> configurations...

Nor do I. (Flipped to "All".)
Comment 13 Michal Nowak 2008-07-22 07:41:49 EDT
Any progress on this?
Comment 14 Doug Chapman 2008-07-23 14:49:15 EDT
Finally got my hands on hp-sapphire-01, can't reproduce this on any other system
(no idea why).

here is what I am seeing in brief, I need to dig deeper to understand the details.

down in the netlink layer netlink_insert is returning -EADDRINUSE.  The call
stack is:

netlink_insert
netlink_kernel_create
ecryptfs_init_netlink
ecryptfs_init_messaging
ecryptfs_init


I have no idea why we are seeing this and why it is only this system.


Comment 15 Doug Chapman 2008-07-23 16:01:44 EDT
Ah HA!!!!!

This is due to an UGLY hack in the lpfc driver.  Both ecryptfs and lpfc are
using 19 as the unit number for netlink.  ecryptfs is doing the right thing by
adding NETLINK_ECRYPTFS to linux/netlink.h however lpfc has this hack.

In drivers/scsi/lpfc/Makefile:

EXTRA_CFLAGS += -DNETLINK_FCTRANSPORT=19


So, on you will run into this bug on any system that has an emulex FC adapter.

Comment 16 Doug Chapman 2008-07-23 16:03:37 EDT
Chip,

Not sure if you "own" lpfc but I am guessing you are the right owner for this?

- Doug
Comment 17 Eric Sandeen 2008-07-23 17:32:34 EDT
Hey, thanks for sorting that out :) 

FWIW, eCryptfs in RHEL5.3 and beyond should not be using netlink anymore, at
least by default...

Probably still worth fixing, though.
Comment 18 Michal Nowak 2008-07-29 03:49:22 EDT
Pretty good work Doug! Thanks for resolution. 

Worth fixing.
Comment 19 Chip Coldwell 2008-08-01 14:29:22 EDT
Adding James Smart at Emulex to the CC: list.

James: it seems we have a collision in netlink unit numbers between the lpfc
driver and the ecryptfs filesystem.  Can we move lpfc to a different number or
will that break management apps?

Chip
Comment 20 James Smart 2008-08-01 17:29:57 EDT
Yep. This a shortcoming of our driver.

The driver that will be submitted for 5.3 changed the netlink unit number from
19 to 25.  Is this acceptable ?

We'd really like to use the just-pushed-upstream patch for driver-specific
netlink messages that use the scsi unit number, but that requires a change that
likely makes a binary interface change - thus I'm assuming it can't be done.
Comment 21 Michal Nowak 2008-08-19 05:36:49 EDT
coughlan: How can I help you?
Comment 23 Tom Coughlan 2008-11-10 15:02:17 EST
(In reply to comment #21)
> coughlan: How can I help you?

The fix is in snapshot 2, kernel 2.6.18-122.el5. Please test this and confirm the fix.
Comment 24 Michal Nowak 2008-11-18 06:00:09 EST
Looks like we don't have this box in RHTS anymore. Probably can't help.
Comment 25 Michal Nowak 2009-06-25 06:45:41 EDT
We have it actually. Just tested with -153.el5 - works out of the box. Closing.

Note You need to log in before you can comment on or make changes to this bug.