Bug 442315

Summary: FATAL: Error inserting ecryptfs (/lib/modules/2.6.18-81.el5/kernel/fs/ecryptfs/ecryptfs.ko): Input/output error
Product: Red Hat Enterprise Linux 5 Reporter: Michal Nowak <mnowak>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: coughlan, dchapman, duck, dzickus, esandeen, james.smart, luyu, ohudlick
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-25 10:45:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kring from HP Sapphire none

Description Michal Nowak 2008-04-14 09:52:43 UTC
Description of problem:

Not possible to insert ecryptfs kernel module

FATAL: Error inserting ecryptfs
(/lib/modules/2.6.18-81.el5/kernel/fs/ecryptfs/ecryptfs.ko): Input/output error

Version-Release number of selected component (if applicable):
kernel-2.6.18-89.el5
kernel-2.6.18-81.el5


How reproducible:
always

Steps to Reproduce:
1. modprobe ecryptfs

Actual results:
not loaded

Expected results:
loaded

Additional info:

machine is: hp-sapphire-01.rhts.boston.redhat.com

Apr 14 05:41:01 hp-sapphire-01 kernel: ecryptfs_init_netlink: Failed to create
netlink socket

Apr 14 05:41:01 hp-sapphire-01 kernel: ecryptfs_init: Failure occured while
attempting to initialize the eCryptfs netlink socket

Comment 1 Michal Nowak 2008-04-14 10:26:11 UTC
But runs OK on intel-s6e5132-01.rhts.boston.redhat.com with -89 kernel. 

Comment 2 Doug Chapman 2008-04-15 14:55:52 UTC
So far runs OK on all the other systems I have tried this on.  I am reserving
hp-sapphire-01 now to see if the problem is unique to that box but I have a
suspicion that this is more related to network configuration (based on the
netlink error).  Had any networking tests or something that may have done
someting "special" with the networking been done on this box prior to seeing the
error?


Comment 3 Michal Nowak 2008-04-16 08:32:41 UTC
> Had any networking tests or something that may have done
> someting "special" with the networking been done on this box prior to seeing the
> error?

Don't think so. It happened right after restart. Basically the networking
worked, was logged via ssh in.

Comment 4 Luming Yu 2008-04-21 02:46:48 UTC
please post dmesg after you see "Error inserting ecryptfs..."

Comment 5 Michal Nowak 2008-04-21 09:46:22 UTC
Unluckily, we have in lab only one this HP Sapphire machine and that is booked.
I sent my registration job to queue and will wait for it, dunno how long it
might take.

Anyone, feel free to pick up the machine if you get it into it and post the
kring data here. 

Comment 7 Michal Nowak 2008-04-21 10:54:21 UTC
Created attachment 303133 [details]
kring from HP Sapphire

Comment 8 Eric Sandeen 2008-04-21 14:25:04 UTC
Does this happen every time?  I wonder why the netlink socket creation fails
(this is below ecryptfs, FWIW...)  Instrumenting to find where it failed would
probably be instructive...

Comment 9 Michal Nowak 2008-04-22 06:59:48 UTC
(In reply to comment #8)
> Does this happen every time?  I wonder why the netlink socket creation fails
> (this is below ecryptfs, FWIW...)  Instrumenting to find where it failed would
> probably be instructive...

Yes, every time.

I am sorry but probably I am unable to provide more information.

I just picked up the machine from RHTS and modprobed ecryptfs kernel module and
got error msg on stdout, in /var/log/messages as well as in kring/dmesg. 

I saw it happen only on this machine and only with this module. For more
information, do register the machine in RHTS lab and take a look on your own,
please.

Comment 10 Luming Yu 2008-04-24 06:33:11 UTC
There are a few possibilities: 

* no networking support in the kernel.
* Low memory.
* Security policy.

I don't see any possibility that is IA64 arch specific so far...
So it should have chance to be reproduced on other platforms with similar
configurations...


Comment 11 Luming Yu 2008-04-24 06:36:04 UTC
Please verify if any possibility above applies.

Comment 12 Michal Nowak 2008-04-24 09:09:31 UTC
(In reply to comment #10)
> There are a few possibilities: 
> 
> * no networking support in the kernel.

I am connected to the server via SSH, there are eth0-3, eth0 active.

> * Low memory.

Don't think so.

[root@hp-sapphire-01 ~]# free
             total       used       free     shared    buffers     cached
Mem:     100125888    1033856   99092032          0      31120     230080
-/+ buffers/cache:     772656   99353232
Swap:      4194272          0    4194272

> * Security policy.

[root@hp-sapphire-01 ~]# ausearch -m avc -ts recent
<no matches>

after modprobing module. Another "Security policy" I should check?

> I don't see any possibility that is IA64 arch specific so far...
> So it should have chance to be reproduced on other platforms with similar
> configurations...

Nor do I. (Flipped to "All".)

Comment 13 Michal Nowak 2008-07-22 11:41:49 UTC
Any progress on this?

Comment 14 Doug Chapman 2008-07-23 18:49:15 UTC
Finally got my hands on hp-sapphire-01, can't reproduce this on any other system
(no idea why).

here is what I am seeing in brief, I need to dig deeper to understand the details.

down in the netlink layer netlink_insert is returning -EADDRINUSE.  The call
stack is:

netlink_insert
netlink_kernel_create
ecryptfs_init_netlink
ecryptfs_init_messaging
ecryptfs_init


I have no idea why we are seeing this and why it is only this system.




Comment 15 Doug Chapman 2008-07-23 20:01:44 UTC
Ah HA!!!!!

This is due to an UGLY hack in the lpfc driver.  Both ecryptfs and lpfc are
using 19 as the unit number for netlink.  ecryptfs is doing the right thing by
adding NETLINK_ECRYPTFS to linux/netlink.h however lpfc has this hack.

In drivers/scsi/lpfc/Makefile:

EXTRA_CFLAGS += -DNETLINK_FCTRANSPORT=19


So, on you will run into this bug on any system that has an emulex FC adapter.



Comment 16 Doug Chapman 2008-07-23 20:03:37 UTC
Chip,

Not sure if you "own" lpfc but I am guessing you are the right owner for this?

- Doug


Comment 17 Eric Sandeen 2008-07-23 21:32:34 UTC
Hey, thanks for sorting that out :) 

FWIW, eCryptfs in RHEL5.3 and beyond should not be using netlink anymore, at
least by default...

Probably still worth fixing, though.

Comment 18 Michal Nowak 2008-07-29 07:49:22 UTC
Pretty good work Doug! Thanks for resolution. 

Worth fixing.

Comment 19 Chip Coldwell 2008-08-01 18:29:22 UTC
Adding James Smart at Emulex to the CC: list.

James: it seems we have a collision in netlink unit numbers between the lpfc
driver and the ecryptfs filesystem.  Can we move lpfc to a different number or
will that break management apps?

Chip


Comment 20 James Smart 2008-08-01 21:29:57 UTC
Yep. This a shortcoming of our driver.

The driver that will be submitted for 5.3 changed the netlink unit number from
19 to 25.  Is this acceptable ?

We'd really like to use the just-pushed-upstream patch for driver-specific
netlink messages that use the scsi unit number, but that requires a change that
likely makes a binary interface change - thus I'm assuming it can't be done.

Comment 21 Michal Nowak 2008-08-19 09:36:49 UTC
coughlan: How can I help you?

Comment 23 Tom Coughlan 2008-11-10 20:02:17 UTC
(In reply to comment #21)
> coughlan: How can I help you?

The fix is in snapshot 2, kernel 2.6.18-122.el5. Please test this and confirm the fix.

Comment 24 Michal Nowak 2008-11-18 11:00:09 UTC
Looks like we don't have this box in RHTS anymore. Probably can't help.

Comment 25 Michal Nowak 2009-06-25 10:45:41 UTC
We have it actually. Just tested with -153.el5 - works out of the box. Closing.