Bug 804345 - init_module sometimes fails with "Bad message" (EBADMSG)
init_module sometimes fails with "Bad message" (EBADMSG)
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
high Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
: 804365 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-17 20:13 EDT by Tom
Modified: 2012-03-28 09:33 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-03-28 09:33:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (1.23 KB, patch)
2012-03-21 12:12 EDT, Milan Broz
no flags Details | Diff

  None (edit)
Description Tom 2012-03-17 20:13:54 EDT
Description of problem:
device-mapper: table: 253: 0: crypt: unknown target type
device-mapper: ioctl: error adding target to table
device-mapper: reload ioctl on temporary-cryptsetup-276 failed: Invalid argument
Failed to setup dm-crypt key mapping for device /dev/sda3.
Check that kernel supports aes-xts-plain64 cipher (check syslog for more info).
Failed to read from key storage.

Version-Release number of selected component (if applicable):
device-mapper-1.02.74-3.fc18
kernel-3.3.0-0.rc7.git2.1.fc18

How reproducible:
Everytime with kernel-3.3.0-0.rc7.git2.1.fc18, but works with previous kernel-3.3.0-0.rc6.git2.2.fc18

Steps to Reproduce:
1.Have working Rawhide with kernel-3.3.0-0.rc6.git2.2.fc18 and device-mapper-1.02.74-2.fc18
2.Update to kernel-3.3.0-0.rc7.git2.1.fc18 and device-mapper-1.02.74-3.fc18
3.Can't unlock encrypted drive

Actual results:


Expected results:


Additional info:
Comment 1 Alasdair Kergon 2012-03-17 21:06:17 EDT
That error "crypt: unknown target type" means you do not have the dm-crypt kernel module available.
Comment 2 Alasdair Kergon 2012-03-17 21:08:09 EDT
If this is during boot, check your initrd, dracut etc.  If later, check where the module is and why it's not loading (lsmod, modprobe, /usr issue maybe etc.)
Comment 3 Tom 2012-03-18 03:02:00 EDT
I see that I have /initrd/lib/modules/3.3.0-0.rc7.git2.1.fc18.x86_64/kernel/drivers/md/dm-crypt.ko in both kernels, but it isn't loading for this one during boot. I tried recreating initramfs with dracut, but it didn't help.

I can unlock drive normally again with the next kernel-3.3.0-0.rc7.git2.3.fc17
Comment 4 Milan Broz 2012-03-18 06:32:11 EDT
*** Bug 804365 has been marked as a duplicate of this bug. ***
Comment 5 Milan Broz 2012-03-18 06:37:13 EDT
According to comment #3, it works with kernel-3.3.0-0.rc7.git2.3.fc17/fc18 again?

Seems like a glitch in the kernel build then.
Comment 6 Richard W.M. Jones 2012-03-18 06:40:18 EDT
I have kicked off another libguestfs build against the
git2.3 kernel, so we'll see if this is resolved.
Comment 7 Richard W.M. Jones 2012-03-18 06:46:49 EDT
(In reply to comment #6)
> I have kicked off another libguestfs build against the
> git2.3 kernel, so we'll see if this is resolved.

... or at least, I would do if the git2.3 kernel was available
in F18, which it isn't.  It seems like comment 3 "fixed" things
just because Tom tried a F17 kernel, not an F18 kernel.
Comment 8 Richard W.M. Jones 2012-03-18 11:04:20 EDT
I disabled LUKS tests in libguestfs and rebuilt, and
it failed again with a strange module loading bug:

  md: personality for level 5 is not loaded!
  mdadm: RUN_ARRAY failed: Invalid argument

Since this is the same git2.1 kernel, I wonder if there
is a generic module loading bug?

This was all working 2 days ago ...

Full log (huge) is here:

http://koji.fedoraproject.org/koji/getfile?taskID=3905826&name=build.log
Comment 9 Milan Broz 2012-03-18 19:45:11 EDT
Something is broken in this kernel build...

# modprobe dm-crypt
ERROR: could not insert 'dm_crypt': Bad message

strace insmod dm-crypt.ko 

...
stat(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/lib/modules/3.3.0-0.rc7.git2.1.fc18.x86_64/kernel/drivers/md", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/lib/modules/3.3.0-0.rc7.git2.1.fc18.x86_64/kernel/drivers/md/./dm-crypt.ko", {st_mode=S_IFREG|0744, st_size=34864, ...}) = 0
open("/lib/modules/3.3.0-0.rc7.git2.1.fc18.x86_64/kernel/drivers/md/./dm-crypt.ko", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1", 6)               = 6
lseek(3, 0, SEEK_SET)                   = 0
fstat(3, {st_mode=S_IFREG|0744, st_size=34864, ...}) = 0
mmap(NULL, 34864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdd4584a000
init_module(0x7fdd4584a000, 34864, "")  = -1 EBADMSG (Bad message)
munmap(0x7fdd4584a000, 34864)           = 0
close(3)                                = 0
write(2, "Error: could not insert module ."..., 58Error: could not insert module ./dm-crypt.ko: Bad message
) = 58
exit_group(1)                           = ?
Comment 10 Milan Broz 2012-03-19 07:48:24 EDT
And kernel-3.3.0-2.fc18.x86_64 works again...
Comment 11 Richard W.M. Jones 2012-03-19 09:11:26 EDT
Are you sure?  libguestfs builds fail, different module,
but same error message:

  [    0.000000] Linux version 3.3.0-2.fc18.x86_64 (mockbuild@x86-09.phx2.fedoraproject.org) (gcc version 4.7.0 20120308 (Red Hat 4.7.0-0.19) (GCC) ) #1 SMP Mon Mar 19 03:14:14 UTC 2012
  [...]
  mount -o  /dev/vda2 /sysroot/
  ERROR: could not insert 'fuse': Bad message
  ntfs-3g-mount: fuse device is missing, try 'modprobe fuse' as root
  guestfsd: error: /dev/vda2 on /: ERROR: could not insert 'fuse': Bad message
  ntfs-3g-mount: fuse device is missing, try 'modprobe fuse' as root

(Note that ntfs-3g worked a few days ago)

Full log is here:
http://koji.fedoraproject.org/koji/getfile?taskID=3909368&name=build.log
Comment 12 Josh Boyer 2012-03-19 10:27:46 EDT
Rawhide is using signed modules now.  EBADMSG is the error returned with there is a corrupted signature in the module.

I get the same results for fuse.ko on an actual machine so it seems occasionally either something is corrupting the signature on random modules, or stripping portions out, etc.

[jwboyer@vader fuse]$ uname -a
Linux vader 3.3.0-2.fc18.x86_64 #1 SMP Mon Mar 19 03:14:14 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
[jwboyer@vader fuse]$ sudo modprobe fuse
ERROR: could not insert 'fuse': Bad message
[jwboyer@vader fuse]$ 


David, any ideas on how to track this down?
Comment 13 Richard W.M. Jones 2012-03-19 10:35:15 EDT
Whenever someone mentions "something" "random" "corrupting"
and "stripping" in the same sentence, my mind turns to prelink.
Comment 14 Josh Boyer 2012-03-19 21:11:45 EDT
(In reply to comment #13)
> Whenever someone mentions "something" "random" "corrupting"
> and "stripping" in the same sentence, my mind turns to prelink.

prelink doesn't touch kernel modules AFAIK.  They don't link against anything.
Comment 15 Milan Broz 2012-03-20 04:16:59 EDT
prelink has nothing to do with it here, I can reproduce it on my local build (on different module though) and prelink is not even installed on this machine.

Is there anyone who know how to debug it except adding some printk in kernel? It is not good idea to switch on module verification and keep it broken for everyone.
Comment 16 David Howells 2012-03-20 06:40:33 EDT
You can ask for debugging information by doing:

    echo 1 >/sys/module/kernel/parameters/modsign_debug

It is possible to get more debugging information, but you have to alter and recompile your kernel.  In kernel/module-verify-sig.c there's a line:

    #undef MODSIGN_DEBUG

Switch that to #define and it'll produce checksums.
Comment 17 Milan Broz 2012-03-20 13:10:47 EDT
This is debug for some wrong module:

[ 3388.252317] looking for sig section '.note.module.sig'
[ 3388.252323] sig in section 27 (size 286)
[ 3388.252328] ffffff89011b0400010800
[ 3388.252357] Trying parser 'pgp'
[ 3388.252371] Parser recognised the format (ret -131936347604992)
[ 3388.252377] 0000003d 56 digested the .altinstr_replacement section, size 12
[ 3388.252386] 000000b1 88 digested the .altinstructions section, size 72
[ 3388.252391] 000000d1 97 digested the .bss section, size 4356
[ 3388.252396] 000000f2 d0 digested the .data section, size 0
[ 3388.252414] 00000387 60 digested the .gnu.linkonce.this_module section, size 608
[ 3388.252428] 00000581 55 digested the .init.text section, size 468
[ 3388.252435] 0000061f 2c digested the .modinfo section, size 122
[ 3388.252440] 00000671 ae digested the .note.gnu.build-id section, size 36
[ 3388.252450] 0000075a 76 digested the .parainstructions section, size 188
[ 3388.252463] 000008f7 5a digested the .rela.altinstructions section, nrels 9
[ 3388.252472] 00000985 8e digested the .rela.gnu.linkonce.this_module section, nrels 2
[ 3388.252496] 00000c84 05 digested the .rela.init.text section, nrels 16
[ 3388.252512] 00000e9a 11 digested the .rela.parainstructions section, nrels 12
[ 3388.252539] 00001286 06 digested the .rela.rodata section, nrels 24
[ 3388.252614] 00001dc5 81 digested the .rela.text section, nrels 60
[ 3388.252628] 00001f83 bc digested the .rela__ksymtab section, nrels 10
[ 3388.252638] 000020a5 b6 digested the .rela__ksymtab_gpl section, nrels 6
[ 3388.253917] 000125b8 75 digested the .rodata section, size 66800
[ 3388.253925] 0001262e 0b digested the .rodata.str1.1 section, size 76
[ 3388.253934] 00012730 d7 digested the .rodata.str1.8 section, size 216
[ 3388.254042] 00013ca9 f1 digested the .text section, size 5464
[ 3388.254049] 00013d1e 1c digested the __ksymtab section, size 80
[ 3388.254055] 00013d77 9e digested the __ksymtab_gpl section, size 48
[ 3388.254063] 00013e19 5d digested the __ksymtab_strings section, size 117
[ 3388.254066] Contributed 81433 bytes to the digest (csum 0xa0)
[ 3388.254077] verify-sig : -74

The interesting difference with working module is:
 sig in section 25 (size 287)
vs
 sig in section 27 (size 286)

sig_size = note->n_descsz;
_debug("sig in section %d (size %d)\n", mvdata->sig_index, sig_size);

So why is the signature length -1 here? Is it normal?
Comment 18 David Howells 2012-03-20 13:37:42 EDT
Ah...  this is the Fedora rawhide kernel, not RHEL?
Comment 19 Milan Broz 2012-03-20 13:44:21 EDT
Yes, this is rawhide. Someone had the idea to enable it there :-)
Comment 20 Milan Broz 2012-03-20 14:37:26 EDT
# ls -l dm-raid.ko.digest.sig 
-rw-r--r-- 1 root root 287 Mar 20 14:29 dm-raid.ko.digest.sig

# modprobe dm-raid
ERROR: could not insert 'dm_raid': Bad message

# dmesg -c
[ 8828.854588] looking for sig section '.note.module.sig'
[ 8828.854595] sig in section 27 (size 286)

And perhaps someone could patch kmod as well to print something more useful than "Bad message" :)
Comment 21 Milan Broz 2012-03-21 12:12:21 EDT
Created attachment 571759 [details]
Proposed patch

Signature contains stored using MPI (multiprecise integers).

According to RFC 4880 sec 3.2, length of MPI is computed starting from most significant bit.

So length can be shorter than checked size in current code.

RFC 3447 sec 8.2.2 RSA size check must be relaxed to conform with shorter signatures if we want to use signatures produced by GPG.

See attached patch where I tried to fix it.

An example of such signature (note data size != 2048)
# gpg --list-packets test.sig 
:signature packet: algo 1, keyid B63419B399B11B0E
        version 4, created 1332279490, md5len 0, sigclass 0x00
        digest algo 8, begin of digest fc 8f
        hashed subpkt 2 len 4 (sig created 2012-03-20)
        subpkt 16 len 8 (issuer key ID B63419B399B11B0E)
        data: [2037 bits]
Comment 22 Milan Broz 2012-03-22 17:16:16 EDT
David, is the patch ok?

If so, please can anyone rebuild rawhide kernel with the patch applied?
Comment 23 Dave Jones 2012-03-22 17:29:27 EDT
building kernel-3.4.0-0.rc0.git1.2.fc18 with that patch (munged to have file headers. What happened there?)
Comment 24 Richard W.M. Jones 2012-03-23 07:56:30 EDT
New kernel has fixed libguestfs builds.
Comment 25 David Howells 2012-03-23 09:21:43 EDT
The patch is okay.
Comment 26 Josh Boyer 2012-03-28 09:33:43 EDT
This should be fixed in rawhide now.

Note You need to log in before you can comment on or make changes to this bug.