libreport version: 2.0.10 cmdline: BOOT_IMAGE=/vmlinuz-3.4.2-1.fc16.x86_64 root=/dev/mapper/d1-root ro rd.md=0 rd.dm=0 KEYTABLE=us quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 rd.lvm.lv=d1/root rd.lvm.lv=d1/swap LANG=en_US.UTF-8 backtrace: :BUG: unable to handle kernel NULL pointer dereference at (null) :IP: [< (null)>] (null) :PGD 0 :Oops: 0010 [#1] SMP :CPU 1 :Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] :Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1 :RIP: 0010:[<0000000000000000>] [< (null)>] (null) :RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246 :RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80 :RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50 :RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0 :R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80 :R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006 :FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000 :CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 :CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0 :DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 :DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 :Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710) :Stack: : ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90 : ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75 : ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8 :Call Trace: : [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100 : [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0 : [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs] : [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc] : [<ffffffff8117f833>] vfs_write+0xb3/0x180 : [<ffffffff8117fb5a>] sys_write+0x4a/0x90 : [<ffffffff81600329>] system_call_fastpath+0x16/0x1b :Code: Bad RIP value. :RIP [< (null)>] (null) : RSP <ffff8801a3645d40> :CR2: 0000000000000000
Might be related to reloading two NFS servers. Other than that nothing unusual was going on. After this oops I was not able to log into any of the other virtual terminals. All access to NFS mounts caused lock ups. Attempted to do this report before rebooting, but it got stuck on running smolt. Package: kernel OS Release: Fedora release 16 (Verne)
I'm seeing these too. There is discussion upstream and apparently a patch: https://lkml.org/lkml/2012/6/18/435
FYI - it appears this was introduced in the 3.4.X kernel, so reverting to 3.3.X should help.
That patch went into Linus' tree as commit b1027439dff844675f6c0df97a1b1d190791a699. It's currently queued for the 3.4.5 stable tree, which should be released shortly. We'll pick it up then.
*** Bug 837998 has been marked as a duplicate of this bug. ***
An alternative to booting a 3.3.x kernel (yum had already removed all of my 3.3 kernels) is to set the nfsvers=3 option in /etc/fstab for each of the mount points that are nfs4.
3.4.5 is built now. Will be the next update.
kernel-3.4.5-2.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/kernel-3.4.5-2.fc17
kernel-3.4.5-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.4.5-1.fc16
kernel-3.4.5-1.fc16 crashes very quickly with: Jul 17 13:46:34 kolea kernel: [ 635.874330] BUG: unable to handle kernel paging request at 000007720000078a Jul 17 13:46:34 kolea kernel: [ 635.874474] IP: [<ffffffff81261589>] __key_instantiate_and_link+0x59/0x100 Jul 17 13:46:34 kolea kernel: [ 635.874606] PGD 0 Jul 17 13:46:34 kolea kernel: [ 635.874648] Oops: 0000 [#1] SMP Jul 17 13:46:34 kolea kernel: [ 635.874715] CPU 0 Jul 17 13:46:34 kolea kernel: [ 635.874752] Modules linked in: nfs fscache tpm_bios auth_rpcgss nfs_acl lockd snd_atiixp snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm ppdev parport_pc tg3 serio_raw parport snd_timer edac_core edac_mce_amd k8temp i2c_piix4 snd shpchp uinput sunrpc soundcore snd_page_alloc ata_generic pata_acpi pata_atiixp sata_sil radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] Jul 17 13:46:34 kolea kernel: [ 635.875017] Jul 17 13:46:34 kolea kernel: [ 635.875017] Pid: 758, comm: rpc.idmapd Not tainted 3.4.5-2.fc17.x86_64 #1 Hewlett-Packard HP dx5150 MT(PZ590UA)/09AC Jul 17 13:46:34 kolea kernel: [ 635.875017] RIP: 0010:[<ffffffff81261589>] [<ffffffff81261589>] __key_instantiate_and_link+0x59/0x100 Jul 17 13:46:34 kolea kernel: [ 635.875017] RSP: 0018:ffff880135ef7d38 EFLAGS: 00010246 Jul 17 13:46:34 kolea kernel: [ 635.875017] RAX: 0000077200000772 RBX: ffff8801372b5a80 RCX: ffff880134d900c0 Jul 17 13:46:34 kolea kernel: [ 635.875017] RDX: 0000000000000005 RSI: ffff880135ef7de9 RDI: ffff8800a196d270 Jul 17 13:46:34 kolea kernel: [ 635.875017] RBP: ffff880135ef7d78 R08: ffff8801372b5a80 R09: ffff880135ef7da0 Jul 17 13:46:34 kolea kernel: [ 635.875017] R10: 0000000000000000 R11: 0000000000000246 R12: ffff880134d900c0 Jul 17 13:46:34 kolea kernel: [ 635.875017] R13: ffff8800a196d270 R14: ffff880135ef7de9 R15: 0000000000000005 Jul 17 13:46:34 kolea kernel: [ 635.875017] FS: 00007f8878fa2700(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 Jul 17 13:46:34 kolea kernel: [ 635.875017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 17 13:46:34 kolea kernel: [ 635.875017] CR2: 000007720000078a CR3: 0000000137db2000 CR4: 00000000000007f0 Jul 17 13:46:34 kolea kernel: [ 635.875017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 17 13:46:34 kolea kernel: [ 635.875017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 17 13:46:34 kolea kernel: [ 635.875017] Process rpc.idmapd (pid: 758, threadinfo ffff880135ef6000, task ffff8801380f8000) Jul 17 13:46:34 kolea kernel: [ 635.875017] Stack: Jul 17 13:46:34 kolea kernel: [ 635.875017] ffff880135ef7da0 ffff880135ef7da0 ffff880135419f00 ffff880134d900c0 Jul 17 13:46:34 kolea kernel: [ 635.875017] ffff8800a196d270 ffff880135ef7de9 0000000000000005 ffff8801372b5a80 Jul 17 13:46:34 kolea kernel: [ 635.875017] ffff880135ef7dc8 ffffffff81261693 ffff880135ef7dd8 ffff8801372b5a80 Jul 17 13:46:34 kolea kernel: [ 635.875017] Call Trace: Jul 17 13:46:34 kolea kernel: [ 635.875017] [<ffffffff81261693>] key_instantiate_and_link+0x63/0xa0 Jul 17 13:46:34 kolea kernel: [ 635.875017] [<ffffffffa0328cfc>] idmap_pipe_downcall+0x1dc/0x1f0 [nfs] Jul 17 13:46:34 kolea kernel: [ 635.875017] [<ffffffffa0192e79>] rpc_pipe_write+0x69/0x90 [sunrpc] Jul 17 13:46:34 kolea kernel: [ 635.875017] [<ffffffff8118127c>] vfs_write+0xac/0x180 Jul 17 13:46:34 kolea kernel: [ 635.875017] [<ffffffff811815aa>] sys_write+0x4a/0x90 Jul 17 13:46:34 kolea kernel: [ 635.875017] [<ffffffff815fcaa9>] system_call_fastpath+0x16/0x1b Jul 17 13:46:34 kolea kernel: [ 635.875017] Code: 49 89 d7 49 89 cc 4c 89 c3 4c 89 4d c8 e8 c0 0e 39 00 49 8b 45 70 a8 01 0f 85 7c 00 00 00 49 8b 45 20 4c 89 fa 4c 89 f6 4c 89 ef <ff> 50 18 85 c0 41 89 c7 75 6d 49 8b 55 48 3e ff 42 44 49 8d 45 Jul 17 13:46:34 kolea kernel: [ 635.875017] RIP [<ffffffff81261589>] __key_instantiate_and_link+0x59/0x100 Jul 17 13:46:34 kolea kernel: [ 635.875017] RSP <ffff880135ef7d38> Jul 17 13:46:34 kolea kernel: [ 635.875017] CR2: 000007720000078a Which is different, but still not good. Seems to be basically in the same code (idmap/__key_instantiate_and_link).
Package kernel-3.4.5-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.4.5-1.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-10764/kernel-3.4.5-1.fc16 then log in and leave karma (feedback).
kernel-3.4.5-2.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.4.6-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.4.6-1.fc16
There were reports that 3.4.5 didn't work and 3.4.6 didn't explicitly add anything that I recall. I'm going to reopen this bug for now. If someone could report one way or another on 3.4.6, that would be appreciated.
I've been getting this almost daily on 3.4.x, and I just hit it again on 3.4.4-4.fc16.i686. Now running 3.4.6-1.fc16.i686, and will let you know if it recurs, thanks.
Still able to reproduce this --- twice yesterday, twice today, and just hit it with 3.4.6-1.fc16.i686. Definitely not fixed yet. Thanks!
I can reproduce this. A bit of debugging shows that the dodgy pointer appearing in: Jul 17 13:46:34 kolea kernel: [ 635.874330] BUG: unable to handle kernel paging request at 000007720000078a ought to be the in-kernel pointer to the authorisation key used to grant instantiation rights. This is got from somewhere by idmap_map_downcall(). Looking at the idmap.c code in the kernel, rpc.idmapd uses the legacy key type - but this is not registered with register_key_type() before it is used. That can certainly cause bugs if the module is unloaded. I'm not sure if it will cause problems otherwise.
Package kernel-3.4.6-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.4.6-1.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-11048/kernel-3.4.6-1.fc16 then log in and leave karma (feedback).
(In reply to comment #18) > Package kernel-3.4.6-1.fc16: > * should fix your issue, Ignore this. Just bodhi doing it's thing. I'll edit the update to remove this bug number in the morning
Any idea yet how to get this fixed though? Happens to us with a single NFS server and user homes mounted over NFS.
There is now a fix upstream: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a427b9ec4eda8cd6e641ea24541d30b641fc3140
Here's a scratch build with that patch included: http://koji.fedoraproject.org/koji/taskinfo?taskID=4350655 Testing when it completes would be much appreciated.
It's not immediately crashing like 3.4.5-1.fc16. Will keep an eye on it.
Thanks for the quick deployment of the patched kernel! I have installed kernel-3.4.7-1.1.fc16 on two machines and will watch them. They haven't crashed until now. But I have another problem - I don't know if it is directly related to the problem in this bug report or if I should open a new bug report. Since yesterday nfs4 mapping doesn't work properly. On machines with kernel-3.4.6-1.fc16 nfs mounted partitions some owners and groups of the files and directories of that nfs partitions show the number 4294967294 instead of the username or groupname. But that is _mixed_ in the same directory - some uids and gids are mapped to username and groupname, others are mapped to 4294967294. The nfs server runs kernel 3.3.4-3.fc16, 3.4.6-1.fc16, 3.3.6-3.fc16 - the problem occurs on the nfs client with nfs mounted partitions from all these nfs servers. We use nis - ypbind and ypserv was restarted and are running, rpc.idmapd was restarted and is running. On the nfs server with the 3.4.x I set /sys/module/nfsd/parameters/nfs4_disable_idmapping to N. On the nfs client it doesn't matter if I set /sys/module/nfs/parameters/nfs4_disable_idmapping to N or Y. Do you think that this problem has the same reason like the rpc.idmapd crash, or should I open a new bug report?
I just see the following bug with "dmesg" from a new booted machine with the new kernel after trying to access an automounted nfs partition (with did not return so I have interrupted it): [ 53.395195] ------------[ cut here ]------------ [ 53.395236] kernel BUG at fs/nfs/idmap.c:668! [ 53.395267] invalid opcode: 0000 [#1] SMP [ 53.395302] CPU 0 [ 53.395316] Modules linked in: nfs fscache ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi input_polldev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e iTCO_wdt snd_timer snd soundcore snd_page_alloc i2c_i801 coretemp iTCO_vendor_support serio_raw dcdbas microcode tpm_tis tpm tpm_bios nfsd lockd nfs_acl auth_rpcgss sunrpc uinput binfmt_misc firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [ 53.396024] [ 53.396039] Pid: 1914, comm: mount.nfs Not tainted 3.4.7-1.1.fc16.x86_64 #1 Dell Inc. OptiPlex 755 /0GM819 [ 53.396119] RIP: 0010:[<ffffffffa05740eb>] [<ffffffffa05740eb>] nfs_idmap_legacy_upcall+0x34b/0x350 [nfs] [ 53.396139] RSP: 0018:ffff88010bb71398 EFLAGS: 00010286 [ 53.396139] RAX: 0000000000000019 RBX: ffff88011b3e50c0 RCX: 0000000000000000 [ 53.396139] RDX: 0000000000000001 RSI: ffff88010516257d RDI: ffff88011b3e50db [ 53.396139] RBP: ffff88010bb713d8 R08: 000000000000003a R09: ffff88010bb71330 [ 53.396139] R10: 000000004249b72a R11: 00000000cd363a05 R12: ffff880120961340 [ 53.396139] R13: ffff88010532ff00 R14: ffff88011c674240 R15: 0000000000000019 [ 53.396139] FS: 00007f494e33e800(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000 [ 53.396139] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 53.396139] CR2: 00000036660bdb50 CR3: 000000010bbe5000 CR4: 00000000000007f0 [ 53.396139] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 53.396139] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 53.396139] Process mount.nfs (pid: 1914, threadinfo ffff88010bb70000, task ffff88010bb82e40) [ 53.396139] Stack: [ 53.396139] ffff880105162564 ffff88010516257d ffff88010bb713d8 ffff88010532ff00 [ 53.396139] ffff88011b3e5c00 ffff88011b3e5300 0000000000000000 ffffffffa0587eb5 [ 53.396139] ffff88010bb71478 ffffffff8126a324 0000000000000000 ffff88011daafcc0 [ 53.396139] Call Trace: [ 53.396139] [<ffffffff8126a324>] request_key_and_link+0x2e4/0x410 [ 53.396139] [<ffffffff8126a4be>] request_key_with_auxdata+0x1e/0x70 [ 53.396139] [<ffffffffa05741c6>] nfs_idmap_request_key+0xd6/0x1b0 [nfs] [ 53.396139] [<ffffffffa057449b>] nfs_idmap_lookup_id+0xeb/0x110 [nfs] [ 53.396139] [<ffffffffa05746ce>] ? nfs_map_string_to_numeric+0x3e/0xb0 [nfs] [ 53.396139] [<ffffffffa0574e5c>] nfs_map_group_to_gid+0x5c/0x80 [nfs] [ 53.396139] [<ffffffffa056b895>] decode_getfattr_attrs+0xbf5/0xfc0 [nfs] [ 53.396139] [<ffffffffa056e8b0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] [ 53.396139] [<ffffffffa056c398>] decode_getfattr_generic.constprop.51+0x98/0x110 [nfs] [ 53.396139] [<ffffffffa056e8b0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] [ 53.396139] [<ffffffffa056c423>] decode_getfattr+0x13/0x20 [nfs] [ 53.396139] [<ffffffffa056e937>] nfs4_xdr_dec_lookup_root+0x87/0x90 [nfs] [ 53.396139] [<ffffffffa019e4ed>] rpcauth_unwrap_resp+0x9d/0xd0 [sunrpc] [ 53.396139] [<ffffffffa019b780>] ? rpc_queue_empty+0x40/0x40 [sunrpc] [ 53.396139] [<ffffffffa056e8b0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] [ 53.396139] [<ffffffffa0191d6f>] call_decode+0x1cf/0x870 [sunrpc] [ 53.396139] [<ffffffff8107a540>] ? autoremove_wake_function+0x50/0x50 [ 53.396139] [<ffffffffa019c860>] __rpc_execute+0x80/0x3c0 [sunrpc] [ 53.396139] [<ffffffffa0191ba0>] ? call_bc_transmit+0x140/0x140 [sunrpc] [ 53.396139] [<ffffffffa0191ba0>] ? call_bc_transmit+0x140/0x140 [sunrpc] [ 53.396139] [<ffffffffa019d1e8>] rpc_execute+0x68/0xd0 [sunrpc] [ 53.396139] [<ffffffffa0193de5>] rpc_run_task+0x75/0x90 [sunrpc] [ 53.396139] [<ffffffffa0193f03>] rpc_call_sync+0x43/0x70 [sunrpc] [ 53.396139] [<ffffffffa055c1d0>] _nfs4_call_sync+0x30/0x40 [nfs] [ 53.396139] [<ffffffffa055c6ec>] _nfs4_lookup_root+0xac/0xc0 [nfs] [ 53.396139] [<ffffffffa055f3ae>] nfs4_lookup_root+0x4e/0x90 [nfs] [ 53.396139] [<ffffffffa055f425>] nfs4_proc_get_root+0x35/0xb0 [nfs] [ 53.396139] [<ffffffffa054449c>] nfs4_get_rootfh+0x4c/0x140 [nfs] [ 53.396139] [<ffffffffa0545794>] ? nfs_alloc_fattr+0x24/0x80 [nfs] [ 53.396139] [<ffffffffa053dbe5>] nfs4_server_common_setup+0x95/0x1c0 [nfs] [ 53.396139] [<ffffffffa053ec78>] nfs4_create_server+0x198/0x370 [nfs] [ 53.396139] [<ffffffffa0549450>] nfs4_remote_mount+0x60/0x2e0 [nfs] [ 53.396139] [<ffffffff81187553>] mount_fs+0x43/0x1b0 [ 53.396139] [<ffffffff81144230>] ? __alloc_percpu+0x10/0x20 [ 53.396139] [<ffffffff811a079a>] vfs_kern_mount+0x6a/0xf0 [ 53.396139] [<ffffffffa05499a5>] nfs_do_root_mount+0x95/0xe0 [nfs] [ 53.396139] [<ffffffffa054a0a6>] nfs4_try_mount+0x46/0xc0 [nfs] [ 53.396139] [<ffffffffa0549dc3>] ? nfs4_validate_text_mount_data+0xc3/0xe0 [nfs] [ 53.396139] [<ffffffffa054b037>] nfs_fs_mount+0x677/0x990 [nfs] [ 53.396139] [<ffffffff81187553>] mount_fs+0x43/0x1b0 [ 53.396139] [<ffffffff81144230>] ? __alloc_percpu+0x10/0x20 [ 53.396139] [<ffffffff811a079a>] vfs_kern_mount+0x6a/0xf0 [ 53.396139] [<ffffffff811a0f74>] do_kern_mount+0x54/0x110 [ 53.396139] [<ffffffff811a26f4>] do_mount+0x1a4/0x830 [ 53.396139] [<ffffffff811a23ca>] ? copy_mount_options+0x3a/0x170 [ 53.396139] [<ffffffff811a2ec0>] sys_mount+0x90/0xe0 [ 53.396139] [<ffffffff816047e9>] system_call_fastpath+0x16/0x1b [ 53.396139] Code: ff 0f 1f 80 00 00 00 00 66 c7 07 00 00 83 ea 02 48 83 c7 02 e9 5b fd ff ff 0f 1f 80 00 00 00 00 41 bf f4 ff ff ff e9 0a fe ff ff <0f> 0b 0f 1f 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 e0 [ 53.396139] RIP [<ffffffffa05740eb>] nfs_idmap_legacy_upcall+0x34b/0x350 [nfs] [ 53.396139] RSP <ffff88010bb71398> [ 53.427429] ---[ end trace 0a95a8310084fe43 ]---
FWIW I'm running kernel-3.4.7-1.fc16.x86_64 as of yesterday lunchtime and 3 PC's have seen the crash originally mentioned; all were for users using NFSV4 mounted home directories. I've just switched them to NFSV3 to see if that helps. Aug 7 18:32:07 cc00 kernel: [21815.264882] BUG: unable to handle kernel paging request at 0000000300000471 Aug 7 18:32:07 cc00 kernel: [21815.264918] IP: [<0000000300000471>] 0x300000470 Aug 7 18:32:07 cc00 kernel: [21815.264941] PGD 0 Aug 7 18:32:07 cc00 kernel: [21815.264952] Oops: 0010 [#1] SMP Aug 7 18:32:07 cc00 kernel: [21815.264970] CPU 1 Aug 7 18:32:07 cc00 kernel: [21815.264979] Modules linked in: tcp_lp fuse nfs fscache auth_rpcgss nfs_acl lockd tpm_bios binfmt_misc snd_hda_codec_analog snd_hda_intel snd_hda_codec coretemp dcdbas snd_hwdep ppdev parport_pc parport snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc uinput microcode i2c_i801 iTCO_wdt iTCO_vendor_support sunrpc dell_wmi e1000e sparse_keymap serio_raw wmi ata_generic pata_acpi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] Aug 7 18:32:07 cc00 kernel: [21815.265222] Aug 7 18:32:07 cc00 kernel: [21815.265230] Pid: 830, comm: rpc.idmapd Not tainted 3.4.7-1.fc16.x86_64 #1 Dell Inc. OptiPlex 780 /03NVJ6 Aug 7 18:32:07 cc00 kernel: [21815.265276] RIP: 0010:[<0000000300000471>] [<0000000300000471>] 0x300000470 Aug 7 18:32:07 cc00 kernel: [21815.265306] RSP: 0018:ffff8800dad05d40 EFLAGS: 00010246 Aug 7 18:32:07 cc00 kernel: [21815.265327] RAX: ffff880043a716c0 RBX: ffff880043a717d0 RCX: ffff8801108b0c00 Aug 7 18:32:07 cc00 kernel: [21815.265355] RDX: 0000000000000004 RSI: ffff8800dad05e75 RDI: ffff880043a717d0 Aug 7 18:32:07 cc00 kernel: [21815.265383] RBP: ffff8800dad05d88 R08: 0000000000000000 R09: ffff8800dad05db0 Aug 7 18:32:07 cc00 kernel: [21815.265410] R10: 000000000000000a R11: 0000000000000246 R12: ffff8801108b0c00 Aug 7 18:32:07 cc00 kernel: [21815.265437] R13: ffff8800dad05e75 R14: 0000000000000000 R15: 0000000000000004 Aug 7 18:32:07 cc00 kernel: [21815.265465] FS: 00007fd971dc8700(0000) GS:ffff880117c40000(0000) knlGS:0000000000000000 Aug 7 18:32:07 cc00 kernel: [21815.265496] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 7 18:32:07 cc00 kernel: [21815.265519] CR2: 0000000300000471 CR3: 00000000db794000 CR4: 00000000000407e0 Aug 7 18:32:07 cc00 kernel: [21815.265546] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 7 18:32:07 cc00 kernel: [21815.265573] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 7 18:32:07 cc00 kernel: [21815.265601] Process rpc.idmapd (pid: 830, threadinfo ffff8800dad04000, task ffff88010c638000) Aug 7 18:32:07 cc00 kernel: [21815.265633] Stack: Aug 7 18:32:07 cc00 kernel: [21815.265642] ffffffff81264bf8 ffff8800dad05db0 ffff8800dad05db0 ffff880043a71100 Aug 7 18:32:07 cc00 kernel: [21815.265678] ffff880043a717d0 ffff8801108b0c00 0000000000000004 ffff8800dad05e75 Aug 7 18:32:07 cc00 kernel: [21815.265713] 0000000000000000 ffff8800dad05dd8 ffffffff81264d03 ffff8800dad05de8 Aug 7 18:32:07 cc00 kernel: [21815.265747] Call Trace: Aug 7 18:32:07 cc00 kernel: [21815.265762] [<ffffffff81264bf8>] ? __key_instantiate_and_link+0x58/0x100 Aug 7 18:32:07 cc00 kernel: [21815.265789] [<ffffffff81264d03>] key_instantiate_and_link+0x63/0xa0 Aug 7 18:32:07 cc00 kernel: [21815.265828] [<ffffffffa036867b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs] Aug 7 18:32:07 cc00 kernel: [21815.265860] [<ffffffffa01d9f67>] rpc_pipe_write+0x67/0x90 [sunrpc] Aug 7 18:32:07 cc00 kernel: [21815.265860] [<ffffffff81183b63>] vfs_write+0xb3/0x180 Aug 7 18:32:07 cc00 kernel: [21815.265860] [<ffffffff81183e8a>] sys_write+0x4a/0x90 Aug 7 18:32:07 cc00 kernel: [21815.265860] [<ffffffff816047e9>] system_call_fastpath+0x16/0x1b Aug 7 18:32:07 cc00 kernel: [21815.265860] Code: Bad RIP value. Aug 7 18:32:07 cc00 kernel: [21815.265860] RIP [<0000000300000471>] 0x300000470 Aug 7 18:32:07 cc00 kernel: [21815.265860] RSP <ffff8800dad05d40> Aug 7 18:32:07 cc00 kernel: [21815.265860] CR2: 0000000300000471 Aug 7 18:32:07 cc00 kernel: [21815.272141] ---[ end trace c0e334bf4319c46d ]---
(In reply to comment #26) > FWIW I'm running kernel-3.4.7-1.fc16.x86_64 as of yesterday lunchtime Yeah, that kernel doesn't have any fixes.
(In reply to comment #25) > I just see the following bug with "dmesg" from a new booted machine with the > new kernel after trying to access an automounted nfs partition (with did not > return so I have interrupted it): > > [ 53.395195] ------------[ cut here ]------------ > [ 53.395236] kernel BUG at fs/nfs/idmap.c:668! > [ 53.395267] invalid opcode: 0000 [#1] SMP > [ 53.395302] CPU 0 > [ 53.395316] Modules linked in: nfs fscache ip6table_filter ip6_tables > ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 > nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle tun bridge > stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i > libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr > iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi input_polldev > snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq > snd_seq_device snd_pcm e1000e iTCO_wdt snd_timer snd soundcore > snd_page_alloc i2c_i801 coretemp iTCO_vendor_support serio_raw dcdbas > microcode tpm_tis tpm tpm_bios nfsd lockd nfs_acl auth_rpcgss sunrpc uinput > binfmt_misc firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi > radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: > scsi_wait_scan] > [ 53.396024] > [ 53.396039] Pid: 1914, comm: mount.nfs Not tainted 3.4.7-1.1.fc16.x86_64 > #1 Dell Inc. OptiPlex 755 /0GM819 > [ 53.396119] RIP: 0010:[<ffffffffa05740eb>] [<ffffffffa05740eb>] > nfs_idmap_legacy_upcall+0x34b/0x350 [nfs] > [ 53.396139] RSP: 0018:ffff88010bb71398 EFLAGS: 00010286 > [ 53.396139] RAX: 0000000000000019 RBX: ffff88011b3e50c0 RCX: > 0000000000000000 > [ 53.396139] RDX: 0000000000000001 RSI: ffff88010516257d RDI: > ffff88011b3e50db > [ 53.396139] RBP: ffff88010bb713d8 R08: 000000000000003a R09: > ffff88010bb71330 > [ 53.396139] R10: 000000004249b72a R11: 00000000cd363a05 R12: > ffff880120961340 > [ 53.396139] R13: ffff88010532ff00 R14: ffff88011c674240 R15: > 0000000000000019 > [ 53.396139] FS: 00007f494e33e800(0000) GS:ffff880127c00000(0000) > knlGS:0000000000000000 > [ 53.396139] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 53.396139] CR2: 00000036660bdb50 CR3: 000000010bbe5000 CR4: > 00000000000007f0 > [ 53.396139] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 53.396139] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 53.396139] Process mount.nfs (pid: 1914, threadinfo ffff88010bb70000, > task ffff88010bb82e40) > [ 53.396139] Stack: > [ 53.396139] ffff880105162564 ffff88010516257d ffff88010bb713d8 > ffff88010532ff00 > [ 53.396139] ffff88011b3e5c00 ffff88011b3e5300 0000000000000000 > ffffffffa0587eb5 > [ 53.396139] ffff88010bb71478 ffffffff8126a324 0000000000000000 > ffff88011daafcc0 > [ 53.396139] Call Trace: > [ 53.396139] [<ffffffff8126a324>] request_key_and_link+0x2e4/0x410 > [ 53.396139] [<ffffffff8126a4be>] request_key_with_auxdata+0x1e/0x70 > [ 53.396139] [<ffffffffa05741c6>] nfs_idmap_request_key+0xd6/0x1b0 [nfs] > [ 53.396139] [<ffffffffa057449b>] nfs_idmap_lookup_id+0xeb/0x110 [nfs] > [ 53.396139] [<ffffffffa05746ce>] ? nfs_map_string_to_numeric+0x3e/0xb0 > [nfs] > [ 53.396139] [<ffffffffa0574e5c>] nfs_map_group_to_gid+0x5c/0x80 [nfs] > [ 53.396139] [<ffffffffa056b895>] decode_getfattr_attrs+0xbf5/0xfc0 [nfs] > [ 53.396139] [<ffffffffa056e8b0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] > [ 53.396139] [<ffffffffa056c398>] > decode_getfattr_generic.constprop.51+0x98/0x110 [nfs] > [ 53.396139] [<ffffffffa056e8b0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] > [ 53.396139] [<ffffffffa056c423>] decode_getfattr+0x13/0x20 [nfs] > [ 53.396139] [<ffffffffa056e937>] nfs4_xdr_dec_lookup_root+0x87/0x90 [nfs] > [ 53.396139] [<ffffffffa019e4ed>] rpcauth_unwrap_resp+0x9d/0xd0 [sunrpc] > [ 53.396139] [<ffffffffa019b780>] ? rpc_queue_empty+0x40/0x40 [sunrpc] > [ 53.396139] [<ffffffffa056e8b0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] > [ 53.396139] [<ffffffffa0191d6f>] call_decode+0x1cf/0x870 [sunrpc] > [ 53.396139] [<ffffffff8107a540>] ? autoremove_wake_function+0x50/0x50 > [ 53.396139] [<ffffffffa019c860>] __rpc_execute+0x80/0x3c0 [sunrpc] > [ 53.396139] [<ffffffffa0191ba0>] ? call_bc_transmit+0x140/0x140 [sunrpc] > [ 53.396139] [<ffffffffa0191ba0>] ? call_bc_transmit+0x140/0x140 [sunrpc] > [ 53.396139] [<ffffffffa019d1e8>] rpc_execute+0x68/0xd0 [sunrpc] > [ 53.396139] [<ffffffffa0193de5>] rpc_run_task+0x75/0x90 [sunrpc] > [ 53.396139] [<ffffffffa0193f03>] rpc_call_sync+0x43/0x70 [sunrpc] > [ 53.396139] [<ffffffffa055c1d0>] _nfs4_call_sync+0x30/0x40 [nfs] > [ 53.396139] [<ffffffffa055c6ec>] _nfs4_lookup_root+0xac/0xc0 [nfs] > [ 53.396139] [<ffffffffa055f3ae>] nfs4_lookup_root+0x4e/0x90 [nfs] > [ 53.396139] [<ffffffffa055f425>] nfs4_proc_get_root+0x35/0xb0 [nfs] > [ 53.396139] [<ffffffffa054449c>] nfs4_get_rootfh+0x4c/0x140 [nfs] > [ 53.396139] [<ffffffffa0545794>] ? nfs_alloc_fattr+0x24/0x80 [nfs] > [ 53.396139] [<ffffffffa053dbe5>] nfs4_server_common_setup+0x95/0x1c0 > [nfs] > [ 53.396139] [<ffffffffa053ec78>] nfs4_create_server+0x198/0x370 [nfs] > [ 53.396139] [<ffffffffa0549450>] nfs4_remote_mount+0x60/0x2e0 [nfs] > [ 53.396139] [<ffffffff81187553>] mount_fs+0x43/0x1b0 > [ 53.396139] [<ffffffff81144230>] ? __alloc_percpu+0x10/0x20 > [ 53.396139] [<ffffffff811a079a>] vfs_kern_mount+0x6a/0xf0 > [ 53.396139] [<ffffffffa05499a5>] nfs_do_root_mount+0x95/0xe0 [nfs] > [ 53.396139] [<ffffffffa054a0a6>] nfs4_try_mount+0x46/0xc0 [nfs] > [ 53.396139] [<ffffffffa0549dc3>] ? > nfs4_validate_text_mount_data+0xc3/0xe0 [nfs] > [ 53.396139] [<ffffffffa054b037>] nfs_fs_mount+0x677/0x990 [nfs] > [ 53.396139] [<ffffffff81187553>] mount_fs+0x43/0x1b0 > [ 53.396139] [<ffffffff81144230>] ? __alloc_percpu+0x10/0x20 > [ 53.396139] [<ffffffff811a079a>] vfs_kern_mount+0x6a/0xf0 > [ 53.396139] [<ffffffff811a0f74>] do_kern_mount+0x54/0x110 > [ 53.396139] [<ffffffff811a26f4>] do_mount+0x1a4/0x830 > [ 53.396139] [<ffffffff811a23ca>] ? copy_mount_options+0x3a/0x170 > [ 53.396139] [<ffffffff811a2ec0>] sys_mount+0x90/0xe0 > [ 53.396139] [<ffffffff816047e9>] system_call_fastpath+0x16/0x1b > [ 53.396139] Code: ff 0f 1f 80 00 00 00 00 66 c7 07 00 00 83 ea 02 48 83 > c7 02 e9 5b fd ff ff 0f 1f 80 00 00 00 00 41 bf f4 ff ff ff e9 0a fe ff ff > <0f> 0b 0f 1f 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 e0 > [ 53.396139] RIP [<ffffffffa05740eb>] nfs_idmap_legacy_upcall+0x34b/0x350 > [nfs] > [ 53.396139] RSP <ffff88010bb71398> > [ 53.427429] ---[ end trace 0a95a8310084fe43 ]--- This backtrace seems to be reported here: http://thread.gmane.org/gmane.linux.nfs/51316 Hopefully there's a follow on fix. Sigh.
I've been running 3.4.7-1.1 since Monday without problems now. Previously it was crashing at least once a day. That's insufficient data to be completely sure it's fixed, but it definitely looks improved to me. I'll report if there are any other panics. Thanks!
We still have the problem. On one machine as frequently as every 15 minutes or so. The posted linux NFS post seems not directly related. We are using NFSv4 only, but that guy is having problems with NFSv3. Anyone still working on this? What has changed between 3.3 and 3.4 in the NFS code, this must have broken it. I just installed F-16 stock kernel 3.3.4 again to have a workaround.
Tim, you can go up as far as kernel-3.3.8-1.fc16 without running into this bug.
Yes, but that's in no repo anymore and I didn't care enough to get it from Koji... I just want to see this fixed in the regular updates to get going again.
Tim, you might also look at the comments in Bug 837998, referenced above as a duplicate of this bug. It's possible to make a change to /etc/request-key.conf that causes the idmap daemon to use the freestanding nfsidmap utility to do the key lookups, rather than going through the kernel routine that is abrt'ing. One of my (big! 20 core, 128gig RAM) F16 systems has been running nfs4 client on kernel 3.4.4-4.fc16.x86_64 with no abrts as a result of this. The downside is that doing an ls -l on the system shows a bunch of invalid user id's on the nfs4 mount. For most things this doesn't much matter, because all the permissions are being enforced properly on the NFS server, but it is disturbing to users and to software that tries to check the file owner directly rather than just looking for the ENOPERM.
s/with no abrts/with no abrts for 10 days/
The /etc/request-key.conf change works without flaw on the 3.3 kernels, incidentally. I'm not sure what happened in 3.4 to cause the idmap behavior to break, but it seems like it may be a distinct bug from the abrt one?
*** Bug 847430 has been marked as a duplicate of this bug. ***
I tried to boot the new available and by yumupdatesd installed kernel-3.4.9-1.fc16.x86_64. I got a crash in the first three minutes, and all nfs mount processes /sbin/mount.nfs (started by autofs) had been hanging and could be killed. Here is the output from dmesg. It seems to be still the same bug. After a second reboot this kernel bug has not appeared in the first ten minutes, but I don't know when and why it happened and if and when it will happen again. There is still the problem that the first few uids and gids are mapped to usernames and groupnames for nfs mounted directories, but if more nfs partitions are mounted (autofs, nfs4), "ls -l" on these directories show "4294967294" in place of username and groupname for uids and gids not known from the first nfs mounted partition (sometimes the first two or three nfs mounted partitions). It seems that only the first few looked up uids and gids can be mapped to names, them some data structure is "full" or damaged, so no new mapping can be done or could not remembered (saved, cached). I found upstream in the changelog of kernel 3.4.8 that comment: NFS: Fix a number of bugs in the idmapper commit a427b9ec4eda8cd6e641ea24541d30b641fc3140 upstream. but this does not fix this problem. (It may fix a problem we already had, too.) [ 84.985998] ------------[ cut here ]------------ [ 84.986039] kernel BUG at fs/nfs/idmap.c:668! [ 84.986068] invalid opcode: 0000 [#1] SMP [ 84.986101] CPU 0 [ 84.986115] Modules linked in: nfs fscache ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi input_polldev coretemp snd_hda_codec_analog microcode snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm tpm_tis tpm dcdbas snd_timer snd soundcore snd_page_alloc i2c_i801 e1000e iTCO_wdt tpm_bios serio_raw iTCO_vendor_support vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nfsd lockd nfs_acl auth_rpcgss uinput sunrpc binfmt_misc firewire_ohci firewire_core crc_itu_t ata_generic pata_acpi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [ 84.986808] [ 84.986820] Pid: 1923, comm: mount.nfs Tainted: G C O 3.4.9-1.fc16.x86_64 #1 Dell Inc. OptiPlex 755 /0GM819 [ 84.986899] RIP: 0010:[<ffffffffa07560fb>] [<ffffffffa07560fb>] nfs_idmap_legacy_upcall+0x34b/0x350 [nfs] [ 84.986949] RSP: 0018:ffff880106967398 EFLAGS: 00010282 [ 84.986949] RAX: 0000000000000019 RBX: ffff88011e25da80 RCX: 0000000000000000 [ 84.986949] RDX: 0000000000000001 RSI: ffff880101b2a95d RDI: ffff88011e25da9b [ 84.986949] RBP: ffff8801069673d8 R08: 000000000000003a R09: 0000000180150015 [ 84.986949] R10: 000000001e25da01 R11: 0000000025f93aa9 R12: ffff88010266a040 [ 84.986949] R13: ffff880102620be0 R14: ffff8801025fda80 R15: 0000000000000019 [ 84.986949] FS: 00007f12d126a800(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000 [ 84.986949] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 84.986949] CR2: 00000036660bdb50 CR3: 0000000106928000 CR4: 00000000000007f0 [ 84.986949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 84.986949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 84.986949] Process mount.nfs (pid: 1923, threadinfo ffff880106966000, task ffff8801068f4560) [ 84.986949] Stack: [ 84.986949] ffff880101b2a944 ffff880101b2a95d ffff8801069673d8 ffff880102620be0 [ 84.986949] ffff88011d243f00 ffff88011d2430c0 0000000000000000 ffffffffa0769eb5 [ 84.986949] ffff880106967478 ffffffff8126a2f4 0000000000000000 ffff880102638600 [ 84.986949] Call Trace: [ 84.986949] [<ffffffff8126a2f4>] request_key_and_link+0x2e4/0x410 [ 84.986949] [<ffffffff8126a48e>] request_key_with_auxdata+0x1e/0x70 [ 84.986949] [<ffffffffa07561d6>] nfs_idmap_request_key+0xd6/0x1b0 [nfs] [ 84.986949] [<ffffffffa07564ab>] nfs_idmap_lookup_id+0xeb/0x110 [nfs] [ 84.986949] [<ffffffffa07566de>] ? nfs_map_string_to_numeric+0x3e/0xb0 [nfs] [ 84.986949] [<ffffffffa0756e6c>] nfs_map_group_to_gid+0x5c/0x80 [nfs] [ 84.986949] [<ffffffffa074d8a5>] decode_getfattr_attrs+0xbf5/0xfc0 [nfs] [ 84.986949] [<ffffffffa07508c0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] [ 84.986949] [<ffffffffa074e3a8>] decode_getfattr_generic.constprop.51+0x98/0x110 [nfs] [ 84.986949] [<ffffffffa07508c0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] [ 84.986949] [<ffffffffa074e433>] decode_getfattr+0x13/0x20 [nfs] [ 84.986949] [<ffffffffa0750947>] nfs4_xdr_dec_lookup_root+0x87/0x90 [nfs] [ 84.986949] [<ffffffffa01985ad>] rpcauth_unwrap_resp+0x9d/0xd0 [sunrpc] [ 84.986949] [<ffffffffa0195820>] ? rpc_queue_empty+0x40/0x40 [sunrpc] [ 84.986949] [<ffffffffa07508c0>] ? nfs4_xdr_dec_remove+0xb0/0xb0 [nfs] [ 84.986949] [<ffffffffa018bd6f>] call_decode+0x1cf/0x870 [sunrpc] [ 84.986949] [<ffffffff8107a540>] ? autoremove_wake_function+0x50/0x50 [ 84.986949] [<ffffffffa0196900>] __rpc_execute+0x80/0x3c0 [sunrpc] [ 84.986949] [<ffffffffa018bba0>] ? call_bc_transmit+0x140/0x140 [sunrpc] [ 84.986949] [<ffffffffa018bba0>] ? call_bc_transmit+0x140/0x140 [sunrpc] [ 84.986949] [<ffffffffa01972a8>] rpc_execute+0x68/0xd0 [sunrpc] [ 84.986949] [<ffffffffa018de15>] rpc_run_task+0x75/0x90 [sunrpc] [ 84.986949] [<ffffffffa018df33>] rpc_call_sync+0x43/0x70 [sunrpc] [ 84.986949] [<ffffffffa073e1e0>] _nfs4_call_sync+0x30/0x40 [nfs] [ 84.986949] [<ffffffffa073e6fc>] _nfs4_lookup_root+0xac/0xc0 [nfs] [ 84.986949] [<ffffffffa07413be>] nfs4_lookup_root+0x4e/0x90 [nfs] [ 84.986949] [<ffffffffa0741435>] nfs4_proc_get_root+0x35/0xb0 [nfs] [ 84.986949] [<ffffffffa07264ac>] nfs4_get_rootfh+0x4c/0x140 [nfs] [ 84.986949] [<ffffffffa07277a4>] ? nfs_alloc_fattr+0x24/0x80 [nfs] [ 84.986949] [<ffffffffa071fbe5>] nfs4_server_common_setup+0x95/0x1c0 [nfs] [ 84.986949] [<ffffffffa0720c78>] nfs4_create_server+0x198/0x370 [nfs] [ 84.986949] [<ffffffffa072b460>] nfs4_remote_mount+0x60/0x2e0 [nfs] [ 84.986949] [<ffffffff81187493>] mount_fs+0x43/0x1b0 [ 84.986949] [<ffffffff81144150>] ? __alloc_percpu+0x10/0x20 [ 84.986949] [<ffffffff811a06da>] vfs_kern_mount+0x6a/0xf0 [ 84.986949] [<ffffffffa072b9b5>] nfs_do_root_mount+0x95/0xe0 [nfs] [ 84.986949] [<ffffffffa072c0b6>] nfs4_try_mount+0x46/0xc0 [nfs] [ 84.986949] [<ffffffffa072bdd3>] ? nfs4_validate_text_mount_data+0xc3/0xe0 [nfs] [ 84.986949] [<ffffffffa072d047>] nfs_fs_mount+0x677/0x990 [nfs] [ 84.986949] [<ffffffff81187493>] mount_fs+0x43/0x1b0 [ 84.986949] [<ffffffff81144150>] ? __alloc_percpu+0x10/0x20 [ 84.986949] [<ffffffff811a06da>] vfs_kern_mount+0x6a/0xf0 [ 84.986949] [<ffffffff811a0eb4>] do_kern_mount+0x54/0x110 [ 84.986949] [<ffffffff811a2634>] do_mount+0x1a4/0x830 [ 84.986949] [<ffffffff811a230a>] ? copy_mount_options+0x3a/0x170 [ 84.986949] [<ffffffff811a2e00>] sys_mount+0x90/0xe0 [ 84.986949] [<ffffffff81605be9>] system_call_fastpath+0x16/0x1b [ 84.986949] Code: ff 0f 1f 80 00 00 00 00 66 c7 07 00 00 83 ea 02 48 83 c7 02 e9 5b fd ff ff 0f 1f 80 00 00 00 00 41 bf f4 ff ff ff e9 0a fe ff ff <0f> 0b 0f 1f 00 55 48 89 e5 48 83 ec 60 48 89 5d d8 4c 89 65 e0 [ 84.986949] RIP [<ffffffffa07560fb>] nfs_idmap_legacy_upcall+0x34b/0x350 [nfs] [ 84.986949] RSP <ffff880106967398> [ 85.016078] ---[ end trace 60f549389895d548 ]---
seen with kernel-3.5.2-3.fc17.x86_64 too
Enrico - Is this really happening on a F17 box? I thought this was purely a F16 issue with a mixing of old idmapper code and newer kernels, but that since F17 runs the new idmapper it wouldn't be affected.
I tried to use the old idmapper because of bug #829362 (see comment #15). There was seen two issues: [1066219.136739] BUG: unable to handle kernel paging request at 000003e1000003f9 [1066219.137773] IP: [<ffffffff8126a3e9>] __key_instantiate_and_link+0x59/0x100 [1066219.138827] PGD 0 [1066219.139871] Oops: 0000 [#1] SMP [1066219.140900] CPU 0 [1066219.140908] Modules linked in: bnep bluetooth rfkill vfat fat ftdi_sio cp210x nls_utf8 fuse cts des_generic rpcsec_gss_krb5 nfs fscache ip6table_filter ip6_tabl ysdev xt_pkttype xt_LOG xt_limit xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack sha256_generic dm_crypt snd_hda_codec_hdmi snd_hda_codec_c p snd_pcm snd_page_alloc snd_timer parport_serial snd microcode soundcore ppdev parport_pc parport mei lpc_ich mfd_core coretemp vhost_net tun macvtap macvlan kvm_in t binfmt_misc pata_acpi raid1 ghash_clmulni_intel crc32c_intel e1000e ata_generic usb_storage i915 video i2c_algo_bit drm_kms_helper drm i2c_core sunrpc be2iscsi bnx scsi_tcp mdio qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan] [1066219.148829] [1066219.150006] Pid: 2198, comm: rpc.idmapd Not tainted 3.5.0-2.fc17.x86_64 #1 FUJITSU ESPRIMO P900/D3062-A1 [1066219.151259] RIP: 0010:[<ffffffff8126a3e9>] [<ffffffff8126a3e9>] __key_instantiate_and_link+0x59/0x100 [1066219.152522] RSP: 0018:ffff8803284d1d38 EFLAGS: 00010246 [1066219.153739] RAX: 000003e1000003e1 RBX: 0000000000000000 RCX: ffff880325aa8e40 [1066219.154968] RDX: 0000000000000004 RSI: ffff8803284d1de9 RDI: ffff8802c7d82450 [1066219.156195] RBP: ffff8803284d1d78 R08: 0000000000000000 R09: ffff8803284d1da0 [1066219.157415] R10: 0000000000000000 R11: 000000000000000f R12: ffff880325aa8e40 [1066219.158649] R13: ffff8802c7d82450 R14: ffff8803284d1de9 R15: 0000000000000004 [1066219.159886] FS: 00007f551cb26740(0000) GS:ffff88033e200000(0000) knlGS:0000000000000000 [1066219.161151] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [1066219.162402] CR2: 000003e1000003f9 CR3: 0000000329d1f000 CR4: 00000000000427e0 [1066219.163665] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1066219.164924] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [1066219.166177] Process rpc.idmapd (pid: 2198, threadinfo ffff8803284d0000, task ffff880321764530) [1066219.167439] Stack: [1066219.168694] ffff8802c7d82820 ffff8803284d1da0 ffff88032b1cae00 ffff880325aa8e40 [1066219.169978] ffff8802c7d82450 ffff8803284d1de9 0000000000000004 0000000000000000 [1066219.171271] ffff8803284d1dc8 ffffffff8126a4f6 ffff8803284d1dd8 0000000000000000 [1066219.172579] Call Trace: [1066219.173889] [<ffffffff8126a4f6>] key_instantiate_and_link+0x66/0xa0 [1066219.175213] [<ffffffffa058566c>] idmap_pipe_downcall+0x1dc/0x1f0 [nfs] [1066219.176521] [<ffffffffa012be79>] rpc_pipe_write+0x69/0x90 [sunrpc] [1066219.177805] [<ffffffff8118509c>] vfs_write+0xac/0x180 [1066219.179090] [<ffffffff811853ca>] sys_write+0x4a/0x90 [1066219.180337] [<ffffffff8160fc29>] system_call_fastpath+0x16/0x1b [1066219.181567] Code: 49 89 d7 49 89 cc 4c 89 c3 4c 89 4d c8 e8 20 b1 39 00 49 8b 45 78 a8 01 0f 85 7c 00 00 00 49 8b 45 20 4c 89 fa 4c 89 f6 4c 89 ef <ff> 50 18 85 c0 41 89 c7 75 6d 49 8b 55 48 f0 ff 42 44 49 8d 45 [1066219.184255] RIP [<ffffffff8126a3e9>] __key_instantiate_and_link+0x59/0x100 [1066219.185579] RSP <ffff8803284d1d38> [1066219.186891] CR2: 000003e1000003f9 [1066219.294782] ---[ end trace 47fdbf62ff2b46ad ]--- -------- [ 31.238200] ------------[ cut here ]------------ [ 31.239533] kernel BUG at fs/nfs/idmap.c:684! [ 31.240844] invalid opcode: 0000 [#1] SMP [ 31.242143] CPU 2 [ 31.242150] Modules linked in: des_generic rpcsec_gss_krb5 nfs fscache ip6table_filter ip6_tables ebtable_nat ebtables bridge stp llc xt_physdev xt_pkttype xt_LOG ntrack_ipv4 nf_defrag_ipv4 nf_conntrack sha256_generic dm_crypt ftdi_sio cp210x snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_p port_serial soundcore ppdev microcode parport_pc parport lpc_ich mei mfd_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc uinput nfsd nfs_acl auth_rpcgss l e1000e ata_generic pata_acpi usb_storage i915 video i2c_algo_bit drm_kms_helper drm i2c_core sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3 mdio libcxgbi libisc i_transport_iscsi [last unloaded: scsi_wait_scan] [ 31.246186] [ 31.246187] Pid: 2229, comm: mount.nfs4 Not tainted 3.5.2-3.fc17.x86_64 #1 FUJITSU ESPRIMO P900/D3062-A1 [ 31.246189] RIP: 0010:[<ffffffffa05c4e63>] [<ffffffffa05c4e63>] nfs_idmap_legacy_upcall+0x343/0x350 [nfs] [ 31.246202] RSP: 0018:ffff8803068c1308 EFLAGS: 00010282 [ 31.246202] RAX: 0000000000000016 RBX: ffff8803215aa980 RCX: 0000000000000000 [ 31.246203] RDX: 0000000000000006 RSI: ffff8803066442fa RDI: ffff8803299b2558 [ 31.246203] RBP: ffff8803068c1348 R08: 0000000000000000 R09: ffff8803066442e0 [ 31.246204] R10: 00000000bbca8cdd R11: 00000000bb779519 R12: ffff8803214adc80 [ 31.246204] R13: ffff8803222d56c0 R14: ffff8803299b2540 R15: 0000000000000016 [ 31.246205] FS: 00007fae93e07840(0000) GS:ffff88033e300000(0000) knlGS:0000000000000000 [ 31.246206] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 31.246206] CR2: 00007f575267f050 CR3: 0000000306de7000 CR4: 00000000000407e0 [ 31.246207] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 31.246207] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 31.246208] Process mount.nfs4 (pid: 2229, threadinfo ffff8803068c0000, task ffff880328165c40) [ 31.246209] Stack: [ 31.246209] ffff8803066442e4 ffff8803066442fa ffff8803068c1348 ffff8803299b29c0 [ 31.246211] ffff8803299b23c0 ffffffffa05d4e97 ffff8803214adc80 ffff8803281d2a80 [ 31.246212] ffff8803068c13e8 ffffffff812732c4 0000000000000000 ffffffff81c5fd20 [ 31.246213] Call Trace: [ 31.246215] [<ffffffff812732c4>] request_key_and_link+0x2e4/0x450 [ 31.246218] [<ffffffff8127349e>] request_key_with_auxdata+0x1e/0x60 [ 31.246219] [<ffffffffa05c4feb>] nfs_idmap_request_key+0xcb/0x1b0 [nfs] [ 31.246227] [<ffffffffa05c57ab>] nfs_idmap_lookup_id+0xeb/0x120 [nfs] [ 31.246232] [<ffffffffa05c4eb5>] ? nfs_map_string_to_numeric+0x45/0xb0 [nfs] [ 31.246238] [<ffffffffa05c5cac>] nfs_map_group_to_gid+0x5c/0x80 [nfs] [ 31.246243] [<ffffffffa05bc028>] decode_getfattr_attrs+0xf48/0x1230 [nfs] [ 31.246249] [<ffffffffa05be098>] decode_getfattr_generic.constprop.97+0x98/0x110 [nfs] [ 31.246254] [<ffffffffa05be510>] ? nfs4_xdr_dec_link+0xe0/0xe0 [nfs] [ 31.246259] [<ffffffffa05be1b3>] decode_getfattr+0x13/0x20 [nfs] [ 31.246264] [<ffffffffa05be58c>] nfs4_xdr_dec_lookup_root+0x7c/0x80 [nfs] [ 31.246269] [<ffffffffa0296a27>] gss_unwrap_resp+0xb7/0x2c0 [auth_rpcgss] [ 31.246272] [<ffffffffa05be510>] ? nfs4_xdr_dec_link+0xe0/0xe0 [nfs] [ 31.246277] [<ffffffffa011d42c>] rpcauth_unwrap_resp+0x5c/0xd0 [sunrpc] [ 31.246284] [<ffffffffa05be510>] ? nfs4_xdr_dec_link+0xe0/0xe0 [nfs] [ 31.246289] [<ffffffffa05be510>] ? nfs4_xdr_dec_link+0xe0/0xe0 [nfs] [ 31.246294] [<ffffffffa0110c5f>] call_decode+0x1bf/0x840 [sunrpc] [ 31.246299] [<ffffffffa0110aa0>] ? call_bc_transmit+0x140/0x140 [sunrpc] [ 31.246304] [<ffffffffa0110aa0>] ? call_bc_transmit+0x140/0x140 [sunrpc] [ 31.246308] [<ffffffffa011baec>] __rpc_execute+0x7c/0x390 [sunrpc] [ 31.246314] [<ffffffffa011c1a9>] rpc_execute+0x59/0xa0 [sunrpc] [ 31.246319] [<ffffffffa0112d50>] rpc_run_task+0x70/0x90 [sunrpc] [ 31.246324] [<ffffffffa0112e73>] rpc_call_sync+0x43/0x70 [sunrpc] [ 31.246328] [<ffffffffa05ab0a0>] _nfs4_call_sync+0x30/0x40 [nfs] [ 31.246334] [<ffffffffa05ac339>] _nfs4_lookup_root.isra.39+0xa9/0xc0 [nfs] [ 31.246339] [<ffffffffa05af5ef>] nfs4_lookup_root+0x4f/0x90 [nfs] [ 31.246344] [<ffffffffa0595524>] ? nfs_alloc_fattr+0x24/0x80 [nfs] [ 31.246349] [<ffffffffa05b4345>] nfs4_proc_get_rootfh+0x35/0xa0 [nfs] [ 31.246354] [<ffffffffa05945c0>] nfs4_get_rootfh+0x40/0x130 [nfs] [ 31.246358] [<ffffffffa0595524>] ? nfs_alloc_fattr+0x24/0x80 [nfs] [ 31.246363] [<ffffffffa0595524>] ? nfs_alloc_fattr+0x24/0x80 [nfs] [ 31.246367] [<ffffffffa058d511>] nfs4_server_common_setup+0x81/0x1a0 [nfs] [ 31.246371] [<ffffffffa058c215>] ? nfs_init_server_rpcclient+0x35/0x130 [nfs] [ 31.246374] [<ffffffffa058ec82>] nfs4_create_server+0x1a2/0x390 [nfs] [ 31.246378] [<ffffffffa05978b5>] nfs4_remote_mount+0x45/0x80 [nfs] [ 31.246383] [<ffffffff8118b813>] mount_fs+0x43/0x1b0 [ 31.246385] [<ffffffff81147070>] ? __alloc_percpu+0x10/0x20 [ 31.246387] [<ffffffff811a454f>] vfs_kern_mount+0x6f/0x100 [ 31.246388] [<ffffffffa0597a51>] nfs_do_root_mount+0x91/0xe0 [nfs] [ 31.246393] [<ffffffffa059a674>] nfs_fs_mount+0x4e4/0xe00 [nfs] [ 31.246397] [<ffffffffa059a070>] ? nfs_remount+0x3a0/0x3a0 [nfs] [ 31.246401] [<ffffffffa0597320>] ? nfs_get_option_ul+0x60/0x60 [nfs] [ 31.246405] [<ffffffff8118b813>] mount_fs+0x43/0x1b0 [ 31.246406] [<ffffffff81147070>] ? __alloc_percpu+0x10/0x20 [ 31.246408] [<ffffffff811a454f>] vfs_kern_mount+0x6f/0x100 [ 31.246409] [<ffffffff811a4f04>] do_kern_mount+0x54/0x110 [ 31.246410] [<ffffffff811a682a>] do_mount+0x26a/0x880 [ 31.246411] [<ffffffff811a642a>] ? copy_mount_options+0x3a/0x180 [ 31.246412] [<ffffffff811a6f8d>] sys_mount+0x8d/0xe0 [ 31.246414] [<ffffffff81614969>] system_call_fastpath+0x16/0x1b [ 31.246417] Code: 2f e9 ba fd ff ff 66 90 66 c7 07 00 00 83 ea 02 48 83 c7 02 e9 67 fd ff ff 0f 1f 80 00 00 00 00 41 bf f4 ff ff ff e9 19 fe ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 40 [ 31.246432] RIP [<ffffffffa05c4e63>] nfs_idmap_legacy_upcall+0x343/0x350 [nfs] [ 31.246439] RSP <ffff8803068c1308> [ 31.330584] ---[ end trace 789538ce7904602d ]---
Is there any progress on this? If a new idmapper on F17 fixes this, why not backport it to F16? The current situation is terrible and ongoing for months and makes some users wonder about the "bad distro engineering" on this one, as it is so severe and affects everybody so fundamentally. Currently we can choose from kernels with security issues or with broken client NFS :-/
Agree that it's absurd how long it has taken to fix this, with no finger pointing intended, but if you really don't need the nfs4 functionality you can fix this for your clients by reconfiguring them to use nfsvers=3. Ignore if otherwise :)
Unfortunately, my patch wasn't sufficient. There have been three more patches to the idmapper upstream by NetApp: commit 12dfd080556124088ed61a292184947711b46cbe commit c5066945b7ea346a11424dbeb7830b7d7d00c206 commit fac1e8e4ef417e958060a6c3a061cc1a180bd8ae and it still may not be completely fixed.
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.
We are still seeing bugs with idmapping with nfs4 client with sec=sys with the most recent kernels on both Fedora 16 and 17, i.e., 3.6.5-2.fc16.x86_64 and 3.6.6-1.fc17.x86_64. On both, when we do a nfs4 sec=sys mount from an up-to-date Red Hat 6 system, we see sporadic failure of idmapping, with some seemingly random uids/gids showing up as 4294967294. Rebooting the clients and doing an 'ls' of a subset of the entries shows a slightly different set of users and groups mapped to 4294967294, but the overwhelming majority (490 out of 646 in our case) are showing the 4294967294.