Bug 747496
Summary: | cthon test hang on nfs | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jian Li <jiali> | ||||||||||||||
Component: | kernel | Assignee: | J. Bruce Fields <bfields> | ||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | 6.2 | CC: | bfields, dhowells, dpal, jburke, jiali, nmurray, pbunyan, rwheeler, sprabhu, steved, swhiteho, torel, yanwang | ||||||||||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Unspecified | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2017-12-06 12:38:47 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Jian Li
2011-10-20 03:37:48 UTC
Created attachment 529178 [details]
call trace
(In reply to comment #0) > Description of problem: > Test is designed to check nfs exportfs's option(such as: sync,no_wdelay,sec=*) > and mount.nfs's option(a|sync,tcp|udp,r|wsize=*) would cooperate well. cthon > will run in all combination of these options. > > When server uses such options: (path=/mnt/exportdir) > > ***** > $path *(rw,sync,no_wdelay,insecure,fsid=0x111111,no_root_squash) > ***** > > And client run such test: > > ***** > iosizes="1024 4096 65536 1048576" > synctypes="sync async" > for size in $iosize ; do > for syn in $synctypes ; do > option='nfsvers=4,udp,rsize=$size,wsize=$size,$syn This is not valid. It is mandated in the spec that v4 has to use TCP... Please Change this to tcp and rerun the tests. Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Ok, thanks for tips. now I meet problem again, when using sec=krb5p & s390x, details: [root@ibm-z10-03 basic]# uname -a Linux ibm-z10-03.rhts.eng.bos.redhat.com 2.6.32-214.el6.s390x #1 SMP Tue Oct 25 20:00:08 EDT 2011 s390x s390x s390x GNU/Linux SERVER: /mnt/exportdir *(sec=krb5p,rw,sync,no_wdelay,secure,fsid=0x111111,no_root_squash) CLIENT: #cat /proc/mount | grep nfs ibm-z10-04.rhts.eng.bos.redhat.com:/mnt/exportdir /mnt/ibm-z10-04.rhts.eng.bos.redhat.com nfs rw,sync,relatime,vers=2,rsize=1024,wsize=1024,namlen=255,hard,proto=udp,timeo=11,retrans=3,sec=krb5p,mountaddr=10.16.66.195,mountvers=1,mountport=38260,mountproto=udp,local_lock=none,addr=10.16.66.195 0 0 when running cthon's basic test(test5), always get EIO. strace result: [root@ibm-z10-03 basic]# export NFSTESTDIR="/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test" [root@ibm-z10-03 basic]# ./test5 -t ./test5: read and write /mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test, bigfile ./test5: (/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test) 'bigfile' write failed : Input/output error [root@ibm-z10-03 basic]# strace ./test5 -t execve("./test5", ["./test5", "-t"], [/* 36 vars */]) = 11 brk(0) = 0xac62a000 ** snip ** rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- mkdir("/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test", 0777) = 0 chdir("/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test") = 0 getcwd("/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test", 1024) = 45 write(1, "/mnt/ibm-z10-04.rhts.eng.bos.red"..., 54/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test, bigfile ) = 54 open("bigfile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 stat("bigfile", {st_mode=S_IFREG|0666, st_size=0, ...}) = 0 write(3, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6\0\0\0\7"..., 8192) = 8192 write(3, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6\0\0\0\7"..., 8192) = 8192 ** snip ** write(3, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6\0\0\0\7"..., 8192) = 8192 write(3, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6\0\0\0\7"..., 8192) = 8192 write(3, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6\0\0\0\7"..., 8192) = 8192 write(3, "\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6\0\0\0\7"..., 8192) = -1 EIO (Input/output error) getcwd("/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test", 4096) = 45 write(2, "\t./test5: (/mnt/ibm-z10-04.rhts."..., 57 ./test5: (/mnt/ibm-z10-04.rhts.eng.bos.redhat.com/test) ) = 57 write(2, "'bigfile' write failed", 22'bigfile' write failed) = 22 write(2, " : Input/output error\n", 22 : Input/output error ) = 22 exit_group(1) = ? And EIO appears when nfsv2/3/4 * tcp/udp are used. There is EIO again when I running cthon in such situation(krb5i,upd,nfsv3,x86): SERVER: /mnt/exportdir *(sec=krb5i,rw,sync,no_wdelay,secure,fsid=0x111111,no_root_squash) CLIENT: #mount | grep nfs hp-z220-03.lab.bos.redhat.com:/mnt/exportdir on /mnt/hp-z220-03.lab.bos.redhat.com type nfs (rw,sync,sec=krb5i,resvport,nfsvers=3,udp,rsize=4096,wsize=4096,addr=10.16.42.225) TEST: bigfile #strace ./bigfile test [root@hp-xw4550-01 hp-z220-03.lab.bos.redhat.com]# strace ./bigfile test execve("./bigfile", ["./bigfile", "test"], [/* 37 vars */]) = 0 brk(0) = 0x7c5000 ** snip ** munmap(0x7fe6b7af3000, 45194) = 0 open("test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 brk(0) = 0x7c5000 brk(0x7e8000) = 0x7e8000 write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 8192) = 8192 ** snip ** write(3, "jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"..., 8192) = 8192 write(3, "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk"..., 8192) = -1 EIO (Input/output error) write(2, "Error: ", 7Error: ) = 7 write(2, "write to test failed: Input/outp"..., 41write to test failed: Input/output error ) = 41 exit_group(1) = ? When I using tcp, there is no EIO. If this is only reproduceable over udp, then I think we should close the bug as WONTFIX. There are known bugs with rpcsec_gss over udp. Perhaps we should find some way to fail (or at least warn about) that combination at mount time. (In reply to comment #8) > If this is only reproduceable over udp, then I think we should close the bug as > WONTFIX. There are known bugs with rpcsec_gss over udp. Perhaps we should > find some way to fail (or at least warn about) that combination at mount time. I agree... Hi, problem also remain when using tcp, but this error couldn't be reproduced 100%(90% maybe) #strace ./bigfile testfile ** snip ** read(3, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 8192) = 8192 lseek(3, 31326208, SEEK_SET) = 31326208 read(3, "cccccccccccccccccccccccccccccccc"..., 8192) = 8192 lseek(3, 31334400, SEEK_SET) = 31334400 read(3, "dddddddddddddddddddddddddddddddd"..., 8192) = 4096 write(2, "short read (4096) to testfile\n", 30short read (4096) to testfile ) = 30 exit_group(1) = ? [root@ibm-z10-03 test]# uname -a Linux ibm-z10-03.rhts.eng.bos.redhat.com 2.6.32-214.el6.s390x #1 SMP Tue Oct 25 20:00:08 EDT 2011 s390x s390x s390x GNU/Linux [root@ibm-z10-03 test]# cat /proc/mounts | grep nfs sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 ibm-z10-04.rhts.eng.bos.redhat.com:/mnt/exportdir/ /mnt/ibm-z10-04.rhts.eng.bos.redhat.com nfs4 rw,relatime,vers=4,rsize=4096,wsize=4096,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5p,clientaddr=10.16.66.194,minorversion=0,local_lock=none,addr=10.16.66.195 0 0 SERVER: [root@ibm-z10-04 ~]# cat /etc/exports /mnt/exportdir *(sec=krb5p,rw,sync,no_wdelay,secure,fsid=0x111111,no_root_squash) and confused message appear in server: svc: 10.16.66.194, port=685: failed to decode args (10.16.66.194: client's ip) svc: 10.16.66.194, port=685: failed to decode args When testing, such errors were met: #strace ./bigfile testfile ** snip ** lseek(3, 31440896, SEEK_SET) = 31440896 read(3, "qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq"..., 8192) = 8192 lseek(3, 31449088, SEEK_SET) = 31449088 read(3, "rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr"..., 8192) = 8192 ftruncate(3, 0) = 0 ftruncate(3, 31457280) = 0 mmap(NULL, 31457280, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x3fffb62f000 msync(0x3fffb62f000, 31457280, MS_SYNC|MS_INVALIDATE) = -1 EIO (Input/output error) write(2, "Error: ", 7Error: ) = 7 write(2, "can't msync testfile: Input/outp"..., 41can't msync testfile: Input/output error ) = 41 exit_group(1) = ? #strace ./bigfile testfile ** snip ** read(3, "pppppppppppppppppppppppppppppppp"..., 8192) = 8192 lseek(3, 31440896, SEEK_SET) = 31440896 read(3, "qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq"..., 8192) = 8192 lseek(3, 31449088, SEEK_SET) = 31449088 read(3, "rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr"..., 8192) = 8192 ftruncate(3, 0) = 0 ftruncate(3, 31457280) = 0 mmap(NULL, 31457280, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x3fffb683000 --- SIGBUS (Bus error) @ 0 (0) --- +++ killed by SIGBUS (core dumped) +++ Bus error (core dumped) (In reply to comment #9) > (In reply to comment #8) > > If this is only reproduceable over udp, then I think we should close the bug as > > WONTFIX. There are known bugs with rpcsec_gss over udp. Perhaps we should > > find some way to fail (or at least warn about) that combination at mount time. > > I agree... "problem also remain when using tcp" OK, it's likely a real krb5p bug in that case. Next might be to see if we can locate more precisely where those errors are originating. Wireshark won't help thanks to the encryption. If you turn on rpc debugging on client and server, can you still reproduce the problem, and what do the last lines of output before the failure look like? Hi, I reproduce the case in one machine (both as SERVER and CLIENT), and fetch debug info with rpcdebug command. rpcdebug -m rpc -s all; for ((;;)) ; do dmesg -c > /dev/null strace ./bigfile testfile if [ $? -ne 0 ] ; then dmesg -c > ~/rpcdebug_rpc.txt break; fi done rpcdebug -m rpc -c all; rpcdebug -m nfs -s all; for ((;;)) ; do ** Created attachment 531920 [details]
rpcdebug -m rpc
Created attachment 531921 [details]
rpcdebug -m nfs
Created attachment 531922 [details]
rpcdebug -m nfsd
And pls notice such message:
> svc: 10.16.66.194, port=685: failed to decode args (10.16.66.194: client's ip)
> svc: 10.16.66.194, port=685: failed to decode args
"And pls notice such message" I wonder why that message doesn't show up in any of those dmesg's? Neither does "server saw garbage", which the client should be dprintk'ing in the rpc code if it gets a garbage_args error back from the server. Created attachment 532625 [details]
print gss_unwrap errors
Here's an attempt to get a little more information about gss_unwrap failures; no need to turn on rpc debugging this time, just run until you get the "failed to decode args" error and see if there's also one of these new errors before it.
By the way, do you have a previous kernel version where you know this error did *not* happen? > I wonder why that message doesn't show up in any of those dmesg's? Neither I do many times of test, these messages may be printed sometime. (In reply to comment #19) > By the way, do you have a previous kernel version where you know this error did > *not* happen? I run the test on a old kernel(forget the exact NVR), the bug also exist. Testing the patch. Hi Bruce: I test your patch three times. 1st test: [root@ibm-z10-03 test]# dmesg -c > /dev/null [root@ibm-z10-03 test]# ./bigfile testfile [root@ibm-z10-03 test]# dmesg gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 ** repeat ** gss_unwrap returned d0000 2nd test: [root@ibm-z10-03 test]# dmesg -c > /dev/null [root@ibm-z10-03 test]# ./bigfile testfile Bus error (core dumped) [root@ibm-z10-03 test]# dmesg gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 ** repeat ** gss_unwrap returned d0000 3rd test: [root@ibm-z10-03 test]# dmesg -c > /dev/null [root@ibm-z10-03 test]# ./bigfile testfile short read (4096) to testfile [root@ibm-z10-03 test]# dmesg -c gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 svc: 10.16.66.194, port=779: failed to decode args gss_unwrap returned d0000 ** repeat ** gss_unwrap returned d0000 gss_unwrap returned d0000 Looking at include/linux/sunrpc/gss_err.h, 0xd0000 is GSS_S_FAILURE. So either crypto_alloc_blkcipher() or make_checksum() (which also does some allocation) failed. So maybe this is an allocation failure. I can make another patch to confirm that (might not get to it today). We really shouldn't have allocations in that code at all; I thought we'd gotten rid of them.... So I'll need to work on fixing that regardless. Created attachment 533851 [details]
print more gss_unwrap errors
Sorry, those aren't the only two places where it could fail....
Here are some more printk's to try.
Ok, testing (In reply to comment #23) > Created attachment 533851 [details] [root@ibm-z10-03 test]# ./bigfile testfile short read (4096) to testfile [root@ibm-z10-03 test]# dmesg -c gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 ** repeat ** [root@ibm-z10-03 test]# ./bigfile testfile Error: read from testfile failed: Input/output error [root@ibm-z10-03 test]# dmesg -c gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 svc: 10.16.66.194, port=753: failed to decode args gss_unwrap_kerberos_v2: error 393216 from decrypt_v2 gss_unwrap returned d0000 ** repeat ** (In reply to comment #23) > Created attachment 533851 [details] > print more gss_unwrap errors > > Sorry, those aren't the only two places where it could fail.... > > Here are some more printk's to try. OK, so I was wrong, allocation failures aren't the problem. It's gss_krb5_aes_decrypt returning GSS_S_BAD_SIG, suggesting the memcmp() at the end of gss_krb5_aes_decrypt is failing. That could be a bug in the encoding or decoding, or perhaps a network that's corruption data somehow. Aie. What to try next? Well, it did attempt to decrypt this, so perhaps looking at the result would tell us something. So, we could add more printk()'s, to: 1. Dump the decrypted data. We can take a look at it and see if it seems like a sensible xdr-encoded rpc request, or like random data, or if perhaps it starts well and then turns into random data at the end. 2. Dump the two MAC's that we're memcmp'ing (pkt_hmac and our_hmac). Probably they'll both look random and unrelated, but maybe it's worth looking at just in case. I wonder whether it would also be worth running a tcpdump, both on the client and on the server, and then comparing the two data streams to see if we can catch the network flipping a bit or something. Seems unlikely. The client also knows when we hit this failure since it gets an error back. So on the client side we should be able to dump the data that it fed into the encryption routines, and on the server dump what we decrypted, and see if the results always differ in some predictable way. Also, I know you said you'd tried one older kernel version, but it might be worth experimenting a little more with kernel versions, to see if there are any where this works (in which case we can bisect to find the regression (if it's an older kernel) or bugfix (if it's a newer one). It would be worth trying the latest upstream kernel, at least. *** Bug 749655 has been marked as a duplicate of this bug. *** Pasted my test result on 2.6.32-274.el6 of RHEL6.3, cthon 'Special' test failed on krb5p/krb5i mounts with UDP. Pls check bug 822189 comment #26 - comment #29. cthon fail on rhel5.9, cause machine panic. failed command: cthon04 nfs -s:special nfsvers=2_udp_krb5p intel-s3e37-01.rhts.eng.rdu.redhat.com /mnt/testarea -onfsvers=2,udp,sec=krb5p dmesg: kernel BUG at arch/i386/mm/highmem.c:64! invalid opcode: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:19.0/irq Modules linked in: md5 testmgr_cipher testmgr aead crypto_blkcipher crypto_algapi des rpcsec_gss_krb5 auth_rpcgss nfs nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac lp snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd sr_mod cdrom e1000e tpm_tis parport_serial tpm tpm_bios parport_pc parport sg pcspkr soundcore i2c_i801 i2c_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 3 EIP: 0060:[<c041dc92>] Not tainted VLI EFLAGS: 00010287 (2.6.18-333.el5 #1) EIP is at kunmap_atomic+0x38/0x66 eax: fff63000 ebx: fff62000 ecx: 00000058 edx: 0009c000 esi: 00000003 edi: f648bb65 ebp: 00000000 esp: f648bb3c ds: 007b es: 007b ss: 0068 Process rewind (pid: 21220, ti=f648b000 task=dffac000 task.ti=f648b000) Stack: f648bbcc c04db8d2 f648bbcc f648bbb4 f648bb60 f648bb94 c04dbcaf 00000000 c06a2300 01000000 c045f100 00000044 00000000 00020220 00000000 c16d3200 ffffffff f648bc34 f648bb68 f6990000 00000010 f648bbb4 00000010 c04dbe6e Call Trace: [<c04db8d2>] scatterwalk_copychunks+0x49/0x7c [<c04dbcaf>] crypt_slow+0x45/0x85 [<c045f100>] __alloc_pages+0x10/0x2cf [<c04dbe6e>] crypt+0x17f/0x1f1 [<c04dc06d>] crypt_iv_unaligned+0x9b/0xa4 [<c04db9be>] cbc_process_encrypt+0x0/0x85 [<c04dc0f1>] cbc_encrypt_iv+0x3b/0x42 [<f90c4c36>] des_encrypt+0x0/0x20a [des] [<c04db9be>] cbc_process_encrypt+0x0/0x85 [<f90bb980>] encryptor+0xfe/0x165 [auth_rpcgss] [<f90bb73f>] process_xdr_buf+0x133/0x144 [auth_rpcgss] [<f90bbb8a>] gss_encrypt_xdr_buf+0x87/0x92 [auth_rpcgss] [<f90bb882>] encryptor+0x0/0x165 [auth_rpcgss] [<f9046d88>] gss_wrap_kerberos+0x2df/0x30b [rpcsec_gss_krb5] [<f90b97f1>] gss_wrap+0xd/0x10 [auth_rpcgss] [<f90b8f94>] gss_wrap_req+0x29d/0x36d [auth_rpcgss] [<f90b8cf7>] gss_wrap_req+0x0/0x36d [auth_rpcgss] [<f90f9f1b>] nfs_xdr_writeargs+0x0/0x78 [nfs] [<f906f90e>] rpcauth_wrap_req+0x4b/0x60 [sunrpc] [<f90f9f1b>] nfs_xdr_writeargs+0x0/0x78 [nfs] [<f906a35b>] call_transmit+0x171/0x1ca [sunrpc] [<f906f05c>] __rpc_execute+0x80/0x26a [sunrpc] [<c042ee54>] sigprocmask+0xb0/0xce [<f90fdf44>] nfs_execute_write+0x35/0x49 [nfs] [<f90ff309>] nfs_flush_one+0xb8/0x10d [nfs] [<f90ff251>] nfs_flush_one+0x0/0x10d [nfs] [<f90fdbca>] nfs_flush_list+0x6f/0x132 [nfs] [<f90fdcd2>] nfs_flush_inode+0x45/0x53 [nfs] [<f90ff571>] nfs_writepages+0x4c/0x88 [nfs] [<c045fb01>] do_writepages+0x20/0x32 [<c045b5eb>] __filemap_fdatawrite_range+0x66/0x72 [<c045b7e5>] filemap_fdatawrite+0x12/0x16 [<f90f5da1>] nfs_file_flush+0x66/0x78 [nfs] [<c0476860>] filp_close+0x2f/0x54 [<c0477ad8>] sys_close+0x71/0xa0 [<c0404f4b>] syscall_call+0x7/0xb We expect krb5/udp to fail sometimes, but we don't expect it to trigger a BUG like this. Is this reproduceable? It looks like we passed something inconsistent to kunmap_atomic(). Possibly there's some bug in the crypto code, or perhaps we're setting up the something incorrectly in gss_wrap_req_priv. (In reply to comment #37) > We expect krb5/udp to fail sometimes, but we don't expect it to trigger a > BUG like this. Is this reproduceable? 2 automatic jobs all panic machine. > > It looks like we passed something inconsistent to kunmap_atomic(). > > Possibly there's some bug in the crypto code, or perhaps we're setting up > the something incorrectly in gss_wrap_req_priv. Ok, I will test again, try to fetch a vmcore! cc There's an odd mixture of bugs reported here, I'm not convinced they're all the same (e.g., comment 36 is on rhel5, and for a crash not an EIO?). For the reports on s390, it would be worth checking whether this is a dup of 1003528, in which case this should be fixed as of 2.6.32-491.el6 or so according to that bug. (In reply to J. Bruce Fields from comment #43) > For the reports on s390, it would be worth checking whether this is a dup of > 1003528, in which case this should be fixed as of 2.6.32-491.el6 or so > according to that bug. Jian Li, could you check this? I'm thinking for example about the problem you reported in comment 10. Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |