Bug 156148 (IT_71382)

Summary: automount/mount hang in __rpc_execute
Product: Red Hat Enterprise Linux 3 Reporter: Wendy Cheng <nobody+wcheng>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: davem, dduffey, kanderso, martin.bowers, mtonn, nhorman, petrides, staubach, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-07-08 12:42:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
memory barrier patch
none
tcpdump from the customer
none
log messages
none
out.tar.gz
none
This the last tcpdump attached that the customer wants to see it.
none
tcpdumps-3.tar.gz
none
tcpdump.
none
solaris binary trace
none
EMC packet dump, filtered to show the issue none

Description Wendy Cheng 2005-04-27 20:57:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040924

Description of problem:
Automount daemon (in the middle of mount) is stuck in D state with the following thread trace:
                                                                                
PID: 8100   TASK: f46c6000  CPU: 1   COMMAND: "automount"
#0 [f46c7d2c] context_switch at c0126104
#1 [f46c7d44] schedule at c012413f
#2 [f46c7d88] __rpc_execute at f91073d2
#3 [f46c7dc8] rpc_call_sync at f9102598
#4 [f46c7e84] nfs3_rpc_wrapper at f913814f
#5 [f46c7eac] nfs3_proc_getattr at f913828e
#6 [f46c7ed4] __nfs_revalidate_inode at f91305e0
#7 [f46c7f6c] nfs_revalidate at f91302ed
#8 [f46c7f84] sys_lstat64 at c016ebf4
#9 [f46c7fc0] system_call at c03f4068
                                                                                
Summary of the issue:
1. More like a (nfs) mount issue, instead of autofs issue (can be recreated without autofs).
1. Sampled mount option: "-hard,tcp,intr,bg,rsize=32768,wsize=32768,retry=100". 
2. The nfs server: Solaris based Sun heads, Celerra based EMC server, or Linux RHEL 3. 
3. When problem occurs, it only happens with that particular mount point where the process waits for (in nfs_revalidate). Accessing to other mount points to the same sever with other directories is working fine.

vmcore and network sniff traces are available. 

                                                                                
More data will be uploaded.



Version-Release number of selected component (if applicable):
2.4.21-31.EL.test.bz124600smp

How reproducible:
Didn't try

Steps to Reproduce:
1. (customer has a test system that can trigger this issue within 1 ~ 2 hours.)
2.
3.
  

Additional info:

Comment 3 Wendy Cheng 2005-04-27 21:18:19 UTC
Created attachment 113741 [details]
network sniffing trace

Please ignore the "nfs bindresvport: Address already in use" error messages in
one of the files zipped. We have identified it was from the workaround during
debugging sessions where the autofs was removed. It is *not* a major concern at
this moment. The real issue is the mount point hang from time to time randomly.

Comment 5 Wendy Cheng 2005-04-27 21:48:48 UTC
Also ignore the MVFS in the first vmcore. The problem shows up with and without
clear case. 

Comment 6 Issue Tracker 2005-04-27 23:01:45 UTC
From User-Agent: XML-RPC

Latest update from customer:



Our mount test succeeded 3 times in a row with no failures, so I am now
setting up 9 systems in production to see if user jobs bring out the
failure.  Instead of failing during testing, mount would print: 

 

mount: RPC: Timed out 

mount: backgrounding "<mount_point>" 

 

And then it would succeed in mounting the dir in the background.  If that
is where it would have otherwise gone into unint sleep, then this is
encouraging.  I will get the 9 systems online ASAP to see how it goes
there. 

 

Not to get ahead of myself, but can you begin preparing a U5 kernel for us
so we have a solution until U5 officially comes out (or whenever you get
this patch in a release): 

 

- This patch is installed 

- The patch from bugid 124600 is installed 

- It's not a debug kernel 

 

We only need the x86 and x86_64 versions of U5 like this, the ppc version
doesn't apply to us.  If we do wind up at the stance that this fixes the
issue, I don't that to be the first time we ask for this, that way our
downtime is decreased. 

 

I will update again later tonight with my progress. 



=====



Assuming this test kernel continues to work, we'll need a Hotfix before
U5. This is the formal request.


Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by marty
 issue 71382

Comment 8 Wendy Cheng 2005-04-27 23:40:58 UTC
Created attachment 113749 [details]
memory barrier patch

The test patch the customer is currently running ...

Comment 9 Wendy Cheng 2005-04-28 02:17:15 UTC
Customer update by gordanz: 

Well, nevermind on the kernel request, I spoke too soon.

I uploaded core-4-534000.tar.gz which is a dump where a csh went unint sleep on
you kernel -- untainted and without autofs.  The user typed "cd /proj/.ppc"
without hitting enter.  He then typed "Ctl-D", which in csh will show you all
the different ways you could finish the line, like an auto-complete.  That's
where it hung.

I also had an automount go unint sleep on one of the 9 production systems I
deployed running this kernel.  That core is currently uploading, called
core-5-534000.tar.gz.  This is an example of our original failure.  This kernel
is also tainted by ClearCase.

Comment 10 Issue Tracker 2005-04-28 02:40:56 UTC
From User-Agent: XML-RPC

From core-2: also a twin version of first core - automount pid=1798 forked
pid 20442 which got stuck in __rpc_execute; Pid 20442 subsequently blocked
a "tcsh" shell. And these two are the only two active threads in the
whole system !!!!



PID: 20442  TASK: d5d4c000  CPU: 1   COMMAND: "automount"

 #0 [d5d4dd2c] context_switch at c0126104

 #1 [d5d4dd44] schedule at c012413f

 #2 [d5d4dd88] __rpc_execute at f91073d2

 #3 [d5d4ddc8] rpc_call_sync at f9102598

 #4 [d5d4de84] nfs3_rpc_wrapper at f913814f

 #5 [d5d4deac] nfs3_proc_getattr at f913828e

 #6 [d5d4ded4] __nfs_revalidate_inode at f91305e0

 #7 [d5d4df6c] nfs_revalidate at f91302ed

 #8 [d5d4df84] sys_lstat64 at c016ebf4

 #9 [d5d4dfc0] system_call at c03f4068



PID: 2964   TASK: f71be000  CPU: 1   COMMAND: "tcsh"^M

#0 [f71bfbbc] context_switch at c0126104^M

#1 [f71bfbd4] schedule at c012413f^M

#2 [f71bfc18] __rpc_execute at f91073d2^M

#3 [f71bfc58] rpc_call_sync at f9102598^M

#4 [f71bfd14] nfs3_rpc_wrapper at f913814f^M

#5 [f71bfd3c] nfs3_proc_access at f91385fb^M

#6 [f71bfdf8] nfs_permission at f912ce4d^M

#7 [f71bfe20] permission at c0172c4c^M

#8 [f71bfe30] link_path_walk at c0173111^M

#9 [f71bfe70] __vfs_follow_link at c0177022^M

#10 [f71bfe90] nfs_follow_link at f9135211^M

#11 [f71bfea4] link_path_walk at c017335e^M

#12 [f71bfee4] __vfs_follow_link at c0177022^M

#13 [f71bff04] nfs_follow_link at f9135211^M

#14 [f71bff18] link_path_walk at c017335e^M

#15 [f71bff58] path_lookup at c0173aa4^M

#16 [f71bff68] __user_walk at c0173df4^M

#17 [f71bff84] sys_chdir at c0162d20^M

#18 [f71bffc0] system_call at c03f4068^M




This event sent from IssueTracker by wcheng
 issue 71382

Comment 11 Wendy Cheng 2005-04-28 03:08:15 UTC
Note that when the above vmcore was forced (via sysrq-c), all other threads
(including automount daemons) were working. So we definitely missed the wake-up
- not sure how to dig the rpc_task structure out (that's two stack frames down).
Look to me it is something with this loop:

                while (RPC_IS_SLEEPING(task)) {
                        /* sync task: sleep here */
                        dprintk("RPC: %4d sync task going to sleep\n",
                                                        task->tk_pid);
                        if (current->pid == rpciod_pid)
                                printk(KERN_ERR "RPC: rpciod waiting on sync
task!\n");
                                                                                
                        __wait_event(task->tk_wait, !RPC_IS_SLEEPING(task));
                        dprintk("RPC: %4d sync task resuming\n", task->tk_pid);
                                                                                
                        /*
                         * When a sync task receives a signal, it exits with
                         * -ERESTARTSYS. In order to catch any callbacks that
                         * clean up after sleeping on some queue, we don't
                         * break the loop here, but go around once more.
                         */
                        if (task->tk_client->cl_intr && signalled()) {
                                dprintk("RPC: %4d got signal\n", task->tk_pid);
                                task->tk_flags |= RPC_TASK_KILLED;
                                rpc_exit(task, -ERESTARTSYS);
                                rpc_wake_up_task(task);
                        }
                }
        }


Comment 12 Wendy Cheng 2005-04-28 03:55:50 UTC
So this problem seems to be triggered by three events:

* nfs server is slow to response.
* autmount is in the middle of mounting (nfs_revalidate).
* user access to that particular mount point but the process signalled.


Comment 13 Geronimo A. Ordanza II 2005-04-28 11:55:37 UTC
Customer update
---------------

Here is an strace of a shell (similar to the one) that did the Ctrl-D that
exposed the D (with commentary for your entertainment):

## Prior stuff removed to save space. We are somewhere around the shell prompt.

brk(0x8529800)                          = 0x8529800
close(0)                                = 0
munmap(0xb73d7000, 4096)                = 0
alarm(0)                                = 0
ioctl(16, FIONREAD, [0])                = 0
ioctl(16, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) =
0
ioctl(16, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0

## Shell prompt delivered to user below:

write(17, "cde-tx32-lc093:1% ", 18cde-tx32-lc093:1% )     = 18

## Unsuspecting user starts typing: ls /proj/.ppc

read(16, "l", 1)                        = 1
write(17, "l", 1l)                       = 1
read(16, "s", 1)                        = 1
write(17, "s", 1s)                       = 1
read(16, " ", 1)                        = 1
write(17, " ", 1 )                       = 1
read(16, "/", 1)                        = 1
write(17, "/", 1/)                       = 1
read(16, "p", 1)                        = 1
write(17, "p", 1p)                       = 1
read(16, "r", 1)                        = 1
write(17, "r", 1r)                       = 1
read(16, "o", 1)                        = 1
write(17, "o", 1o)                       = 1
read(16, "j", 1)                        = 1
write(17, "j", 1j)                       = 1
read(16, "/", 1)                        = 1
write(17, "/", 1/)                       = 1
read(16, ".", 1)                        = 1
write(17, ".", 1.)                       = 1
read(16, "p", 1)                        = 1
write(17, "p", 1p)                       = 1
read(16, "p", 1)                        = 1
write(17, "p", 1p)                       = 1
read(16, "c", 1)                        = 1
write(17, "c", 1c)                       = 1

## User gets tired and hits Ctrl-D not knowing the tragic events that are about
to unfold:

read(16, "\4", 1)                       = 1
write(17, "\r\n", 2
)                    = 2

## Ctr-D is captured and the beginning of the end starts:

rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], [INT], 8) = 0
open("/proj/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 0
fstat64(0, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
fcntl64(0, F_SETFD, FD_CLOEXEC)         = 0
lseek(0, 0, SEEK_SET)                   = 0
getdents64(0, /* 122 entries */, 4096)  = 4096
lstat64("/proj/.ppc_apollo8x_ver3", {st_mode=S_IFDIR|0755, st_size=1024, ...}) =
 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], [INT], 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], [INT], 8) = 0

## lstat64 domino effects starts:

lstat64("/proj/.ppc_25", {st_mode=S_IFDIR|S_ISGID|0770, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_18", {st_mode=S_IFDIR|0750, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_53", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_44", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_35", {st_mode=S_IFDIR|S_ISUID|S_ISGID|0770, st_size=1024, ..
.}) = 0
lstat64("/proj/.ppc_05", {st_mode=S_IFDIR|S_ISUID|S_ISGID|0750, st_size=1024, ..
.}) = 0
lstat64("/proj/.ppc_04", {st_mode=S_IFDIR|0775, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_67", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_14", {st_mode=S_IFDIR|0770, st_size=2048, ...}) = 0
lstat64("/proj/.ppc_13", {st_mode=S_IFDIR|0770, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_09", {st_mode=S_IFDIR|S_ISGID|0770, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_draco8_integ_06", {st_mode=S_IFDIR|S_ISGID|0775, st_size=102
4, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], [INT], 8) = 0
lstat64("/proj/.ppc_scr1", {st_mode=S_IFDIR|S_ISGID|0775, st_size=512, ...}) = 0
lstat64("/proj/.ppc_02", {st_mode=S_IFDIR|S_ISGID|0770, st_size=1024, ...}) = 0
lstat64("/proj/.ppc_altair_unitver",

## This is where it hangs

Comment 14 Geronimo A. Ordanza II 2005-04-28 12:00:58 UTC
Created attachment 113767 [details]
tcpdump from the customer

Comment 15 Geronimo A. Ordanza II 2005-04-28 12:03:31 UTC
Customer update:
----------------

I was able to recreate the problem and had logging turned up as requested.  The
messages file is attached.

Unfortunately the sysrq-c seems to have failed.  It created a log, but the
vmcore is incomplete (it's zero length, doesn't look like it started).  I
attached the log.

Please let me know if I need to try again or if what you have is sufficient.

Thanks,
Marty

--------

I'm not sure if it has any relevance, but I want to be sure the backend
engineers saw my observation at the beginning of this ticket as it might provide
a lead.

When the processes go uninterruptable according to ps, vmstat doesn't show any
blocked processes as it should (and has in the past when I have dealt with unint
processes).  Also, /proc/stat doesn't show any blocked processes either.

Like I said, it may not be relevant, but that behavior is not the way Linux has
been for me in the past.  Just wanted to throw that out there..

-Marty

Comment 16 Geronimo A. Ordanza II 2005-04-28 12:17:54 UTC
Created attachment 113768 [details]
log messages

Comment 18 Steve Dickson 2005-04-28 15:29:14 UTC
Looking at these (from the log in Comment #16):
RPC:      tcp_data_ready...
RPC:      tcp_state_change client f703b000...
RPC:      state 8 conn 1 dead 0 zapped 0
RPC:      disconnected transport f703b000
RPC:      tcp_data_ready...

It appears the TCP connection is dying.... A ethereal trace
of what (if anything) is going over the wire at this point
would be good.

Comment 20 Steve Dickson 2005-04-28 19:21:04 UTC
Looking the vmcore that was given to Wendy, it appear the automount
and a process named "sbatchd" are both stuck hung reconnecting
their socket to the NFS server.  The state of both sockets are
SS_CONNECTING which means "in process of connecting".
Which could mean one of a few things:

Even though the sockets are in "connecting mode" the connection
request (i.e. the TCP SYN) is not being sent to the server . With an
ethereal -w trace of the client, we should be able verify this one
way or the other.

Assuming that the client is sending out the  connect requests (which is
probably a pretty safe assumption),  the next step is to figure why
we are having such a hard time connecting to the server. This is
where a tcpdump from the server's point view would be handy. From
that, we could tell if the client's connect requests are  or are not
making it to the server. 

If the requests are not getting to the server , then the network (i.e. an
network appliance on the network) is dropping packets. We should need
to know the network topology to pursue this further.

If the connection requests are making it to the server and the server
is either dropping them (i.e. not sending any response which is probably
the case) or resetting them (i.e. sending a TCP RST) then we should work
with the EMC people to see what the NFS client can do to resolve the problem.

Either way we get in touch with the EMC people so we can work this problem
from both ends.... imho...






Comment 23 Steve Dickson 2005-04-28 21:26:15 UTC
Here are the steps I took to look at the state of RPC
socket that is being used by the hung automounter
process:

1) Start crash. If everything works you should be sitting
   at a crash prompt like "crash> " (after a bunch system info is
   displayed)

2) Find the process that is hung by doing a
   'foreach bt' (which gives a backtrace of
    all the running processes. Look for a stack
    that looks like:

crash> foreach bt
PID: 20442  TASK: d5d4c000  CPU: 1   COMMAND: "automount"
 #0 [d5d4dd2c] context_switch at c0126104
 #1 [d5d4dd44] schedule at c012413f
 #2 [d5d4dd88] __rpc_execute at f91073d2
 #3 [d5d4ddc8] rpc_call_sync at f9102598
 #4 [d5d4de84] nfs3_rpc_wrapper at f913814f
 #5 [d5d4deac] nfs3_proc_getattr at f913828e
 #6 [d5d4ded4] __nfs_revalidate_inode at f91305e0
 #7 [d5d4df6c] nfs_revalidate at f91302ed
 #8 [d5d4df84] sys_lstat64 at c016ebf4
 #9 [d5d4dfc0] system_call at c03f4068
    EAX: 000000c4  EBX: bfff7c40  ECX: bfff6a9c  EDX: 00ac8898
    DS:  002b      ESI: bfff6a9c  ES:  002b      EDI: 00000003
    SS:  002b      ESP: bfff6a90  EBP: bfff6b08
    CS:  0023      EIP: 00a62313  ERR: 000000c4  EFLAGS: 00000202

3) Now dump the stack of the hung process with the "bt -f pid" command
    In my case its "bt -f 20442" which shows:

crash> bt -f 20442
PID: 20442  TASK: d5d4c000  CPU: 1   COMMAND: "automount"
 #0 [d5d4dd2c] context_switch at c0126104
    [RA: c0124144  SP: d5d4dd2c  FP: d5d4dd44  SIZE: 28]
    d5d4dd2c: d5d4dd40  f9103922  c043d180  00000001
    d5d4dd3c: dbec0000  d5d4dd84  c0124144
 #1 [d5d4dd44] schedule at c012413f
    [RA: f91073d7  SP: d5d4dd48  FP: d5d4dd88  SIZE: 68]
    d5d4dd48: c043dc80  d5d4c000  dbec0000  f9104d90
    d5d4dd58: f9110262  c043e120  d5d4c000  c043dc80
    d5d4dd68: c043e11c  c043e58c  00000000  00000001
    d5d4dd78: d5d4c000  d5d4dde4  d5d4de58  d5d4dd98
    d5d4dd88: f91073d7
 #2 [d5d4dd88] __rpc_execute at f91073d2
    [RA: f910259d  SP: d5d4dd8c  FP: d5d4ddc8  SIZE: 64]
    d5d4dd8c: d5d4dde4  c03a9100  d5d4dde4  00000000
    d5d4dd9c: d5d4c000  d5d4de5c  d5d4de5c  00000000
    d5d4ddac: 00000000  00000000  d5d4dde4  00000000
    d5d4ddbc: f661a580  d5d4dde4  d5d4dddc  f910259d
 #3 [d5d4ddc8] rpc_call_sync at f9102598
    [RA: f9138154  SP: d5d4ddcc  FP: d5d4de84  SIZE: 188]
    d5d4ddcc: d5d4dde4  d5d4debc  00000000  00000000
    d5d4dddc: fffffeff  ffffffff  f0b49c74  f4d8905c
    d5d4ddec: 00f00baa  f66e088c  f0b49c80  f661a580
    d5d4ddfc: f4d89128  00000000  f4d8905c  00000001
    d5d4de0c: eb73b2bc  d5d4def0  ec1d5e80  f7e86800
    d5d4de1c: 00010202  f9104d90  00000000  f9102e60
    d5d4de2c: 00000000  00000000  00000000  eca021a0
    d5d4de3c: c051e560  021d9fbb  d5d4dde4  f91066e0
    d5d4de4c: 00000000  00000000  c0386280  00000001
    d5d4de5c: d5d4dda0  d5d4dda0  0000ea60  00010020
    d5d4de6c: 00000001  000006b2  eb73b180  d5d4c000
    d5d4de7c: f661a580  00000000  f9138154
 #4 [d5d4de84] nfs3_rpc_wrapper at f913814f
    [RA: f9138293  SP: d5d4de88  FP: d5d4deac  SIZE: 40]
    d5d4de88: *f661a580*  d5d4debc  00000000  00000000
    d5d4de98: 00000000  eb73b180  d5d4def0  d5d4def0
    d5d4dea8: bfff7c40  f9138293
[.......]
crash>

    Use the source, its know that that first argument to the rpc_call_sync()
    call is the rpc client handle structure which is also the last thing pushed
    on the stack. So looking the routine that called rpc_call_sync(), in this
    case its nfs3_rpc_wrapper(), we see that f661a580 is the last address
    that was pushed on the stack before the call to rpc_call_sync().
    So f661a580 should be the rpc client handle structure

4) Now dump out the rpc client handle structure (i..e struct rpc_clnt) by
     typing "rpc_clnt addr" . In my case its rpc_clnt f661a580  which shows:
crash> rpc_clnt f661a580
struct rpc_clnt {
  cl_users = {
    counter = 3
  },
  cl_xprt = 0xf4d89000,
  cl_procinfo = 0xf9143520,
  cl_maxproc = 22,
  cl_server = 0xe2d9f480 "tx32-fs01", 
  cl_protname = 0xf913f851 "nfs",
[.....]
crash>

     Note the address of the 'cl_xprt' field.

5) Next dump out the rpc transport structure (i.e. struct rpc_xprt) by
    typing "rpc_xprt addr". In my case its rpc_xprt 0xf4d89000 which shows:

crash> rpc_xprt 0xf4d8900
struct rpc_xprt {
  sock = 0xeda059b4,
  inet = 0xf2bd9680,
  timeout = {
    to_current = 700,
    to_initval = 700,
    to_maxval = 60000,
    to_increment = 3478626096,
    to_resrvval = 3222405346,
    to_retries = 3,
    to_exponential = 1 '\001'
  },
[...]
crash>

   Note the address of the 'sock' field

6) Finally dump out the socket structure (i.e. struct socket ) by typing
    "socket  addr". In my case its socket 0xeda059b4 which shows:

crash> socket 0xeda059b4
struct socket {
  state = SS_CONNECTING,
  flags = 0,
  ops = 0xc03ed920,
  inode = 0xeda05880,
  fasync_list = 0x0,
crash>

Note the "state" field. its SS_CONNECTING which means the socket is
trying to connect with the server.

I hope this helps... 



Comment 30 Issue Tracker 2005-04-29 16:23:23 UTC
From User-Agent: XML-RPC

Summery of Call:



1.

 Steve wanted to make sure that the linux client was attempting to make the
connection.

If the client is sending the request out. Where is the connection being
dropped(network/server). Trace on an already hung process. An ethereal -w
on client and rpc -t to see what goes out.



2.

 Freescale will be sending new vmcore without checkpoint turned on. 





3 .

EMC will look if there is a way server side to tcpdump and filter for an
already hung client. EMC will also send a TCP dump.



4.

Requested that EMC look at vmcore file.



Client notes:

The Linux clients have on average about 50 connections, some have up two
400 connections.



Joe Kachuck


This event sent from IssueTracker by jkachuck
 issue 71382

Comment 32 Wendy Cheng 2005-04-29 19:55:18 UTC
Two data points:

* when a NAS box is performing any form of backup (checkpoint, snapshot, etc),
the system capacity would drop. Some implementations can even temporarily
suspend the access. 
* The Solaris NFS client will automatically switch its protocol to NFS-UDP when
it can't go thru NFS-TCP connection, except for NFSv4 mounts or when the
protocol is explicitly specified on the command line. 


Comment 33 Wendy Cheng 2005-04-29 20:00:55 UTC
s/Solaris NFS client/Solaris NFS mount request/ in comment #32.

Comment 34 Narsi Subramanian 2005-04-29 20:18:20 UTC
Created attachment 113873 [details]
This the last tcpdump attached that the customer wants to see it.

Comment 36 Michael Napolis 2005-04-30 01:22:24 UTC
The file tcpdumps-3.tar.gz is now uploaded.  It has a dump from 3 different
points, the linux client (client.3), the EMC server (server.3), and the Cisco
SPAN port (line.3).  The client had a single unint automount process, the same
as in the last dump. 
 
There is also a capture of Solaris traffic from a single host to the same EMC
server we have been working with, called server_sun.1. 
 
Please let us know asap if you have any questions or need us to do anything else. 
 
Thx!

Comment 37 Michael Napolis 2005-04-30 01:24:28 UTC
Created attachment 113879 [details]
tcpdumps-3.tar.gz

Comment 38 Narsi Subramanian 2005-04-30 01:24:34 UTC
Created attachment 113880 [details]
tcpdump.

Comment 39 Narsi Subramanian 2005-04-30 01:28:56 UTC
Ignore my attachment as it is identical to what Michael posted.

-narsi.

Comment 40 Michael Napolis 2005-04-30 01:35:10 UTC
One thing to note, the server.sun.1 was not taken @ the same time, hence the .1
extension..

crm update time from customer: 29-APR-2005 18:51:29

Comment 41 Michael Napolis 2005-04-30 04:29:13 UTC
Created attachment 113881 [details]
solaris binary trace

Comment 42 Steve Dickson 2005-04-30 10:41:39 UTC
Comparing the Solaris trace with either server.3 or client.3 trace
the one thing that is interesting is the different type of file handles
that the server seems to hand out. This could be just an ethereal
anomaly but, the majority of the time ethereal says the file handle type
is a "NetApp file handle" but now and then ethereal reports the type
as "unknown". See packets 35 and 38 in the Solaris trace and
packets 3 and 24 in the client.3 trace.

Now the notable thing is whenever the Linux client uses one of these unknown
file handle types, the EMC server drops the NFS requests on the floor
but when a Solaris use an unknown file handle type, the server
response with the appropriate response.




Comment 44 Wendy Cheng 2005-04-30 15:57:22 UTC
Previous discussed workaround (shortening RPC-TCP timeout to force retransmit)
won't work for this problem since that (theoretically impossible) RPC-TCP
retransmit (600 seconds) has been kicked off and the server has been (keep)
dropping that particular getattr packet.



Comment 48 Wendy Cheng 2005-04-30 20:24:07 UTC
This issue is boiled down to why the client's GETATTR request never gets replied
from the server when this hang occurs.

1. The GETATTR requests were re-transmitted twice and their arrivals into the
server side were captured in server.3 (from tcpdumps-3.tar.gz) as (frame No. 3,
31, 49). The server never replied.
2. Linux client has no business with file handle. However, by peeking around the
successful GETATTR calls and failed one, the difference lies in the lower 16
bytes of file handle. The first 16 bytes and lower 16 bytes were identical in
bad call but different in good calls. So look from the surface, the FH was bad.
Why didn't linux client get a stale filehandle error reply ? Was it dropped in
the server's RPC layer ? 



The fothe three questionable 

Comment 49 Narsi Subramanian 2005-05-01 12:45:37 UTC
EMC acknowledged soft scsi errors and issues with the clarion backend storage.
They confirmed two bad disks and requested for immediate replacement. They will
be rebooting the datamover and/or do a failover so that memory maybe more
contigious. Why would solaris client work with the same EMC server?.  What is
the result of big fluctuation with TCP window sizes on the EMC server? -narsi.


Comment 55 David Duffey 2005-05-02 18:47:56 UTC
Created attachment 113932 [details]
EMC packet dump, filtered to show the issue

This is a section of a 3 gig packet dump on the EMC that has been filtered that
demonstrate the problem.