Bug 1494834

Summary: NFS gets hung after upgrade to 7.4 (CentOS)
Product: Red Hat Enterprise Linux 7 Reporter: Nikolaos Milas <nmilas>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Filesystem QE <fs-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: nmilas, yoyang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-25 13:00:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Related excerpt from /var/log/messages
none
Full messages file after reboot, including nfs mounts in /etc/fstab
none
TCP Dump between the box and the NFS server - Test on 2017-10-06
none
/var/log/messages file for the period that the test on 2017-10-06 was performed none

Description Nikolaos Milas 2017-09-23 12:35:25 UTC
Created attachment 1329921 [details]
Related excerpt from /var/log/messages

Description of problem:

After an upgrade to 7.4 (which, among several hundred updates, includes rpcbind-0.2.0-42.el7.x86_64) we have started having NFS issues: NFS communication hungs. In /var/log/messages:

-----------------------------------------------------------------------------------------
...
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support.
Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996 okir.de).
Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
...
Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1...
Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart...
Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting
Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart.
Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon.
Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded
Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs 'nfs' registered for caching
Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1.
Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems.
Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
...
Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
...
Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
...
-----------------------------------------------------------------------------------------

We tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it didn't help. (This solved a problem earlier, caused by https://bugzilla.redhat.com/show_bug.cgi?id=1454876). 

We have confirmed the above behavior multiple times.

We can mount either directly:

  mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 -t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2

or through /etc/fstab:

  10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0

The box may even hung during reboot, which has never happened in the past and it needs a hard reboot (via VM admin console) to boot again.

I have confirmed the above behavior multiple times.


Version-Release number of selected component (if applicable):

CentOS Linux release 7.4.1708 (Core)

# uname -r
3.10.0-693.2.2.el7.x86_64
# rpm -qa | grep rpcbind
rpcbind-0.2.0-42.el7.x86_64
# rpm -qa | grep nfs
libnfsidmap-0.25-17.el7.x86_64
nfs-utils-1.3.0-0.48.el7.x86_64

How reproducible:

Always

Steps to Reproduce:

1. Mount NFS share directly or via /etc/fstab and try various operations. I worked with rsync and simple directory listing (ls).

Actual results:

The NFS mounted path becomes inaccessible and causes the SSH connection to hung too.

Expected results:

The mounted share should be fully accessible for directory and file use.

Additional info:

The remote system (NFS Server) publishing the share is an EMC DD2500 (supporting NFS v3).

I tried to debug rpc using rpcdebug. I set:

# rpcdebug -m rpc -s all
# rpcdebug -m nfs -s all

and then mounted the nfs share again. The mount worked fine (as always):

[root@hesperia1 ~]# mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 -t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2
mount.nfs: trying text-based options 'nolock,bg,nfsvers=3,intr,tcp,actimeo=1800,addr=10.201.40.34'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.201.40.34 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 10.201.40.34 prog 100005 vers 3 prot TCP port 2052

and then I tried to list the mounted directory, but in the process it failed (this is the actual problem):

[root@hesperia1 ~]# ls -la /hesperiamount <I did not finish typing and it hang.>
Disconnecting: Timeout, server not responding.

Among the logged output (in /var/log/messages, see below), I found many timeouts, mostly minor and a couple of major, probably causing all the issues I am facing. (I didn't have any such problems when running with CentOS 7.3 and rpcbind-0.2.0-38.el7.x86_64.)

Then, on a new SSH session:

[root@hesperia1 ~]# umount /hesperiamount2
umount.nfs: /hesperiamount2: device is busy
[root@hesperia1 ~]#
[root@hesperia1 ~]# umount /hesperiamount2
umount.nfs: /hesperiamount2: device is busy
[root@hesperia1 ~]#
...
[root@hesperia1 ~]# umount /hesperiamount2

I am attaching all the logged session (I have removed irrelevant messages) as recorded in /var/log/messages.

Note: Network reliability is quite high, as can be indicated e.g. by a nping:

--------------------------------------------------------------------------------------------------------------------------
[root@hesperia1 ~]# nping --tcp -c 200 -p 2049 10.201.40.34

Starting Nping 0.6.40 ( http://nmap.org/nping ) at 2017-09-23 09:18 UTC
SENT (0.0160s) TCP 195.251.204.197:52359 > 10.201.40.34:2049 S ttl=64 id=50655 iplen=40 seq=1643036899 win=1480
RCVD (0.0185s) TCP 10.201.40.34:2049 > 195.251.204.197:52359 SA ttl=64 id=0 iplen=44 seq=4221839098 win=14600 <mss 1380>
...
SENT (199.3056s) TCP 195.251.204.197:52359 > 10.201.40.34:2049 S ttl=64 id=50655 iplen=40 seq=1643036899 win=1480
RCVD (199.3079s) TCP 10.201.40.34:2049 > 195.251.204.197:52359 SA ttl=64 id=0 iplen=44 seq=3607435522 win=14600 <mss 1380>
 
Max rtt: 10.897ms | Min rtt: 2.085ms | Avg rtt: 2.372ms
Raw packets sent: 200 (8.000KB) | Rcvd: 200 (8.800KB) | Lost: 0 (0.00%)
Nping done: 1 IP address pinged in 199.32 seconds
--------------------------------------------------------------------------------------------------------------------------

Hence, timeouts are not caused by network performance, congestion, or other issues. 

Please, let me know of how to proceed with this.

This report is also at: https://bugs.centos.org/view.php?id=13891

Comment 2 Nikolaos Milas 2017-09-29 21:13:08 UTC
After further tests, it seems the problem occurs mainly when mounting at boot time (through /etc/fstab). 

I have managed to work successfully multiple times when mounting manually, but it is important for us to be able to mount the NFS share at boot time, through /etc/fstab. Mounting through /etc/fstab fails every time.

Please advise.

Comment 3 Nikolaos Milas 2017-09-30 15:18:04 UTC
Created attachment 1332697 [details]
Full messages file after reboot, including nfs mounts in /etc/fstab

Full output from /var/log/messages after reboot, when nfs mounts exist in /etc/fstab. 

The following actions were made during this period (notice how nfs hangs): 

[Parallel Session 1 right after boot (see parallel session 2 below)]

[root@hesperia1 ~]# df -h
Filesystem                              Size  Used Avail Use% Mounted on
/dev/mapper/centos-root                  46G  7.6G   39G  17% /
devtmpfs                                1.9G     0  1.9G   0% /dev
tmpfs                                   1.9G     0  1.9G   0% /dev/shm
tmpfs                                   1.9G  8.6M  1.9G   1% /run
tmpfs                                   1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/vg2-lv1                     100G   94G  6.2G  94% /hesperiamount
/dev/vda1                               497M  216M  282M  44% /boot
10.201.40.34:/data/col1/noc-bkups-1      11T  1.1T  9.4T  10% /mnt/dd2500-1
10.201.40.34:/data/col1/hesperia-mount   11T  1.1T  9.4T  10% /hesperiamount2
tmpfs                                   380M     0  380M   0% /run/user/998
tmpfs                                   380M     0  380M   0% /run/user/1001
tmpfs                                   380M     0  380M   0% /run/user/0

[root@hesperia1 ~]# ls -la /hesperiamount2/isnet1/
total 1315
drwxr-xr-x 16 isnet1 isnet   3604 Apr 26 15:26 .
drwxrwxrwx  6 root   root     274 Mar  2  2017 ..
drwxr-xr-x  5 isnet1 isnet    857 Jul 17 09:47 AGGELIS_DEBUG
-rwxr-xr-x  1 isnet1 isnet  15929 Mar 24  2017 alert_processes
-rw-r--r--  1 isnet1 isnet    240 Mar 24  2017 ALERT_PROCESSES.ini
-rw-------  1 isnet1 isnet 819474 Apr 26 15:47 .bash_history
-rw-r--r--  1 isnet1 isnet    220 Apr  9  2016 .bash_logout
-rw-r--r--  1 isnet1 isnet    193 Aug  2  2016 .bash_profile
-rw-r--r--  1 isnet1 isnet   3625 Apr 25  2016 .bashrc
-rw-r--r--  1 isnet1 isnet   3515 Apr  9  2016 .bashrc_old
-rwxr-xr-x  1 isnet1 isnet     42 Feb 17  2017 ChangeTonano
-rwxr-xr-x  1 isnet1 isnet  12010 Mar 24  2017 check_processes_work_well
-rw-r--r--  1 isnet1 isnet    138 Jan 15  2017 CHECK_PROCESSES_WORK_WELL.ini
-rwxr-xr-x  1 isnet1 isnet  10287 Feb  5  2017 check_processes_work_well.save
drwx------  4 isnet1 isnet    207 Feb 23  2017 .config
-rwxr-xr-x  1 isnet1 isnet    606 Mar 24  2017 cronreleaserealtime
-rwxr-xr-x  1 isnet1 isnet   1308 Mar 24  2017 cronumasep500
drwxr-xr-x  2 isnet1 isnet    412 Mar 26  2017 Database
-rwxr-xr-x  1 isnet1 isnet    266 Mar 24  2017 GetReleaseToLocalhost
-rwxr-xr-x  1 isnet1 isnet    346 Feb 17  2017 GetUmasepLastFile
-rwxr-xr-x  1 isnet1 isnet    211 Feb 13  2017 GetUmasepToLocalhost
-rwxr-xr-x  1 isnet1 isnet    117 Apr 26 15:05 GetUmasepToLocalhostHTTP
drwxr-xr-x  3 isnet1 isnet    164 Feb 19  2017 hesperiamount
-rwxr-xr-x  1 isnet1 isnet  12203 Mar 24  2017 kernel_email
-rw-r--r--  1 isnet1 isnet     76 Mar 24  2017 KERNEL_EMAIL.ini
-rw-r--r--  1 isnet1 isnet    172 Nov  3  2015 .kshrc
-rw-r--r--  1 isnet1 isnet 135581 Jun  1 16:58 Latest_SEP_500_estimations_2017_06_01.txt
-rw-------  1 isnet1 isnet     43 Jul 17 08:09 .lesshst
drwxr-xr-x  3 isnet1 isnet    155 Feb 23  2017 .local
drwxr-x---  2 isnet1 isnet  11237 Sep 29 04:02 log
drwx------  2 isnet1 isnet    101 Apr 27  2016 Mail
-rw-------  1 isnet1 isnet   7941 Apr 26 15:26 .mysql_history
-rw-------  1 isnet1 isnet     17 Feb 11  2017 .nano_history
-rw-------  1 isnet1 isnet 200281 Apr  8 14:57 nohup.out
-rw-r--r--  1 isnet1 isnet    675 Feb 23  2017 .profile
lrwxrwxrwx  1 root   root      23 Feb 20  2017 release -> /hesperiamount/release1
drwxr-xr-x  2 isnet1 isnet    353 Mar 24  2017 RELEASE_ALERT_IMAGES
-rwxr-xr-x  1 isnet1 isnet  18967 Mar 24  2017 release_epam_realtime
-rw-r--r--  1 isnet1 isnet    382 Feb 19  2017 RELEASE_EPAM_REALTIME.ini
-rwxr-xr-x  1 isnet1 isnet  18970 Mar 24  2017 release_ephin_realtime
-rw-r--r--  1 isnet1 isnet    382 Jan 16  2017 RELEASE_EPHIN_REALTIME.ini
drwxr-xr-x  6 isnet1 isnet    413 Jan 14  2017 release_local
drwxr-xr-x  2 isnet1 isnet    174 Feb 13  2017 RELEASE_realtime
-rwxr-xr-x  1 isnet1 isnet    237 Jan 19  2017 ReleaseToComp1
-rwxr-xr-x  1 isnet1 isnet     95 Jul  4  2016 sarlmove
-rw-r--r--  1 isnet1 isnet     66 Apr 25  2016 .selected_editor
-rwxr-xr-x  1 isnet1 isnet   1610 May 20 20:18 send_email
-rwxr-xr-x  1 isnet1 isnet   1433 Mar 24  2017 send_email.py
-rw-r--r--  1 isnet1 isnet   1222 Mar 24  2017 send_email.pyc
drwx------  2 isnet1 isnet    275 Apr 26  2016 .ssh
-rwxr-xr-x  1 isnet1 isnet  21019 Apr  8 14:34 umasep500_1_minute
-rw-r--r--  1 isnet1 isnet    253 Apr 26 14:53 UMASEP_500.ini
drwxr-xr-x  4 isnet1 isnet    207 Jan 10  2017 UMASEP_500MEV_IMAGES
drwxr-xr-x  2 isnet1 isnet  19063 Sep 17 00:02 UMASEP_realtime
-rwxr-xr-x  1 isnet1 isnet    107 Jun 22  2016 UmasepToComp1
drwxr-xr-x  2 isnet1 isnet    403 Apr  9 18:46 webform
-rw-------  1 isnet1 isnet    171 Feb 14  2017 .Xauthority

[root@hesperia1 ~]# rpcdebug -v -m rpc -s all
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache

Module     Valid flags
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache
[root@hesperia1 ~]# 
[root@hesperia1 ~]# rpcdebug -v -m nfs -s all
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state

Module     Valid flags
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# Disconnecting: Timeout, server not responding.
<session hung>

[Parallel Session 2 (right after boot) on another terminal]

[root@hesperia1 ~]# rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
sending incremental file list
RELEASE_ALERT_IMAGES/release_alert_merged_plots.png
      310091 100%    8.82MB/s    0:00:00 (xfer#1, to-check=1062/1153)
Disconnecting: Timeout, server not responding.



[New Terminal Session Follows, when the above hung, but before they display "Timeout, server not responding"]

[root@hesperia1 ~]# ps axjf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
    2     3     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/0]
    2     4     0     0 ?           -1 S        0   0:00  \_ [kworker/0:0]
    2     5     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:0H]
    2     6     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:0]
    2     7     0     0 ?           -1 S        0   0:00  \_ [migration/0]
    2     8     0     0 ?           -1 S        0   0:00  \_ [rcu_bh]
    2     9     0     0 ?           -1 S        0   0:00  \_ [rcu_sched]
    2    10     0     0 ?           -1 S        0   0:00  \_ [watchdog/0]
    2    11     0     0 ?           -1 S        0   0:00  \_ [watchdog/1]
    2    12     0     0 ?           -1 S        0   0:00  \_ [migration/1]
    2    13     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/1]
    2    14     0     0 ?           -1 S        0   0:00  \_ [kworker/1:0]
    2    15     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:0H]
    2    16     0     0 ?           -1 S        0   0:00  \_ [watchdog/2]
    2    17     0     0 ?           -1 S        0   0:00  \_ [migration/2]
    2    18     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/2]
    2    19     0     0 ?           -1 S        0   0:00  \_ [kworker/2:0]
    2    20     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:0H]
    2    21     0     0 ?           -1 S        0   0:00  \_ [watchdog/3]
    2    22     0     0 ?           -1 S        0   0:00  \_ [migration/3]
    2    23     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/3]
    2    24     0     0 ?           -1 S        0   0:00  \_ [kworker/3:0]
    2    25     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:0H]
    2    27     0     0 ?           -1 S        0   0:00  \_ [kdevtmpfs]
    2    28     0     0 ?           -1 S<       0   0:00  \_ [netns]
    2    29     0     0 ?           -1 S        0   0:00  \_ [khungtaskd]
    2    30     0     0 ?           -1 S<       0   0:00  \_ [writeback]
    2    31     0     0 ?           -1 S<       0   0:00  \_ [kintegrityd]
    2    32     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2    33     0     0 ?           -1 S<       0   0:00  \_ [kblockd]
    2    34     0     0 ?           -1 S<       0   0:00  \_ [md]
    2    35     0     0 ?           -1 S        0   0:00  \_ [kworker/0:1]
    2    36     0     0 ?           -1 S        0   0:00  \_ [kworker/1:1]
    2    37     0     0 ?           -1 S        0   0:00  \_ [kworker/2:1]
    2    38     0     0 ?           -1 S        0   0:00  \_ [kworker/3:1]
    2    40     0     0 ?           -1 S        0   0:00  \_ [kswapd0]
    2    41     0     0 ?           -1 SN       0   0:00  \_ [ksmd]
    2    42     0     0 ?           -1 SN       0   0:00  \_ [khugepaged]
    2    43     0     0 ?           -1 S<       0   0:00  \_ [crypto]
    2    51     0     0 ?           -1 S<       0   0:00  \_ [kthrotld]
    2    52     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:1]
    2    53     0     0 ?           -1 S<       0   0:00  \_ [kmpath_rdacd]
    2    54     0     0 ?           -1 S<       0   0:00  \_ [kpsmoused]
    2    55     0     0 ?           -1 S<       0   0:00  \_ [ipv6_addrconf]
    2    74     0     0 ?           -1 S<       0   0:00  \_ [deferwq]
    2   106     0     0 ?           -1 S        0   0:00  \_ [kauditd]
    2   286     0     0 ?           -1 S<       0   0:00  \_ [ata_sff]
    2   300     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_0]
    2   301     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_0]
    2   302     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_1]
    2   303     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_1]
    2   304     0     0 ?           -1 S<       0   0:00  \_ [ttm_swap]
    2   356     0     0 ?           -1 S        0   0:00  \_ [kworker/3:2]
    2   357     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:1H]
    2   359     0     0 ?           -1 S        0   0:00  \_ [kworker/2:2]
    2   399     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   400     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   411     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   412     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   425     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   426     0     0 ?           -1 S<       0   0:00  \_ [xfsalloc]
    2   427     0     0 ?           -1 S<       0   0:00  \_ [xfs_mru_cache]
    2   428     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-0]
    2   429     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-0]
    2   430     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-0]
    2   431     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-0]
    2   432     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   433     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-0]
    2   434     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   435     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-0]
    2   542     0     0 ?           -1 S<       0   0:00  \_ [rpciod]
    2   543     0     0 ?           -1 S<       0   0:00  \_ [xprtiod]
    2   587     0     0 ?           -1 S        0   0:00  \_ [kworker/1:2]
    2   592     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:1H]
    2   602     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/vda1]
    2   603     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/vda1]
    2   604     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/vda1]
    2   605     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/vda1]
    2   606     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/vda]
    2   607     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/vda1]
    2   608     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/v]
    2   609     0     0 ?           -1 S        0   0:00  \_ [xfsaild/vda1]
    2   611     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:1H]
    2   614     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   615     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   622     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-2]
    2   623     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-2]
    2   624     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-2]
    2   625     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-2]
    2   626     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   627     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-2]
    2   628     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   629     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-2]
    2   782     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:1H]
    2   976     0     0 ?           -1 S<       0   0:00  \_ [nfsiod]
    2 11912     0     0 ?           -1 S        0   0:00  \_ [kworker/1:3]
    2 12394     0     0 ?           -1 S        0   0:00  \_ [kworker/2:3]
    2 12528     0     0 ?           -1 S        0   0:00  \_ [kworker/0:2]
    0     1     1     1 ?           -1 Ss       0   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
    1   505   505   505 ?           -1 Ss       0   0:12 /usr/lib/systemd/systemd-journald
    1   533   533   533 ?           -1 Ss       0   0:00 /usr/sbin/lvmetad -f
    1   541   541   541 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
    1   654   654   654 ?           -1 S<sl     0   0:00 /sbin/auditd
    1   682   682   682 ?           -1 Ss       0   0:00 /usr/sbin/irqbalance --foreground
    1   683   683   683 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-logind
    1   684   684   684 ?           -1 Ssl      0   0:16 /usr/sbin/rsyslogd -n
    1   685   685   685 ?           -1 Ssl    999   0:00 /usr/lib/polkit-1/polkitd --no-debug
    1   687   687   687 ?           -1 Ss      81   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
    1   703   703   703 ?           -1 Ssl      0   0:00 /usr/sbin/gssproxy -D
    1   715   715   715 ?           -1 Ssl      0   0:00 /usr/sbin/NetworkManager --no-daemon
    1   953   953   953 ?           -1 Ssl      0   0:00 /usr/bin/python -Es /usr/sbin/tuned -l -P
    1   961   961   961 ?           -1 Ss       0   0:00 /usr/sbin/sshd -D
  961 11768 11768 11768 ?           -1 Ss       0   0:00  \_ sshd: root@pts/0
11768 11770 11770 11770 pts/0    11770 Ss+      0   0:00  |   \_ -bash
  961 12423 12423 12423 ?           -1 Ss       0   0:00  \_ sshd: root@pts/1
12423 12425 12425 12425 pts/1    12445 Ss       0   0:00  |   \_ -bash
12425 12445 12445 12425 pts/1    12445 S+       0   0:00  |       \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
12445 12447 12445 12425 pts/1    12445 D+       0   0:01  |           \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
12447 12448 12445 12425 pts/1    12445 S+       0   0:00  |               \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
  961 12568 12568 12568 ?           -1 Ss       0   0:00  \_ sshd: root@pts/2
12568 12571 12571 12571 pts/2    12594 Ss       0   0:00      \_ -bash
12571 12594 12594 12571 pts/2    12594 R+       0   0:00          \_ ps axjf
    1   986   986   986 ?           -1 Ss      27   0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
  986  1293   986   986 ?           -1 Sl      27   0:00  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log
    1  1021  1021  1021 ?           -1 Ss       0   0:00 /usr/sbin/httpd -DFOREGROUND
 1021  2826  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2828  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2829  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2832  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2835  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  4799  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  8606  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
    1  1040  1040  1040 ?           -1 Ss       0   0:00 /usr/sbin/crond -n
 1040 12330  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12330 12350 12350 12350 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
12350 12360 12350 12350 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 12332  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12332 12345 12345 12345 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
12345 12358 12345 12345 ?           -1 S     1001   0:00  |       \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
12358 12367 12345 12345 ?           -1 S     1001   0:00  |           \_ ftp -p -n -v spaceweather.uma.es
 1040 12481  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12481 12504 12504 12504 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
12504 12512 12504 12504 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 12483  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12483 12498 12498 12498 ?           -1 Ss    1001   0:00      \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
12498 12508 12498 12498 ?           -1 S     1001   0:00          \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
12508 12515 12498 12498 ?           -1 S     1001   0:00              \_ ftp -p -n -v spaceweather.uma.es
    1  1121  1121  1121 tty1      1121 Ss+      0   0:00 /sbin/agetty --noclear tty1 linux
    1  1365  1365  1365 ?           -1 Ss     998   0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    1  1620  1620  1620 ?           -1 Ss       0   0:00 /usr/libexec/postfix/master -w
 1620  1687  1620  1620 ?           -1 S       89   0:00  \_ pickup -l -t unix -u
 1620  1688  1620  1620 ?           -1 S       89   0:00  \_ qmgr -l -t unix -u
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# less /var/log/messages
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ls -la /hesperiamount2/is
<session hung>

[New Terminal Session Follows]

[root@hesperia1 ~]# top

top - 14:10:09 up 11 min,  3 users,  load average: 3.44, 2.27, 1.11
Tasks: 157 total,   3 running, 153 sleeping,   0 stopped,   1 zombie
%Cpu(s): 48.6 us,  0.9 sy,  0.0 ni, 50.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.1 st
KiB Mem :  3881424 total,  2761644 free,   439432 used,   680348 buff/cache
KiB Swap:  4063228 total,  4063228 free,        0 used.  2879808 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                     
13508 release1  20   0  382188  94528   7760 R  99.7  2.4   0:06.70 python                                                                                      
13517 release1  20   0  376016  88396   7760 R  99.7  2.3   0:06.95 python                                                                                      
    1 root      20   0  125408   3840   2420 S   0.3  0.1   0:01.78 systemd                                                                                     
    9 root      20   0       0      0      0 S   0.3  0.0   0:00.51 rcu_sched                                                                                   
  715 root      20   0  469628   8676   6428 S   0.3  0.2   0:00.24 NetworkManager                                                                              
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd                                                                                    
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0                                                                                 
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                                                
    6 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u8:0                                                                                
    7 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/0                                                                                 
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                                                      
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/0                                                                                  
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/1                                                                                  
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/1                                                                                 
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.01 ksoftirqd/1                                                                                 
   15 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H                                                                                
   16 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/2                                                                                  
   17 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/2                                                                                 
   18 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/2                                                                                 
   20 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H                                                                                
   21 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/3                                                                                  
   22 root      rt   0       0      0      0 S   0.0  0.0   0:00.05 migration/3                                                                                 
   23 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/3                                                                                 
   25 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/3:0H                                                                                
   27 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs                                                                                   
   28 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 netns                                                                                       
   29 root      20   0       0      0      0 S   0.0  0.0   0:00.00 khungtaskd                                                                                  
   30 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 writeback                                                                                   
   31 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kintegrityd                                                                                 
   32 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 bioset                                                                                      
   33 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kblockd                                                                                     
   34 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 md                                                                                          
   35 root      20   0       0      0      0 S   0.0  0.0   0:00.15 kworker/0:1                                                                                 
   36 root      20   0       0      0      0 S   0.0  0.0   0:00.10 kworker/1:1                                                                                 
   37 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kworker/2:1                                                                                 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ps axjf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
    2     3     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/0]
    2     5     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:0H]
    2     6     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:0]
    2     7     0     0 ?           -1 S        0   0:00  \_ [migration/0]
    2     8     0     0 ?           -1 S        0   0:00  \_ [rcu_bh]
    2     9     0     0 ?           -1 S        0   0:00  \_ [rcu_sched]
    2    10     0     0 ?           -1 S        0   0:00  \_ [watchdog/0]
    2    11     0     0 ?           -1 S        0   0:00  \_ [watchdog/1]
    2    12     0     0 ?           -1 S        0   0:00  \_ [migration/1]
    2    13     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/1]
    2    15     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:0H]
    2    16     0     0 ?           -1 S        0   0:00  \_ [watchdog/2]
    2    17     0     0 ?           -1 S        0   0:00  \_ [migration/2]
    2    18     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/2]
    2    20     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:0H]
    2    21     0     0 ?           -1 S        0   0:00  \_ [watchdog/3]
    2    22     0     0 ?           -1 S        0   0:00  \_ [migration/3]
    2    23     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/3]
    2    25     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:0H]
    2    27     0     0 ?           -1 S        0   0:00  \_ [kdevtmpfs]
    2    28     0     0 ?           -1 S<       0   0:00  \_ [netns]
    2    29     0     0 ?           -1 S        0   0:00  \_ [khungtaskd]
    2    30     0     0 ?           -1 S<       0   0:00  \_ [writeback]
    2    31     0     0 ?           -1 S<       0   0:00  \_ [kintegrityd]
    2    32     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2    33     0     0 ?           -1 S<       0   0:00  \_ [kblockd]
    2    34     0     0 ?           -1 S<       0   0:00  \_ [md]
    2    35     0     0 ?           -1 S        0   0:00  \_ [kworker/0:1]
    2    36     0     0 ?           -1 S        0   0:00  \_ [kworker/1:1]
    2    37     0     0 ?           -1 S        0   0:00  \_ [kworker/2:1]
    2    38     0     0 ?           -1 S        0   0:00  \_ [kworker/3:1]
    2    40     0     0 ?           -1 S        0   0:00  \_ [kswapd0]
    2    41     0     0 ?           -1 SN       0   0:00  \_ [ksmd]
    2    42     0     0 ?           -1 SN       0   0:00  \_ [khugepaged]
    2    43     0     0 ?           -1 S<       0   0:00  \_ [crypto]
    2    51     0     0 ?           -1 S<       0   0:00  \_ [kthrotld]
    2    52     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:1]
    2    53     0     0 ?           -1 S<       0   0:00  \_ [kmpath_rdacd]
    2    54     0     0 ?           -1 S<       0   0:00  \_ [kpsmoused]
    2    55     0     0 ?           -1 S<       0   0:00  \_ [ipv6_addrconf]
    2    74     0     0 ?           -1 S<       0   0:00  \_ [deferwq]
    2   106     0     0 ?           -1 S        0   0:00  \_ [kauditd]
    2   286     0     0 ?           -1 S<       0   0:00  \_ [ata_sff]
    2   300     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_0]
    2   301     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_0]
    2   302     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_1]
    2   303     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_1]
    2   304     0     0 ?           -1 S<       0   0:00  \_ [ttm_swap]
    2   356     0     0 ?           -1 S        0   0:00  \_ [kworker/3:2]
    2   357     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:1H]
    2   399     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   400     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   411     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   412     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   425     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   426     0     0 ?           -1 S<       0   0:00  \_ [xfsalloc]
    2   427     0     0 ?           -1 S<       0   0:00  \_ [xfs_mru_cache]
    2   428     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-0]
    2   429     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-0]
    2   430     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-0]
    2   431     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-0]
    2   432     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   433     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-0]
    2   434     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   435     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-0]
    2   542     0     0 ?           -1 S<       0   0:00  \_ [rpciod]
    2   543     0     0 ?           -1 S<       0   0:00  \_ [xprtiod]
    2   592     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:1H]
    2   602     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/vda1]
    2   603     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/vda1]
    2   604     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/vda1]
    2   605     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/vda1]
    2   606     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/vda]
    2   607     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/vda1]
    2   608     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/v]
    2   609     0     0 ?           -1 S        0   0:00  \_ [xfsaild/vda1]
    2   611     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:1H]
    2   614     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   615     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   622     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-2]
    2   623     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-2]
    2   624     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-2]
    2   625     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-2]
    2   626     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   627     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-2]
    2   628     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   629     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-2]
    2   782     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:1H]
    2   976     0     0 ?           -1 S<       0   0:00  \_ [nfsiod]
    2 12394     0     0 ?           -1 S        0   0:00  \_ [kworker/2:3]
    2 12528     0     0 ?           -1 S        0   0:00  \_ [kworker/0:2]
    2 12770     0     0 ?           -1 S        0   0:00  \_ [kworker/1:0]
    2 12925     0     0 ?           -1 S        0   0:00  \_ [kworker/1:3]
    2 12940     0     0 ?           -1 S        0   0:00  \_ [kworker/3:0]
    2 13265     0     0 ?           -1 S        0   0:00  \_ [kworker/2:0]
    2 13745     0     0 ?           -1 S        0   0:00  \_ [kworker/3:3]
    2 13902     0     0 ?           -1 S        0   0:00  \_ [kworker/0:0]
    0     1     1     1 ?           -1 Ss       0   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
    1   505   505   505 ?           -1 Ss       0   0:12 /usr/lib/systemd/systemd-journald
    1   533   533   533 ?           -1 Ss       0   0:00 /usr/sbin/lvmetad -f
    1   541   541   541 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
    1   654   654   654 ?           -1 S<sl     0   0:00 /sbin/auditd
    1   682   682   682 ?           -1 Ss       0   0:00 /usr/sbin/irqbalance --foreground
    1   683   683   683 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-logind
    1   684   684   684 ?           -1 Ssl      0   0:16 /usr/sbin/rsyslogd -n
    1   685   685   685 ?           -1 Ssl    999   0:00 /usr/lib/polkit-1/polkitd --no-debug
    1   687   687   687 ?           -1 Ss      81   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
    1   703   703   703 ?           -1 Ssl      0   0:00 /usr/sbin/gssproxy -D
    1   715   715   715 ?           -1 Ssl      0   0:00 /usr/sbin/NetworkManager --no-daemon
    1   953   953   953 ?           -1 Ssl      0   0:00 /usr/bin/python -Es /usr/sbin/tuned -l -P
    1   961   961   961 ?           -1 Ss       0   0:00 /usr/sbin/sshd -D
  961 12568 12568 12568 ?           -1 Ss       0   0:00  \_ sshd: root@pts/2
12568 12571 12571 12571 pts/2    12571 Ds+      0   0:00  |   \_ -bash
  961 12817 12817 12817 ?           -1 Ss       0   0:00  \_ sshd: root@pts/3
12817 12819 12819 12819 pts/3    12945 Ss       0   0:00  |   \_ -bash
12819 12945 12945 12819 pts/3    12945 D+       0   0:00  |       \_ ls --color=auto -la /hesperiamount2/
  961 13072 13072 13072 ?           -1 Ss       0   0:00  \_ sshd: root@pts/4
13072 13101 13101 13101 pts/4    13946 Ss       0   0:00      \_ -bash
13101 13946 13946 13101 pts/4    13946 R+       0   0:00          \_ ps axjf
    1   986   986   986 ?           -1 Ss      27   0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
  986  1293   986   986 ?           -1 Sl      27   0:01  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log
    1  1021  1021  1021 ?           -1 Ss       0   0:00 /usr/sbin/httpd -DFOREGROUND
 1021  2826  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2828  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2829  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2832  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2835  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  4799  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  8606  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
    1  1040  1040  1040 ?           -1 Ss       0   0:00 /usr/sbin/crond -n
 1040 13678  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13678 13697 13697 13697 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
13697 13710 13697 13697 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 13680  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13680 13699 13699 13699 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
13699 13702 13699 13699 ?           -1 S     1001   0:00  |       \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
13702 13703 13699 13699 ?           -1 S     1001   0:00  |           \_ ftp -p -n -v spaceweather.uma.es
 1040 13853  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13853 13876 13876 13876 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
13876 13886 13876 13876 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 13855  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13855 13864 13864 13864 ?           -1 Ss    1001   0:00      \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
13864 13885 13864 13864 ?           -1 S     1001   0:00          \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
13885 13888 13864 13864 ?           -1 S     1001   0:00              \_ ftp -p -n -v spaceweather.uma.es
    1  1121  1121  1121 tty1      1121 Ss+      0   0:00 /sbin/agetty --noclear tty1 linux
    1  1365  1365  1365 ?           -1 Ss     998   0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    1  1620  1620  1620 ?           -1 Ss       0   0:00 /usr/libexec/postfix/master -w
 1620  1687  1620  1620 ?           -1 S       89   0:00  \_ pickup -l -t unix -u
 1620  1688  1620  1620 ?           -1 S       89   0:00  \_ qmgr -l -t unix -u
    1 12447 12445 12425 ?           -1 D        0   0:01 rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
12447 12448 12445 12425 ?           -1 Z        0   0:00  \_ [rsync] <defunct>
    1 13736 13696 13696 ?           -1 S     1001   0:00 python umasep500_1_minute
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ps -l 12447
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY        TIME CMD
1 D     0 12447     1  0  80   0 - 29685 rpc_wa ?          0:01 rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1

[root@hesperia1 log]# cat /proc/self/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=1929660k,nr_inodes=482415,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
/dev/mapper/centos-root / xfs rw,relatime,attr2,inode64,noquota 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12052 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
/dev/mapper/vg2-lv1 /hesperiamount xfs rw,relatime,attr2,inode64,noquota 0 0
/dev/vda1 /boot xfs rw,relatime,attr2,inode64,noquota 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
10.201.40.34:/data/col1/noc-bkups-1 /mnt/dd2500-1 nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34 0 0
10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34 0 0
tmpfs /run/user/998 tmpfs rw,nosuid,nodev,relatime,size=388144k,mode=700,uid=998,gid=997 0 0
tmpfs /run/user/1001 tmpfs rw,nosuid,nodev,relatime,size=388144k,mode=700,uid=1001,gid=1002 0 0
tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=388144k,mode=700 0 0

[root@hesperia1 log]# showmount --all
clnt_create: RPC: Program not registered

[root@hesperia1 log]# mount -l -t nfs
10.201.40.34:/data/col1/noc-bkups-1 on /mnt/dd2500-1 type nfs (rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34)
10.201.40.34:/data/col1/hesperia-mount on /hesperiamount2 type nfs (rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34)

[root@hesperia1 log]# showmount -e 10.201.40.34
Export list for 10.201.40.34:
/data/col1/hesperia-mount 195.251.204.197
/data/col1/noc-bkups-1    195.251.204.192/28

Comment 4 Nikolaos Milas 2017-10-04 12:07:59 UTC
Problem solved by changing the NFS Export Options (of the NFS shared directory, at the data storage system) from secure to insecure. That is, I changed from:

    rw,no_root_squash,no_all_squash,secure,nolog

to:

    rw,no_root_squash,no_all_squash,insecure,nolog

I don't know if the behavior I had described can be explained/expected by using the "secure" option, but after I changed to "insecure" everything works fine, using the latest packages - latest kernel and latest rpms on CentOS 7.4 (3.10.0-693.2.2.el7.x86_64 and rpcbind-0.2.0-42.el7.x86_64).

I can't tell whether this issue needs further examination and/or source code changes/improvements.

Comment 5 Nikolaos Milas 2017-10-06 09:58:38 UTC
The problem, after a couple of days, started occurring again, so the above setting evidently did not resolve the issue in the end. 

Here is a test performed today (2017-10-06), for which I am attaching a TCPdump between the box under investigation and the storage server (which exports directories).

I have booted using kernel 3.10.0-693.2.2.el7.x86_64 with debugging.

I attach a TCP dump for this session (recorded using the command you see at Terminal Window 1 below), named hesperia-nfs-003.zip

I also attach the messages log for the session (hesperia-messages-20171006-01.txt).

The nfs mounts in /etc/fstab are as follows: 

----------------------------------------------------------
/etc/fstab:
-----------

[root@hesperia1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Mon Jul  6 14:29:42 2015
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=7a3ae70a-8ef3-463b-8f5b-be4e2e7be894 /boot                   xfs     defaults        0 0
/dev/mapper/centos-swap swap                    swap    defaults        0 0
/dev/mapper/vg2-lv1     /hesperiamount          xfs     defaults        0 0
#
10.201.40.34:/data/col1/noc-bkups-1   /mnt/dd2500-1   nfs hard,intr,nolock,nfsvers=3,tcp,rsize=1048600,wsize=1048600,bg 0 0
10.201.40.34:/data/col1/hesperia-mount   /hesperiamount2   nfs hard,intr,nolock,nfsvers=3,tcp,rsize=1048600,wsize=1048600,bg 0 0
#
# 10.201.40.34:/data/col1/noc-bkups-1   /mnt/dd2500-1   nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
# 10.201.40.34:/data/col1/hesperia-mount   /hesperiamount2   nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
----------------------------------------------------------

As you can see below, I run the rsync command, and a bit later all sessions hang.

----------------------------------------------------------
Terminal Window 1
-----------------

[root@hesperia1 ~]# rpcdebug -v -m rpc -s all
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache

Module     Valid flags
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache
[root@hesperia1 ~]# rpcdebug -v -m nfs -s all
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state

Module     Valid flags
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# tcpdump -w dumps/hesperia-nfs-003 -i eth0 -s 0 host 10.201.40.34 &
[1] 1608
[root@hesperia1 ~]# tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
<Later...>
[root@hesperia1 ~]# Disconnecting: Timeout, server not responding.
----------------------------------------------------------

----------------------------------------------------------
Terminal Window 2
-----------------

[root@hesperia1 ~]# rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
sending incremental file list
RELEASE_ALERT_IMAGES/release_alert_merged_plots.png
      315851 100%    8.44MB/s    0:00:00 (xfer#1, to-check=1062/1153)

<Later...>
Disconnecting: Timeout, server not responding.
---------------------------------------------------------

----------------------------------------------------------
Terminal Window 3
-----------------

[root@hesperia1 ~]# ps axjf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
    2     3     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/0]
    2     4     0     0 ?           -1 S        0   0:00  \_ [kworker/0:0]
    2     5     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:0H]
    2     6     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:0]
    2     7     0     0 ?           -1 S        0   0:00  \_ [migration/0]
    2     8     0     0 ?           -1 S        0   0:00  \_ [rcu_bh]
    2     9     0     0 ?           -1 S        0   0:00  \_ [rcu_sched]
    2    10     0     0 ?           -1 S        0   0:00  \_ [watchdog/0]
    2    11     0     0 ?           -1 S        0   0:00  \_ [watchdog/1]
    2    12     0     0 ?           -1 S        0   0:00  \_ [migration/1]
    2    13     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/1]
    2    14     0     0 ?           -1 S        0   0:00  \_ [kworker/1:0]
    2    15     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:0H]
    2    16     0     0 ?           -1 S        0   0:00  \_ [watchdog/2]
    2    17     0     0 ?           -1 S        0   0:00  \_ [migration/2]
    2    18     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/2]
    2    19     0     0 ?           -1 S        0   0:00  \_ [kworker/2:0]
    2    20     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:0H]
    2    21     0     0 ?           -1 S        0   0:00  \_ [watchdog/3]
    2    22     0     0 ?           -1 S        0   0:00  \_ [migration/3]
    2    23     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/3]
    2    24     0     0 ?           -1 S        0   0:00  \_ [kworker/3:0]
    2    25     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:0H]
    2    27     0     0 ?           -1 S        0   0:00  \_ [kdevtmpfs]
    2    28     0     0 ?           -1 S<       0   0:00  \_ [netns]
    2    29     0     0 ?           -1 S        0   0:00  \_ [khungtaskd]
    2    30     0     0 ?           -1 S<       0   0:00  \_ [writeback]
    2    31     0     0 ?           -1 S<       0   0:00  \_ [kintegrityd]
    2    32     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2    33     0     0 ?           -1 S<       0   0:00  \_ [kblockd]
    2    34     0     0 ?           -1 S<       0   0:00  \_ [md]
    2    35     0     0 ?           -1 S        0   0:00  \_ [kworker/0:1]
    2    36     0     0 ?           -1 S        0   0:00  \_ [kworker/1:1]
    2    37     0     0 ?           -1 S        0   0:00  \_ [kworker/2:1]
    2    38     0     0 ?           -1 S        0   0:00  \_ [kworker/3:1]
    2    40     0     0 ?           -1 S        0   0:00  \_ [kswapd0]
    2    41     0     0 ?           -1 SN       0   0:00  \_ [ksmd]
    2    42     0     0 ?           -1 SN       0   0:00  \_ [khugepaged]
    2    43     0     0 ?           -1 S<       0   0:00  \_ [crypto]
    2    51     0     0 ?           -1 S<       0   0:00  \_ [kthrotld]
    2    52     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:1]
    2    53     0     0 ?           -1 S<       0   0:00  \_ [kmpath_rdacd]
    2    54     0     0 ?           -1 S<       0   0:00  \_ [kpsmoused]
    2    55     0     0 ?           -1 S<       0   0:00  \_ [ipv6_addrconf]
    2    74     0     0 ?           -1 S<       0   0:00  \_ [deferwq]
    2   106     0     0 ?           -1 S        0   0:00  \_ [kworker/3:2]
    2   107     0     0 ?           -1 S        0   0:00  \_ [kauditd]
    2   226     0     0 ?           -1 S        0   0:00  \_ [kworker/0:2]
    2   288     0     0 ?           -1 S<       0   0:00  \_ [ata_sff]
    2   296     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_0]
    2   299     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_0]
    2   300     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_1]
    2   301     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_1]
    2   303     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:2]
    2   304     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:3]
    2   305     0     0 ?           -1 S<       0   0:00  \_ [ttm_swap]
    2   316     0     0 ?           -1 S        0   0:00  \_ [kworker/1:2]
    2   320     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:1H]
    2   331     0     0 ?           -1 S        0   0:00  \_ [kworker/2:2]
    2   400     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   401     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   412     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   413     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   426     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   427     0     0 ?           -1 S<       0   0:00  \_ [xfsalloc]
    2   428     0     0 ?           -1 S<       0   0:00  \_ [xfs_mru_cache]
    2   429     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-0]
    2   430     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-0]
    2   431     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-0]
    2   432     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-0]
    2   433     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   434     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-0]
    2   435     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   436     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-0]
    2   538     0     0 ?           -1 S<       0   0:00  \_ [rpciod]
    2   539     0     0 ?           -1 S<       0   0:00  \_ [xprtiod]
    2   596     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:1H]
    2   597     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/vda1]
    2   598     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/vda1]
    2   599     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/vda1]
    2   600     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/vda1]
    2   601     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/vda]
    2   602     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/vda1]
    2   603     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/v]
    2   604     0     0 ?           -1 S        0   0:00  \_ [xfsaild/vda1]
    2   607     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:1H]
    2   608     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:1H]
    2   612     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   613     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   620     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-2]
    2   621     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-2]
    2   622     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-2]
    2   623     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-2]
    2   624     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   625     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-2]
    2   626     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   627     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-2]
    2   963     0     0 ?           -1 S<       0   0:00  \_ [nfsiod]
    2  1785     0     0 ?           -1 S        0   0:00  \_ [kworker/3:3]
    0     1     1     1 ?           -1 Ss       0   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 20
    1   506   506   506 ?           -1 Rs       0   0:56 /usr/lib/systemd/systemd-journald
    1   541   541   541 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
    1   549   549   549 ?           -1 Ss       0   0:00 /usr/sbin/lvmetad -f
    1   652   652   652 ?           -1 S<sl     0   0:00 /sbin/auditd
    1   675   675   675 ?           -1 Ssl    999   0:00 /usr/lib/polkit-1/polkitd --no-debug
    1   677   677   677 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-logind
    1   679   679   679 ?           -1 Ssl      0   1:03 /usr/sbin/rsyslogd -n
    1   684   684   684 ?           -1 Ss       0   0:00 /usr/sbin/irqbalance --foreground
    1   686   686   686 ?           -1 Ss      81   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --sys
    1   705   705   705 ?           -1 Ssl      0   0:00 /usr/sbin/gssproxy -D
    1   713   713   713 ?           -1 Ssl      0   0:00 /usr/sbin/NetworkManager --no-daemon
    1   943   943   943 ?           -1 Ssl      0   0:00 /usr/bin/python -Es /usr/sbin/tuned -l -P
    1   957   957   957 ?           -1 Ss       0   0:00 /usr/sbin/sshd -D
  957  1507  1507  1507 ?           -1 Ss       0   0:00  \_ sshd: root@pts/0
 1507  1510  1510  1510 pts/0     1510 Ss+      0   0:00  |   \_ -bash
 1510  1608  1608  1510 pts/0     1510 S       72   0:00  |       \_ tcpdump -w dumps/hesperia-nfs-003 -i eth0 -s 0 host 10.201.
  957  1655  1655  1655 ?           -1 Ss       0   0:00  \_ sshd: root@pts/1
 1655  1658  1658  1658 pts/1     1688 Ss       0   0:00  |   \_ -bash
 1658  1688  1688  1658 pts/1     1688 D+       0   0:01  |       \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ 
 1688  1689  1688  1658 pts/1     1688 S+       0   0:07  |           \_ rsync -azv --del --stats --progress /hesperiamount/isne
 1689  1690  1688  1658 pts/1     1688 S+       0   0:00  |               \_ rsync -azv --del --stats --progress /hesperiamount/
  957  1803  1803  1803 ?           -1 Ss       0   0:00  \_ sshd: root@pts/2
 1803  1806  1806  1806 pts/2     1833 Ss       0   0:00      \_ -bash
 1806  1833  1833  1806 pts/2     1833 R+       0   0:00          \_ ps axjf
    1   974   974   974 ?           -1 Ss       0   0:00 /usr/sbin/httpd -DFOREGROUND
  974  1448   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1449   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1450   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1451   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1452   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1485   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1541   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
    1   978   978   978 ?           -1 Ss       0   0:00 /usr/sbin/crond -n
  978  1559   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1559  1565  1565  1565 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /d
 1565  1582  1565  1565 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_
  978  1561   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1561  1569  1569  1569 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperia
 1569  1579  1569  1569 ?           -1 S     1001   0:00  |       \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
 1579  1594  1569  1569 ?           -1 S     1001   0:00  |           \_ ftp -p -n -v spaceweather.uma.es
  978  1724   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1724  1740  1740  1740 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /d
 1740  1750  1740  1740 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_
  978  1726   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1726  1742  1742  1742 ?           -1 Ss    1001   0:00      \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperia
 1742  1744  1742  1742 ?           -1 S     1001   0:00          \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
 1744  1751  1742  1742 ?           -1 S     1001   0:00              \_ ftp -p -n -v spaceweather.uma.es
    1  1011  1011  1011 tty1      1011 Ss+      0   0:00 /sbin/agetty --noclear tty1 linux
    1  1040  1040  1040 ?           -1 Ss      27   0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
 1040  1312  1040  1040 ?           -1 Sl      27   0:00  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-d
    1  1229  1229  1229 ?           -1 Ss     998   0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    1  1358  1358  1358 ?           -1 Ss       0   0:00 /usr/libexec/postfix/master -w
 1358  1360  1358  1358 ?           -1 S       89   0:00  \_ pickup -l -t unix -u
 1358  1361  1358  1358 ?           -1 S       89   0:00  \_ qmgr -l -t unix -u
    1  1607  1576  1576 ?           -1 S     1001   0:00 python umasep500_1_minute
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ls -la /hesperiamount2/
total 12
drwxrwxrwx   6 root     root   274 Mar  2  2017 .
dr-xr-xr-x. 22 root     root  4096 Sep 22 06:50 ..
drwxr-xr-x  16 isnet1   isnet 3604 Apr 26 15:26 isnet1
drwxr-xr-x   3 root     root   153 Mar  2  2017 ocloud_store
drwxrwxr-x  19 release1 isnet 1660 Feb 28  2017 release1
drwxrwxrwx   6 root     root   457 Oct  6 08:31 .snapshot
[root@hesperia1 ~]# 

<Session hangs - Much later...>

[root@hesperia1 ~]# Disconnecting: Timeout, server not responding.
----------------------------------------------------------

In any new terminal window (SSH Session) that I open, if I attempt to list the mounted directory, the session hangs:

----------------------------------------------------------
Terminal Window 4
-----------------

[root@hesperia1 ~]# ls -la /hesperiamount2

<hangs forever>
----------------------------------------------------------

What is being wrong?

Comment 6 Nikolaos Milas 2017-10-06 10:05:46 UTC
Created attachment 1335178 [details]
TCP Dump between the box and the NFS server - Test on 2017-10-06

The TCP dump records packets during the test performed on 2017-10-06; please see the associated report below

Comment 7 Nikolaos Milas 2017-10-06 10:08:02 UTC
Created attachment 1335179 [details]
/var/log/messages file for the period that the test on 2017-10-06 was performed

Comment 8 Nikolaos Milas 2017-10-20 08:33:40 UTC
The problem was finally traced down to a Cisco ASA bug (this firewall device lies between the connected networks); bug CSCuq80704 was resolved by an ASA software update.

NFS packets were incorrectly being dropped by ASA: 

Drop-reason: (tcp-paws-fail) TCP packet failed PAWS test

...and were causing nfs traffic to stall. After ASA software upgrade the problem has not occurred again.

I can't tell why this was not happening for many months, but only lately.

I think this case may be closed.