RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1494834 - NFS gets hung after upgrade to 7.4 (CentOS)
Summary: NFS gets hung after upgrade to 7.4 (CentOS)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-23 12:35 UTC by Nikolaos Milas
Modified: 2017-10-25 13:00 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-25 13:00:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Related excerpt from /var/log/messages (60.88 KB, text/plain)
2017-09-23 12:35 UTC, Nikolaos Milas
no flags Details
Full messages file after reboot, including nfs mounts in /etc/fstab (3.11 MB, text/plain)
2017-09-30 15:18 UTC, Nikolaos Milas
no flags Details
TCP Dump between the box and the NFS server - Test on 2017-10-06 (13.45 MB, application/zip)
2017-10-06 10:05 UTC, Nikolaos Milas
no flags Details
/var/log/messages file for the period that the test on 2017-10-06 was performed (262.42 KB, text/plain)
2017-10-06 10:08 UTC, Nikolaos Milas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
CentOS 13891 0 None None None 2017-09-23 12:35:24 UTC

Description Nikolaos Milas 2017-09-23 12:35:25 UTC
Created attachment 1329921 [details]
Related excerpt from /var/log/messages

Description of problem:

After an upgrade to 7.4 (which, among several hundred updates, includes rpcbind-0.2.0-42.el7.x86_64) we have started having NFS issues: NFS communication hungs. In /var/log/messages:

-----------------------------------------------------------------------------------------
...
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support.
Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996 okir.de).
Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
...
Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1...
Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart...
Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting
Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart.
Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon.
Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded
Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs 'nfs' registered for caching
Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1.
Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems.
Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
...
Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
...
Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
...
-----------------------------------------------------------------------------------------

We tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it didn't help. (This solved a problem earlier, caused by https://bugzilla.redhat.com/show_bug.cgi?id=1454876). 

We have confirmed the above behavior multiple times.

We can mount either directly:

  mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 -t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2

or through /etc/fstab:

  10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0

The box may even hung during reboot, which has never happened in the past and it needs a hard reboot (via VM admin console) to boot again.

I have confirmed the above behavior multiple times.


Version-Release number of selected component (if applicable):

CentOS Linux release 7.4.1708 (Core)

# uname -r
3.10.0-693.2.2.el7.x86_64
# rpm -qa | grep rpcbind
rpcbind-0.2.0-42.el7.x86_64
# rpm -qa | grep nfs
libnfsidmap-0.25-17.el7.x86_64
nfs-utils-1.3.0-0.48.el7.x86_64

How reproducible:

Always

Steps to Reproduce:

1. Mount NFS share directly or via /etc/fstab and try various operations. I worked with rsync and simple directory listing (ls).

Actual results:

The NFS mounted path becomes inaccessible and causes the SSH connection to hung too.

Expected results:

The mounted share should be fully accessible for directory and file use.

Additional info:

The remote system (NFS Server) publishing the share is an EMC DD2500 (supporting NFS v3).

I tried to debug rpc using rpcdebug. I set:

# rpcdebug -m rpc -s all
# rpcdebug -m nfs -s all

and then mounted the nfs share again. The mount worked fine (as always):

[root@hesperia1 ~]# mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 -t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2
mount.nfs: trying text-based options 'nolock,bg,nfsvers=3,intr,tcp,actimeo=1800,addr=10.201.40.34'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.201.40.34 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 10.201.40.34 prog 100005 vers 3 prot TCP port 2052

and then I tried to list the mounted directory, but in the process it failed (this is the actual problem):

[root@hesperia1 ~]# ls -la /hesperiamount <I did not finish typing and it hang.>
Disconnecting: Timeout, server not responding.

Among the logged output (in /var/log/messages, see below), I found many timeouts, mostly minor and a couple of major, probably causing all the issues I am facing. (I didn't have any such problems when running with CentOS 7.3 and rpcbind-0.2.0-38.el7.x86_64.)

Then, on a new SSH session:

[root@hesperia1 ~]# umount /hesperiamount2
umount.nfs: /hesperiamount2: device is busy
[root@hesperia1 ~]#
[root@hesperia1 ~]# umount /hesperiamount2
umount.nfs: /hesperiamount2: device is busy
[root@hesperia1 ~]#
...
[root@hesperia1 ~]# umount /hesperiamount2

I am attaching all the logged session (I have removed irrelevant messages) as recorded in /var/log/messages.

Note: Network reliability is quite high, as can be indicated e.g. by a nping:

--------------------------------------------------------------------------------------------------------------------------
[root@hesperia1 ~]# nping --tcp -c 200 -p 2049 10.201.40.34

Starting Nping 0.6.40 ( http://nmap.org/nping ) at 2017-09-23 09:18 UTC
SENT (0.0160s) TCP 195.251.204.197:52359 > 10.201.40.34:2049 S ttl=64 id=50655 iplen=40 seq=1643036899 win=1480
RCVD (0.0185s) TCP 10.201.40.34:2049 > 195.251.204.197:52359 SA ttl=64 id=0 iplen=44 seq=4221839098 win=14600 <mss 1380>
...
SENT (199.3056s) TCP 195.251.204.197:52359 > 10.201.40.34:2049 S ttl=64 id=50655 iplen=40 seq=1643036899 win=1480
RCVD (199.3079s) TCP 10.201.40.34:2049 > 195.251.204.197:52359 SA ttl=64 id=0 iplen=44 seq=3607435522 win=14600 <mss 1380>
 
Max rtt: 10.897ms | Min rtt: 2.085ms | Avg rtt: 2.372ms
Raw packets sent: 200 (8.000KB) | Rcvd: 200 (8.800KB) | Lost: 0 (0.00%)
Nping done: 1 IP address pinged in 199.32 seconds
--------------------------------------------------------------------------------------------------------------------------

Hence, timeouts are not caused by network performance, congestion, or other issues. 

Please, let me know of how to proceed with this.

This report is also at: https://bugs.centos.org/view.php?id=13891

Comment 2 Nikolaos Milas 2017-09-29 21:13:08 UTC
After further tests, it seems the problem occurs mainly when mounting at boot time (through /etc/fstab). 

I have managed to work successfully multiple times when mounting manually, but it is important for us to be able to mount the NFS share at boot time, through /etc/fstab. Mounting through /etc/fstab fails every time.

Please advise.

Comment 3 Nikolaos Milas 2017-09-30 15:18:04 UTC
Created attachment 1332697 [details]
Full messages file after reboot, including nfs mounts in /etc/fstab

Full output from /var/log/messages after reboot, when nfs mounts exist in /etc/fstab. 

The following actions were made during this period (notice how nfs hangs): 

[Parallel Session 1 right after boot (see parallel session 2 below)]

[root@hesperia1 ~]# df -h
Filesystem                              Size  Used Avail Use% Mounted on
/dev/mapper/centos-root                  46G  7.6G   39G  17% /
devtmpfs                                1.9G     0  1.9G   0% /dev
tmpfs                                   1.9G     0  1.9G   0% /dev/shm
tmpfs                                   1.9G  8.6M  1.9G   1% /run
tmpfs                                   1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/vg2-lv1                     100G   94G  6.2G  94% /hesperiamount
/dev/vda1                               497M  216M  282M  44% /boot
10.201.40.34:/data/col1/noc-bkups-1      11T  1.1T  9.4T  10% /mnt/dd2500-1
10.201.40.34:/data/col1/hesperia-mount   11T  1.1T  9.4T  10% /hesperiamount2
tmpfs                                   380M     0  380M   0% /run/user/998
tmpfs                                   380M     0  380M   0% /run/user/1001
tmpfs                                   380M     0  380M   0% /run/user/0

[root@hesperia1 ~]# ls -la /hesperiamount2/isnet1/
total 1315
drwxr-xr-x 16 isnet1 isnet   3604 Apr 26 15:26 .
drwxrwxrwx  6 root   root     274 Mar  2  2017 ..
drwxr-xr-x  5 isnet1 isnet    857 Jul 17 09:47 AGGELIS_DEBUG
-rwxr-xr-x  1 isnet1 isnet  15929 Mar 24  2017 alert_processes
-rw-r--r--  1 isnet1 isnet    240 Mar 24  2017 ALERT_PROCESSES.ini
-rw-------  1 isnet1 isnet 819474 Apr 26 15:47 .bash_history
-rw-r--r--  1 isnet1 isnet    220 Apr  9  2016 .bash_logout
-rw-r--r--  1 isnet1 isnet    193 Aug  2  2016 .bash_profile
-rw-r--r--  1 isnet1 isnet   3625 Apr 25  2016 .bashrc
-rw-r--r--  1 isnet1 isnet   3515 Apr  9  2016 .bashrc_old
-rwxr-xr-x  1 isnet1 isnet     42 Feb 17  2017 ChangeTonano
-rwxr-xr-x  1 isnet1 isnet  12010 Mar 24  2017 check_processes_work_well
-rw-r--r--  1 isnet1 isnet    138 Jan 15  2017 CHECK_PROCESSES_WORK_WELL.ini
-rwxr-xr-x  1 isnet1 isnet  10287 Feb  5  2017 check_processes_work_well.save
drwx------  4 isnet1 isnet    207 Feb 23  2017 .config
-rwxr-xr-x  1 isnet1 isnet    606 Mar 24  2017 cronreleaserealtime
-rwxr-xr-x  1 isnet1 isnet   1308 Mar 24  2017 cronumasep500
drwxr-xr-x  2 isnet1 isnet    412 Mar 26  2017 Database
-rwxr-xr-x  1 isnet1 isnet    266 Mar 24  2017 GetReleaseToLocalhost
-rwxr-xr-x  1 isnet1 isnet    346 Feb 17  2017 GetUmasepLastFile
-rwxr-xr-x  1 isnet1 isnet    211 Feb 13  2017 GetUmasepToLocalhost
-rwxr-xr-x  1 isnet1 isnet    117 Apr 26 15:05 GetUmasepToLocalhostHTTP
drwxr-xr-x  3 isnet1 isnet    164 Feb 19  2017 hesperiamount
-rwxr-xr-x  1 isnet1 isnet  12203 Mar 24  2017 kernel_email
-rw-r--r--  1 isnet1 isnet     76 Mar 24  2017 KERNEL_EMAIL.ini
-rw-r--r--  1 isnet1 isnet    172 Nov  3  2015 .kshrc
-rw-r--r--  1 isnet1 isnet 135581 Jun  1 16:58 Latest_SEP_500_estimations_2017_06_01.txt
-rw-------  1 isnet1 isnet     43 Jul 17 08:09 .lesshst
drwxr-xr-x  3 isnet1 isnet    155 Feb 23  2017 .local
drwxr-x---  2 isnet1 isnet  11237 Sep 29 04:02 log
drwx------  2 isnet1 isnet    101 Apr 27  2016 Mail
-rw-------  1 isnet1 isnet   7941 Apr 26 15:26 .mysql_history
-rw-------  1 isnet1 isnet     17 Feb 11  2017 .nano_history
-rw-------  1 isnet1 isnet 200281 Apr  8 14:57 nohup.out
-rw-r--r--  1 isnet1 isnet    675 Feb 23  2017 .profile
lrwxrwxrwx  1 root   root      23 Feb 20  2017 release -> /hesperiamount/release1
drwxr-xr-x  2 isnet1 isnet    353 Mar 24  2017 RELEASE_ALERT_IMAGES
-rwxr-xr-x  1 isnet1 isnet  18967 Mar 24  2017 release_epam_realtime
-rw-r--r--  1 isnet1 isnet    382 Feb 19  2017 RELEASE_EPAM_REALTIME.ini
-rwxr-xr-x  1 isnet1 isnet  18970 Mar 24  2017 release_ephin_realtime
-rw-r--r--  1 isnet1 isnet    382 Jan 16  2017 RELEASE_EPHIN_REALTIME.ini
drwxr-xr-x  6 isnet1 isnet    413 Jan 14  2017 release_local
drwxr-xr-x  2 isnet1 isnet    174 Feb 13  2017 RELEASE_realtime
-rwxr-xr-x  1 isnet1 isnet    237 Jan 19  2017 ReleaseToComp1
-rwxr-xr-x  1 isnet1 isnet     95 Jul  4  2016 sarlmove
-rw-r--r--  1 isnet1 isnet     66 Apr 25  2016 .selected_editor
-rwxr-xr-x  1 isnet1 isnet   1610 May 20 20:18 send_email
-rwxr-xr-x  1 isnet1 isnet   1433 Mar 24  2017 send_email.py
-rw-r--r--  1 isnet1 isnet   1222 Mar 24  2017 send_email.pyc
drwx------  2 isnet1 isnet    275 Apr 26  2016 .ssh
-rwxr-xr-x  1 isnet1 isnet  21019 Apr  8 14:34 umasep500_1_minute
-rw-r--r--  1 isnet1 isnet    253 Apr 26 14:53 UMASEP_500.ini
drwxr-xr-x  4 isnet1 isnet    207 Jan 10  2017 UMASEP_500MEV_IMAGES
drwxr-xr-x  2 isnet1 isnet  19063 Sep 17 00:02 UMASEP_realtime
-rwxr-xr-x  1 isnet1 isnet    107 Jun 22  2016 UmasepToComp1
drwxr-xr-x  2 isnet1 isnet    403 Apr  9 18:46 webform
-rw-------  1 isnet1 isnet    171 Feb 14  2017 .Xauthority

[root@hesperia1 ~]# rpcdebug -v -m rpc -s all
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache

Module     Valid flags
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache
[root@hesperia1 ~]# 
[root@hesperia1 ~]# rpcdebug -v -m nfs -s all
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state

Module     Valid flags
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# Disconnecting: Timeout, server not responding.
<session hung>

[Parallel Session 2 (right after boot) on another terminal]

[root@hesperia1 ~]# rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
sending incremental file list
RELEASE_ALERT_IMAGES/release_alert_merged_plots.png
      310091 100%    8.82MB/s    0:00:00 (xfer#1, to-check=1062/1153)
Disconnecting: Timeout, server not responding.



[New Terminal Session Follows, when the above hung, but before they display "Timeout, server not responding"]

[root@hesperia1 ~]# ps axjf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
    2     3     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/0]
    2     4     0     0 ?           -1 S        0   0:00  \_ [kworker/0:0]
    2     5     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:0H]
    2     6     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:0]
    2     7     0     0 ?           -1 S        0   0:00  \_ [migration/0]
    2     8     0     0 ?           -1 S        0   0:00  \_ [rcu_bh]
    2     9     0     0 ?           -1 S        0   0:00  \_ [rcu_sched]
    2    10     0     0 ?           -1 S        0   0:00  \_ [watchdog/0]
    2    11     0     0 ?           -1 S        0   0:00  \_ [watchdog/1]
    2    12     0     0 ?           -1 S        0   0:00  \_ [migration/1]
    2    13     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/1]
    2    14     0     0 ?           -1 S        0   0:00  \_ [kworker/1:0]
    2    15     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:0H]
    2    16     0     0 ?           -1 S        0   0:00  \_ [watchdog/2]
    2    17     0     0 ?           -1 S        0   0:00  \_ [migration/2]
    2    18     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/2]
    2    19     0     0 ?           -1 S        0   0:00  \_ [kworker/2:0]
    2    20     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:0H]
    2    21     0     0 ?           -1 S        0   0:00  \_ [watchdog/3]
    2    22     0     0 ?           -1 S        0   0:00  \_ [migration/3]
    2    23     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/3]
    2    24     0     0 ?           -1 S        0   0:00  \_ [kworker/3:0]
    2    25     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:0H]
    2    27     0     0 ?           -1 S        0   0:00  \_ [kdevtmpfs]
    2    28     0     0 ?           -1 S<       0   0:00  \_ [netns]
    2    29     0     0 ?           -1 S        0   0:00  \_ [khungtaskd]
    2    30     0     0 ?           -1 S<       0   0:00  \_ [writeback]
    2    31     0     0 ?           -1 S<       0   0:00  \_ [kintegrityd]
    2    32     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2    33     0     0 ?           -1 S<       0   0:00  \_ [kblockd]
    2    34     0     0 ?           -1 S<       0   0:00  \_ [md]
    2    35     0     0 ?           -1 S        0   0:00  \_ [kworker/0:1]
    2    36     0     0 ?           -1 S        0   0:00  \_ [kworker/1:1]
    2    37     0     0 ?           -1 S        0   0:00  \_ [kworker/2:1]
    2    38     0     0 ?           -1 S        0   0:00  \_ [kworker/3:1]
    2    40     0     0 ?           -1 S        0   0:00  \_ [kswapd0]
    2    41     0     0 ?           -1 SN       0   0:00  \_ [ksmd]
    2    42     0     0 ?           -1 SN       0   0:00  \_ [khugepaged]
    2    43     0     0 ?           -1 S<       0   0:00  \_ [crypto]
    2    51     0     0 ?           -1 S<       0   0:00  \_ [kthrotld]
    2    52     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:1]
    2    53     0     0 ?           -1 S<       0   0:00  \_ [kmpath_rdacd]
    2    54     0     0 ?           -1 S<       0   0:00  \_ [kpsmoused]
    2    55     0     0 ?           -1 S<       0   0:00  \_ [ipv6_addrconf]
    2    74     0     0 ?           -1 S<       0   0:00  \_ [deferwq]
    2   106     0     0 ?           -1 S        0   0:00  \_ [kauditd]
    2   286     0     0 ?           -1 S<       0   0:00  \_ [ata_sff]
    2   300     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_0]
    2   301     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_0]
    2   302     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_1]
    2   303     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_1]
    2   304     0     0 ?           -1 S<       0   0:00  \_ [ttm_swap]
    2   356     0     0 ?           -1 S        0   0:00  \_ [kworker/3:2]
    2   357     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:1H]
    2   359     0     0 ?           -1 S        0   0:00  \_ [kworker/2:2]
    2   399     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   400     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   411     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   412     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   425     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   426     0     0 ?           -1 S<       0   0:00  \_ [xfsalloc]
    2   427     0     0 ?           -1 S<       0   0:00  \_ [xfs_mru_cache]
    2   428     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-0]
    2   429     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-0]
    2   430     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-0]
    2   431     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-0]
    2   432     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   433     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-0]
    2   434     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   435     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-0]
    2   542     0     0 ?           -1 S<       0   0:00  \_ [rpciod]
    2   543     0     0 ?           -1 S<       0   0:00  \_ [xprtiod]
    2   587     0     0 ?           -1 S        0   0:00  \_ [kworker/1:2]
    2   592     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:1H]
    2   602     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/vda1]
    2   603     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/vda1]
    2   604     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/vda1]
    2   605     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/vda1]
    2   606     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/vda]
    2   607     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/vda1]
    2   608     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/v]
    2   609     0     0 ?           -1 S        0   0:00  \_ [xfsaild/vda1]
    2   611     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:1H]
    2   614     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   615     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   622     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-2]
    2   623     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-2]
    2   624     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-2]
    2   625     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-2]
    2   626     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   627     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-2]
    2   628     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   629     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-2]
    2   782     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:1H]
    2   976     0     0 ?           -1 S<       0   0:00  \_ [nfsiod]
    2 11912     0     0 ?           -1 S        0   0:00  \_ [kworker/1:3]
    2 12394     0     0 ?           -1 S        0   0:00  \_ [kworker/2:3]
    2 12528     0     0 ?           -1 S        0   0:00  \_ [kworker/0:2]
    0     1     1     1 ?           -1 Ss       0   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
    1   505   505   505 ?           -1 Ss       0   0:12 /usr/lib/systemd/systemd-journald
    1   533   533   533 ?           -1 Ss       0   0:00 /usr/sbin/lvmetad -f
    1   541   541   541 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
    1   654   654   654 ?           -1 S<sl     0   0:00 /sbin/auditd
    1   682   682   682 ?           -1 Ss       0   0:00 /usr/sbin/irqbalance --foreground
    1   683   683   683 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-logind
    1   684   684   684 ?           -1 Ssl      0   0:16 /usr/sbin/rsyslogd -n
    1   685   685   685 ?           -1 Ssl    999   0:00 /usr/lib/polkit-1/polkitd --no-debug
    1   687   687   687 ?           -1 Ss      81   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
    1   703   703   703 ?           -1 Ssl      0   0:00 /usr/sbin/gssproxy -D
    1   715   715   715 ?           -1 Ssl      0   0:00 /usr/sbin/NetworkManager --no-daemon
    1   953   953   953 ?           -1 Ssl      0   0:00 /usr/bin/python -Es /usr/sbin/tuned -l -P
    1   961   961   961 ?           -1 Ss       0   0:00 /usr/sbin/sshd -D
  961 11768 11768 11768 ?           -1 Ss       0   0:00  \_ sshd: root@pts/0
11768 11770 11770 11770 pts/0    11770 Ss+      0   0:00  |   \_ -bash
  961 12423 12423 12423 ?           -1 Ss       0   0:00  \_ sshd: root@pts/1
12423 12425 12425 12425 pts/1    12445 Ss       0   0:00  |   \_ -bash
12425 12445 12445 12425 pts/1    12445 S+       0   0:00  |       \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
12445 12447 12445 12425 pts/1    12445 D+       0   0:01  |           \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
12447 12448 12445 12425 pts/1    12445 S+       0   0:00  |               \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
  961 12568 12568 12568 ?           -1 Ss       0   0:00  \_ sshd: root@pts/2
12568 12571 12571 12571 pts/2    12594 Ss       0   0:00      \_ -bash
12571 12594 12594 12571 pts/2    12594 R+       0   0:00          \_ ps axjf
    1   986   986   986 ?           -1 Ss      27   0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
  986  1293   986   986 ?           -1 Sl      27   0:00  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log
    1  1021  1021  1021 ?           -1 Ss       0   0:00 /usr/sbin/httpd -DFOREGROUND
 1021  2826  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2828  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2829  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2832  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2835  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  4799  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  8606  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
    1  1040  1040  1040 ?           -1 Ss       0   0:00 /usr/sbin/crond -n
 1040 12330  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12330 12350 12350 12350 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
12350 12360 12350 12350 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 12332  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12332 12345 12345 12345 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
12345 12358 12345 12345 ?           -1 S     1001   0:00  |       \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
12358 12367 12345 12345 ?           -1 S     1001   0:00  |           \_ ftp -p -n -v spaceweather.uma.es
 1040 12481  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12481 12504 12504 12504 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
12504 12512 12504 12504 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 12483  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
12483 12498 12498 12498 ?           -1 Ss    1001   0:00      \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
12498 12508 12498 12498 ?           -1 S     1001   0:00          \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
12508 12515 12498 12498 ?           -1 S     1001   0:00              \_ ftp -p -n -v spaceweather.uma.es
    1  1121  1121  1121 tty1      1121 Ss+      0   0:00 /sbin/agetty --noclear tty1 linux
    1  1365  1365  1365 ?           -1 Ss     998   0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    1  1620  1620  1620 ?           -1 Ss       0   0:00 /usr/libexec/postfix/master -w
 1620  1687  1620  1620 ?           -1 S       89   0:00  \_ pickup -l -t unix -u
 1620  1688  1620  1620 ?           -1 S       89   0:00  \_ qmgr -l -t unix -u
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# less /var/log/messages
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ls -la /hesperiamount2/is
<session hung>

[New Terminal Session Follows]

[root@hesperia1 ~]# top

top - 14:10:09 up 11 min,  3 users,  load average: 3.44, 2.27, 1.11
Tasks: 157 total,   3 running, 153 sleeping,   0 stopped,   1 zombie
%Cpu(s): 48.6 us,  0.9 sy,  0.0 ni, 50.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.1 st
KiB Mem :  3881424 total,  2761644 free,   439432 used,   680348 buff/cache
KiB Swap:  4063228 total,  4063228 free,        0 used.  2879808 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                     
13508 release1  20   0  382188  94528   7760 R  99.7  2.4   0:06.70 python                                                                                      
13517 release1  20   0  376016  88396   7760 R  99.7  2.3   0:06.95 python                                                                                      
    1 root      20   0  125408   3840   2420 S   0.3  0.1   0:01.78 systemd                                                                                     
    9 root      20   0       0      0      0 S   0.3  0.0   0:00.51 rcu_sched                                                                                   
  715 root      20   0  469628   8676   6428 S   0.3  0.2   0:00.24 NetworkManager                                                                              
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd                                                                                    
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0                                                                                 
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                                                
    6 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u8:0                                                                                
    7 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/0                                                                                 
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                                                      
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/0                                                                                  
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/1                                                                                  
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/1                                                                                 
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.01 ksoftirqd/1                                                                                 
   15 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H                                                                                
   16 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/2                                                                                  
   17 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 migration/2                                                                                 
   18 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/2                                                                                 
   20 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H                                                                                
   21 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/3                                                                                  
   22 root      rt   0       0      0      0 S   0.0  0.0   0:00.05 migration/3                                                                                 
   23 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/3                                                                                 
   25 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/3:0H                                                                                
   27 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs                                                                                   
   28 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 netns                                                                                       
   29 root      20   0       0      0      0 S   0.0  0.0   0:00.00 khungtaskd                                                                                  
   30 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 writeback                                                                                   
   31 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kintegrityd                                                                                 
   32 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 bioset                                                                                      
   33 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kblockd                                                                                     
   34 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 md                                                                                          
   35 root      20   0       0      0      0 S   0.0  0.0   0:00.15 kworker/0:1                                                                                 
   36 root      20   0       0      0      0 S   0.0  0.0   0:00.10 kworker/1:1                                                                                 
   37 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kworker/2:1                                                                                 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ps axjf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
    2     3     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/0]
    2     5     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:0H]
    2     6     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:0]
    2     7     0     0 ?           -1 S        0   0:00  \_ [migration/0]
    2     8     0     0 ?           -1 S        0   0:00  \_ [rcu_bh]
    2     9     0     0 ?           -1 S        0   0:00  \_ [rcu_sched]
    2    10     0     0 ?           -1 S        0   0:00  \_ [watchdog/0]
    2    11     0     0 ?           -1 S        0   0:00  \_ [watchdog/1]
    2    12     0     0 ?           -1 S        0   0:00  \_ [migration/1]
    2    13     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/1]
    2    15     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:0H]
    2    16     0     0 ?           -1 S        0   0:00  \_ [watchdog/2]
    2    17     0     0 ?           -1 S        0   0:00  \_ [migration/2]
    2    18     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/2]
    2    20     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:0H]
    2    21     0     0 ?           -1 S        0   0:00  \_ [watchdog/3]
    2    22     0     0 ?           -1 S        0   0:00  \_ [migration/3]
    2    23     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/3]
    2    25     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:0H]
    2    27     0     0 ?           -1 S        0   0:00  \_ [kdevtmpfs]
    2    28     0     0 ?           -1 S<       0   0:00  \_ [netns]
    2    29     0     0 ?           -1 S        0   0:00  \_ [khungtaskd]
    2    30     0     0 ?           -1 S<       0   0:00  \_ [writeback]
    2    31     0     0 ?           -1 S<       0   0:00  \_ [kintegrityd]
    2    32     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2    33     0     0 ?           -1 S<       0   0:00  \_ [kblockd]
    2    34     0     0 ?           -1 S<       0   0:00  \_ [md]
    2    35     0     0 ?           -1 S        0   0:00  \_ [kworker/0:1]
    2    36     0     0 ?           -1 S        0   0:00  \_ [kworker/1:1]
    2    37     0     0 ?           -1 S        0   0:00  \_ [kworker/2:1]
    2    38     0     0 ?           -1 S        0   0:00  \_ [kworker/3:1]
    2    40     0     0 ?           -1 S        0   0:00  \_ [kswapd0]
    2    41     0     0 ?           -1 SN       0   0:00  \_ [ksmd]
    2    42     0     0 ?           -1 SN       0   0:00  \_ [khugepaged]
    2    43     0     0 ?           -1 S<       0   0:00  \_ [crypto]
    2    51     0     0 ?           -1 S<       0   0:00  \_ [kthrotld]
    2    52     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:1]
    2    53     0     0 ?           -1 S<       0   0:00  \_ [kmpath_rdacd]
    2    54     0     0 ?           -1 S<       0   0:00  \_ [kpsmoused]
    2    55     0     0 ?           -1 S<       0   0:00  \_ [ipv6_addrconf]
    2    74     0     0 ?           -1 S<       0   0:00  \_ [deferwq]
    2   106     0     0 ?           -1 S        0   0:00  \_ [kauditd]
    2   286     0     0 ?           -1 S<       0   0:00  \_ [ata_sff]
    2   300     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_0]
    2   301     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_0]
    2   302     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_1]
    2   303     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_1]
    2   304     0     0 ?           -1 S<       0   0:00  \_ [ttm_swap]
    2   356     0     0 ?           -1 S        0   0:00  \_ [kworker/3:2]
    2   357     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:1H]
    2   399     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   400     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   411     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   412     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   425     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   426     0     0 ?           -1 S<       0   0:00  \_ [xfsalloc]
    2   427     0     0 ?           -1 S<       0   0:00  \_ [xfs_mru_cache]
    2   428     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-0]
    2   429     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-0]
    2   430     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-0]
    2   431     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-0]
    2   432     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   433     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-0]
    2   434     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   435     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-0]
    2   542     0     0 ?           -1 S<       0   0:00  \_ [rpciod]
    2   543     0     0 ?           -1 S<       0   0:00  \_ [xprtiod]
    2   592     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:1H]
    2   602     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/vda1]
    2   603     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/vda1]
    2   604     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/vda1]
    2   605     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/vda1]
    2   606     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/vda]
    2   607     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/vda1]
    2   608     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/v]
    2   609     0     0 ?           -1 S        0   0:00  \_ [xfsaild/vda1]
    2   611     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:1H]
    2   614     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   615     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   622     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-2]
    2   623     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-2]
    2   624     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-2]
    2   625     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-2]
    2   626     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   627     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-2]
    2   628     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   629     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-2]
    2   782     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:1H]
    2   976     0     0 ?           -1 S<       0   0:00  \_ [nfsiod]
    2 12394     0     0 ?           -1 S        0   0:00  \_ [kworker/2:3]
    2 12528     0     0 ?           -1 S        0   0:00  \_ [kworker/0:2]
    2 12770     0     0 ?           -1 S        0   0:00  \_ [kworker/1:0]
    2 12925     0     0 ?           -1 S        0   0:00  \_ [kworker/1:3]
    2 12940     0     0 ?           -1 S        0   0:00  \_ [kworker/3:0]
    2 13265     0     0 ?           -1 S        0   0:00  \_ [kworker/2:0]
    2 13745     0     0 ?           -1 S        0   0:00  \_ [kworker/3:3]
    2 13902     0     0 ?           -1 S        0   0:00  \_ [kworker/0:0]
    0     1     1     1 ?           -1 Ss       0   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
    1   505   505   505 ?           -1 Ss       0   0:12 /usr/lib/systemd/systemd-journald
    1   533   533   533 ?           -1 Ss       0   0:00 /usr/sbin/lvmetad -f
    1   541   541   541 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
    1   654   654   654 ?           -1 S<sl     0   0:00 /sbin/auditd
    1   682   682   682 ?           -1 Ss       0   0:00 /usr/sbin/irqbalance --foreground
    1   683   683   683 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-logind
    1   684   684   684 ?           -1 Ssl      0   0:16 /usr/sbin/rsyslogd -n
    1   685   685   685 ?           -1 Ssl    999   0:00 /usr/lib/polkit-1/polkitd --no-debug
    1   687   687   687 ?           -1 Ss      81   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
    1   703   703   703 ?           -1 Ssl      0   0:00 /usr/sbin/gssproxy -D
    1   715   715   715 ?           -1 Ssl      0   0:00 /usr/sbin/NetworkManager --no-daemon
    1   953   953   953 ?           -1 Ssl      0   0:00 /usr/bin/python -Es /usr/sbin/tuned -l -P
    1   961   961   961 ?           -1 Ss       0   0:00 /usr/sbin/sshd -D
  961 12568 12568 12568 ?           -1 Ss       0   0:00  \_ sshd: root@pts/2
12568 12571 12571 12571 pts/2    12571 Ds+      0   0:00  |   \_ -bash
  961 12817 12817 12817 ?           -1 Ss       0   0:00  \_ sshd: root@pts/3
12817 12819 12819 12819 pts/3    12945 Ss       0   0:00  |   \_ -bash
12819 12945 12945 12819 pts/3    12945 D+       0   0:00  |       \_ ls --color=auto -la /hesperiamount2/
  961 13072 13072 13072 ?           -1 Ss       0   0:00  \_ sshd: root@pts/4
13072 13101 13101 13101 pts/4    13946 Ss       0   0:00      \_ -bash
13101 13946 13946 13101 pts/4    13946 R+       0   0:00          \_ ps axjf
    1   986   986   986 ?           -1 Ss      27   0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
  986  1293   986   986 ?           -1 Sl      27   0:01  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log
    1  1021  1021  1021 ?           -1 Ss       0   0:00 /usr/sbin/httpd -DFOREGROUND
 1021  2826  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2828  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2829  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2832  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  2835  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  4799  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
 1021  8606  1021  1021 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
    1  1040  1040  1040 ?           -1 Ss       0   0:00 /usr/sbin/crond -n
 1040 13678  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13678 13697 13697 13697 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
13697 13710 13697 13697 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 13680  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13680 13699 13699 13699 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
13699 13702 13699 13699 ?           -1 S     1001   0:00  |       \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
13702 13703 13699 13699 ?           -1 S     1001   0:00  |           \_ ftp -p -n -v spaceweather.uma.es
 1040 13853  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13853 13876 13876 13876 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /dev/null 2>&1
13876 13886 13876 13876 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_well
 1040 13855  1040  1040 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
13855 13864 13864 13864 ?           -1 Ss    1001   0:00      \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperiamount/isnet1/log/UMASEP_ftp_get_
13864 13885 13864 13864 ?           -1 S     1001   0:00          \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
13885 13888 13864 13864 ?           -1 S     1001   0:00              \_ ftp -p -n -v spaceweather.uma.es
    1  1121  1121  1121 tty1      1121 Ss+      0   0:00 /sbin/agetty --noclear tty1 linux
    1  1365  1365  1365 ?           -1 Ss     998   0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    1  1620  1620  1620 ?           -1 Ss       0   0:00 /usr/libexec/postfix/master -w
 1620  1687  1620  1620 ?           -1 S       89   0:00  \_ pickup -l -t unix -u
 1620  1688  1620  1620 ?           -1 S       89   0:00  \_ qmgr -l -t unix -u
    1 12447 12445 12425 ?           -1 D        0   0:01 rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
12447 12448 12445 12425 ?           -1 Z        0   0:00  \_ [rsync] <defunct>
    1 13736 13696 13696 ?           -1 S     1001   0:00 python umasep500_1_minute
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ps -l 12447
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY        TIME CMD
1 D     0 12447     1  0  80   0 - 29685 rpc_wa ?          0:01 rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1

[root@hesperia1 log]# cat /proc/self/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=1929660k,nr_inodes=482415,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
/dev/mapper/centos-root / xfs rw,relatime,attr2,inode64,noquota 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12052 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
/dev/mapper/vg2-lv1 /hesperiamount xfs rw,relatime,attr2,inode64,noquota 0 0
/dev/vda1 /boot xfs rw,relatime,attr2,inode64,noquota 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
10.201.40.34:/data/col1/noc-bkups-1 /mnt/dd2500-1 nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34 0 0
10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34 0 0
tmpfs /run/user/998 tmpfs rw,nosuid,nodev,relatime,size=388144k,mode=700,uid=998,gid=997 0 0
tmpfs /run/user/1001 tmpfs rw,nosuid,nodev,relatime,size=388144k,mode=700,uid=1001,gid=1002 0 0
tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=388144k,mode=700 0 0

[root@hesperia1 log]# showmount --all
clnt_create: RPC: Program not registered

[root@hesperia1 log]# mount -l -t nfs
10.201.40.34:/data/col1/noc-bkups-1 on /mnt/dd2500-1 type nfs (rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34)
10.201.40.34:/data/col1/hesperia-mount on /hesperiamount2 type nfs (rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.201.40.34,mountvers=3,mountport=2052,mountproto=tcp,local_lock=all,addr=10.201.40.34)

[root@hesperia1 log]# showmount -e 10.201.40.34
Export list for 10.201.40.34:
/data/col1/hesperia-mount 195.251.204.197
/data/col1/noc-bkups-1    195.251.204.192/28

Comment 4 Nikolaos Milas 2017-10-04 12:07:59 UTC
Problem solved by changing the NFS Export Options (of the NFS shared directory, at the data storage system) from secure to insecure. That is, I changed from:

    rw,no_root_squash,no_all_squash,secure,nolog

to:

    rw,no_root_squash,no_all_squash,insecure,nolog

I don't know if the behavior I had described can be explained/expected by using the "secure" option, but after I changed to "insecure" everything works fine, using the latest packages - latest kernel and latest rpms on CentOS 7.4 (3.10.0-693.2.2.el7.x86_64 and rpcbind-0.2.0-42.el7.x86_64).

I can't tell whether this issue needs further examination and/or source code changes/improvements.

Comment 5 Nikolaos Milas 2017-10-06 09:58:38 UTC
The problem, after a couple of days, started occurring again, so the above setting evidently did not resolve the issue in the end. 

Here is a test performed today (2017-10-06), for which I am attaching a TCPdump between the box under investigation and the storage server (which exports directories).

I have booted using kernel 3.10.0-693.2.2.el7.x86_64 with debugging.

I attach a TCP dump for this session (recorded using the command you see at Terminal Window 1 below), named hesperia-nfs-003.zip

I also attach the messages log for the session (hesperia-messages-20171006-01.txt).

The nfs mounts in /etc/fstab are as follows: 

----------------------------------------------------------
/etc/fstab:
-----------

[root@hesperia1 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Mon Jul  6 14:29:42 2015
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=7a3ae70a-8ef3-463b-8f5b-be4e2e7be894 /boot                   xfs     defaults        0 0
/dev/mapper/centos-swap swap                    swap    defaults        0 0
/dev/mapper/vg2-lv1     /hesperiamount          xfs     defaults        0 0
#
10.201.40.34:/data/col1/noc-bkups-1   /mnt/dd2500-1   nfs hard,intr,nolock,nfsvers=3,tcp,rsize=1048600,wsize=1048600,bg 0 0
10.201.40.34:/data/col1/hesperia-mount   /hesperiamount2   nfs hard,intr,nolock,nfsvers=3,tcp,rsize=1048600,wsize=1048600,bg 0 0
#
# 10.201.40.34:/data/col1/noc-bkups-1   /mnt/dd2500-1   nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
# 10.201.40.34:/data/col1/hesperia-mount   /hesperiamount2   nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
----------------------------------------------------------

As you can see below, I run the rsync command, and a bit later all sessions hang.

----------------------------------------------------------
Terminal Window 1
-----------------

[root@hesperia1 ~]# rpcdebug -v -m rpc -s all
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache

Module     Valid flags
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache
[root@hesperia1 ~]# rpcdebug -v -m nfs -s all
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state

Module     Valid flags
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# tcpdump -w dumps/hesperia-nfs-003 -i eth0 -s 0 host 10.201.40.34 &
[1] 1608
[root@hesperia1 ~]# tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
<Later...>
[root@hesperia1 ~]# Disconnecting: Timeout, server not responding.
----------------------------------------------------------

----------------------------------------------------------
Terminal Window 2
-----------------

[root@hesperia1 ~]# rsync -azv --del --stats --progress /hesperiamount/isnet1/ /hesperiamount2/isnet1
sending incremental file list
RELEASE_ALERT_IMAGES/release_alert_merged_plots.png
      315851 100%    8.44MB/s    0:00:00 (xfer#1, to-check=1062/1153)

<Later...>
Disconnecting: Timeout, server not responding.
---------------------------------------------------------

----------------------------------------------------------
Terminal Window 3
-----------------

[root@hesperia1 ~]# ps axjf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0     2     0     0 ?           -1 S        0   0:00 [kthreadd]
    2     3     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/0]
    2     4     0     0 ?           -1 S        0   0:00  \_ [kworker/0:0]
    2     5     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:0H]
    2     6     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:0]
    2     7     0     0 ?           -1 S        0   0:00  \_ [migration/0]
    2     8     0     0 ?           -1 S        0   0:00  \_ [rcu_bh]
    2     9     0     0 ?           -1 S        0   0:00  \_ [rcu_sched]
    2    10     0     0 ?           -1 S        0   0:00  \_ [watchdog/0]
    2    11     0     0 ?           -1 S        0   0:00  \_ [watchdog/1]
    2    12     0     0 ?           -1 S        0   0:00  \_ [migration/1]
    2    13     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/1]
    2    14     0     0 ?           -1 S        0   0:00  \_ [kworker/1:0]
    2    15     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:0H]
    2    16     0     0 ?           -1 S        0   0:00  \_ [watchdog/2]
    2    17     0     0 ?           -1 S        0   0:00  \_ [migration/2]
    2    18     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/2]
    2    19     0     0 ?           -1 S        0   0:00  \_ [kworker/2:0]
    2    20     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:0H]
    2    21     0     0 ?           -1 S        0   0:00  \_ [watchdog/3]
    2    22     0     0 ?           -1 S        0   0:00  \_ [migration/3]
    2    23     0     0 ?           -1 S        0   0:00  \_ [ksoftirqd/3]
    2    24     0     0 ?           -1 S        0   0:00  \_ [kworker/3:0]
    2    25     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:0H]
    2    27     0     0 ?           -1 S        0   0:00  \_ [kdevtmpfs]
    2    28     0     0 ?           -1 S<       0   0:00  \_ [netns]
    2    29     0     0 ?           -1 S        0   0:00  \_ [khungtaskd]
    2    30     0     0 ?           -1 S<       0   0:00  \_ [writeback]
    2    31     0     0 ?           -1 S<       0   0:00  \_ [kintegrityd]
    2    32     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2    33     0     0 ?           -1 S<       0   0:00  \_ [kblockd]
    2    34     0     0 ?           -1 S<       0   0:00  \_ [md]
    2    35     0     0 ?           -1 S        0   0:00  \_ [kworker/0:1]
    2    36     0     0 ?           -1 S        0   0:00  \_ [kworker/1:1]
    2    37     0     0 ?           -1 S        0   0:00  \_ [kworker/2:1]
    2    38     0     0 ?           -1 S        0   0:00  \_ [kworker/3:1]
    2    40     0     0 ?           -1 S        0   0:00  \_ [kswapd0]
    2    41     0     0 ?           -1 SN       0   0:00  \_ [ksmd]
    2    42     0     0 ?           -1 SN       0   0:00  \_ [khugepaged]
    2    43     0     0 ?           -1 S<       0   0:00  \_ [crypto]
    2    51     0     0 ?           -1 S<       0   0:00  \_ [kthrotld]
    2    52     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:1]
    2    53     0     0 ?           -1 S<       0   0:00  \_ [kmpath_rdacd]
    2    54     0     0 ?           -1 S<       0   0:00  \_ [kpsmoused]
    2    55     0     0 ?           -1 S<       0   0:00  \_ [ipv6_addrconf]
    2    74     0     0 ?           -1 S<       0   0:00  \_ [deferwq]
    2   106     0     0 ?           -1 S        0   0:00  \_ [kworker/3:2]
    2   107     0     0 ?           -1 S        0   0:00  \_ [kauditd]
    2   226     0     0 ?           -1 S        0   0:00  \_ [kworker/0:2]
    2   288     0     0 ?           -1 S<       0   0:00  \_ [ata_sff]
    2   296     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_0]
    2   299     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_0]
    2   300     0     0 ?           -1 S        0   0:00  \_ [scsi_eh_1]
    2   301     0     0 ?           -1 S<       0   0:00  \_ [scsi_tmf_1]
    2   303     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:2]
    2   304     0     0 ?           -1 S        0   0:00  \_ [kworker/u8:3]
    2   305     0     0 ?           -1 S<       0   0:00  \_ [ttm_swap]
    2   316     0     0 ?           -1 S        0   0:00  \_ [kworker/1:2]
    2   320     0     0 ?           -1 S<       0   0:00  \_ [kworker/2:1H]
    2   331     0     0 ?           -1 S        0   0:00  \_ [kworker/2:2]
    2   400     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   401     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   412     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   413     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   426     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   427     0     0 ?           -1 S<       0   0:00  \_ [xfsalloc]
    2   428     0     0 ?           -1 S<       0   0:00  \_ [xfs_mru_cache]
    2   429     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-0]
    2   430     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-0]
    2   431     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-0]
    2   432     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-0]
    2   433     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   434     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-0]
    2   435     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   436     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-0]
    2   538     0     0 ?           -1 S<       0   0:00  \_ [rpciod]
    2   539     0     0 ?           -1 S<       0   0:00  \_ [xprtiod]
    2   596     0     0 ?           -1 S<       0   0:00  \_ [kworker/0:1H]
    2   597     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/vda1]
    2   598     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/vda1]
    2   599     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/vda1]
    2   600     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/vda1]
    2   601     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/vda]
    2   602     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/vda1]
    2   603     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/v]
    2   604     0     0 ?           -1 S        0   0:00  \_ [xfsaild/vda1]
    2   607     0     0 ?           -1 S<       0   0:00  \_ [kworker/3:1H]
    2   608     0     0 ?           -1 S<       0   0:00  \_ [kworker/1:1H]
    2   612     0     0 ?           -1 S<       0   0:00  \_ [kdmflush]
    2   613     0     0 ?           -1 S<       0   0:00  \_ [bioset]
    2   620     0     0 ?           -1 S<       0   0:00  \_ [xfs-buf/dm-2]
    2   621     0     0 ?           -1 S<       0   0:00  \_ [xfs-data/dm-2]
    2   622     0     0 ?           -1 S<       0   0:00  \_ [xfs-conv/dm-2]
    2   623     0     0 ?           -1 S<       0   0:00  \_ [xfs-cil/dm-2]
    2   624     0     0 ?           -1 S<       0   0:00  \_ [xfs-reclaim/dm-]
    2   625     0     0 ?           -1 S<       0   0:00  \_ [xfs-log/dm-2]
    2   626     0     0 ?           -1 S<       0   0:00  \_ [xfs-eofblocks/d]
    2   627     0     0 ?           -1 S        0   0:00  \_ [xfsaild/dm-2]
    2   963     0     0 ?           -1 S<       0   0:00  \_ [nfsiod]
    2  1785     0     0 ?           -1 S        0   0:00  \_ [kworker/3:3]
    0     1     1     1 ?           -1 Ss       0   0:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 20
    1   506   506   506 ?           -1 Rs       0   0:56 /usr/lib/systemd/systemd-journald
    1   541   541   541 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
    1   549   549   549 ?           -1 Ss       0   0:00 /usr/sbin/lvmetad -f
    1   652   652   652 ?           -1 S<sl     0   0:00 /sbin/auditd
    1   675   675   675 ?           -1 Ssl    999   0:00 /usr/lib/polkit-1/polkitd --no-debug
    1   677   677   677 ?           -1 Ss       0   0:00 /usr/lib/systemd/systemd-logind
    1   679   679   679 ?           -1 Ssl      0   1:03 /usr/sbin/rsyslogd -n
    1   684   684   684 ?           -1 Ss       0   0:00 /usr/sbin/irqbalance --foreground
    1   686   686   686 ?           -1 Ss      81   0:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --sys
    1   705   705   705 ?           -1 Ssl      0   0:00 /usr/sbin/gssproxy -D
    1   713   713   713 ?           -1 Ssl      0   0:00 /usr/sbin/NetworkManager --no-daemon
    1   943   943   943 ?           -1 Ssl      0   0:00 /usr/bin/python -Es /usr/sbin/tuned -l -P
    1   957   957   957 ?           -1 Ss       0   0:00 /usr/sbin/sshd -D
  957  1507  1507  1507 ?           -1 Ss       0   0:00  \_ sshd: root@pts/0
 1507  1510  1510  1510 pts/0     1510 Ss+      0   0:00  |   \_ -bash
 1510  1608  1608  1510 pts/0     1510 S       72   0:00  |       \_ tcpdump -w dumps/hesperia-nfs-003 -i eth0 -s 0 host 10.201.
  957  1655  1655  1655 ?           -1 Ss       0   0:00  \_ sshd: root@pts/1
 1655  1658  1658  1658 pts/1     1688 Ss       0   0:00  |   \_ -bash
 1658  1688  1688  1658 pts/1     1688 D+       0   0:01  |       \_ rsync -azv --del --stats --progress /hesperiamount/isnet1/ 
 1688  1689  1688  1658 pts/1     1688 S+       0   0:07  |           \_ rsync -azv --del --stats --progress /hesperiamount/isne
 1689  1690  1688  1658 pts/1     1688 S+       0   0:00  |               \_ rsync -azv --del --stats --progress /hesperiamount/
  957  1803  1803  1803 ?           -1 Ss       0   0:00  \_ sshd: root@pts/2
 1803  1806  1806  1806 pts/2     1833 Ss       0   0:00      \_ -bash
 1806  1833  1833  1806 pts/2     1833 R+       0   0:00          \_ ps axjf
    1   974   974   974 ?           -1 Ss       0   0:00 /usr/sbin/httpd -DFOREGROUND
  974  1448   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1449   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1450   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1451   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1452   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1485   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
  974  1541   974   974 ?           -1 S       48   0:00  \_ /usr/sbin/httpd -DFOREGROUND
    1   978   978   978 ?           -1 Ss       0   0:00 /usr/sbin/crond -n
  978  1559   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1559  1565  1565  1565 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /d
 1565  1582  1565  1565 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_
  978  1561   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1561  1569  1569  1569 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperia
 1569  1579  1569  1569 ?           -1 S     1001   0:00  |       \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
 1579  1594  1569  1569 ?           -1 S     1001   0:00  |           \_ ftp -p -n -v spaceweather.uma.es
  978  1724   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1724  1740  1740  1740 ?           -1 Ss    1001   0:00  |   \_ /bin/sh -c /hesperiamount/isnet1/check_processes_work_well > /d
 1740  1750  1740  1740 ?           -1 S     1001   0:00  |       \_ /usr/bin/python /hesperiamount/isnet1/check_processes_work_
  978  1726   978   978 ?           -1 S        0   0:00  \_ /usr/sbin/CROND -n
 1726  1742  1742  1742 ?           -1 Ss    1001   0:00      \_ /bin/sh -c /hesperiamount/isnet1/GetUmasepLastFile >> /hesperia
 1742  1744  1742  1742 ?           -1 S     1001   0:00          \_ /bin/sh /hesperiamount/isnet1/GetUmasepLastFile
 1744  1751  1742  1742 ?           -1 S     1001   0:00              \_ ftp -p -n -v spaceweather.uma.es
    1  1011  1011  1011 tty1      1011 Ss+      0   0:00 /sbin/agetty --noclear tty1 linux
    1  1040  1040  1040 ?           -1 Ss      27   0:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
 1040  1312  1040  1040 ?           -1 Sl      27   0:00  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-d
    1  1229  1229  1229 ?           -1 Ss     998   0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    1  1358  1358  1358 ?           -1 Ss       0   0:00 /usr/libexec/postfix/master -w
 1358  1360  1358  1358 ?           -1 S       89   0:00  \_ pickup -l -t unix -u
 1358  1361  1358  1358 ?           -1 S       89   0:00  \_ qmgr -l -t unix -u
    1  1607  1576  1576 ?           -1 S     1001   0:00 python umasep500_1_minute
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# 
[root@hesperia1 ~]# ls -la /hesperiamount2/
total 12
drwxrwxrwx   6 root     root   274 Mar  2  2017 .
dr-xr-xr-x. 22 root     root  4096 Sep 22 06:50 ..
drwxr-xr-x  16 isnet1   isnet 3604 Apr 26 15:26 isnet1
drwxr-xr-x   3 root     root   153 Mar  2  2017 ocloud_store
drwxrwxr-x  19 release1 isnet 1660 Feb 28  2017 release1
drwxrwxrwx   6 root     root   457 Oct  6 08:31 .snapshot
[root@hesperia1 ~]# 

<Session hangs - Much later...>

[root@hesperia1 ~]# Disconnecting: Timeout, server not responding.
----------------------------------------------------------

In any new terminal window (SSH Session) that I open, if I attempt to list the mounted directory, the session hangs:

----------------------------------------------------------
Terminal Window 4
-----------------

[root@hesperia1 ~]# ls -la /hesperiamount2

<hangs forever>
----------------------------------------------------------

What is being wrong?

Comment 6 Nikolaos Milas 2017-10-06 10:05:46 UTC
Created attachment 1335178 [details]
TCP Dump between the box and the NFS server - Test on 2017-10-06

The TCP dump records packets during the test performed on 2017-10-06; please see the associated report below

Comment 7 Nikolaos Milas 2017-10-06 10:08:02 UTC
Created attachment 1335179 [details]
/var/log/messages file for the period that the test on 2017-10-06 was performed

Comment 8 Nikolaos Milas 2017-10-20 08:33:40 UTC
The problem was finally traced down to a Cisco ASA bug (this firewall device lies between the connected networks); bug CSCuq80704 was resolved by an ASA software update.

NFS packets were incorrectly being dropped by ASA: 

Drop-reason: (tcp-paws-fail) TCP packet failed PAWS test

...and were causing nfs traffic to stall. After ASA software upgrade the problem has not occurred again.

I can't tell why this was not happening for many months, but only lately.

I think this case may be closed.


Note You need to log in before you can comment on or make changes to this bug.