1370241 – NFS mounts don't work on diskless clients when ipv6 is disabled

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1370241 - NFS mounts don't work on diskless clients when ipv6 is disabled

Summary: NFS mounts don't work on diskless clients when ipv6 is disabled

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	doc-System_Administrators_Guide
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.4
Assignee:	Maxim Svistunov
QA Contact:	ecs-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1293430 1381646
TreeView+	depends on / blocked

Reported:	2016-08-25 17:05 UTC by Ben Woodard
Modified:	2020-07-16 08:52 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-12-11 07:49:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
sosreport (10.70 MB, application/x-xz) 2016-08-31 21:07 UTC, Trent D'Hooge	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1293430	unspecified	CLOSED	Add localhost:111 to rpcbind socket activation	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	8709	None	None	None	2016-11-07 11:30:49 UTC
Red Hat Knowledge Base (Solution)	2798411	None	None	None	2016-12-08 20:37:22 UTC

Internal Links: 1293430

Description Ben Woodard 2016-08-25 17:05:53 UTC

Description of problem:
NFS mounts do not work on 7.3 beta
This is found in the logs:

systemctl --state=failed|grep rpc 
* rpc-statd.service              loaded failed failed NFS status monitor for NFSv2/3 locking. 
* rpcbind.socket                 loaded failed failed RPCbind Server Activation Socket 

Aug 24 14:16:53 quartz410 systemd[1]: rpcbind.socket failed to listen on sockets: Address family not supported by protocol 
Aug 24 14:16:53 quartz410 systemd[1]: Failed to listen on RPCbind Server Activation Socket. 
Aug 24 14:16:53 quartz410 systemd[1]: Unit rpcbind.socket entered failed state. 

Removing

ListenStream=[::]:111 
ListenStream=0.0.0.0:111 
BindIPv6Only=ipv6-only 

fixes the problem. Which suggests that something expects ipv6 to be working early in the boot process that isn't? 

It appears like the fix applied for BZ#1293430 has introduced a regression. 

Version-Release number of selected component (if applicable):
rpcbind-0.2.0-38.el7.x86_64

Comment 3 Travis Gummels 2016-08-26 12:48:46 UTC

The reported defect is reproduced using the stock kernel as provided with 7.3 beta.

Comment 5 Steve Dickson 2016-08-27 18:07:35 UTC

(In reply to Ben Woodard from comment #0)
> Aug 24 14:16:53 quartz410 systemd[1]: rpcbind.socket failed to listen on
> sockets: Address family not supported by protocol 
> Aug 24 14:16:53 quartz410 systemd[1]: Failed to listen on RPCbind Server
> Activation Socket. 
> Aug 24 14:16:53 quartz410 systemd[1]: Unit rpcbind.socket entered failed
> state.
These are systemd errors... they are not coming from rpcbind. 
> 
> Removing
> 
> ListenStream=[::]:111 
> ListenStream=0.0.0.0:111 
> BindIPv6Only=ipv6-only 
> 
> fixes the problem. Which suggests that something expects ipv6 to be working
> early in the boot process that isn't?
Removing just avoids the problem... it does not fix it... 

> 
> It appears like the fix applied for BZ#1293430 has introduced a regression. 
Which is needed for rpcbind to listen for IPv6 connections... w/out this
change there would a regression... 

I'm going to reassign this to the systemd people so they can help
debug what is really going on there... The bottom line is its not
rpcbind failing to come up... Its systemd failing to bring up an
IPv6 socket for rpcbind to listen on.

Comment 6 Michal Sekletar 2016-08-28 08:51:20 UTC

Can you please reproduce the issue, gather sos_report and attach it to bugzilla?

Comment 7 Trent D'Hooge 2016-08-31 21:07:54 UTC

Created attachment 1196515 [details]
sosreport

Comment 8 Trent D'Hooge 2016-08-31 21:08:26 UTC

sysreport has been attached.

Comment 9 Michal Sekletar 2016-09-02 12:37:40 UTC

Thanks for the sos_report. From logs attached in the sos_report it looks like socket unit ended up in the failed state because systemd could not start listening on IPv6 socket. This is because IPv6 is disabled on that box.

grep -E 'net\.ipv6.conf\.[[:alnum:]]+\.disable_ipv6.*' sosreport-tdhooge.1370241-20160831135122/sos_commands/kernel/sysctl_-a

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.eno1.disable_ipv6 = 1
net.ipv6.conf.enp3s0f3.disable_ipv6 = 1
net.ipv6.conf.hsi0.disable_ipv6 = 1
net.ipv6.conf.hsi1.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Socket unit file lists more sockets and if it is not possible to bind all of them socket unit will fail to start. This is expected behavior. Moving back to rpcbind.

I am not sure what would be the best fix. Probably two distinct socket units but activating the same service (rpcbind).

Comment 10 Travis Gummels 2016-09-02 15:03:20 UTC

Trent can correct me bug I believe LLNL does not run IPV6 in their environment for security reasons.  They are working around the defect by removing the ListenStream line from the config.

Comment 11 Trent D'Hooge 2016-09-02 15:24:44 UTC

we don't use ipv6 at this time, so we have have

net.ipv6.conf.all.disable_ipv6 = 1

set in our sysctl.conf

Comment 12 Yongcheng Yang 2016-09-05 05:40:56 UTC

Have reproduced this issue with disabling ipv6 address

Steps to Reproduce:
1. echo "net.ipv6.conf.all.disable_ipv6 = 1" > /etc/sysctl.conf
2. reboot the system

Actual results:
[root@hp-dl380pg8-09 ~]# sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
[root@hp-dl380pg8-09 ~]# cat /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
[root@hp-dl380pg8-09 ~]# 
[root@hp-dl380pg8-09 ~]# systemctl --state=failed
  UNIT           LOAD   ACTIVE SUB    DESCRIPTION
● rpcbind.socket loaded failed failed RPCbind Server Activation Socket

[snip]
[root@hp-dl380pg8-09 ~]# systemctl status rpcbind.socket -l
● rpcbind.socket - RPCbind Server Activation Socket
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.socket; enabled; vendor preset: enabled)
   Active: failed (Result: resources)
   Listen: /var/run/rpcbind.sock (Stream)
           [::]:111 (Stream)
           0.0.0.0:111 (Stream)

Sep 04 23:47:54 localhost.localdomain systemd[1]: rpcbind.socket failed to listen on sockets: Address family not supported by protocol
Sep 04 23:47:54 localhost.localdomain systemd[1]: Failed to listen on RPCbind Server Activation Socket.
Sep 04 23:47:54 localhost.localdomain systemd[1]: Unit rpcbind.socket entered failed state.
Sep 04 23:47:54 localhost.localdomain systemd[1]: Starting RPCbind Server Activation Socket.
[root@hp-dl380pg8-09 ~]# 
[root@hp-dl380pg8-09 ~]# systemctl restart nfs
Job for nfs-server.service failed because the control process exited with error code. See "systemctl status nfs-server.service" and "journalctl -xe" for details.
[root@hp-dl380pg8-09 ~]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2016-09-05 00:55:49 EDT; 23min ago
  Process: 2845 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 2841 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 2845 (code=exited, status=1/FAILURE)

Sep 05 00:02:59 hp-dl380pg8-09.rhts.eng.pek2.redhat.com systemd[1]: Starting NFS server and services...
Sep 05 00:34:25 hp-dl380pg8-09.rhts.eng.pek2.redhat.com rpc.nfsd[2845]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Sep 05 00:55:49 hp-dl380pg8-09.rhts.eng.pek2.redhat.com rpc.nfsd[2845]: rpc.nfsd: unable to set any sockets for nfsd
Sep 05 00:55:49 hp-dl380pg8-09.rhts.eng.pek2.redhat.com systemd[1]: nfs-server.service: main process exited, code=exited, status=1/FAILURE
Sep 05 00:55:49 hp-dl380pg8-09.rhts.eng.pek2.redhat.com systemd[1]: Failed to start NFS server and services.
Sep 05 00:55:49 hp-dl380pg8-09.rhts.eng.pek2.redhat.com systemd[1]: Unit nfs-server.service entered failed state.
Sep 05 00:55:49 hp-dl380pg8-09.rhts.eng.pek2.redhat.com systemd[1]: nfs-server.service failed.
[root@hp-dl380pg8-09 ~]#

Comment 37 Maxim Svistunov 2016-12-11 07:49:34 UTC

The new text has been published to the Customer Portal:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/sec-Verifying_the_Initial_RAM_Disk_Image.html

Comment 38 Robert Story 2016-12-26 23:11:17 UTC

I see this closed and marked currentrelease. AFAIK 7.3 is the current release and it still has this problem..

Comment 40 Travis Gummels 2016-12-27 21:19:40 UTC

(In reply to Robert Story from comment #38)
> I see this closed and marked currentrelease. AFAIK 7.3 is the current
> release and it still has this problem..

There was no defect to resolve.  Root cause analysis resulted in the documentation update as noted in comment 37.  End users making changes to sysctl.conf or other sysctl files that are used early in the boot process need to be aware that they likely need to run dracut -f to propagate the changes (by rebuilding the initramfs).  The documentation was updated for the current release hence why this bug is closed current release.

Comment 41 Vide 2017-01-20 11:43:26 UTC

Hello

I've the same problem with EL7.3 (specifically with ipv6 disabled in sysctl.conf) but I can't get Dracut to put sysctl.conf in the initrd image.

# sysctl net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.all.disable_ipv6 = 1
# sysctl net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6 = 1
# grep ipv6 /etc/sysctl.conf 
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.all.disable_ipv6 = 1
# dracut -f
# uname -a
Linux centos73.billydomain.com 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# lsinitrd --kver 3.10.0-514.6.1.el7.x86_64 -f /etc/sysctl.conf
#

Note You need to log in before you can comment on or make changes to this bug.

bcodding
deekej
dmoessne
eguan
initscripts-maint-list
jiyin
jshivers
lmiksik
lnykryn
mlichvar
msekleta
msvistun
rhel-docs
rs
steved
swhiteho
systemd-maint-list
tdhooge
tgummels
vide80
woodard
yoyang