Bug 1422090 - Rpcbind does not work in the container
Summary: Rpcbind does not work in the container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhgs-server-container
Version: cns-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.6
Assignee: Mohamed Ashiq
QA Contact: Prasanth
URL:
Whiteboard:
: 1457126 1457617 (view as bug list)
Depends On: 1427806
Blocks: 1433735 1445447 1445448
TreeView+ depends on / blocked
 
Reported: 2017-02-14 12:50 UTC by Mohamed Ashiq
Modified: 2017-10-11 06:58 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, all services in Red Hat Gluster Storage container used to connect to rpcbind service in the container. With this update, every service now connects to rpcbind service on the host node.
Clone Of:
Environment:
Last Closed: 2017-10-11 06:58:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1457126 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1457617 0 unspecified CLOSED glusterd fails to start due to dependency issues 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2017:2877 0 normal SHIPPED_LIVE rhgs-server-container bug fix and enhancement update 2017-10-11 11:11:39 UTC

Internal Links: 1457126 1457617

Description Mohamed Ashiq 2017-02-14 12:50:44 UTC
Description of problem:
RPCbind was causing glusterd to fail on dependency as rpcbind fails.
RPCBind is failing as rpcbind.socket in release rpcbind-0.2.0-38.el7.x86_64, explicitly binds to IP's port 111 on host. 
In previous release it was binding only on request. If we had used rpcbind of host(openshift node) for NFS server, we would have hit this before. This happens as we use --net=host.

Workarounds:
1)Image will work if we stop rpcbind.socket and rpcbind.service in host.
2)Disable rpcbind in container as we are not using nfs in gluster containers yet (this is followed in the release cns-3.4 on image rhgs3/rhgs-server-rhel7:3.1.3
and see BZ#1397255 for more information).

Expectation:
*)Make rpcbind work in containers

Comment 2 Ju Lim 2017-03-10 15:32:12 UTC
Relates to RHEL7 BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1427806

Comment 4 Humble Chirammal 2017-06-22 11:24:29 UTC
*** Bug 1457617 has been marked as a duplicate of this bug. ***

Comment 5 Humble Chirammal 2017-06-22 11:24:32 UTC
*** Bug 1457126 has been marked as a duplicate of this bug. ***

Comment 7 Michael Adam 2017-07-18 12:29:48 UTC
According to findings in the RHEL BZ #1427806, this is solved as follows:

1) The rpcbind service has to run on the host
   (nodes that are supposed to run gluster containers).

2) We need a small change to the gluster container docker file.
   This change is already done.
   It will be shipped with the next gluster container image build.

Comment 8 Michael Adam 2017-07-25 13:50:54 UTC
Ok. A with a little more analysis the situation seems like this:

1) A change in the host (OCP/RHEL) has led to the rpcbind being started
   by default while it was not being started by default et before.

2) Originally the gluster containers started rpcbind, but in the
   CNS builds this dependency was actually removed for the rhgs
   3.2.0 release, in february, since gluster containers don't need
   rpcbind.

   ==> I.e. cns 3.5 images should not have a problem!

   ==> I don't know how OCP qe could have run into this issue.

3) Now with preparation for CNS 3.6, the new gluster-blockd component
   in the rhgs containers does require rpcbind. Hence testing with
   the new CNS 3.6 containers, we hit the problem due to the changed
   Host behavior.

   ==> The solution is to *not* start rpcbind in the container ever
       and always rely on rpcbind running on the host.

   ==> Changed CNS 3.6 gluster builds expected tomorrow (July 26)

Summary questions for Brenton:

* Is it true that OCP 3.6 has the changed behavior of always
  starting rpcbind on the host?

* How has OCP QE possibly hit this issue for OCP 3.6?
  Have they possibly been using upstream images instead of RHGS/CNS images?

Comment 9 Wenkai Shi 2017-07-28 05:47:43 UTC
(In reply to Michael Adam from comment #8)
> Summary questions for Brenton:
> 
> * Is it true that OCP 3.6 has the changed behavior of always
>   starting rpcbind on the host?
According to code, rpcbind will only start in openshift_storage_nfs_lvm role, which means in cns situation, it won't start.
 
$ grep -nir "rpcbind"
roles/openshift_storage_nfs_lvm/tasks/nfs.yml:6:- name: Start rpcbind
roles/openshift_storage_nfs_lvm/tasks/nfs.yml:8:    name: rpcbind

> 
> * How has OCP QE possibly hit this issue for OCP 3.6?
>   Have they possibly been using upstream images instead of RHGS/CNS images?

Our QE is using brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7, the tag is latest by default. in BZ #1457126, the latest is 3.3.0-7 which have problem. 
Now the latest is 3.3.0-9, which works well. Thank you :)

Comment 10 krishnaram Karthick 2017-09-14 06:24:45 UTC
with the latest cns 3.6 builds, rpcbind is now run on host instead of the containers.

verified in build - cns-deploy-5.0.0-34.el7rhgs.x86_64

Comment 11 Prasanth 2017-09-14 10:59:45 UTC
The following steps seems to a prerequisite now and the same has been documented in our CNS 3.6 guide [1] as well:

#########
 Execute the following commands to enable and run rpcbind on all the nodes hosting the gluster pod :

# systemctl add-wants multi-user rpcbind.service
# systemctl enable rpcbind.service
# systemctl start rpcbind.service
#########


[1] https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/#chap-Documentation-Red_Hat_Gluster_Storage_Container_Native_with_OpenShift_Platform-Setting_the_environment-Preparing_RHOE

Comment 13 Raghavendra Talur 2017-10-04 15:31:37 UTC
doc text looks good to me

Comment 15 errata-xmlrpc 2017-10-11 06:58:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2877


Note You need to log in before you can comment on or make changes to this bug.