Description of problem: The ypserv YP server in F16 doesn't work with Mac OS X's ypbind client anymore. The same setup worked fine on F14. Version-Release number of selected component (if applicable): ypserv-2.26-9.fc16.x86_64 rpcbind-0.2.0-15.fc16.x86_64 How reproducible: Worked fine in F14, doesn't work anymore in F16. The iptables firewall was disabled just in case. Steps to Reproduce: 1. Install + configure ypserv on F16 2. Configure YP client on Mac OS X 3. Try ypwhich/ypcat on OS X, notice nothing works as expected Actual results: osx $ ypwhich ... Expected results: osx $ ypwhich my.yp.lan Additional info: I started several daemons in debug mode: osx $ rpcinfo -p f16srv program vers proto port 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100004 2 udp 800 ypserv 100004 1 udp 800 ypserv 100004 2 tcp 803 ypserv 100004 1 tcp 803 ypserv Looks good so far... f16srv # rpcbind -d local: 0 lookup routines : rpcbind : my address is (null) FUNCTION rbllist_addAdd the prog 100000 vers 3 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 4 to the rpcbind list check binding for local udp: 0 lookup routines : rpcbind : my address is 0.0.0.0.0.111 FUNCTION rbllist_addAdd the prog 100000 vers 2 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 3 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 4 to the rpcbind list check binding for udp rmtcall fd for udp is 7 tcp: 0 lookup routines : rpcbind : my address is 0.0.0.0.0.111 FUNCTION rbllist_addAdd the prog 100000 vers 2 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 3 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 4 to the rpcbind list check binding for tcp udp6: 0 lookup routines : rpcbind : my address is ::.0.111 FUNCTION rbllist_addAdd the prog 100000 vers 3 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 4 to the rpcbind list check binding for udp6 rmtcall fd for udp6 is 10 tcp6: 0 lookup routines : rpcbind : my address is ::.0.111 FUNCTION rbllist_addAdd the prog 100000 vers 3 to the rpcbind list FUNCTION rbllist_addAdd the prog 100000 vers 4 to the rpcbind list check binding for tcp6 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_UNSET request for (100004, 2) : Checking caller's adress (port = 804) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_UNSET request for (100004, 1) : Checking caller's adress (port = 805) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > pmap_rmtcall callit req for (100004, 2, 2, udp) from 192.168.0.112.221.46 : not found svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_UNSET request for (100004, 2) : Checking caller's adress (port = 798) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_UNSET request for (100004, 1) : Checking caller's adress (port = 799) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_SET request for (100004, 2) : Checking caller's adress (port = 801) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_SET request for (100004, 1) : Checking caller's adress (port = 802) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_SET request for (100004, 2) : Checking caller's adress (port = 804) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > PMAP_SET request for (100004, 1) : Checking caller's adress (port = 805) succeeded svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 6 > pmap_rmtcall callit req for (100004, 2, 2, udp) from 192.168.0.112.221.46 : found at uaddr 0.0.0.0.3.32 ... hm? f16srv # ypbind -d Find securenet: 255.255.255.0 192.168.0.0 Find securenet: 255.255.255.255 127.0.0.1 ypserv.conf: files: 30 ypserv.conf: xfr_check_port: 1 ypserv.conf: 0.0.0.0/0.0.0.0:*:shadow.byname:2 ypserv.conf: 0.0.0.0/0.0.0.0:*:passwd.adjunct.byname:2 ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:797] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:797] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:797] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:797] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:797] connect from 127.0.0.1 -> OK. ... Why is ypserv seeing requests from localhost??! There's no ypbind running on f16srv. osx # ypbind -d ypbind: ypbindproc_domain_2 my.yp.lan ypbind: dead domain my.yp.lan
FreeBSD and Fedora16 clients seem to be affected as well. Can somebody confirm that ypserv on F16 actually works?
(In reply to comment #1) > FreeBSD and Fedora16 clients seem to be affected as well. Can somebody confirm > that ypserv on F16 actually works? I'm only able to test Fedora/RHEL packages, but I see no issues. If you can elaborate a bit more what can be wrong, I'll take a look a bit closer at it. Do you see anything suspicious in syslog when using Fedora client (server/client side)?
I'm sorry, you're right, F16 NIS clients work fine indeed. But I still have problems with OSX and *BSD clients and I don't know why. :( As soon as I start ypbind on OSX or *BSD I see this when running ypserv -d on F16: ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ... With a F16 NIS client I see connections from the 192.168.0.0/24 network (i.e. the F16 NIS client IP). I also tried downgrading ypserv and rpcbind using F14 packages, which didn't help. So I'm kinda lost here as ypbind on *BSD and OSX provide no debug mechanisms. Will try installing Solaris tomorrow and report back.
So, Solaris 10 NIS client works ofc fine with F16 NIS server, but! Solaris 10 NIS server also works with *BSD clients... To sum it up: F14 YP Server works fine F16 YP Server works with F16, Solaris, but not with *BSD and Mac OS X clients Solaris YP Server works fine Maybe someone could d/l any *BSD, install it in VirtualBox and confirm that F16 ypserv doesn't work with *BSD ypbind?
(In reply to comment #4) > Maybe someone could d/l any *BSD, install it in VirtualBox and confirm that F16 > ypserv doesn't work with *BSD ypbind? It works for me in VM with Fedora 16 server and FreeBSD 9.0 client. (In reply to comment #3) > ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] > connect from 127.0.0.1 > -> OK. Sounds to me like a kind of hostname misconfiguration. Don't your hostnames conflict somehow?
Thanks for checking it out, it's really weird that this is working for you out of the box. Anyway, I tried to nail it down again and found this: freebsd9# ypbind freebsd9# ypwhich ypwhich: can't yp_bind: reason: Domain not bound freebsd9# killall ypbind freebsd9# ypbind -S my.yp.lan,f16srv -m freebsd9# ypwhich f16srv freebsd9# killall ypbind freebsd9# ypbind -ypset; sleep 1; ypset f16srv freebsd9# ypwhich f16srv ypbind's -m is: -m Cause ypbind to use a 'many-cast' rather than a broadcast for choosing a server from the restricted mode server list. In many- cast mode, ypbind will transmit directly to the YPPROC_DOMAIN_NONACK procedure of the servers specified in the restricted list and bind to the server that responds the fastest. This mode of operation is useful for NIS clients on remote sub- nets where no local NIS servers are available. The -m flag can only be used in conjunction with the -S flag above (if used with- out the -S flag, it has no effect). ypset does this: The ypset utility tells the ypbind(8) process on the current machine which YP server process to communicate with. I also logged the server side: f16srv # ypserv -d when running 'ypbind' on freebsd9: ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. when running 'ypbind -S my.yp.lan,f16srv -m' on freebsd9: ypproc_domain_nonack("my.yp.lan") [From: 192.168.0.3:795] connect from 192.168.0.3 -> OK. when running 'ypbind -ypset; sleep 1; ypset f16srv' on freebsd9: ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:1010] connect from 127.0.0.1 -> OK. ypproc_domain("my.yp.lan") [From: 192.168.0.3:905] connect from 192.168.0.3 -> Ok. So the question is, why does it *not* work when FreeBSD's ypbind is in 'broadcast' mode?
(In reply to comment #0) > ypproc_domain_nonack("my.yp.lan") [From: 127.0.0.1:797] > connect from 127.0.0.1 > -> OK. > ... > Why is ypserv seeing requests from localhost??! There's no ypbind running on > f16srv. It's because ypbind sends broadcast message to all local machines, where rpcbind tests if ypserv is running, which is actually the call above. That's fine. (In reply to comment #6) > So the question is, why does it *not* work when FreeBSD's ypbind is in > 'broadcast' mode? I've looked into it a bit and it turned out that even current Fedora's NIS client doesn't work in broadcast mode. So I tried to downgrade rpcbind together with libtirpc and this is result: ypbind in broadcast mode *works* with the following older builds on the NIS server: $ rpm -q libtirpc rpcbind libtirpc-0.2.2-0.fc16.x86_64 rpcbind-0.2.0-11.fc16.x86_64 But ypbind in broadcast mode *doesn't work* with current builds on the NIS server: $ rpm -q libtirpc rpcbind libtirpc-0.2.2-1.1.fc16.x86_64 rpcbind-0.2.0-15.fc16.x86_64 Since rpcbind was only converted from SysV init to systemd, I suspect libtirpc to be the problem. There is a large patch porting changes from libtirpc-0.2.3-rc1. Though, current rpcbind-0.2.0-15.fc16.x86_64 doesn't work with older libtirpc-0.2.2-0.fc16.x86_64 (rpcbind segfaults at least in debug mode). How to reproduce: 1. install, configure and run ypserv on NIS server 2. install ypbind on NIS client 3. turn off firewall on client and server 4. set domainname on NIS client according to server (domainname "mydomain") 5. run ypbind -d -broadcast Actual results: # ypbind -d -broadcast ... 6718: add_server() domain: mydomain, broadcast 6718: do_broadcast() for domain 'mydomain' is called 6718: broadcast: RPC: Timed out. 6718: leave do_broadcast() for domain 'mydomain' ... [nis-client] $ ypwhich ypwhich: Can't communicate with ypbind Expected results: ... 6723: add_server() domain: mydomain, broadcast 6723: do_broadcast() for domain 'mydomain' is called 6723: Answer for domain 'mydomain' from server 'f16-x64-nis-server' 6723: leave do_broadcast() for domain 'mydomain' ... [nis-client] $ ypwhich f16-x64-nis-server Additional info: According to tcpdump output there is no UDP response from server's rpcbind to client's broadcast request when using current rpcbind-0.2.0-15.fc16.x86_64 and libtirpc-0.2.2-1.1.fc16.x86_64.
Any updates on this one?
I've just ran into the same problem on CentOS 6.2 (RHEL 6.2). Broadcast connections from a CentOS 5 client do not work, but when I specify the server name in /etc/yp.conf, ypbind binds to the NIS domain. rpcbind-0.2.0.8.el6 libtirpc-0.2.1.5.el6 ypserv-2.19.22.el6 I don't see any bugs reported for this against RHEL 6 or CentOS 6.
Hi All we have just migrated our NIS server to CentOS 6.2 (from 5.7) and now all our mac os x clients that did NIS logins can no longer do so. I'm in the process of installing BSD on a real (not virtual) machine to see is there is the same behaviour.
Ok it would seem FreeBSD is the same. We have two domains one is CentOS 5.7 one 6.2. config NIS for domain1 works config for domain2 doesn't, all other clients apart from MacOSX and FreeBSD work
On 04/23/2012 01:28 PM, Steve Dickson wrote: > I have a feeling broadcast mode has been broken a long time... I seem > remember when I maintain the code, broadcasts didn't work... Last working builds I found are: libtirpc-0.2.2-1.1.fc16.x86_64 rpcbind-0.2.0-15.fc16.x86_64 ...so it seems to work the last time almost a year back. > Does turning on debug for rpcbind (-d) show the broadcast reaching rpcbind? > I just took a quick look at the code and its not clear if rpcbind is > listening for broadcasts or not... I think it reaches rpcbind, but it doesn't send a respond. This is a debug output of rpcbind related to the ypbind broadcast request: poll returned read fds < 6 > pmap_rmtcall callit req for (100004, 2, 2, udp) from 192.168.122.42.133.220 : found at uaddr 0.0.0.0.3.121 addrmerge(caller, 0.0.0.0.3.121, NULL, udp addrmerge: hint 127.0.0.1.0.111 addrmerge: returning 127.0.0.1.3.121 addrmerge(caller, 0.0.0.0.3.121, NULL, udp addrmerge: hint 192.168.122.42.133.220 addrmerge: returning 192.168.122.223.3.121 merged uaddr 192.168.122.223.3.121 rpcbproc_callit_com: original XID 62c39783, new XID e55bc6c0 svc_maxfd now 11 polling for read on fd < 5 6 7 8 9 10 11 > poll returned read fds < 7 > my_svc_run: polled on forwarding fd 7, netid udp - calling handle_reply handle_reply: reply xid: -446970176 fi addr: 0x7f6977fdae00 polling for read on fd < 5 6 7 8 9 10 11 >
(In reply to comment #12) > On 04/23/2012 01:28 PM, Steve Dickson wrote: > > I have a feeling broadcast mode has been broken a long time... I seem > > remember when I maintain the code, broadcasts didn't work... > > Last working builds I found are: > libtirpc-0.2.2-1.1.fc16.x86_64 > rpcbind-0.2.0-15.fc16.x86_64 These are the builds that are currently in f16, at least on my updated box... $ rpm -q libtirpc rpcbind libtirpc-0.2.2-1.1.fc16.x86_64 rpcbind-0.2.0-15.fc16.x86_64 But there has be some churn in both packages libtirpc-0.2.2-0.fc16 to libtirpc-0.2.2-1.1.fc16 (http://koji.fedoraproject.org/koji/buildinfo?buildID=254334) rpcbind-0.2.0-11.fc16 - rpcbind-0.2.0-15.fc16 (http://koji.fedoraproject.org/koji/buildinfo?buildID=263222) So something definitely could have broken... You you mind looking back to see which version things did work in? > ...so it seems to work the last time almost a year back. > > > Does turning on debug for rpcbind (-d) show the broadcast reaching rpcbind? > > I just took a quick look at the code and its not clear if rpcbind is > > listening for broadcasts or not... > > I think it reaches rpcbind, but it doesn't send a respond. This is a debug > output of rpcbind related to the ypbind broadcast request: > > poll returned read fds < 6 > > pmap_rmtcall callit req for (100004, 2, 2, udp) from 192.168.122.42.133.220 : This means the call got there... > found at uaddr 0.0.0.0.3.121 This means "something" was found. > addrmerge(caller, 0.0.0.0.3.121, NULL, udp addrmerge finds a server address that can be used by `caller' to contact the local service specified by `serv_uaddr' (0.0.0.0.3.121) " > addrmerge: hint 127.0.0.1.0.111 > addrmerge: returning 127.0.0.1.3.121 > addrmerge(caller, 0.0.0.0.3.121, NULL, udp > addrmerge: hint 192.168.122.42.133.220 > addrmerge: returning 192.168.122.223.3.121 > merged uaddr 192.168.122.223.3.121 This means something was found. Does the 192.168.122.223 IP meaningful? > rpcbproc_callit_com: original XID 62c39783, new XID e55bc6c0 This means a call to 192.168.122.223 is being set up > svc_maxfd now 11 The lack of errors at this point means the call was probably successful > polling for read on fd < 5 6 7 8 9 10 11 > > poll returned read fds < 7 > This means another call came in... > my_svc_run: polled on forwarding fd 7, netid udp - calling handle_reply This means its replay to a previous call... > handle_reply: reply xid: -446970176 fi addr: 0x7f6977fdae00 This means the reaply was found and the tirpc routine svc_sendreply() was called... unfortunately the return value of svc_sendreply() is not checked... > polling for read on fd < 5 6 7 8 9 10 11 > This means rpcbind is waiting for another message... At least from the rpcbind stand point, the message was received and sent.... Just curious what that network trace of 'tshark host 192.168.122.223' shows any traffic (assuming 192.168.122.22) is meaningful...
(In reply to comment #13) > (In reply to comment #12) > > Last working builds I found are: > > libtirpc-0.2.2-1.1.fc16.x86_64 > > rpcbind-0.2.0-15.fc16.x86_64 > These are the builds that are currently in f16, at least on my updated box... > $ rpm -q libtirpc rpcbind > libtirpc-0.2.2-1.1.fc16.x86_64 > rpcbind-0.2.0-15.fc16.x86_64 > > But there has be some churn in both packages > libtirpc-0.2.2-0.fc16 to libtirpc-0.2.2-1.1.fc16 > (http://koji.fedoraproject.org/koji/buildinfo?buildID=254334) > > rpcbind-0.2.0-11.fc16 - rpcbind-0.2.0-15.fc16 > (http://koji.fedoraproject.org/koji/buildinfo?buildID=263222) > > So something definitely could have broken... You you mind looking > back to see which version things did work in? Sorry, I made a mistake. The last working builds I found were: $ rpm -q libtirpc rpcbind libtirpc-0.2.2-0.fc16.x86_64 rpcbind-0.2.0-11.fc16.x86_64 ...so generally the builds before the churn you mentioned. > > addrmerge: hint 127.0.0.1.0.111 > > addrmerge: returning 127.0.0.1.3.121 > > addrmerge(caller, 0.0.0.0.3.121, NULL, udp > > addrmerge: hint 192.168.122.42.133.220 > > addrmerge: returning 192.168.122.223.3.121 > > merged uaddr 192.168.122.223.3.121 > This means something was found. Does the 192.168.122.223 IP meaningful? 192.168.122.223 is the server where rpcbind + ypserv are running, 192.168.122.42 is a client, where ypbind -broadcast is running. So that seems correct to me. > Just curious what that network trace of 'tshark host 192.168.122.223' > shows any traffic (assuming 192.168.122.22) is meaningful... Using current builds tshark shows only the following incoming traffic: 0.000000 192.168.122.42 -> 192.168.122.255 Portmap 162 V2 CALLIT Call After I downgraded to libtirpc-0.2.2-0.fc16.x86_64 and rpcbind-0.2.0-11.fc16.x86_64, tshark shows: 6.297871 192.168.122.42 -> 192.168.122.255 Portmap 162 V2 CALLIT Call 6.298537 192.168.122.223 -> 192.168.122.42 UDP 78 Source port: multiling-http Destination port: 54231 6.304615 192.168.122.42 -> 192.168.122.255 Portmap 162 V2 CALLIT Call 6.305191 192.168.122.223 -> 192.168.122.42 UDP 78 Source port: multiling-http Destination port: 49289
Created attachment 579877 [details] working workaround I've played a bit with rpcbind and libtirpc and found some interesting things: First, current rpmbind-0.2.0-15.fc16 from koji segfaults with older libtirpc-0.2.2-0.fc16 or libtirpc-0.2.1-6.fc15. But it works fine if I rebuild the same rpcbind against the same older libtirpc on my own. There is no change in soname, but I'd say there should be, since the segfault indicates some ABI incompatibility to me. Consider some minor soname bump, please. Second, I tried to debug the communication and found the problem was really in svc_sendreply, which in turn called svc_dg_reply. Some changes related to authentication were made in svc_dg_reply, so I reverted some changes and the attached patch is a minimum change, which works fine. I won't do more investigating, since I don't understand the authentication stuff at all. Please, take a look at the changes in svc_dg_reply made from libtirpc-0.2.1-6 until now.
Thank you for your excellent debugging! I'm travelling today but I will get back to this asap...
I confirm that the patch attachment 579877 [details] fixes all problems with (rpcbind-mediated) RPC broadcasts. I have made some fixed packages available here: http://rpm.fifi.org/f17-fifi/i386/libtirpc-0.2.2-2.1.0.0.1.fif17.i686.rpm http://rpm.fifi.org/f17-fifi/i386/libtirpc-devel-0.2.2-2.1.0.0.1.fif17.i686.rpm http://rpm.fifi.org/f17-fifi/x86_64/libtirpc-0.2.2-2.1.0.0.1.fif17.x86_64.rpm http://rpm.fifi.org/f17-fifi/x86_64/libtirpc-devel-0.2.2-2.1.0.0.1.fif17.x86_64.rpm Phil.
*** Bug 732327 has been marked as a duplicate of this bug. ***
The same patch could be applied to libtirpc-0.2.1-5.el6 and it worked for me on RHEL 6.3 x86_64.
Encountered this issue just now migrating our CentOS 5 NIS servers to CentOS 6. All our OS X clients were unable to bind to NIS. With the patch in comment 17 applied to libtirpc-0.2.1-5.el6, everything appears to be resolved.
Any hope of getting this patch pushed to RHEL 6 and Fedora 16,17,18? Our OS X clients are hopelessly broken and we are hoping it will fix rup and rusers as described in bug 732327.
It has been a month since Elliott asked, so I'll nudge again: Any hope of getting this patch pushed to RHEL 6?
(In reply to comment #22) > It has been a month since Elliott asked, so I'll nudge again: Any hope of > getting this patch pushed to RHEL 6? Fortunately yes, this issue should be fixed by bug #864056, which is going to be fixed in RHEL-6.4.
And there are also updates for Fedora: https://admin.fedoraproject.org/updates/FEDORA-2012-16150/rpcbind-0.2.0-19.fc17 This should be fixed by bug #869365 for Fedora, so I'm closing this as a duplicate. Feel free to re-open if I'm mistaken. *** This bug has been marked as a duplicate of bug 869365 ***