Red Hat Bugzilla – Bug 1361339
[RFE] improve domain detection method with a more robust one in the bootstrap.py
Last modified: 2017-05-01 09:53:35 EDT
Description of problem: I installed a fresh Sat6.2 to experiment, since it was release today/yesterday. I tried to register the first system, with the new bootstrap.py program but it failed when querying via API a wrong domain. Version-Release number of selected component (if applicable): 6.2.0 How reproducible: always (but requires a typo) Steps to Reproduce: [root@rhevm1 ~]# ./bootstrap.py -l admin -s testsat3.example.com -o 'FakeCorp' -L 'Central' -a RHEL6-RHEV-M -g RHEL6-RHEV-M admin's password: Foreman Bootstrap Script This script is designed to register new systems or to migrate an existing system to a Foreman server with Katello [NOTIFICATION], [2016-07-28 14:55:27], [This system is not registered to RHN. Attempting to register via subscription-manager] [NOTIFICATION], [2016-07-28 14:55:27], [Retrieving Client CA Certificate RPMs] [RUNNING], [2016-07-28 14:55:27], [rpm -Uvh http://testsat3.example.com/pub/katello-ca-consumer-latest.noarch.rpm] Recuperando http://testsat3.example.com/pub/katello-ca-consumer-latest.noarch.rpm Preparando... ################################################## katello-ca-consumer-testsat3################################################## [SUCCESS], [2016-07-28 14:55:30], [rpm -Uvh http://testsat3.example.com/pub/katello-ca-consumer-latest.noarch.rpm], completed successfully. [ERROR], [2016-07-28 14:55:31], EXITING: [0 element in array for search key 'name="exmaple.com"' in API '/api/v2/domains'. Please note that all searches are case-sensitive. Fatal error.] failed to execute properly. Wow, i got confused by this: name="exmaple.com" I checked and rechecked Sat 6.2 instance, thinking i had a typo there somewhere. But no, all was ok in the Sat6.2. So i modified the bootstrap.py and adding some print() statements as a debug measure. So i traced it to "FQDN = socket.getfqdn()" in the bootstrap.py program. And then read: https://github.com/ansible/ansible/issues/9972 The bootstrap error message is confusing, because: * It already logged in to the Sat6.2 instance, and can get the domains list via API. * Since i specified a Host Group in the bootstrap, and such Host Group has a both correct DOMAIN and SUBNET specified. So i did not expect for the bootstrap to go and start prospecting it outside of the Sat6.2 instance. (and get a wrong value). * I was not aware of BZ#1343585, the domain must exist. Now i am. (but in this case the domain exists in Sat6.2 anyway). * It does not request it explicitly. (i am not claiming that it should) So i went to the system that i am trying to register: [root@rhevm1 ~]# hostname rhevm1.example.com [root@rhevm1 ~]# hostname -f rhevm1.example.com [root@rhevm1 etc]# hostname -d example.com The hostname is ok, so let's test DNS resolution: [root@rhevm1 ~]# dig testsat3.example.com ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.5 <<>> testsat3.example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36896 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;testsat3.example.com. IN A ;; ANSWER SECTION: testsat3.example.com. 0 IN A 192.168.207.93 ;; Query time: 0 msec ;; SERVER: 192.168.207.25#53(192.168.207.25) ;; WHEN: Thu Jul 28 16:12:36 2016 ;; MSG SIZE rcvd: 54 # dig rhevm1.example.com ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.5 <<>> rhevm1.example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36379 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;rhevm1.example.com. IN A ;; ANSWER SECTION: rhevm1.example.com. 0 IN A 192.168.207.47 ;; Query time: 0 msec ;; SERVER: 192.168.207.25#53(192.168.207.25) ;; WHEN: Thu Jul 28 16:15:26 2016 ;; MSG SIZE rcvd: 52 Well, the DNS resolution do work. Even uname reports the correct nodename. [root@rhevm1 ~]# uname -a Linux rhevm1.example.com 2.6.32-573.12.1.el6.x86_64 #1 SMP Mon Nov 23 12:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux [root@rhevm1 ~]# uname -n rhevm1.example.com Even the config files are ok: [root@rhevm1 ~]# cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=rhevm1.example.com In the end i found the typo: [root@rhevm1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.207.47 rhevm1.exmaple.com rhevm1 192.168.207.46 hyper1.example.com Well, i never noticed it because everything (RHEV-M, CFME, other Sat6.1) were using DNS request and were getting the correct FQDN. The interesting thing is that "rhevm1" is neither the hostname nor the FQDN of the system. (it is the output of 'hostname -s', though). So i am puzzled by the heuristic/logic that "FQDN = socket.getfqdn()" is using to purposely avoid the hostname and FQDN and hit an alias. "rhevm1.example.com" is the output of both 'hostname' and 'hostname -f' and also 'uname -a' and 'uname -n' and the config files. "rhevm1.exmaple.com" is NOT the output of neither 'hostname' nor 'hostname -f' and also is not the output of 'uname -a' nor 'uname -n' and is not in the config files. Did bootstrap ever checked whatever it thinked the fqdn was with a dns request? ('hostname -d' returns the correct domain) Granted, the hosts file is wrong (so i opened this as an RFE and not a bug), but can the reported error message be improved somehow? I think that a proper DNS sanity check request should be performed, or maybe use something else than "socket.getfqdn()" Is not better to use some other command instead of "socket.getfqdn()", are there more robust alternatives? (i mean, that are able to catch that or something worse?) Actual results: Not very clear error message from the bootstrap script, it seems more like a Sat6.2 side than from the client. Expected results: A robust method to get the FQDN of the system to be registered. Additional info: I tried a quick lockup in the documentation "Administration guide" and "installation guide" but i found no references to the bootstrap.py. In the "Administration Guide" the old method is still listed. I opened both pdf and searched for the string 'bootstrap' and found no results. Is it possible that the usage of the bootstrap.py script is not currently documented or i just missed it? If it is already documented a note emphasizing to check the hosts file for typos probably should be added. (Just testing proper DNS resoultion would not be enough). Cheers.
Fixed in this upstream commit - https://github.com/Katello/katello-client-bootstrap/commit/49ec1ae44a7463f5f615ad6d59fccd21d9da6bda
Please add verifications steps for this bug to help QE verify
Verification steps for this bug are the same as those here - https://bugzilla.redhat.com/show_bug.cgi?id=1425606#c4
We may want to rebase to katello-client-bootstrap-1.3.0 (https://github.com/Katello/katello-client-bootstrap/releases/tag/1.3.0) to address this.
Verified in Satellite 6.2.9 Snap 2 The script now catches the short hostname, and a short name fqdn. -bash-4.1# docker run -it -h shawty ch-d:bootstrap /bin/bash [root@shawty ~]# ./bootstrap.py -s mgmt5.rhq.lab.eng.bos.redhat.com -o 'Default Organization' -g basic -a basickey -L 'Default Location' Foreman Bootstrap Script This script is designed to register new systems or to migrate an existing system to a Foreman server with Katello We could not determine the domain of this machine, most probably `hostname -f` does not return the FQDN. This can lead to Puppet missbehaviour and thus the script will terminate now. You can override this by passing one of the following --force - to disable all checking --skip-puppet - to omit installing the puppet agent [root@shawty ~]# hostname shawty [root@shawty ~]# ./bootstrap.py -s mgmt5.rhq.lab.eng.bos.redhat.com -o 'Default Organization' -g basic -a basickey -L 'Default Location' --fqdn $(hostname) Foreman Bootstrap Script This script is designed to register new systems or to migrate an existing system to a Foreman server with Katello We could not determine the domain of this machine, most probably `hostname -f` does not return the FQDN. This can lead to Puppet missbehaviour and thus the script will terminate now. You can override this by passing one of the following --force - to disable all checking --skip-puppet - to omit installing the puppet agent
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1191