Description of problem: While installing undercloud with FreeIPA, deployment has been failed with the below error: TASK [Request kerberos keytab] ***************************************************************************************************************************************************** Friday 19 March 2021 19:23:42 -0400 (0:00:00.286) 0:01:58.254 ********** fatal: [osp16]: FAILED! => {"changed": true, "cmd": "/usr/bin/kinit -kt /etc/krb5.keytab && ipa-getkeytab -s $(grep xmlrpc_uri /etc/ipa/default.conf | cut -d/ -f3) -p nova/osp16.5 c5s.local -k /etc/novajoin/krb5.keytab", "delta": "0:00:00.356700", "end": "2021-03-19 19:23:43.534849", "msg": "non-zero return code", "rc": 11, "start": "2021-03-19 19:23:43.1781 49", "stderr": "Failed to add key to the keytab", "stderr_lines": ["Failed to add key to the keytab"], "stdout": "", "stdout_lines": []} After debugging with redhat support, the root case is, /etc/novajoin directory node created in Undercloud. After creating novajoin directory manually. I did the below changed to complete the undercloud install [stack@osp16 etc]$ sudo mkdir novajoin [stack@osp16 etc]$ sudo cp -r krb5.keytab novajoin/ [stack@osp16 etc]$ chmod 777 novajoin chmod: changing permissions of 'novajoin': Operation not permitted [stack@osp16 etc]$ sudo chmod 777 novajoin [stack@osp16 etc]$ Version-Release number of selected component (if applicable): How reproducible: Create Undercloud with FreeIPA Please refer to support case: https://access.redhat.com/support/cases/#/case/02897947 https://access.redhat.com/support/cases/#/case/02791164 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
for reference, those steps are performed here: https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/tls/undercloud-tls.yaml#L81
Actually, the steps in https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/tls/undercloud-tls.yaml#L81 are those of a service to simply enroll the undercloud in IPA. For novajoin deployments, we actually end up using - https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/nova/novajoin-container-puppet.yaml#L260 which does not have a command to create the directory. This is very strange though. We've never needed this before, but that might be because we always installed the novajoin rpm which does provide this file. Maybe with containers though ..
I do see the undercloud installation completes after creating the novajoin directory manually as mentioned in description. But the novajoin containers are continuously restarting. === [stack@osp16 ~]$ sudo podman ps | grep novajo [stack@osp16 ~]$ sudo podman ps | grep novajo 30cf0ddd698c osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-notifier:16.1.3 kolla_start 3 days ago Up 1 second ago novajoin_notifier 0f85925a9d46 osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-server:16.1.3 kolla_start 3 days ago Up 1 second ago novajoin_server [stack@osp16 ~]$ [stack@osp16 ~]$ [stack@osp16 ~]$ sudo podman ps | grep novajo 30cf0ddd698c osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-notifier:16.1.3 kolla_start 4 days ago Up Less than a second ago novajoin_notifier 0f85925a9d46 osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-server:16.1.3 kolla_start 4 days ago Up Less than a second ago novajoin_server [stack@osp16 ~]$ ==== he undercloud /etc/hosts have the freeipa ip hostname map already present. # BEGIN ANSIBLE MANAGED BLOCK 192.168.24.1 osp16.5c5s.local osp16 192.168.24.1 osp16.external.5c5s.local osp16.external 192.168.24.1 osp16.ctlplane.5c5s.local osp16.ctlplane # END ANSIBLE MANAGED BLOCK 192.168.24.252 freeipa.5c5s.local 10.87.86.51 5c5s45-satellite.englab.juniper.net 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ==== While looking for podman logs -f from novajoin_server container. I can see this error: ++ . /usr/local/bin/kolla_novajoin_extend_start + echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\''' Running command: 'novajoin-server --config-file /etc/novajoin/join.conf' + exec novajoin-server --config-file /etc/novajoin/join.conf + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/novajoin/join.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf INFO:__main__:Writing out command to execute ++ cat /run_command + CMD='novajoin-server --config-file /etc/novajoin/join.conf' + ARGS= + sudo kolla_copy_cacerts + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/novajoin ]] +++ stat -c %a /var/log/kolla/novajoin ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/novajoin Running command: 'novajoin-server --config-file /etc/novajoin/join.conf' ++ . /usr/local/bin/kolla_novajoin_extend_start + echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\''' + exec novajoin-server --config-file /etc/novajoin/join.conf + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/novajoin/join.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf INFO:__main__:Writing out command to execute ====== This is the logs from novajoin_notifier container: ========================= INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/novajoin/join.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf INFO:__main__:Writing out command to execute ++ cat /run_command + CMD='novajoin-notify --config-file /etc/novajoin/join.conf' + ARGS= + sudo kolla_copy_cacerts + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/novajoin ]] +++ stat -c %a /var/log/kolla/novajoin ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/novajoin ++ . /usr/local/bin/kolla_novajoin_extend_start + echo 'Running command: '\''novajoin-notify --config-file /etc/novajoin/join.conf'\''' Running command: 'novajoin-notify --config-file /etc/novajoin/join.conf' + exec novajoin-notify --config-file /etc/novajoin/join.conf Traceback (most recent call last): File "/usr/bin/novajoin-notify", line 6, in <module> from novajoin.notifications import main File "/usr/lib/python3.6/site-packages/novajoin/notifications.py", line 33, in <module> from novajoin import join File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 78, in <module> class JoinController(Controller): File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 80, in JoinController def __init__(self, ipaclient=IPAClient()): File "/usr/lib/python3.6/site-packages/novajoin/ipa.py", line 79, in __init__ CONF.keytab, self.ccache) File "/usr/lib/python3.6/site-packages/ipalib/install/kinit.py", line 47, in kinit_keytab cred = gssapi.Credentials(name=name, store=store, usage='initiate') File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 64, in __new__ store=store) File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 148, in acquire usage) File "gssapi/raw/ext_cred_store.pyx", line 182, in gssapi.raw.ext_cred_store.acquire_cred_from gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (2529639107): No credentials cache found + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file Thanks, Shaju
I am working on a support case: https://access.redhat.com/support/cases/#/case/02897947 for a solution for this issue. Appreciate any help to solve this problem. Thanks, Shaju
Shaju, Is the undercloud DNS pointing to the IPA server?
Yes, it is pointed to freeIPA IPA. [stack@osp16 ~]$ cat /etc/resolv.conf # Generated by NetworkManager search 5c5s.local nameserver 192.168.24.252 == [stack@osp16 ~]$ nslookup 192.168.24.252 252.24.168.192.in-addr.arpa name = freeipa.5c5s.local. ===== [stack@osp16 ~]$ kinit admin Password for admin: [stack@osp16 ~]$ ipa host-find ipa: ERROR: cannot connect to 'https://freeipa.5c5s.local/ipa/json': [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) [stack@osp16 ~]$ ===== From FreeIPA: ============== [root@freeipa ~]# kinit admin Password for admin: [root@freeipa ~]# ipa host-find --------------- 2 hosts matched --------------- Host name: freeipa.5c5s.local Principal name: host/freeipa.5c5s.local Principal alias: host/freeipa.5c5s.local SSH public key fingerprint: SHA256:QF3LoLSeldC0EWmFw/veIyRPG01Ach2phs9DG6wRYy4 (ssh-ed25519), SHA256:u5/2zF2/Grr2E0s9Bs2KxGgSWVyXyhK2k+t6XQo9zJ8 (ecdsa- sha2-nistp256), SHA256:yRSMAwJb7Mu0GguWRq7EqM4Y1l42GmHxv3HRVCq1bmE (ssh-rsa) Host name: osp16.5c5s.local Description: Undercloud host ---------------------------- Number of entries returned 2 ---------------------------- [root@freeipa ~]# Thanks, Shaju
While looking for podman logs -f from novajoin_server container. I can see this error: ++ . /usr/local/bin/kolla_novajoin_extend_start + echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\''' Running command: 'novajoin-server --config-file /etc/novajoin/join.conf' + exec novajoin-server --config-file /etc/novajoin/join.conf + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/novajoin/join.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf INFO:__main__:Writing out command to execute ++ cat /run_command + CMD='novajoin-server --config-file /etc/novajoin/join.conf' + ARGS= + sudo kolla_copy_cacerts + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/novajoin ]] +++ stat -c %a /var/log/kolla/novajoin ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/novajoin Running command: 'novajoin-server --config-file /etc/novajoin/join.conf' ++ . /usr/local/bin/kolla_novajoin_extend_start + echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\''' + exec novajoin-server --config-file /etc/novajoin/join.conf + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/novajoin/join.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf INFO:__main__:Writing out command to execute ====== This is the logs from novajoin_notifier container: ========================= INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/novajoin/join.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf INFO:__main__:Writing out command to execute ++ cat /run_command + CMD='novajoin-notify --config-file /etc/novajoin/join.conf' + ARGS= + sudo kolla_copy_cacerts + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/novajoin ]] +++ stat -c %a /var/log/kolla/novajoin ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/novajoin ++ . /usr/local/bin/kolla_novajoin_extend_start + echo 'Running command: '\''novajoin-notify --config-file /etc/novajoin/join.conf'\''' Running command: 'novajoin-notify --config-file /etc/novajoin/join.conf' + exec novajoin-notify --config-file /etc/novajoin/join.conf Traceback (most recent call last): File "/usr/bin/novajoin-notify", line 6, in <module> from novajoin.notifications import main File "/usr/lib/python3.6/site-packages/novajoin/notifications.py", line 33, in <module> from novajoin import join File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 78, in <module> class JoinController(Controller): File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 80, in JoinController def __init__(self, ipaclient=IPAClient()): File "/usr/lib/python3.6/site-packages/novajoin/ipa.py", line 79, in __init__ CONF.keytab, self.ccache) File "/usr/lib/python3.6/site-packages/ipalib/install/kinit.py", line 47, in kinit_keytab cred = gssapi.Credentials(name=name, store=store, usage='initiate') File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 64, in __new__ store=store) File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 148, in acquire usage) File "gssapi/raw/ext_cred_store.pyx", line 182, in gssapi.raw.ext_cred_store.acquire_cred_from gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (2529639107): No credentials cache found + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file
Hi Team, Juniper release has been scheduled to deliver in two days. Our joint customers cannot do an upgrade to the latest juniper version due to this bug. An immediate workaround or solution really helps us to deliver the release on time. Thanks, Shaju
The error message above .. [stack@osp16 ~]$ kinit admin Password for admin: [stack@osp16 ~]$ ipa host-find ipa: ERROR: cannot connect to 'https://freeipa.5c5s.local/ipa/json': [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897) [stack@osp16 ~]$ seems to indicate that the CA cert for the IPA server is not trusted. This should have been set up correctly as part of the ipa-client-install process, which makes me wonder what else was not set up correctly. Looking at the ipa-client-install log in /var/log might tell us more. To get more information on whats going on with kerberos, try running the following both on the undercloud, and in the containers: KRB5_TRACE=/dev/stdout kinit admin
Hi Ade Lee, Thanks for checking this case. Please find the requested details: [stack@osp16 ~]$ KRB5_TRACE=/dev/stdout kinit admin [835087] 1619713996.412366: Getting initial credentials for admin [835087] 1619713996.412368: Sending unauthenticated request [835087] 1619713996.412369: Sending request (181 bytes) to 5C5S.LOCAL [835087] 1619713996.412370: Sending DNS URI query for _kerberos.5C5S.LOCAL. [835087] 1619713996.412371: No URI records found [835087] 1619713996.412372: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL. [835087] 1619713996.412373: SRV answer: 0 100 88 "freeipa.5c5s.local." [835087] 1619713996.412374: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL. [835087] 1619713996.412375: SRV answer: 0 100 88 "freeipa.5c5s.local." [835087] 1619713996.412376: Resolving hostname freeipa.5c5s.local. [835087] 1619713996.412377: Resolving hostname freeipa.5c5s.local. [835087] 1619713996.412378: Initiating TCP connection to stream 192.168.24.252:88 [835087] 1619713996.412379: Sending TCP request to stream 192.168.24.252:88 [835087] 1619713996.412380: Received answer (499 bytes) from stream 192.168.24.252:88 [835087] 1619713996.412381: Terminating TCP connection to stream 192.168.24.252:88 [835087] 1619713996.412382: Sending DNS URI query for _kerberos.5C5S.LOCAL. [835087] 1619713996.412383: No URI records found [835087] 1619713996.412384: Sending DNS SRV query for _kerberos-master._tcp.5C5S.LOCAL. [835087] 1619713996.412385: SRV answer: 0 100 88 "freeipa.5c5s.local." [835087] 1619713996.412386: Response was from master KDC [835087] 1619713996.412387: Received error from KDC: -1765328359/Additional pre-authentication required [835087] 1619713996.412390: Preauthenticating using KDC method data [835087] 1619713996.412391: Processing preauth types: PA-PK-AS-REQ (16), PA-FX-FAST (136), PA-ETYPE-INFO2 (19), PA-PKINIT-KX (147), PA-SPAKE (151), PA-ENC-TIMESTAMP (2), PA_AS_FRESHNESS (150), PA-FX-COOKIE (133) [835087] 1619713996.412392: Selected etype info: etype aes256-cts, salt "m^(2D9+F]HS=MNj$", params "" [835087] 1619713996.412393: Received cookie: MIT1\x00\x00\x00\x01Xf\xb7+\xc8&\x0e.7\xab\xa6\x85k7\x1b\x0e@F\x10;\xac8\x9eq5/K(\x0eiO\x06\xbf\x16\xc9\xfbpk\xc3\xe1sN\x90E\xa8<G<\x98\xc9\x8c\x88d\x17W\x14\x031\xae\xed:\x1e\x98ui8\xa4\xc0L\xb7(c\xd3\xc5\xa4h\xad\x1d\x02%\xa9\xab\xda\xd2Br\xc2T\xf5\xb9\x8bE\xe6Xu\xf1\x9a\xe7\xf6\xf8&\xad\x1b\xd9\xd9\xd5\x10\xd9\x83\xf0\xb8\x9e\xe1\xbd5\x9e\x15\x06I\xde\x87+\xf5\xfc\x07Q\xaa\xd5\x18M [835087] 1619713996.412394: SPAKE challenge received with group 1, pubkey 05118BCEE6532A4E90664749D4C615AE546F54572D58452160550AF9F822F5E4 Password for admin: [835087] 1619714003.560984: SPAKE key generated with pubkey 83E37DE1417D963F31963EB3AB8A677ACFABFC7B034F2061756E6A2616CFA9B3 [835087] 1619714003.560985: SPAKE algorithm result: EA8A8BDE28E3C1D3BDB807B1863632A42642CB2506557AAD064A6D807F72F7A9 [835087] 1619714003.560986: SPAKE final transcript hash: 413708ABA23EE8DFFF0DB136C86E794DA2830D073C4D87BC8869DCC99690DD75 [835087] 1619714003.560987: Sending SPAKE response [835087] 1619714003.560988: Preauth module spake (151) (real) returned: 0/Success [835087] 1619714003.560989: Produced preauth for next request: PA-FX-COOKIE (133), PA-SPAKE (151) [835087] 1619714003.560990: Sending request (440 bytes) to 5C5S.LOCAL [835087] 1619714003.560991: Sending DNS URI query for _kerberos.5C5S.LOCAL. [835087] 1619714003.560992: No URI records found [835087] 1619714003.560993: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL. [835087] 1619714003.560994: SRV answer: 0 100 88 "freeipa.5c5s.local." [835087] 1619714003.560995: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL. [835087] 1619714003.560996: SRV answer: 0 100 88 "freeipa.5c5s.local." [835087] 1619714003.560997: Resolving hostname freeipa.5c5s.local. [835087] 1619714003.560998: Resolving hostname freeipa.5c5s.local. [835087] 1619714003.560999: Initiating TCP connection to stream 192.168.24.252:88 [835087] 1619714003.561000: Sending TCP request to stream 192.168.24.252:88 [835087] 1619714003.561001: Received answer (798 bytes) from stream 192.168.24.252:88 [835087] 1619714003.561002: Terminating TCP connection to stream 192.168.24.252:88 [835087] 1619714003.561003: Sending DNS URI query for _kerberos.5C5S.LOCAL. [835087] 1619714003.561004: No URI records found [835087] 1619714003.561005: Sending DNS SRV query for _kerberos-master._tcp.5C5S.LOCAL. [835087] 1619714003.561006: SRV answer: 0 100 88 "freeipa.5c5s.local." [835087] 1619714003.561007: Response was from master KDC [835087] 1619714003.561008: AS key determined by preauth: aes256-cts/4E72 [835087] 1619714003.561009: Decrypted AS reply; session key is: aes256-cts/F8A6 [835087] 1619714003.561010: FAST negotiation: available [835087] 1619714003.561011: Initializing KCM:1000 with default princ admin [835087] 1619714003.561012: Storing admin -> krbtgt/5C5S.LOCAL in KCM:1000 [835087] 1619714003.561013: Storing config in KCM:1000 for krbtgt/5C5S.LOCAL: fast_avail: yes [835087] 1619714003.561014: Storing admin -> krb5_ccache_conf_data/fast_avail/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in KCM:1000 [835087] 1619714003.561015: Storing config in KCM:1000 for krbtgt/5C5S.LOCAL: pa_type: 151 [835087] 1619714003.561016: Storing admin -> krb5_ccache_conf_data/pa_type/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in KCM:1000 [stack@osp16 ~]$
Great. This shows me what is happening on the undercloud. We also need to see the same in the novajoin container .. This may be tricky , given that the containers are going up and down - but the idea would be to run it sort of like: podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin so we can see what is happening there.
[stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin Error: cannot exec into container that is not running: container state improper [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error [stack@osp16 ~]$ [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin Error: non zero exit code: 137: OCI runtime error [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error [stack@osp16 ~]$ [stack@osp16 ~]$
sudo podman exec -e KRB5_TRACE=/dev/stdout -it novajoin_server kinit admin
Hi Ade Lee, Before I enter the creds, containers are restarted [stack@osp16 ~]$ sudo podman exec -e KRB5_TRACE=/dev/stdout -it novajoin_server kinit admin [19] 1619728928.180664: TXT record _kerberos.osp16.5c5s.local. not found [19] 1619728928.180665: TXT record _kerberos.5c5s.local. found: 5C5S.LOCAL [19] 1619728928.180666: Getting initial credentials for admin [19] 1619728928.180668: Sending unauthenticated request [19] 1619728928.180669: Sending request (181 bytes) to 5C5S.LOCAL [19] 1619728928.180670: Sending DNS URI query for _kerberos.5C5S.LOCAL. [19] 1619728928.180671: No URI records found [19] 1619728928.180672: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL. [19] 1619728928.180673: SRV answer: 0 100 88 "freeipa.5c5s.local." [19] 1619728928.180674: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL. [19] 1619728928.180675: SRV answer: 0 100 88 "freeipa.5c5s.local." [19] 1619728928.180676: Resolving hostname freeipa.5c5s.local. [19] 1619728928.180677: Sending initial UDP request to dgram 192.168.24.252:88 [19] 1619728928.180678: Received answer (499 bytes) from dgram 192.168.24.252:88 [19] 1619728928.180679: Sending DNS URI query for _kerberos.5C5S.LOCAL. [19] 1619728928.180680: No URI records found [19] 1619728928.180681: Sending DNS SRV query for _kerberos-master._udp.5C5S.LOCAL. [19] 1619728928.180682: SRV answer: 0 100 88 "freeipa.5c5s.local." [19] 1619728928.180683: Response was from master KDC [19] 1619728928.180684: Received error from KDC: -1765328359/Additional pre-authentication required [19] 1619728928.180687: Preauthenticating using KDC method data [19] 1619728928.180688: Processing preauth types: PA-PK-AS-REQ (16), PA-FX-FAST (136), PA-ETYPE-INFO2 (19), PA-PKINIT-KX (147), PA-SPAKE (151), PA-ENC-TIMESTAMP (2), PA_AS_FRESHNESS (150), PA-FX-COOKIE (133) [19] 1619728928.180689: Selected etype info: etype aes256-cts, salt "m^(2D9+F]HS=MNj$", params "" [19] 1619728928.180690: Received cookie: MIT1\x00\x00\x00\x01\x0c\x09\xd3>\xa4\x0f\xda\xa3\xf3\xe6u\xefu\x83\xfa\x13\xc0\x02\xe9\xd9&.n\x95x{e\xf9\x01\xa1\x0b\xa7\xcbi\xcd\xd6r\x81\xfc\xeb\x1fj\xb1\xe6\xbcu\xc46\x91\xe1k\xea\xb3\xa4\x8d\xddF\x0ci\xf3S\xe1_\xa8\x98\xc8\xac\x9d\x8d\x09@\x95U\x83JL`^\x06L\xc1T\x0d\x0e\xde\x8ao\xd2\x0e\xd8xL\xd9\x8dl\xda\xb1\xc3\xd0\x9c\x143\x91\xec\x8a\xfb\xee\x05/\xaa\xc7\xec\xbdq\xc4\xc2\x19N\xf7\xd9\x02t\x88\x0c\xe9d\x07\xcbU\xa8 [19] 1619728928.180691: SPAKE challenge received with group 1, pubkey 335A89BFC25031E91D1D19CB848557B9D8FAA663302E94049EB259A0698CB3A5 Password for admin: Error: non zero exit code: 137: OCI runtime error
Looks like i got the output: =========== [stack@osp16 ~]$ sudo podman exec -e KRB5_TRACE=/dev/stdout -it novajoin_server kinit admin [10] 1619729066.19522: TXT record _kerberos.osp16.5c5s.local. not found [10] 1619729066.19523: TXT record _kerberos.5c5s.local. found: 5C5S.LOCAL [10] 1619729066.19524: Getting initial credentials for admin [10] 1619729066.19526: Sending unauthenticated request [10] 1619729066.19527: Sending request (181 bytes) to 5C5S.LOCAL [10] 1619729066.19528: Sending DNS URI query for _kerberos.5C5S.LOCAL. [10] 1619729066.19529: No URI records found [10] 1619729066.19530: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL. [10] 1619729066.19531: SRV answer: 0 100 88 "freeipa.5c5s.local." [10] 1619729066.19532: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL. [10] 1619729066.19533: SRV answer: 0 100 88 "freeipa.5c5s.local." [10] 1619729066.19534: Resolving hostname freeipa.5c5s.local. [10] 1619729066.19535: Sending initial UDP request to dgram 192.168.24.252:88 [10] 1619729066.19536: Received answer (498 bytes) from dgram 192.168.24.252:88 [10] 1619729066.19537: Sending DNS URI query for _kerberos.5C5S.LOCAL. [10] 1619729066.19538: No URI records found [10] 1619729066.19539: Sending DNS SRV query for _kerberos-master._udp.5C5S.LOCAL. [10] 1619729066.19540: SRV answer: 0 100 88 "freeipa.5c5s.local." [10] 1619729066.19541: Response was from master KDC [10] 1619729066.19542: Received error from KDC: -1765328359/Additional pre-authentication required [10] 1619729066.19545: Preauthenticating using KDC method data [10] 1619729066.19546: Processing preauth types: PA-PK-AS-REQ (16), PA-FX-FAST (136), PA-ETYPE-INFO2 (19), PA-PKINIT-KX (147), PA-SPAKE (151), PA-ENC-TIMESTAMP (2), PA_AS_FRESHNESS (150), PA-FX-COOKIE (133) [10] 1619729066.19547: Selected etype info: etype aes256-cts, salt "m^(2D9+F]HS=MNj$", params "" [10] 1619729066.19548: Received cookie: MIT1\x00\x00\x00\x01\x8f7l\xb5>^\xfd\xfa!c\x0f\xc2j4\xac\xdf\x04$\x91\xe7\x96)\xdf\xdbH\x9c\x99\x9b\x8c\x9d\xc0\xa0&$Z[Y\xd6\x85\xfb\xfd\x06h\xb8\xda\xf9\xc2\xc1\x89\xe34\xb8U\xba\x0c\x8f<\xe6_8\xde\x90\xe4\x9b\xeb`\xbe\xbe\xb2\x10\xbf\xd3\x8f\xcf\x9bX\x81\x9b\x09\xdb\xd6\x92w\xee(5%_{Y\x94\x9b\xe4;\xba1\xbb\xce_\x92\xd9\x09X\xc5\x84\xd24\xc8Y\xf9\xa2\xc3\xe1\xa6\xb9\xa1\xa0\x88+\x9an\xb4\x13\x87\x08\xbb\xa3(\xb6\x0a [10] 1619729066.19549: SPAKE challenge received with group 1, pubkey 0ED51B94003BD2699E7398F53E9842AF64A6C6392C7DDA8A22051626FA361109 Password for admin: [10] 1619729066.19550: SPAKE key generated with pubkey 57A1138F16FA4B47FACD1712F0C08B73B1B03CD3DD2CA35CAC45C821F01A96D7 [10] 1619729066.19551: SPAKE algorithm result: 21BAFA536B0BDE7CE155CE310F5A107DD46E23D8B398290054637DF35736CAE7 [10] 1619729066.19552: SPAKE final transcript hash: 9330C092BC0A743982B90A168725F92B88DA785AA056C13724C63EB015116D2B [10] 1619729066.19553: Sending SPAKE response [10] 1619729066.19554: Preauth module spake (151) (real) returned: 0/Success [10] 1619729066.19555: Produced preauth for next request: PA-FX-COOKIE (133), PA-SPAKE (151) [10] 1619729066.19556: Sending request (440 bytes) to 5C5S.LOCAL [10] 1619729066.19557: Sending DNS URI query for _kerberos.5C5S.LOCAL. [10] 1619729066.19558: No URI records found [10] 1619729066.19559: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL. [10] 1619729066.19560: SRV answer: 0 100 88 "freeipa.5c5s.local." [10] 1619729066.19561: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL. [10] 1619729066.19562: SRV answer: 0 100 88 "freeipa.5c5s.local." [10] 1619729066.19563: Resolving hostname freeipa.5c5s.local. [10] 1619729066.19564: Sending initial UDP request to dgram 192.168.24.252:88 [10] 1619729066.19565: Received answer (798 bytes) from dgram 192.168.24.252:88 [10] 1619729066.19566: Sending DNS URI query for _kerberos.5C5S.LOCAL. [10] 1619729066.19567: No URI records found [10] 1619729066.19568: Sending DNS SRV query for _kerberos-master._udp.5C5S.LOCAL. [10] 1619729066.19569: SRV answer: 0 100 88 "freeipa.5c5s.local." [10] 1619729066.19570: Response was from master KDC [10] 1619729066.19571: AS key determined by preauth: aes256-cts/30AC [10] 1619729066.19572: Decrypted AS reply; session key is: aes256-cts/0FBF [10] 1619729066.19573: FAST negotiation: available [10] 1619729066.19574: Initializing FILE:/tmp/krb5cc_0 with default princ admin [10] 1619729066.19575: Storing admin -> krbtgt/5C5S.LOCAL in FILE:/tmp/krb5cc_0 [10] 1619729066.19576: Storing config in FILE:/tmp/krb5cc_0 for krbtgt/5C5S.LOCAL: fast_avail: yes [10] 1619729066.19577: Storing admin -> krb5_ccache_conf_data/fast_avail/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in FILE:/tmp/krb5cc_0 [10] 1619729066.19578: Storing config in FILE:/tmp/krb5cc_0 for krbtgt/5C5S.LOCAL: pa_type: 151 [10] 1619729066.19579: Storing admin -> krb5_ccache_conf_data/pa_type/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in FILE:/tmp/krb5cc_0 [stack@osp16 ~]$
The error above seems to indicate that kerberos was able to get a ticket but was unable to store it. gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (2529639107): No credentials cache found which, given that novajoin is using a memory cache is unexpected. Lets try a couple things. First, we want to set the log level to debug. That is likely to a join.conf file somewhere under /var/lib/config-data/puppet-generated/ There set debug to true. Hopefully, this will provide some logs in the novajoin logs, which are somewhere under /var/log/containers/ Also, what does the following show? sudo podman exec -it novajoin_server cat /etc/ipa/default.conf sudo podman inspect novajoin_server
No, this is the first time we have seen this issue. novajoin deployment is regularly tested in QE, and has not seen this issue either. As I mentioned above, given that the ipa ca cert appears not to be trusted, I would suspect that there may have been issues with the ipa-client-install. This could manifest itself in strange errors like this.
Created attachment 1777513 [details] novajoin-logs-directory
Created attachment 1777514 [details] podman inspect output
Created attachment 1777515 [details] join.conf
Thanks Ade. I've changed the log level (debug = true) in /var/lib/config-data/puppet-generated/novajoin/etc/novajoin/join.conf . Attached : 1. /var/log/containers/novajoin directory 2. inspect of novajoin_server 3. join.conf [stack@osp16 ~]$ sudo podman exec -it novajoin_server cat /etc/ipa/default.conf #File modified by ipa-client-install [global] basedn = dc=5c5s,dc=local realm = 5C5S.LOCAL domain = 5c5s.local server = freeipa.5c5s.local host = osp16.5c5s.local xmlrpc_uri = https://freeipa.5c5s.local/ipa/xml enable_ra = True
Thanks Ade. Please let me know if you need any help. I have created a new premium support case to get updated of this issue: https://access.redhat.com/support/cases/#/case/02930143 Thanks, Shaju
Shaju, i updated old case with proper account details and set premium support. we can continue old case. I closed the new case. please let me know if we can have remote session, we can involve ppl from IPA team as well if needed.
Hi , I am available at 9.30 PM PST time, 1.30 hrs from now. Please share the link to join live debugging. Thanks, Shaju
What might be going on here is that some problem with the keytab is being masked by a credentials cache error as in this BZ. https://bugzilla.redhat.com/show_bug.cgi?id=1395799#c4 Lets see what the following returns: klist -kt /etc/novajoin/krb5.keytab ls -lZ /etc/novajoin KRB5_TRACE=/dev/stdout kinit nova/osp16.5c5s.local -k -t /etc/novajoin/krb5.keytab podman exec -e KRB5_TRACE=/dev/stdout -ti novajoin_server kinit nova/osp16.5c5s.local -kt /etc/novajoin/krb5.keytab podman exec -it novajoin_server ls -lZ /etc/novajoin
[stack@osp16 ~]$ sudo klist -kt /etc/novajoin/krb5.keytab Keytab name: FILE:/etc/novajoin/krb5.keytab KVNO Timestamp Principal---- ------------------- ------------------------------------------------------ 1 03/23/2021 16:21:00 host/osp16.5c5s.local 1 03/23/2021 16:21:01 host/osp16.5c5s.local [stack@osp16 ~]$ [stack@osp16 ~]$[stack@osp16 ~]$ sudo ls -lZ /etc/novajoin total 28 -rw-r--r--. 1 root novajoin system_u:object_r:etc_t:s0 3302 Mar 10 2020 cloud-config-novajoin.json -rw-r--r--. 1 root novajoin system_u:object_r:etc_t:s0 499 Mar 10 2020 join-api-paste.ini -rw-r-----. 1 root novajoin system_u:object_r:etc_t:s0 11221 Mar 10 2020 join.conf -rwxrwxrwx. 1 root root unconfined_u:object_r:krb5_keytab_t:s0 168 Mar 24 11:19 krb5.keytab -rw-------. 1 root root unconfined_u:object_r:etc_t:s0 168 Apr 30 08:12 krb5.keytab.bckp [stack@osp16 ~]$ [stack@osp16 ~]$ sudo KRB5_TRACE=/dev/stdout kinit nova/osp16.5c5s.local -k -t /etc/novajoin/krb5.keytab [538577] 1619793208.551186: Resolving unique ccache of type KCM [538577] 1619793208.551187: Getting initial credentials for nova/osp16.5c5s.local [538577] 1619793208.551188: Looked up etypes in keytab: (empty) [538577] 1619793208.551189: Getting initial credentials for nova/osp16.5c5s.local [538577] 1619793208.551190: Looked up etypes in keytab: (empty) kinit: Keytab contains no suitable keys for nova/osp16.5c5s.local while getting initial credentials [stack@osp16 ~]$ couldn't get the podman output since the containers are not up [stack@osp16 ~]$ sudo podman exec -e KRB5_TRACE=/dev/stdout -ti novajoin_server kinit nova/osp16.5c5s.local -kt /etc/novajoin/krb5.keytab Error: cannot exec into container that is not running: container state improper [stack@osp16 ~]$ [stack@osp16 ~]$ [stack@osp16 ~]$ podman exec -it novajoin_server ls -lZ /etc/novajoin Error: no container with name or ID novajoin_server found: no such container [stack@osp16 ~]$
[root@osp16 ~]# podman ps | grep novajo 30cf0ddd698c osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-notifier:16.1.3 kolla_start 7 days ago Up About a minute ago novajoin_notifier 0f85925a9d46 osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-server:16.1.3 kolla_start 7 days ago Up About a minute ago novajoin_server [root@osp16 ~]# ipa-getkeytab -p nova/osp16.5c5s.local -k /etc/novajoin/krb5.keytab [root@osp16 ~]# [root@osp16 ~]# [root@osp16 ~]# rpm -qa | grep novaj python3-novajoin-1.3.0-0.20200310222422.265146e.el8ost.noarch [root@osp16 ~]# [root@osp16 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.2 (Ootpa) [root@osp16 ~]#
Just a recap of an interactive session on the customer: 1.The ipa-client-install operation which enrolled the node as an ipa-client failed to trust the IPA CA cert. Interestingly, this did not result in a failure being reported. The ipaclient-install.log indicated success. The result of this was that ipa CLI calls failed because of SSL failures to verify host certs when talking to the IPA server. We fixed this by unenrolling and re-enrolling the ipa client software -- ipa-client-install --uninstall -U ipa-client-install 2. For some reason, it appears that the /etc/novajoin directory did not exist. This may be a bug and as I pointed out in https://bugzilla.redhat.com/show_bug.cgi?id=1941799#c3 , this particular code path may be missing the directive to create this. It used to be that the directory was provided by the python-novajoin rpm, but in the version on the customr machine (python3-novajoin-1.3.0-0.20200310222422.265146e.el8ost.noarch) this was not the case. I'm going to use this BZ to investigate/fix this. The customer was instructed to add this directory manually - and this should be done with 755 permissions 3. The containers were restarting because the keytab being used was incorrect, in that it did not have keys for the right credential. The exact error was masked by a CC_cache error, which was spurious (https://bugzilla.redhat.com/show_bug.cgi?id=1395799#c4) The keytab was incorrectly copied over from the /etc/krb5.keytab The correct command to get the keytab is: kinit admin ipa-getkeytab -p nova/<hostname>@<REALM> -k /etc/novajoin/krb5.keytab
(In reply to Ade Lee from comment #38) > > 2. For some reason, it appears that the /etc/novajoin directory did not > exist. This may be a bug and as I pointed out in > https://bugzilla.redhat.com/show_bug.cgi?id=1941799#c3 , this particular > code path may be missing the directive to create this. > It used to be that the directory was provided by the python-novajoin rpm, > but in the version on the customr machine > (python3-novajoin-1.3.0-0.20200310222422.265146e.el8ost.noarch) > this was not the case. I'm going to use this BZ to investigate/fix this. > > The customer was instructed to add this directory manually - and this > should be done with 755 permissions > I modified the subject to make this more clear.
Changing priority to low, because the escalated issue was fixed, but we still need to document the procedure.