Bug 1941799

Summary: OSP16.1+ rhel-8-2:FreeIPA: Udercloud installation failed. [Failed to add key to the keytab - the /etc/novajoin directory did not exist]
Product: Red Hat OpenStack Reporter: shaju <shajuvk>
Component: python-novajoinAssignee: Ade Lee <alee>
Status: CLOSED WONTFIX QA Contact: Jeremy Agee <jagee>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: alee, apevec, bshephar, dasmith, dwilde, eglynn, ggrasza, gkadam, hrybacki, jhakimra, kchamart, ldenny, mburns, mschuppe, pmannidi, pveiga, rcritten, sbauza, sgordon, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-01 19:52:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
novajoin-logs-directory
none
podman inspect output
none
join.conf none

Description shaju 2021-03-22 19:25:47 UTC
Description of problem:

While installing undercloud with FreeIPA, deployment has been failed with the below error:

TASK [Request kerberos keytab] *****************************************************************************************************************************************************
Friday 19 March 2021  19:23:42 -0400 (0:00:00.286)       0:01:58.254 **********
fatal: [osp16]: FAILED! => {"changed": true, "cmd": "/usr/bin/kinit -kt /etc/krb5.keytab && ipa-getkeytab -s $(grep xmlrpc_uri /etc/ipa/default.conf  | cut -d/ -f3) -p nova/osp16.5
c5s.local -k /etc/novajoin/krb5.keytab", "delta": "0:00:00.356700", "end": "2021-03-19 19:23:43.534849", "msg": "non-zero return code", "rc": 11, "start": "2021-03-19 19:23:43.1781
49", "stderr": "Failed to add key to the keytab", "stderr_lines": ["Failed to add key to the keytab"], "stdout": "", "stdout_lines": []}


After debugging with redhat support, the root case is, /etc/novajoin directory node created in Undercloud. 

After creating novajoin directory manually. I did the below changed to complete the undercloud install

[stack@osp16 etc]$ sudo mkdir novajoin

[stack@osp16 etc]$ sudo cp -r krb5.keytab novajoin/
[stack@osp16 etc]$ chmod 777 novajoin
chmod: changing permissions of 'novajoin': Operation not permitted
[stack@osp16 etc]$ sudo chmod 777 novajoin
[stack@osp16 etc]$ 


Version-Release number of selected component (if applicable):


How reproducible:

Create Undercloud with FreeIPA

Please refer to support case:

https://access.redhat.com/support/cases/#/case/02897947
https://access.redhat.com/support/cases/#/case/02791164

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Martin Schuppert 2021-03-23 06:38:18 UTC
for reference, those steps are performed here:

https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/tls/undercloud-tls.yaml#L81

Comment 3 Ade Lee 2021-04-26 18:47:23 UTC
Actually, the steps in https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/tls/undercloud-tls.yaml#L81 are those of a service to
simply enroll the undercloud in IPA.

For novajoin deployments, we actually end up using - 
https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/nova/novajoin-container-puppet.yaml#L260

which does not have a command to create the directory.  

This is very strange though.  We've never needed this before, but that might be because we always installed the novajoin rpm which does provide this
file.  Maybe with containers though ..

Comment 4 shaju 2021-04-26 18:51:56 UTC
I do see the undercloud installation completes after creating the novajoin directory manually as mentioned in description. But the novajoin containers are continuously restarting. 

===
 [stack@osp16 ~]$ sudo podman ps | grep novajo
 [stack@osp16 ~]$ sudo podman ps | grep novajo
30cf0ddd698c  osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-notifier:16.1.3          kolla_start           3 days ago   Up 1 second ago         novajoin_notifier
0f85925a9d46  osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-server:16.1.3            kolla_start           3 days ago   Up 1 second ago         novajoin_server
 [stack@osp16 ~]$
 [stack@osp16 ~]$ 
 [stack@osp16 ~]$ sudo podman ps | grep novajo
30cf0ddd698c  osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-notifier:16.1.3          kolla_start           4 days ago   Up Less than a second ago         novajoin_notifier
0f85925a9d46  osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-server:16.1.3            kolla_start           4 days ago   Up Less than a second ago         novajoin_server
 [stack@osp16 ~]$ 
====

he undercloud /etc/hosts have the freeipa ip hostname map already present.

# BEGIN ANSIBLE MANAGED BLOCK
192.168.24.1 osp16.5c5s.local osp16
192.168.24.1 osp16.external.5c5s.local osp16.external
192.168.24.1 osp16.ctlplane.5c5s.local osp16.ctlplane

# END ANSIBLE MANAGED BLOCK
192.168.24.252 freeipa.5c5s.local
10.87.86.51  5c5s45-satellite.englab.juniper.net
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

====
While looking for podman logs -f from novajoin_server container. I can see this error:

++ . /usr/local/bin/kolla_novajoin_extend_start
+ echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\'''
Running command: 'novajoin-server --config-file /etc/novajoin/join.conf'
+ exec novajoin-server --config-file /etc/novajoin/join.conf
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/novajoin/join.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='novajoin-server --config-file /etc/novajoin/join.conf'
+ ARGS=
+ sudo kolla_copy_cacerts
+ [[ ! -n '' ]]   
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/novajoin ]]
+++ stat -c %a /var/log/kolla/novajoin
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/novajoin
Running command: 'novajoin-server --config-file /etc/novajoin/join.conf'
++ . /usr/local/bin/kolla_novajoin_extend_start
+ echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\'''
+ exec novajoin-server --config-file /etc/novajoin/join.conf
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/novajoin/join.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf
INFO:__main__:Writing out command to execute

======
This is the logs from novajoin_notifier container:
=========================

INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/novajoin/join.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='novajoin-notify --config-file /etc/novajoin/join.conf'
+ ARGS=
+ sudo kolla_copy_cacerts
+ [[ ! -n '' ]]   
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/novajoin ]]
+++ stat -c %a /var/log/kolla/novajoin
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/novajoin
++ . /usr/local/bin/kolla_novajoin_extend_start
+ echo 'Running command: '\''novajoin-notify --config-file /etc/novajoin/join.conf'\'''
Running command: 'novajoin-notify --config-file /etc/novajoin/join.conf'
+ exec novajoin-notify --config-file /etc/novajoin/join.conf
Traceback (most recent call last):
  File "/usr/bin/novajoin-notify", line 6, in <module>
    from novajoin.notifications import main
  File "/usr/lib/python3.6/site-packages/novajoin/notifications.py", line 33, in <module>
    from novajoin import join
  File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 78, in <module>
    class JoinController(Controller):
  File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 80, in JoinController
    def __init__(self, ipaclient=IPAClient()):
  File "/usr/lib/python3.6/site-packages/novajoin/ipa.py", line 79, in __init__
    CONF.keytab, self.ccache)
  File "/usr/lib/python3.6/site-packages/ipalib/install/kinit.py", line 47, in kinit_keytab
    cred = gssapi.Credentials(name=name, store=store, usage='initiate')
  File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 64, in __new__
    store=store)  
  File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 148, in acquire
    usage)
  File "gssapi/raw/ext_cred_store.pyx", line 182, in gssapi.raw.ext_cred_store.acquire_cred_from
gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure.  Minor code may provide more information, Minor (2529639107): No credentials cache found
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file


Thanks,
Shaju

Comment 5 shaju 2021-04-26 18:53:41 UTC
I am working on a support case: https://access.redhat.com/support/cases/#/case/02897947 for a solution for this issue. 
Appreciate any help to solve this problem.

Thanks,
Shaju

Comment 6 shaju 2021-04-26 18:54:00 UTC
I am working on a support case: https://access.redhat.com/support/cases/#/case/02897947 for a solution for this issue. 
Appreciate any help to solve this problem.

Thanks,
Shaju

Comment 7 Ade Lee 2021-04-26 18:57:35 UTC
Shaju,

Is the undercloud DNS pointing to the IPA server?

Comment 8 shaju 2021-04-26 19:00:37 UTC
Yes, it is pointed to freeIPA IPA.

 [stack@osp16 ~]$ cat /etc/resolv.conf
# Generated by NetworkManager
search 5c5s.local
nameserver 192.168.24.252

==
 [stack@osp16 ~]$ nslookup 192.168.24.252
252.24.168.192.in-addr.arpa     name = freeipa.5c5s.local.

=====

 [stack@osp16 ~]$ kinit admin
Password for admin:
 [stack@osp16 ~]$ ipa host-find
ipa: ERROR: cannot connect to 'https://freeipa.5c5s.local/ipa/json': [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
 [stack@osp16 ~]$
=====

From FreeIPA:
==============
[root@freeipa ~]# kinit admin
Password for admin:
[root@freeipa ~]# ipa host-find
---------------
2 hosts matched
---------------
  Host name: freeipa.5c5s.local
  Principal name: host/freeipa.5c5s.local
  Principal alias: host/freeipa.5c5s.local
  SSH public key fingerprint: SHA256:QF3LoLSeldC0EWmFw/veIyRPG01Ach2phs9DG6wRYy4 (ssh-ed25519), SHA256:u5/2zF2/Grr2E0s9Bs2KxGgSWVyXyhK2k+t6XQo9zJ8 (ecdsa-
                              sha2-nistp256), SHA256:yRSMAwJb7Mu0GguWRq7EqM4Y1l42GmHxv3HRVCq1bmE (ssh-rsa)

  Host name: osp16.5c5s.local
  Description: Undercloud host
----------------------------
Number of entries returned 2
----------------------------
[root@freeipa ~]#

Thanks,
Shaju

Comment 10 Priscila 2021-04-27 14:44:51 UTC
While looking for podman logs -f from novajoin_server container. I can see this error:

++ . /usr/local/bin/kolla_novajoin_extend_start
+ echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\'''
Running command: 'novajoin-server --config-file /etc/novajoin/join.conf'
+ exec novajoin-server --config-file /etc/novajoin/join.conf
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/novajoin/join.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='novajoin-server --config-file /etc/novajoin/join.conf'
+ ARGS=
+ sudo kolla_copy_cacerts
+ [[ ! -n '' ]]   
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/novajoin ]]
+++ stat -c %a /var/log/kolla/novajoin
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/novajoin
Running command: 'novajoin-server --config-file /etc/novajoin/join.conf'
++ . /usr/local/bin/kolla_novajoin_extend_start
+ echo 'Running command: '\''novajoin-server --config-file /etc/novajoin/join.conf'\'''
+ exec novajoin-server --config-file /etc/novajoin/join.conf
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/novajoin/join.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf
INFO:__main__:Writing out command to execute

======
This is the logs from novajoin_notifier container:
=========================

INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/novajoin/join.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/novajoin/join.conf to /etc/novajoin/join.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='novajoin-notify --config-file /etc/novajoin/join.conf'
+ ARGS=
+ sudo kolla_copy_cacerts
+ [[ ! -n '' ]]   
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/novajoin ]]
+++ stat -c %a /var/log/kolla/novajoin
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/novajoin
++ . /usr/local/bin/kolla_novajoin_extend_start
+ echo 'Running command: '\''novajoin-notify --config-file /etc/novajoin/join.conf'\'''
Running command: 'novajoin-notify --config-file /etc/novajoin/join.conf'
+ exec novajoin-notify --config-file /etc/novajoin/join.conf
Traceback (most recent call last):
  File "/usr/bin/novajoin-notify", line 6, in <module>
    from novajoin.notifications import main
  File "/usr/lib/python3.6/site-packages/novajoin/notifications.py", line 33, in <module>
    from novajoin import join
  File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 78, in <module>
    class JoinController(Controller):
  File "/usr/lib/python3.6/site-packages/novajoin/join.py", line 80, in JoinController
    def __init__(self, ipaclient=IPAClient()):
  File "/usr/lib/python3.6/site-packages/novajoin/ipa.py", line 79, in __init__
    CONF.keytab, self.ccache)
  File "/usr/lib/python3.6/site-packages/ipalib/install/kinit.py", line 47, in kinit_keytab
    cred = gssapi.Credentials(name=name, store=store, usage='initiate')
  File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 64, in __new__
    store=store)  
  File "/usr/lib64/python3.6/site-packages/gssapi/creds.py", line 148, in acquire
    usage)
  File "gssapi/raw/ext_cred_store.pyx", line 182, in gssapi.raw.ext_cred_store.acquire_cred_from
gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure.  Minor code may provide more information, Minor (2529639107): No credentials cache found
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file

Comment 11 shaju 2021-04-29 04:47:58 UTC
Hi Team,

Juniper release has been scheduled to deliver in two days. Our joint customers cannot do an upgrade to the latest juniper version due to this bug. An immediate workaround or solution really helps us to deliver the release on time.

Thanks,
Shaju

Comment 12 Ade Lee 2021-04-29 14:03:06 UTC
The error message above ..

 [stack@osp16 ~]$ kinit admin
Password for admin:
 [stack@osp16 ~]$ ipa host-find
ipa: ERROR: cannot connect to 'https://freeipa.5c5s.local/ipa/json': [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
 [stack@osp16 ~]$

seems to indicate that the CA cert for the IPA server is not trusted.  This should have been set up correctly as part of the ipa-client-install
process, which makes me wonder what else was not set up correctly.  Looking at the ipa-client-install log in /var/log might tell us more.

To get more information on whats going on with kerberos, try running the following both on the undercloud, and in the containers:

KRB5_TRACE=/dev/stdout kinit admin

Comment 13 shaju 2021-04-29 16:35:41 UTC
Hi Ade Lee,
Thanks for checking this case. Please find the requested details: 

 [stack@osp16 ~]$ KRB5_TRACE=/dev/stdout kinit admin
[835087] 1619713996.412366: Getting initial credentials for admin
[835087] 1619713996.412368: Sending unauthenticated request
[835087] 1619713996.412369: Sending request (181 bytes) to 5C5S.LOCAL
[835087] 1619713996.412370: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[835087] 1619713996.412371: No URI records found
[835087] 1619713996.412372: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL.
[835087] 1619713996.412373: SRV answer: 0 100 88 "freeipa.5c5s.local."
[835087] 1619713996.412374: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL.
[835087] 1619713996.412375: SRV answer: 0 100 88 "freeipa.5c5s.local."
[835087] 1619713996.412376: Resolving hostname freeipa.5c5s.local.
[835087] 1619713996.412377: Resolving hostname freeipa.5c5s.local.
[835087] 1619713996.412378: Initiating TCP connection to stream 192.168.24.252:88
[835087] 1619713996.412379: Sending TCP request to stream 192.168.24.252:88
[835087] 1619713996.412380: Received answer (499 bytes) from stream 192.168.24.252:88
[835087] 1619713996.412381: Terminating TCP connection to stream 192.168.24.252:88
[835087] 1619713996.412382: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[835087] 1619713996.412383: No URI records found
[835087] 1619713996.412384: Sending DNS SRV query for _kerberos-master._tcp.5C5S.LOCAL.
[835087] 1619713996.412385: SRV answer: 0 100 88 "freeipa.5c5s.local."
[835087] 1619713996.412386: Response was from master KDC
[835087] 1619713996.412387: Received error from KDC: -1765328359/Additional pre-authentication required
[835087] 1619713996.412390: Preauthenticating using KDC method data
[835087] 1619713996.412391: Processing preauth types: PA-PK-AS-REQ (16), PA-FX-FAST (136), PA-ETYPE-INFO2 (19), PA-PKINIT-KX (147), PA-SPAKE (151), PA-ENC-TIMESTAMP (2), PA_AS_FRESHNESS (150), PA-FX-COOKIE (133)
[835087] 1619713996.412392: Selected etype info: etype aes256-cts, salt "m^(2D9+F]HS=MNj$", params ""
[835087] 1619713996.412393: Received cookie: MIT1\x00\x00\x00\x01Xf\xb7+\xc8&\x0e.7\xab\xa6\x85k7\x1b\x0e@F\x10;\xac8\x9eq5/K(\x0eiO\x06\xbf\x16\xc9\xfbpk\xc3\xe1sN\x90E\xa8<G<\x98\xc9\x8c\x88d\x17W\x14\x031\xae\xed:\x1e\x98ui8\xa4\xc0L\xb7(c\xd3\xc5\xa4h\xad\x1d\x02%\xa9\xab\xda\xd2Br\xc2T\xf5\xb9\x8bE\xe6Xu\xf1\x9a\xe7\xf6\xf8&\xad\x1b\xd9\xd9\xd5\x10\xd9\x83\xf0\xb8\x9e\xe1\xbd5\x9e\x15\x06I\xde\x87+\xf5\xfc\x07Q\xaa\xd5\x18M
[835087] 1619713996.412394: SPAKE challenge received with group 1, pubkey 05118BCEE6532A4E90664749D4C615AE546F54572D58452160550AF9F822F5E4
Password for admin:
[835087] 1619714003.560984: SPAKE key generated with pubkey 83E37DE1417D963F31963EB3AB8A677ACFABFC7B034F2061756E6A2616CFA9B3
[835087] 1619714003.560985: SPAKE algorithm result: EA8A8BDE28E3C1D3BDB807B1863632A42642CB2506557AAD064A6D807F72F7A9
[835087] 1619714003.560986: SPAKE final transcript hash: 413708ABA23EE8DFFF0DB136C86E794DA2830D073C4D87BC8869DCC99690DD75
[835087] 1619714003.560987: Sending SPAKE response
[835087] 1619714003.560988: Preauth module spake (151) (real) returned: 0/Success
[835087] 1619714003.560989: Produced preauth for next request: PA-FX-COOKIE (133), PA-SPAKE (151)
[835087] 1619714003.560990: Sending request (440 bytes) to 5C5S.LOCAL
[835087] 1619714003.560991: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[835087] 1619714003.560992: No URI records found
[835087] 1619714003.560993: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL.
[835087] 1619714003.560994: SRV answer: 0 100 88 "freeipa.5c5s.local."
[835087] 1619714003.560995: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL.
[835087] 1619714003.560996: SRV answer: 0 100 88 "freeipa.5c5s.local."
[835087] 1619714003.560997: Resolving hostname freeipa.5c5s.local.
[835087] 1619714003.560998: Resolving hostname freeipa.5c5s.local.
[835087] 1619714003.560999: Initiating TCP connection to stream 192.168.24.252:88
[835087] 1619714003.561000: Sending TCP request to stream 192.168.24.252:88
[835087] 1619714003.561001: Received answer (798 bytes) from stream 192.168.24.252:88
[835087] 1619714003.561002: Terminating TCP connection to stream 192.168.24.252:88
[835087] 1619714003.561003: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[835087] 1619714003.561004: No URI records found
[835087] 1619714003.561005: Sending DNS SRV query for _kerberos-master._tcp.5C5S.LOCAL.
[835087] 1619714003.561006: SRV answer: 0 100 88 "freeipa.5c5s.local."
[835087] 1619714003.561007: Response was from master KDC
[835087] 1619714003.561008: AS key determined by preauth: aes256-cts/4E72
[835087] 1619714003.561009: Decrypted AS reply; session key is: aes256-cts/F8A6
[835087] 1619714003.561010: FAST negotiation: available
[835087] 1619714003.561011: Initializing KCM:1000 with default princ admin
[835087] 1619714003.561012: Storing admin -> krbtgt/5C5S.LOCAL in KCM:1000
[835087] 1619714003.561013: Storing config in KCM:1000 for krbtgt/5C5S.LOCAL: fast_avail: yes
[835087] 1619714003.561014: Storing admin -> krb5_ccache_conf_data/fast_avail/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in KCM:1000
[835087] 1619714003.561015: Storing config in KCM:1000 for krbtgt/5C5S.LOCAL: pa_type: 151
[835087] 1619714003.561016: Storing admin -> krb5_ccache_conf_data/pa_type/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in KCM:1000
 [stack@osp16 ~]$

Comment 14 Ade Lee 2021-04-29 17:09:49 UTC
Great.  This shows me what is happening on the undercloud.

We also need to see the same in the novajoin container ..
This may be tricky , given that the containers are going up and down - but the idea would be to run it sort of like:

podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin

so we can see what is happening there.

Comment 15 shaju 2021-04-29 17:51:16 UTC
 
[stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin
Error: cannot exec into container that is not running: container state improper
 [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin
Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error
 [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin
Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error
 [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin
Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error
 [stack@osp16 ~]$ 
 [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin
Error: non zero exit code: 137: OCI runtime error
 [stack@osp16 ~]$ sudo podman exec -it novajoin_server KRB5_TRACE=/dev/stdout kinit admin
Error: exec failed: container_linux.go:370: starting container process caused: exec: "KRB5_TRACE=/dev/stdout": stat KRB5_TRACE=/dev/stdout: no such file or directory: OCI runtime command not found error
 [stack@osp16 ~]$ 
 [stack@osp16 ~]$

Comment 16 Ade Lee 2021-04-29 20:22:44 UTC
sudo podman exec -e KRB5_TRACE=/dev/stdout -it novajoin_server kinit admin

Comment 17 shaju 2021-04-29 20:43:29 UTC
Hi Ade Lee,

Before I enter the creds, containers are restarted

 [stack@osp16 ~]$ sudo podman exec -e KRB5_TRACE=/dev/stdout -it novajoin_server kinit admin
[19] 1619728928.180664: TXT record _kerberos.osp16.5c5s.local. not found
[19] 1619728928.180665: TXT record _kerberos.5c5s.local. found: 5C5S.LOCAL
[19] 1619728928.180666: Getting initial credentials for admin
[19] 1619728928.180668: Sending unauthenticated request
[19] 1619728928.180669: Sending request (181 bytes) to 5C5S.LOCAL
[19] 1619728928.180670: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[19] 1619728928.180671: No URI records found
[19] 1619728928.180672: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL.
[19] 1619728928.180673: SRV answer: 0 100 88 "freeipa.5c5s.local."
[19] 1619728928.180674: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL.
[19] 1619728928.180675: SRV answer: 0 100 88 "freeipa.5c5s.local."
[19] 1619728928.180676: Resolving hostname freeipa.5c5s.local.
[19] 1619728928.180677: Sending initial UDP request to dgram 192.168.24.252:88
[19] 1619728928.180678: Received answer (499 bytes) from dgram 192.168.24.252:88
[19] 1619728928.180679: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[19] 1619728928.180680: No URI records found
[19] 1619728928.180681: Sending DNS SRV query for _kerberos-master._udp.5C5S.LOCAL.
[19] 1619728928.180682: SRV answer: 0 100 88 "freeipa.5c5s.local."
[19] 1619728928.180683: Response was from master KDC
[19] 1619728928.180684: Received error from KDC: -1765328359/Additional pre-authentication required
[19] 1619728928.180687: Preauthenticating using KDC method data
[19] 1619728928.180688: Processing preauth types: PA-PK-AS-REQ (16), PA-FX-FAST (136), PA-ETYPE-INFO2 (19), PA-PKINIT-KX (147), PA-SPAKE (151), PA-ENC-TIMESTAMP (2), PA_AS_FRESHNESS (150), PA-FX-COOKIE (133)
[19] 1619728928.180689: Selected etype info: etype aes256-cts, salt "m^(2D9+F]HS=MNj$", params ""
[19] 1619728928.180690: Received cookie: MIT1\x00\x00\x00\x01\x0c\x09\xd3>\xa4\x0f\xda\xa3\xf3\xe6u\xefu\x83\xfa\x13\xc0\x02\xe9\xd9&.n\x95x{e\xf9\x01\xa1\x0b\xa7\xcbi\xcd\xd6r\x81\xfc\xeb\x1fj\xb1\xe6\xbcu\xc46\x91\xe1k\xea\xb3\xa4\x8d\xddF\x0ci\xf3S\xe1_\xa8\x98\xc8\xac\x9d\x8d\x09@\x95U\x83JL`^\x06L\xc1T\x0d\x0e\xde\x8ao\xd2\x0e\xd8xL\xd9\x8dl\xda\xb1\xc3\xd0\x9c\x143\x91\xec\x8a\xfb\xee\x05/\xaa\xc7\xec\xbdq\xc4\xc2\x19N\xf7\xd9\x02t\x88\x0c\xe9d\x07\xcbU\xa8
[19] 1619728928.180691: SPAKE challenge received with group 1, pubkey 335A89BFC25031E91D1D19CB848557B9D8FAA663302E94049EB259A0698CB3A5
Password for admin: Error: non zero exit code: 137: OCI runtime error

Comment 18 shaju 2021-04-29 20:45:19 UTC
Looks like i got the output:
===========

 [stack@osp16 ~]$ sudo podman exec -e KRB5_TRACE=/dev/stdout -it novajoin_server kinit admin
[10] 1619729066.19522: TXT record _kerberos.osp16.5c5s.local. not found
[10] 1619729066.19523: TXT record _kerberos.5c5s.local. found: 5C5S.LOCAL
[10] 1619729066.19524: Getting initial credentials for admin
[10] 1619729066.19526: Sending unauthenticated request
[10] 1619729066.19527: Sending request (181 bytes) to 5C5S.LOCAL
[10] 1619729066.19528: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[10] 1619729066.19529: No URI records found
[10] 1619729066.19530: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL.
[10] 1619729066.19531: SRV answer: 0 100 88 "freeipa.5c5s.local."
[10] 1619729066.19532: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL.
[10] 1619729066.19533: SRV answer: 0 100 88 "freeipa.5c5s.local."
[10] 1619729066.19534: Resolving hostname freeipa.5c5s.local.
[10] 1619729066.19535: Sending initial UDP request to dgram 192.168.24.252:88
[10] 1619729066.19536: Received answer (498 bytes) from dgram 192.168.24.252:88
[10] 1619729066.19537: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[10] 1619729066.19538: No URI records found
[10] 1619729066.19539: Sending DNS SRV query for _kerberos-master._udp.5C5S.LOCAL.
[10] 1619729066.19540: SRV answer: 0 100 88 "freeipa.5c5s.local."
[10] 1619729066.19541: Response was from master KDC
[10] 1619729066.19542: Received error from KDC: -1765328359/Additional pre-authentication required
[10] 1619729066.19545: Preauthenticating using KDC method data
[10] 1619729066.19546: Processing preauth types: PA-PK-AS-REQ (16), PA-FX-FAST (136), PA-ETYPE-INFO2 (19), PA-PKINIT-KX (147), PA-SPAKE (151), PA-ENC-TIMESTAMP (2), PA_AS_FRESHNESS (150), PA-FX-COOKIE (133)
[10] 1619729066.19547: Selected etype info: etype aes256-cts, salt "m^(2D9+F]HS=MNj$", params ""
[10] 1619729066.19548: Received cookie: MIT1\x00\x00\x00\x01\x8f7l\xb5>^\xfd\xfa!c\x0f\xc2j4\xac\xdf\x04$\x91\xe7\x96)\xdf\xdbH\x9c\x99\x9b\x8c\x9d\xc0\xa0&$Z[Y\xd6\x85\xfb\xfd\x06h\xb8\xda\xf9\xc2\xc1\x89\xe34\xb8U\xba\x0c\x8f<\xe6_8\xde\x90\xe4\x9b\xeb`\xbe\xbe\xb2\x10\xbf\xd3\x8f\xcf\x9bX\x81\x9b\x09\xdb\xd6\x92w\xee(5%_{Y\x94\x9b\xe4;\xba1\xbb\xce_\x92\xd9\x09X\xc5\x84\xd24\xc8Y\xf9\xa2\xc3\xe1\xa6\xb9\xa1\xa0\x88+\x9an\xb4\x13\x87\x08\xbb\xa3(\xb6\x0a
[10] 1619729066.19549: SPAKE challenge received with group 1, pubkey 0ED51B94003BD2699E7398F53E9842AF64A6C6392C7DDA8A22051626FA361109
Password for admin: 
[10] 1619729066.19550: SPAKE key generated with pubkey 57A1138F16FA4B47FACD1712F0C08B73B1B03CD3DD2CA35CAC45C821F01A96D7
[10] 1619729066.19551: SPAKE algorithm result: 21BAFA536B0BDE7CE155CE310F5A107DD46E23D8B398290054637DF35736CAE7
[10] 1619729066.19552: SPAKE final transcript hash: 9330C092BC0A743982B90A168725F92B88DA785AA056C13724C63EB015116D2B
[10] 1619729066.19553: Sending SPAKE response
[10] 1619729066.19554: Preauth module spake (151) (real) returned: 0/Success
[10] 1619729066.19555: Produced preauth for next request: PA-FX-COOKIE (133), PA-SPAKE (151)
[10] 1619729066.19556: Sending request (440 bytes) to 5C5S.LOCAL
[10] 1619729066.19557: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[10] 1619729066.19558: No URI records found
[10] 1619729066.19559: Sending DNS SRV query for _kerberos._udp.5C5S.LOCAL.
[10] 1619729066.19560: SRV answer: 0 100 88 "freeipa.5c5s.local."
[10] 1619729066.19561: Sending DNS SRV query for _kerberos._tcp.5C5S.LOCAL.
[10] 1619729066.19562: SRV answer: 0 100 88 "freeipa.5c5s.local."
[10] 1619729066.19563: Resolving hostname freeipa.5c5s.local.
[10] 1619729066.19564: Sending initial UDP request to dgram 192.168.24.252:88
[10] 1619729066.19565: Received answer (798 bytes) from dgram 192.168.24.252:88
[10] 1619729066.19566: Sending DNS URI query for _kerberos.5C5S.LOCAL.
[10] 1619729066.19567: No URI records found
[10] 1619729066.19568: Sending DNS SRV query for _kerberos-master._udp.5C5S.LOCAL.
[10] 1619729066.19569: SRV answer: 0 100 88 "freeipa.5c5s.local."
[10] 1619729066.19570: Response was from master KDC
[10] 1619729066.19571: AS key determined by preauth: aes256-cts/30AC
[10] 1619729066.19572: Decrypted AS reply; session key is: aes256-cts/0FBF
[10] 1619729066.19573: FAST negotiation: available
[10] 1619729066.19574: Initializing FILE:/tmp/krb5cc_0 with default princ admin
[10] 1619729066.19575: Storing admin -> krbtgt/5C5S.LOCAL in FILE:/tmp/krb5cc_0
[10] 1619729066.19576: Storing config in FILE:/tmp/krb5cc_0 for krbtgt/5C5S.LOCAL: fast_avail: yes
[10] 1619729066.19577: Storing admin -> krb5_ccache_conf_data/fast_avail/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in FILE:/tmp/krb5cc_0
[10] 1619729066.19578: Storing config in FILE:/tmp/krb5cc_0 for krbtgt/5C5S.LOCAL: pa_type: 151
[10] 1619729066.19579: Storing admin -> krb5_ccache_conf_data/pa_type/krbtgt\/5C5S.LOCAL\@5C5S.LOCAL@X-CACHECONF: in FILE:/tmp/krb5cc_0
 [stack@osp16 ~]$

Comment 21 Ade Lee 2021-04-29 22:03:42 UTC
The error above seems to indicate that kerberos was able to get a ticket but was unable to store it.

gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure.  Minor code may provide more information, Minor (2529639107): No credentials cache found

which, given that novajoin is using a memory cache is unexpected.

Lets try a couple things.  First, we want to set the log level to debug.  That is likely to a join.conf file somewhere under /var/lib/config-data/puppet-generated/
There set debug to true.  Hopefully, this will provide some logs in the novajoin logs, which are somewhere under /var/log/containers/

Also, what does the following show?
sudo podman exec -it novajoin_server cat /etc/ipa/default.conf

sudo podman inspect novajoin_server

Comment 22 Ade Lee 2021-04-29 22:09:14 UTC
No, this is the first time we have seen this issue.  novajoin deployment is regularly tested in QE, and has not seen this issue either.

As I mentioned above, given that the ipa ca cert appears not to be trusted, I would suspect that there may have been issues with the
ipa-client-install.  This could manifest itself in strange errors like this.

Comment 24 shaju 2021-04-29 22:21:57 UTC
Created attachment 1777513 [details]
novajoin-logs-directory

Comment 25 shaju 2021-04-29 22:22:56 UTC
Created attachment 1777514 [details]
podman inspect output

Comment 26 shaju 2021-04-29 22:23:28 UTC
Created attachment 1777515 [details]
join.conf

Comment 27 shaju 2021-04-29 22:23:51 UTC
Thanks Ade.  I've changed the log level (debug = true) in /var/lib/config-data/puppet-generated/novajoin/etc/novajoin/join.conf .

Attached :

1. /var/log/containers/novajoin directory
2. inspect of novajoin_server
3. join.conf

 [stack@osp16 ~]$ sudo podman exec -it novajoin_server cat /etc/ipa/default.conf
#File modified by ipa-client-install

[global]
basedn = dc=5c5s,dc=local
realm = 5C5S.LOCAL
domain = 5c5s.local
server = freeipa.5c5s.local
host = osp16.5c5s.local
xmlrpc_uri = https://freeipa.5c5s.local/ipa/xml
enable_ra = True

Comment 29 shaju 2021-04-29 23:44:07 UTC
Thanks Ade. Please let me know if you need any help. 

I have created a new premium support case to get updated of this issue: 

https://access.redhat.com/support/cases/#/case/02930143

Thanks,
Shaju

Comment 30 PURANDHAR SAIRAM MANNIDI 2021-04-30 00:40:48 UTC
Shaju, i updated old case with proper account details and set premium support. we can continue old case. I closed the new case.

please let me know if we can have remote session, we can involve ppl from IPA team as well if needed.

Comment 32 shaju 2021-04-30 03:04:45 UTC
Hi ,

I am available at 9.30 PM PST time, 1.30 hrs from now. Please share the link to join live debugging. 

Thanks,
Shaju

Comment 34 Ade Lee 2021-04-30 13:45:32 UTC
What might be going on here is that some problem with the keytab is being masked by a credentials cache error
as in this BZ. https://bugzilla.redhat.com/show_bug.cgi?id=1395799#c4

Lets see what the following returns:

klist -kt /etc/novajoin/krb5.keytab
ls -lZ /etc/novajoin
KRB5_TRACE=/dev/stdout kinit nova/osp16.5c5s.local -k -t /etc/novajoin/krb5.keytab

podman exec -e KRB5_TRACE=/dev/stdout -ti novajoin_server kinit nova/osp16.5c5s.local -kt /etc/novajoin/krb5.keytab
podman exec -it novajoin_server ls -lZ /etc/novajoin

Comment 36 shaju 2021-04-30 15:26:06 UTC
[stack@osp16 ~]$ sudo klist -kt /etc/novajoin/krb5.keytab
Keytab name: FILE:/etc/novajoin/krb5.keytab

KVNO Timestamp           Principal---- ------------------- ------------------------------------------------------   
1 03/23/2021 16:21:00 host/osp16.5c5s.local   
1 03/23/2021 16:21:01 host/osp16.5c5s.local
[stack@osp16 ~]$

[stack@osp16 ~]$[stack@osp16 ~]$ sudo ls -lZ /etc/novajoin
total 28
-rw-r--r--. 1 root novajoin system_u:object_r:etc_t:s0              3302 Mar 10  2020 cloud-config-novajoin.json
-rw-r--r--. 1 root novajoin system_u:object_r:etc_t:s0               499 Mar 10  2020 join-api-paste.ini
-rw-r-----. 1 root novajoin system_u:object_r:etc_t:s0             11221 Mar 10  2020 join.conf
-rwxrwxrwx. 1 root root     unconfined_u:object_r:krb5_keytab_t:s0   168 Mar 24 11:19 krb5.keytab
-rw-------. 1 root root     unconfined_u:object_r:etc_t:s0           168 Apr 30 08:12 krb5.keytab.bckp
[stack@osp16 ~]$

[stack@osp16 ~]$ sudo KRB5_TRACE=/dev/stdout kinit nova/osp16.5c5s.local -k -t /etc/novajoin/krb5.keytab

[538577] 1619793208.551186: Resolving unique ccache of type KCM
[538577] 1619793208.551187: Getting initial credentials for nova/osp16.5c5s.local
[538577] 1619793208.551188: Looked up etypes in keytab: (empty)
[538577] 1619793208.551189: Getting initial credentials for nova/osp16.5c5s.local
[538577] 1619793208.551190: Looked up etypes in keytab: (empty)
kinit: Keytab contains no suitable keys for nova/osp16.5c5s.local while getting initial credentials
[stack@osp16 ~]$

couldn't get the podman output since the containers are not up

[stack@osp16 ~]$ sudo podman exec -e KRB5_TRACE=/dev/stdout -ti novajoin_server kinit nova/osp16.5c5s.local -kt /etc/novajoin/krb5.keytab
Error: cannot exec into container that is not running: container state improper
[stack@osp16 ~]$ 
[stack@osp16 ~]$ 
[stack@osp16 ~]$ podman exec -it novajoin_server ls -lZ /etc/novajoin
Error: no container with name or ID novajoin_server found: no such container
[stack@osp16 ~]$

Comment 37 shaju 2021-04-30 16:51:52 UTC
 [root@osp16 ~]# podman ps | grep novajo

30cf0ddd698c  osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-notifier:16.1.3          kolla_start           7 days ago    Up About a minute ago         novajoin_notifier
0f85925a9d46  osp16.ctlplane.5c5s.local:8787/5c5s45-satellite-rhosp-16-rhel-8-product-containers-rhosp-rhel8_openstack-novajoin-server:16.1.3            kolla_start           7 days ago    Up About a minute ago         novajoin_server

 [root@osp16 ~]# ipa-getkeytab -p nova/osp16.5c5s.local -k /etc/novajoin/krb5.keytab
 [root@osp16 ~]# 
 [root@osp16 ~]# 
 [root@osp16 ~]# rpm -qa | grep novaj
python3-novajoin-1.3.0-0.20200310222422.265146e.el8ost.noarch
 [root@osp16 ~]# 
 [root@osp16 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.2 (Ootpa)
 [root@osp16 ~]#

Comment 38 Ade Lee 2021-04-30 18:55:35 UTC
Just a recap of an interactive session on the customer:

1.The ipa-client-install operation which enrolled the node as an ipa-client failed to trust the IPA CA cert.  Interestingly,
  this did not result in a failure being reported.  The ipaclient-install.log indicated success.  The result of this was that
  ipa CLI calls failed because of SSL failures to verify host certs when talking to the IPA server.

   We fixed this by unenrolling and re-enrolling the ipa client software
   -- ipa-client-install --uninstall -U
      ipa-client-install

2. For some reason, it appears that the /etc/novajoin directory did not exist.  This may be a bug and as I pointed out in
   https://bugzilla.redhat.com/show_bug.cgi?id=1941799#c3 , this particular code path may be missing the directive to create this.
   It used to be that the directory was provided by the python-novajoin rpm, but in the version on the customr machine (python3-novajoin-1.3.0-0.20200310222422.265146e.el8ost.noarch)
   this was not the case.  I'm going to use this BZ to investigate/fix this.

   The customer was instructed to add this directory manually - and this should be done with 755 permissions

3. The containers were restarting because the keytab being used was incorrect, in that it did not have keys for the right credential.
   The exact error was masked by a CC_cache error, which was spurious (https://bugzilla.redhat.com/show_bug.cgi?id=1395799#c4)

   The keytab was incorrectly copied over from the /etc/krb5.keytab

   The correct command to get the keytab is:

   kinit admin
   ipa-getkeytab -p nova/<hostname>@<REALM> -k /etc/novajoin/krb5.keytab

Comment 46 Grzegorz Grasza 2021-07-26 14:37:48 UTC
(In reply to Ade Lee from comment #38)
> 
> 2. For some reason, it appears that the /etc/novajoin directory did not
> exist.  This may be a bug and as I pointed out in
>    https://bugzilla.redhat.com/show_bug.cgi?id=1941799#c3 , this particular
> code path may be missing the directive to create this.
>    It used to be that the directory was provided by the python-novajoin rpm,
> but in the version on the customr machine
> (python3-novajoin-1.3.0-0.20200310222422.265146e.el8ost.noarch)
>    this was not the case.  I'm going to use this BZ to investigate/fix this.
> 
>    The customer was instructed to add this directory manually - and this
> should be done with 755 permissions
> 

I modified the subject to make this more clear.

Comment 48 Grzegorz Grasza 2021-12-06 15:18:51 UTC
Changing priority to low, because the escalated issue was fixed, but we still need to document the procedure.