Created attachment 1640616 [details] log files Description of problem: Hosted engine deploys failed since "Obtain SSO token using username/**FILTERED** credentials". 1). Hosted engine is up in cockpit hosted engine page. 2). Engine vm is up in cockpit hosted engine page. 3). Message "Unable to log in because the user account has expired. Contact the system administrator." displays in the landingpage of engine vm. log: ovirt-hosted-engine-setup-ansible-create_target_vm-20191029151223-eh9zs7.log 2019-11-29 18:34:31,410+0800 DEBUG var changed: host "localhost" var "ovirt_sso_auth" type "<type 'dict'>" value: "{ "attempts": 50, "changed": false, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_auth_payload_xHejoy/ansible_ovirt_auth_payload.zip/ansible/modules/cloud/ovirt/ovirt_auth.py\", line 276, in main\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 382, in authenticate\n self._sso_token = self._get_access_token()\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 628, in _get_access_token\n sso_error[1]\nAuthError: Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator..\n", "failed": true, "msg": "Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator.." }" 2019-11-29 18:34:31,410+0800 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Obtain SSO token using username/**FILTERED** credentials', 'ansible_result': u'type: <type \'dict\'>\nstr: {u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_auth_payload_xHejoy/ansible_ovirt_auth_payload.zip/ansible/modules/cloud/ovirt/ovirt_auth.py", line 276, in main\\n File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 382, in authenticate\\n self._ss', 'task_duration': 727, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'} Version-Release number of selected component (if applicable): RHVH-4.3-20191128.0-RHVH-x86_64-dvd1.iso cockpit-dashboard-195-1.el7.x86_64 cockpit-system-195-1.el7.noarch cockpit-195-1.el7.x86_64 cockpit-bridge-195-1.el7.x86_64 cockpit-ws-195-1.el7.x86_64 cockpit-storaged-195-1.el7.noarch cockpit-ovirt-dashboard-0.13.8-1.el7ev.noarch cockpit-machines-ovirt-195-1.el7.noarch rhvm-appliance-4.3-20191127.0.el7.x86_64 ovirt-hosted-engine-ha-2.3.6-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.12-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: Deploy hosted engine via cockpit UI Actual results: Hosted engine deploys failed since "Obtain SSO token using username/**FILTERED** credentials". Expected results: Hosted engine deploys successfully. Additional info:
Created attachment 1640617 [details] log files
is one of your provided passwords (engine OS/Admin) part of the engine/host FQDN?
can you also check if the chronyd service is running on host: # systemctl status chronyd ● chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2019-12-01 13:44:31 IST; 5h 43min ago Docs: man:chronyd(8) man:chrony.conf(5) Main PID: 16572 (chronyd) Tasks: 1 CGroup: /system.slice/chronyd.service └─16572 /usr/sbin/chronyd and is synced with sources: # chronyc sources 210 Number of sources = 4 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^+ clock01.util.phx2.redhat> 1 10 377 28 +4512us[+4512us] +/- 99ms ^* clock1.rdu2.redhat.com 1 10 377 86 -1116us[-1117us] +/- 69ms ^+ clock.bos.redhat.com 1 10 377 662 -6362us[-6363us] +/- 84ms ^+ clock02.util.phx2.redhat> 1 10 377 86 +4531us[+4531us] +/- 99ms
(In reply to Evgeny Slutsky from comment #2) > is one of your provided passwords (engine OS/Admin) part of the engine/host > FQDN? Yes, totally correct.
(In reply to Evgeny Slutsky from comment #3) > can you also check if the chronyd service is running on host: > > # systemctl status chronyd > ● chronyd.service - NTP client/server > Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor > preset: enabled) > Active: active (running) since Sun 2019-12-01 13:44:31 IST; 5h 43min ago > Docs: man:chronyd(8) > man:chrony.conf(5) > Main PID: 16572 (chronyd) > Tasks: 1 > CGroup: /system.slice/chronyd.service > └─16572 /usr/sbin/chronyd > > > and is synced with sources: > > # chronyc sources > 210 Number of sources = 4 > MS Name/IP address Stratum Poll Reach LastRx Last sample > > ============================================================================= > == > ^+ clock01.util.phx2.redhat> 1 10 377 28 +4512us[+4512us] +/- > 99ms > ^* clock1.rdu2.redhat.com 1 10 377 86 -1116us[-1117us] +/- > 69ms > ^+ clock.bos.redhat.com 1 10 377 662 -6362us[-6363us] +/- > 84ms > ^+ clock02.util.phx2.redhat> 1 10 377 86 +4531us[+4531us] +/- > 99ms # systemctl status chronyd ● chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: disabled) Active: inactive (dead) since Mon 2019-12-02 01:25:32 CST; 6h ago Docs: man:chronyd(8) man:chrony.conf(5) Process: 5913 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS) Process: 5908 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 5911 (code=exited, status=0/SUCCESS) Dec 02 04:23:28 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Starting NTP client/server... Dec 02 04:23:28 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG) Dec 02 04:23:28 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Started NTP client/server. Dec 02 04:23:46 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: Selected source 10.5.26.10 Dec 02 04:23:46 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: System clock wrong by -10797.272753 seconds, adjustment started Dec 02 01:23:49 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: System clock was stepped by -10797.272753 seconds Dec 02 01:25:32 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: chronyd exiting Dec 02 01:25:32 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Stopping NTP client/server... Dec 02 01:25:32 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Stopped NTP client/server. # chronyc sources 506 Cannot talk to daemon So weird, last time I tested with 6~7 times with different machines, auto and manual, always failed. But today I retest HE deployment again, it is successful. Close it.
Reopening and re-targeting to 4.4.2. Despite it didn't reproduce we'd like to investigate on this.
Test Version: RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso rhvm-appliance-4.3-20200128.0.el7.x86_64 Test Steps: 1. Install RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso 2. Deploy hosted engine via cockpit Result: Bug is reproduced with 4.3.8 new build. ERROR ] AuthError: Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator.. [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, "msg": "Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator.."}
Created attachment 1656637 [details] picture_20200131
Created attachment 1656638 [details] log_20200131
This blocks build HE testing now.
can you please check the chronyd service on the host : `systemctl status chronyd && chronyc sources`
by examining the /var/log/messages on the attached logs, chronyd service didnt start on the host since booting .
Test environment has been destroyed, let me re-do it.
(In reply to Wei Wang from comment #7) > Test Version: > RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso > rhvm-appliance-4.3-20200128.0.el7.x86_64 > > Test Steps: > 1. Install RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso > 2. Deploy hosted engine via cockpit > > Result: > Bug is reproduced with 4.3.8 new build. > ERROR ] AuthError: Error during SSO authentication access_denied : Cannot > authenticate user 'admin@internal': Unable to log in because the user > account has expired. Contact the system administrator.. > [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, > "msg": "Error during SSO authentication access_denied : Cannot authenticate > user 'admin@internal': Unable to log in because the user account has > expired. Contact the system administrator.."} Hello, Just an update with RHHI-V deployment We used 'Hyperconverged Deployment' from cockpit, and this creates and configures gluster volumes, and then proceeds to 'HE' deployment. We used DHCP based HE deployment and the deployment was successful with this build: - RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso - rhvm-appliance-4.3-20200128.0.el7.x86_64
Retest with RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso. Bug cannot detected anymore, I don't know why, I tested it has failed not only once that day.
This kind of bug had been detected twice (RHVH-4.3-20191128.0-RHVH-x86_64-dvd1.iso and RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso) with different build, which is occured every time in the first day, but next day the issue disappeared any more without doing any additional steps. Cannot find the really reproducing steps, maybe it should be a occasional issue.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days