1778087 – Hosted engine deploy failed since "Obtain SSO token using username/**FILTERED** credentials"

Bug 1778087 - Hosted engine deploy failed since "Obtain SSO token using username/**FILTERED** credentials"

Summary: Hosted engine deploy failed since "Obtain SSO token using username/**FILTERED...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	ovirt-hosted-engine-setup
Classification:	oVirt
Component:	General
Sub Component:
Version:	---
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Evgeny Slutsky
QA Contact:	Wei Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-11-29 09:42 UTC by Wei Wang
Modified:	2023-09-14 05:47 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-01 10:25:17 UTC
oVirt Team:	Integration
Embargoed:
Dependent Products:
Flags:	weiwang: testing_ack+

Attachments	(Terms of Use)
log files (135.94 KB, image/png) 2019-11-29 09:42 UTC, Wei Wang	no flags	Details
log files (1.02 MB, application/gzip) 2019-11-29 09:43 UTC, Wei Wang	no flags	Details
picture_20200131 (98.51 KB, image/png) 2020-01-31 05:58 UTC, Wei Wang	no flags	Details
log_20200131 (1.07 MB, application/gzip) 2020-01-31 06:00 UTC, Wei Wang	no flags	Details
View All

Description Wei Wang 2019-11-29 09:42:33 UTC

Created attachment 1640616 [details]
log files

Description of problem:
Hosted engine deploys failed since "Obtain SSO token using username/**FILTERED** credentials". 
1). Hosted engine is up in cockpit hosted engine page.
2). Engine vm is up in cockpit hosted engine page.
3). Message "Unable to log in because the user account has expired. Contact the system administrator." displays in the landingpage of engine vm.

log: ovirt-hosted-engine-setup-ansible-create_target_vm-20191029151223-eh9zs7.log

2019-11-29 18:34:31,410+0800 DEBUG var changed: host "localhost" var "ovirt_sso_auth" type "<type 'dict'>" value: "{
    "attempts": 50,
    "changed": false,
    "exception": "Traceback (most recent call last):\n  File \"/tmp/ansible_ovirt_auth_payload_xHejoy/ansible_ovirt_auth_payload.zip/ansible/modules/cloud/ovirt/ovirt_auth.py\", line 276, in main\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 382, in authenticate\n    self._sso_token = self._get_access_token()\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 628, in _get_access_token\n    sso_error[1]\nAuthError: Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator..\n",
    "failed": true,
    "msg": "Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator.."
}"
2019-11-29 18:34:31,410+0800 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Obtain SSO token using username/**FILTERED** credentials', 'ansible_result': u'type: <type \'dict\'>\nstr: {u\'exception\': u\'Traceback (most recent call last):\\n  File "/tmp/ansible_ovirt_auth_payload_xHejoy/ansible_ovirt_auth_payload.zip/ansible/modules/cloud/ovirt/ovirt_auth.py", line 276, in main\\n  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 382, in authenticate\\n    self._ss', 'task_duration': 727, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'}



Version-Release number of selected component (if applicable):
RHVH-4.3-20191128.0-RHVH-x86_64-dvd1.iso
cockpit-dashboard-195-1.el7.x86_64
cockpit-system-195-1.el7.noarch
cockpit-195-1.el7.x86_64
cockpit-bridge-195-1.el7.x86_64
cockpit-ws-195-1.el7.x86_64
cockpit-storaged-195-1.el7.noarch
cockpit-ovirt-dashboard-0.13.8-1.el7ev.noarch
cockpit-machines-ovirt-195-1.el7.noarch
rhvm-appliance-4.3-20191127.0.el7.x86_64
ovirt-hosted-engine-ha-2.3.6-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.12-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
Deploy hosted engine via cockpit UI

Actual results:
Hosted engine deploys failed since "Obtain SSO token using username/**FILTERED** credentials".

Expected results:
Hosted engine deploys successfully.

Additional info:

Comment 1 Wei Wang 2019-11-29 09:43:39 UTC

Created attachment 1640617 [details]
log files

Comment 2 Evgeny Slutsky 2019-12-01 16:45:14 UTC

is one of your provided passwords (engine OS/Admin) part of the engine/host FQDN?

Comment 3 Evgeny Slutsky 2019-12-01 17:29:25 UTC

can you also check if the chronyd service is running on host:

# systemctl status chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-12-01 13:44:31 IST; 5h 43min ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
 Main PID: 16572 (chronyd)
    Tasks: 1
   CGroup: /system.slice/chronyd.service
           └─16572 /usr/sbin/chronyd


and is synced with sources:

# chronyc sources
210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^+ clock01.util.phx2.redhat>     1  10   377    28  +4512us[+4512us] +/-   99ms
^* clock1.rdu2.redhat.com        1  10   377    86  -1116us[-1117us] +/-   69ms
^+ clock.bos.redhat.com          1  10   377   662  -6362us[-6363us] +/-   84ms
^+ clock02.util.phx2.redhat>     1  10   377    86  +4531us[+4531us] +/-   99ms

Comment 4 Wei Wang 2019-12-02 00:06:13 UTC

(In reply to Evgeny Slutsky from comment #2)
> is one of your provided passwords (engine OS/Admin) part of the engine/host
> FQDN?

Yes, totally correct.

Comment 5 Wei Wang 2019-12-02 02:52:39 UTC

(In reply to Evgeny Slutsky from comment #3)
> can you also check if the chronyd service is running on host:
> 
> # systemctl status chronyd
> ● chronyd.service - NTP client/server
>    Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor
> preset: enabled)
>    Active: active (running) since Sun 2019-12-01 13:44:31 IST; 5h 43min ago
>      Docs: man:chronyd(8)
>            man:chrony.conf(5)
>  Main PID: 16572 (chronyd)
>     Tasks: 1
>    CGroup: /system.slice/chronyd.service
>            └─16572 /usr/sbin/chronyd
> 
> 
> and is synced with sources:
> 
> # chronyc sources
> 210 Number of sources = 4
> MS Name/IP address         Stratum Poll Reach LastRx Last sample            
> 
> =============================================================================
> ==
> ^+ clock01.util.phx2.redhat>     1  10   377    28  +4512us[+4512us] +/-  
> 99ms
> ^* clock1.rdu2.redhat.com        1  10   377    86  -1116us[-1117us] +/-  
> 69ms
> ^+ clock.bos.redhat.com          1  10   377   662  -6362us[-6363us] +/-  
> 84ms
> ^+ clock02.util.phx2.redhat>     1  10   377    86  +4531us[+4531us] +/-  
> 99ms

# systemctl status chronyd
● chronyd.service - NTP client/server
   Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Mon 2019-12-02 01:25:32 CST; 6h ago
     Docs: man:chronyd(8)
           man:chrony.conf(5)
  Process: 5913 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 5908 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 5911 (code=exited, status=0/SUCCESS)

Dec 02 04:23:28 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Starting NTP client/server...
Dec 02 04:23:28 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Dec 02 04:23:28 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Started NTP client/server.
Dec 02 04:23:46 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: Selected source 10.5.26.10
Dec 02 04:23:46 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: System clock wrong by -10797.272753 seconds, adjustment started
Dec 02 01:23:49 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: System clock was stepped by -10797.272753 seconds
Dec 02 01:25:32 hp-dl388g9-04.lab.eng.pek2.redhat.com chronyd[5911]: chronyd exiting
Dec 02 01:25:32 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Stopping NTP client/server...
Dec 02 01:25:32 hp-dl388g9-04.lab.eng.pek2.redhat.com systemd[1]: Stopped NTP client/server.
# chronyc sources
506 Cannot talk to daemon

So weird, last time I tested with 6~7 times with different machines, auto and manual, always failed. But today I retest HE deployment again, it is successful. 
Close it.

Comment 6 Sandro Bonazzola 2019-12-02 08:39:50 UTC

Reopening and re-targeting to 4.4.2. Despite it didn't reproduce we'd like to investigate on this.

Comment 7 Wei Wang 2020-01-31 05:57:41 UTC

Test Version:
RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso
rhvm-appliance-4.3-20200128.0.el7.x86_64

Test Steps:
1. Install RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso
2. Deploy hosted engine via cockpit

Result:
Bug is reproduced with 4.3.8 new build.
 ERROR ] AuthError: Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator..
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, "msg": "Error during SSO authentication access_denied : Cannot authenticate user 'admin@internal': Unable to log in because the user account has expired. Contact the system administrator.."}

Comment 8 Wei Wang 2020-01-31 05:58:26 UTC

Created attachment 1656637 [details]
picture_20200131

Comment 9 Wei Wang 2020-01-31 06:00:26 UTC

Created attachment 1656638 [details]
log_20200131

Comment 10 Wei Wang 2020-02-03 03:32:19 UTC

This blocks build HE testing now.

Comment 11 Evgeny Slutsky 2020-02-03 07:28:07 UTC

can you please check the chronyd service on the host :  `systemctl status chronyd && chronyc sources`

Comment 12 Evgeny Slutsky 2020-02-03 07:39:25 UTC

by examining the /var/log/messages on the attached logs, chronyd service didnt start on the host since booting .

Comment 13 Wei Wang 2020-02-03 07:53:43 UTC

Test environment has been destroyed, let me re-do it.

Comment 14 SATHEESARAN 2020-02-03 08:54:49 UTC

(In reply to Wei Wang from comment #7)
> Test Version:
> RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso
> rhvm-appliance-4.3-20200128.0.el7.x86_64
> 
> Test Steps:
> 1. Install RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso
> 2. Deploy hosted engine via cockpit
> 
> Result:
> Bug is reproduced with 4.3.8 new build.
>  ERROR ] AuthError: Error during SSO authentication access_denied : Cannot
> authenticate user 'admin@internal': Unable to log in because the user
> account has expired. Contact the system administrator..
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false,
> "msg": "Error during SSO authentication access_denied : Cannot authenticate
> user 'admin@internal': Unable to log in because the user account has
> expired. Contact the system administrator.."}


Hello,

Just an update with RHHI-V deployment

We used 'Hyperconverged Deployment' from cockpit, and this creates and configures gluster volumes,
and then proceeds to 'HE' deployment.

We used DHCP based HE deployment and the deployment was successful with this build:
- RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso
- rhvm-appliance-4.3-20200128.0.el7.x86_64

Comment 15 Wei Wang 2020-02-03 09:45:44 UTC

Retest with RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso. Bug cannot detected anymore, I don't know why, I tested it has failed not only once that day.

Comment 16 Wei Wang 2020-03-02 01:24:34 UTC

This kind of bug had been detected twice (RHVH-4.3-20191128.0-RHVH-x86_64-dvd1.iso and RHVH-4.3-20200128.0-RHVH-x86_64-dvd1.iso) with different build, which is occured every time in the first day, but next day the issue disappeared any more without doing any additional steps. Cannot find the really reproducing steps, maybe it should be a occasional issue.

Comment 17 Red Hat Bugzilla 2023-09-14 05:47:49 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.