Description of problem: Attempting to add a second host to a new RHV environment, ansible-runner fails with NOAUTH "SH auth error - passwordless ssh not configured for '<IP address>'" and the host fails to add. We've tested passwordless ssh from the RHV-M and it works. Version-Release number of selected component (if applicable): RHV-M rhvm-4.4.5.11-0.1 Hypervisor vdsm-4.40.22-1 ansible-2.9.11-1 How reproducible: Unknown Steps to Reproduce: 1. Build new hosted engine environment. 2. ATtempt to add a second host Actual results: Failure to add host. Log entry in /var/log/ovirt-engine/ansible-runner- ~~~ service.log: runner_service.services.hosts - ERROR - SSH - NOAUTH:SSH auth error - passwordless ssh not configured for '<Host IP Address>' ~~~ Expected results: Successful addition of hypervisor to RHV-M. Additional info:
(In reply to Allie DeVolder from comment #0) > Description of problem: > Attempting to add a second host to a new RHV environment, ansible-runner > fails with NOAUTH "SH auth error - passwordless ssh not configured for '<IP > address>'" and the host fails to add. We've tested passwordless ssh from the > RHV-M and it works. > > Version-Release number of selected component (if applicable): > RHV-M > rhvm-4.4.5.11-0.1 > Hypervisor > vdsm-4.40.22-1 > ansible-2.9.11-1 4.4.5 is shipping ansible-2.9.17-1. Why do you have 2.9.11? Anyway, what do you mean by passwordless? How exactly is the host being added?
Are you sure that you are adding a host using ssh public key option? Are you using webadmin or RESTAPI? Also according to the customer case there were performed bunch of hacks on ansible-runner-service internals around ssh_key.pub, I highly suggest to undo all those hacks, there is no need to for them, engine SSH public key is passed to the ansible-runner-service when adding a host with public key authentication selected.
The workaround we tried was manually copying ssh_key.pub to /usr/share/ovirt-engine/ansible-runner-service-project/env/ssh_key.pub because the error before that was: ~~~ 2021-05-14 15:40:51,766 - runner_service.services.hosts - ERROR - SSH - NOAUTH:SSH auth error - passw ordless ssh not configured for '10.104.136.149' 2021-05-14 15:40:51,767 - flask.app - ERROR - Exception on /api/v1/hosts/10.104.136.149/groups/ovirt [POST] Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/lib/python3.6/site-packages/flask_restful/__init__.py", line 480, in wrapper resp = resource(*args, **kwargs) File "/usr/lib/python3.6/site-packages/flask/views.py", line 88, in view return self.dispatch_request(*args, **kwargs) File "/usr/lib/python3.6/site-packages/flask_restful/__init__.py", line 595, in dispatch_request resp = meth(*args, **kwargs) File "/usr/lib/python3.6/site-packages/runner_service/controllers/utils.py", line 29, in wrapper return f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/runner_service/controllers/hosts.py", line 212, in post response = add_host(host_name, group, ssh_port) File "/usr/lib/python3.6/site-packages/runner_service/services/hosts.py", line 49, in add_host r.data = {"pub_key": fread(pub_key_file)} File "/usr/lib/python3.6/site-packages/runner_service/utils.py", line 34, in fread with open(file_path, 'r') as file_fd: FileNotFoundError: [Errno 2] No such file or directory: '/usr/share/ovirt-engine/ansible-runner-servi ce-project/env/ssh_key.pub' ~~~ This was an attempt to see if the ansible script was failing to pull that file from the location, and the only difference was that the file then existed and ended without the FileNotFoundError. So, if I need to file a bugzilla against THAT instead of this one, then let me know and I'll do that. But the only difference in behavior is that missing file traceback.
(In reply to Allie DeVolder from comment #4) > The workaround we tried was manually copying ssh_key.pub to > /usr/share/ovirt-engine/ansible-runner-service-project/env/ssh_key.pub > because the error before that was: > > ~~~ > 2021-05-14 15:40:51,766 - runner_service.services.hosts - ERROR - SSH - > NOAUTH:SSH auth error - passw > ordless ssh not configured for '10.104.136.149' > 2021-05-14 15:40:51,767 - flask.app - ERROR - Exception on > /api/v1/hosts/10.104.136.149/groups/ovirt > [POST] > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/flask/app.py", line 1813, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/lib/python3.6/site-packages/flask/app.py", line 1799, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/lib/python3.6/site-packages/flask_restful/__init__.py", line > 480, in wrapper > resp = resource(*args, **kwargs) > File "/usr/lib/python3.6/site-packages/flask/views.py", line 88, in view > return self.dispatch_request(*args, **kwargs) > File "/usr/lib/python3.6/site-packages/flask_restful/__init__.py", line > 595, in dispatch_request > resp = meth(*args, **kwargs) > File > "/usr/lib/python3.6/site-packages/runner_service/controllers/utils.py", line > 29, in wrapper > return f(*args, **kwargs) > File > "/usr/lib/python3.6/site-packages/runner_service/controllers/hosts.py", line > 212, in post > response = add_host(host_name, group, ssh_port) > File "/usr/lib/python3.6/site-packages/runner_service/services/hosts.py", > line 49, in add_host > r.data = {"pub_key": fread(pub_key_file)} > File "/usr/lib/python3.6/site-packages/runner_service/utils.py", line 34, > in fread > with open(file_path, 'r') as file_fd: > FileNotFoundError: [Errno 2] No such file or directory: > '/usr/share/ovirt-engine/ansible-runner-servi > ce-project/env/ssh_key.pub' > ~~~ > > This was an attempt to see if the ansible script was failing to pull that > file from the location, and the only difference was that the file then > existed and ended without the FileNotFoundError. > > So, if I need to file a bugzilla against THAT instead of this one, then let > me know and I'll do that. But the only difference in behavior is that > missing file traceback. Above error is irrelevant, this file is not used at all by ansible-runner-service when the service is utilized by RHV Manager. I've just performed simple test and adding a new host using engine public key works as expected: 1. Go to the host and create SSH authorized_keys file: ssh root@<YOUR HOST> mkdir ~/.ssh chmod 700 ~/.ssh touch ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys curl --insecure 'https://<RHV MANAGER FQDN>/ovirt-engine/services/pki-resource?resource=engine-certificate&format=OPENSSH-PUBKEY' -o ~/.ssh/authorized_keys 2. Verify public key access to the host on RHV Manager ssh -i /etc/pki/ovirt-engine/keys/engine_id_rsa root@<YOUR HOST> 3. Add a host using public key in webadmin a. Go to Compute/Hosts and click on New Host b. Fill in Name, Hostname with correct values c. Select SSH Public Key in Authentication part d. Fill in other values if needed e. Click OK Above steps are enough to successfully add a host using engine SSH public key. I've also checked and /usr/share/ovirt-engine/ansible-runner-service-project/env/ssh_key.pub doesn't exist. So could you please remove that file (and oany other manual modifications that has been done) and try above steps on the setup?
The customer was using the previous version of RHV-H rather than the 4.4.5 version released last week. They're currently wiping and reinstalled from the newest ISO and I'll report back if the issue persists.
Test with rhvh-4.4.5.4-0.20210330.0 I used two latest rhvm-appliance build to deploy he, after he set up, I check the rhvm version a) rhvm-appliance-4.4-20210310.0.el8ev.x86_64 [root@rhevh-hostedengine-vm-05 ~]# rpm -qa|grep rhvm rhvm-branding-rhv-4.4.7-1.el8ev.noarch rhvm-dependencies-4.4.1-1.el8ev.noarch rhvm-4.4.4.7-0.2.el8ev.noarch rhvm-setup-plugins-4.4.2-1.el8ev.noarch rhvm-appliance-4.4-20210402.1.el8ev.x86_64 [root@rhevh-hostedengine-vm-05 ~]# rpm -qa|grep rhvm rhvm-dependencies-4.4.1-1.el8ev.noarch rhvm-4.4.6.1-0.11.el8ev.noarch rhvm-setup-plugins-4.4.2-1.el8ev.noarch rhvm-branding-rhv-4.4.7-1.el8ev.noarch So firstly, I cannot get the rhvm-4.4.5 from the rhvm-appliance build which we usually used with testing. Second I used the rhvm-appliance-4.4-20210402.1.el8ev.x86_64 to retest this issue, using two method (password authentication and public key authentication) to add second host to he environment, successful with all the two method Guess the problem seems related to your rhvm-4.4.5.11-0.1.
QE doesn't have the customer's environment to reproduce this issue, could you please help to verify it? Thanks!
(In reply to Wei Wang from comment #26) > QE doesn't have the customer's environment to reproduce this issue, could > you please help to verify it? > > Thanks! Verification of a change we implemented is easy, there should be one SSH connection less observed when adding a new host. Please take a look at comment 14: 1. There should be a new SSH connection observed on the host correlated to ssh-copy-id (message "Executing ssh-copy-id command on host" in engine.log 2. There should be e new SSH connection observed on the host related to execution of Ansible playbook performing host deploy process Without the fix there is another new SSH connection from engine to the host. With the fix there shouldn't be any other connection between those described above.
The latest rhvm-appliance-4.4-20210527.0.el8ev.x86_64 is include ovirt-engine-4.3.11.3-0.1.el7.noarch, QE will verify this bug until the build including ovirt-engine-4.4.7.2 coming.
Test Version: RHVH-4.4-20210624.0-RHVH-x86_64-dvd1.iso rhvm-appliance-4.4-20210625.0.el8ev.x86_64 ovirt-engine-4.4.7.5-0.9.el8ev.noarch Test Step: 1. Clean install RHVH with host A and B 2. Deploy hosted engine with host A 3. Add additional host B to hosted engine environment with passwordless ssh authentication according to comment 8 and comment 5 4. Check the SSH connection in engine.log Test Result: 2021-06-28 11:39:12,302+08 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [7ca1a78a] EVENT_ID: VDS_ANSIBLE_INSTALL_STARTED(560), Ansible host-deploy playbook execution has started on host hp-dl388g9-04.lab.eng.pek2.redhat.com. No error runner_service.services.hosts - ERROR - SSH - NOAUTH:SSH auth error - passwordless ssh not configured in ansible-runner-service.log and no error in sshd log bug is fixed, move it to "VERIFIED"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) security update [ovirt-4.4.7]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2865