Bug 1480507

Summary: tests: Pre-requisite setup to run geo-rep test case on regression machines.
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: project-infrastructureAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: avishwan, bugs, gluster-infra, mscherer, nigelb
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-19 00:28:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Kotresh HR 2017-08-11 09:39:55 UTC
Description of problem:

Geo-replication test cases were disabled in master. I have sent a patch [1]
to re-enable geo-replication test cases. But it will fail as few pre-requisite steps are required.

Pre-requisite:
1. Setup passwordless SSH in all regression test machines for root. Please add it to the script which will spawn new regression machines as it will avoid doing it again on witnessing failures.

[1]: https://review.gluster.org/#/c/18024/1

I remember we did do few other changes as the path to install gluster binaries is different in regression machines. But Let's give it a run after having password less SSH. We can get through this step by step. 

Let me know if any doubts.

Version-Release number of selected component (if applicable):
mainline

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 M. Scherer 2017-08-11 12:00:20 UTC
We already do have authentication setup:

https://github.com/gluster/gluster.org_ansible_configuration/blob/master/roles/jenkins_builder/tasks/authroot_georep.yml

Can you detail what kind of failure you are seeing now ?

Comment 2 Kotresh HR 2017-08-11 12:43:26 UTC
(In reply to M. Scherer from comment #1)
> We already do have authentication setup:
> 
> https://github.com/gluster/gluster.org_ansible_configuration/blob/master/
> roles/jenkins_builder/tasks/authroot_georep.yml
> 
> Can you detail what kind of failure you are seeing now ?

Cool, I will trigger run and try it out. I thought, with new regression machines, it is no longer there.

Thanks,
Kotresh HR

Comment 3 Kotresh HR 2017-08-12 08:30:16 UTC
(In reply to Kotresh HR from comment #2)
> (In reply to M. Scherer from comment #1)
> > We already do have authentication setup:
> > 
> > https://github.com/gluster/gluster.org_ansible_configuration/blob/master/
> > roles/jenkins_builder/tasks/authroot_georep.yml
> > 
> > Can you detail what kind of failure you are seeing now ?
> 
> Cool, I will trigger run and try it out. I thought, with new regression
> machines, it is no longer there.
> 
> Thanks,
> Kotresh HR

But the regression [1] is still failing with no password less SSH for root as below.

13:10:11 [13:10:11] Running tests in file ./tests/geo-rep/georep-basic-dr-rsync.t
13:10:24 Passwordless ssh login has not been setup with slave32.cloud.gluster.org for user root.
13:11:25 Geo-replication session between master and slave32.cloud.gluster.org::slave does not exist.
13:11:26 Geo-replication session between master and slave32.cloud.gluster.org::slave does not exist.

[1] https://build.gluster.org/job/centos6-regression/5975/consoleFull

May be it's not copied to authorized_keys properly ?

Comment 4 Aravinda VK 2017-08-16 06:18:16 UTC
Kotresh,

I suggest skip gsec_create and push-pem steps from the geo-rep tests.

- Run gsec_create command as part of setup and store all the files outside the build setup(May be in /root/data/georep_keys/)
- Add the common_secret.pem.pub content to same node authorized_keys file.

Every Geo-rep test will copy secret.*.pem files to $BUILD/var/lib/glusterd/geo-replication/ and create session without `push-pem`.

Comment 5 Kotresh HR 2017-08-16 09:42:27 UTC
(In reply to Aravinda VK from comment #4)
> Kotresh,
> 
> I suggest skip gsec_create and push-pem steps from the geo-rep tests.
> 
> - Run gsec_create command as part of setup and store all the files outside
> the build setup(May be in /root/data/georep_keys/)
> - Add the common_secret.pem.pub content to same node authorized_keys file.
> 
> Every Geo-rep test will copy secret.*.pem files to
> $BUILD/var/lib/glusterd/geo-replication/ and create session without
> `push-pem`.

Yeah, seems to be better approach. 

@mscherer, can we get that done ?

Comment 6 M. Scherer 2017-08-16 11:59:32 UTC
No, I rather try to figure what was wrong, and why it was working before.
And if that was never working, why it wasn't detected sooner.

So far, I did found some quoting issue (and pushed a fix), but this was likely present since more than 1 year. 

Were the test disabled since that time ?

Also, the key to connect is /root/.ssh/id_georep , you can use it for the test. I did make sure this is properly limited to localhost, to avoid various security issue. 

And the reason this is done like this is because the tests were not cleaning up  the old root key when they broke (or when it did block the system), which in turn did cause trouble to connect as root after a while, IIRC (cause there is a limit, if only computational to the number of key you can place in authorized_keys).

We did manage last time to avoid breakage since we were using salt, but now we use ansible, any sshd breakage would be much more annoying to fix.

Comment 7 Nigel Babu 2017-08-17 03:58:02 UTC
(In reply to M. Scherer from comment #6)
> No, I rather try to figure what was wrong, and why it was working before.
> And if that was never working, why it wasn't detected sooner.

Agreed. Kotresh and Aravinda, please do not fix this from the tests, but let us fix this from the infra end.

Comment 8 Kotresh HR 2017-08-17 13:56:00 UTC
Hi M. Scherer/Nigel,

I see that, it is creating separate ssh key pair for geo-rep( /root/.ssh/id_georep). This will not work as geo-rep won't use ssh with -i option.
It requires passwordless for root with default ssh key pair (/root/.ssh/id_rsa)

Testcase:
ssh root@<local-hostname>

(with out -i flag, should login with out password on all regression machines)

I think previously, they had setup this and it was working on few and was failing on which it was not setup.

Thanks,
Kotresh HR

Comment 9 Kotresh HR 2017-11-14 07:23:37 UTC
Hi M. Scherer/Nigel,

Did we do any progress with this ?It's been really long time.

Comment 10 Nigel Babu 2017-11-14 08:51:50 UTC
So you need to be able to do root@slave21.cloud.gluster.org from inside slave21.cloud.gluster.org without password and have it work?

Misc, is this okay as far as we're concerned. Doing it isn't actually a big deal. Probably a tweak in our scripts.

Comment 11 M. Scherer 2017-11-14 09:55:23 UTC
IIRC, this was fixed, there is a link from id_rsa to the right key.

Comment 12 Kotresh HR 2017-11-14 10:30:17 UTC
You mean to say, the test case mentioned in comment 8 or comment 11 works in all the regression machines?

Comment 13 Kotresh HR 2017-11-17 10:10:43 UTC
Sorry, the latest run also failed for the same reason. Please fix it soon.

https://build.gluster.org/job/centos6-regression/7456/consoleFull

Comment 14 Kotresh HR 2017-11-23 07:17:17 UTC
Why is this taking so long to be fixed ?

Comment 15 Nigel Babu 2017-11-23 07:32:02 UTC
Michael has been taking some time off due to PTOs lapsing. I'll make sure we look into solving this today.

Comment 16 M. Scherer 2017-11-23 08:31:16 UTC
Ok so this was weird.

We have the ssh key in place, in /root/.ss/id_georep, and it work:
[root@slave23 .ssh]# ssh -i id_georep   root@127.0.0.1 id
uid=0(root) gid=0(root) groups=0(root) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

And we have a symlink to the key, so we should use it right away:
[root@slave23 .ssh]# ls -l id_rsa
lrwxrwxrwx. 1 root root 20 Nov  3 17:56 id_rsa -> /root/.ssh/id_georep

But it didn't work.

Turn out that ssh verify the .pub and fail to load the key if it doesn't match. I am gonna fix this cluster wide.

Comment 17 Kotresh HR 2017-11-24 06:42:44 UTC
But note that geo-rep would not use specific ssh key.

Expectation is "ssh root@<hostname>" should work and not "ssh -i <ssh-key> root@<hostname>"

Comment 18 M. Scherer 2017-11-24 09:45:14 UTC
It do work:

[root@slave25 ~]# ssh root@127.0.0.1 id
uid=0(root) gid=0(root) groups=0(root) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Comment 19 Kotresh HR 2017-11-27 06:21:18 UTC
Then, why is the following run on machine (slave27.cloud.gluster.org ) failing with passwordless SSH ?

https://build.gluster.org/job/centos6-regression/7672/console

---

06:16:25 [06:16:30] Running tests in file ./tests/geo-rep/georep-basic-dr-rsync.t

06:16:35 Passwordless ssh login has not been setup with slave27.cloud.gluster.org for user root.

06:17:36 Geo-replication session between master and slave27.cloud.gluster.org::slave does not exist.

06:17:36 Geo-replication session between master and slave27.cloud.gluster.org::slave does not exist.

06:19:37 stat: cannot stat `/mnt/glusterfs/1/hybrid_f1': No such file or directory

06:19:38 stat: cannot stat `/mnt/glusterfs/1/hybrid_f1': No such file or directory
-----------------

Comment 20 Nigel Babu 2017-11-27 06:38:21 UTC
(In reply to M. Scherer from comment #18)
> It do work:
> 
> [root@slave25 ~]# ssh root@127.0.0.1 id
> uid=0(root) gid=0(root) groups=0(root)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

[root@slave27 ~]# ssh root@slave27.cloud.gluster.org id
root@slave27.cloud.gluster.org's password: 

It seems to only work for 127.0.0.1.

Comment 21 Kotresh HR 2017-11-27 06:45:43 UTC
Ok. Easy it fix it ? Let us know if any concerns. We will change geo-rep to use any given ssh key for geo-rep setup ( https://github.com/gluster/glusterfs/issues/362 )

Comment 22 Nigel Babu 2017-11-27 06:50:10 UTC
Ah, there's a `from="127.0.0.1"` which should probably be `from-$(hostname)`. But this needs an ansible fix + misc to look at the security implications. I'll ping him when he's online.

Comment 23 Kotresh HR 2017-11-27 07:19:34 UTC
In meantime, I will re-trigger changing the slave host to "127.0.0.1". If it works, we can live with it for now.

Comment 24 Nigel Babu 2017-11-27 08:02:25 UTC
If that is a possibility for the tests, that would be the best way to handle this. Less security concerns for us.

Comment 25 Kotresh HR 2017-12-21 09:24:09 UTC
I have changed the geo-rep scripts to use the exported ssh identify file to use for geo-rep creation. With the patch [1], it uses "/root/.ssh/id_georep" to push
ssh keys. I am assuming the password less is setup in all regression machines with identity key "/root/.ssh/id_geo-rep".

Once [1] gets merged, we can close this bug.

[1]: https://review.gluster.org/#/c/18024/

Comment 26 Kotresh HR 2017-12-21 09:51:23 UTC
Hi Nigel,

The slave25 regression machines has failed at [1] even with "/root/.ssh/id_georep". Could you just confirm me passwordless SSH is configured on slave25 with that identity key.

[1] https://build.gluster.org/job/centos6-regression/8099/console

Thanks,
Kotresh HR

Comment 27 Nigel Babu 2017-12-21 10:13:40 UTC
No, no. SSH will work as root@127.0.0.1 in all machines without needing a special key to use. You have to use 127.0.0.1 as the hostname though.

Comment 28 Kotresh HR 2017-12-21 10:25:57 UTC
But that is failing for some reason on regression machines and passing locally.
Could you lend a regression machine to debug ?

Comment 29 Nigel Babu 2017-12-22 02:34:23 UTC
Can you add your SSH key to the bug, please? Ping me on Freenode once you've added it to the bug and I can give you a machine instantly.

Comment 30 Kotresh HR 2018-01-03 06:14:08 UTC
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQujcXnq9qOCmTWs44GwKOw48BxZGBmo9AeFk3KPRx33Cz5hS7UHPYvJy03KdoMySgxJtuxsOSvjJSPxpXesTtArO5giW8RLyuVsu9q5j/GhPKyABttuGQyexiRokrHQFLVifzgqwsUARwHarWH16Oa1n6fZPFNqSH56c872zS4Pwqgkzx99NKRWh/B+fIk8VmDzzP1qvQAnUXDeTeTOzapAL+8fSNTp3QnhmZbvCHYwUxfSqJyzq1wBL+517WrvzvX7yCY9B0wzm72sIO4daV6UDyAL1B72QpB04vEZZ/skwS/0jmokfBn43HHCMPy/Mxywfi54alFJzIu4pti30V kravishankar@localhost.localdomain

Comment 31 Nigel Babu 2018-01-03 07:42:51 UTC
You should be able to SSH in as jenkins@slave32.cloud.gluster.org

Comment 32 Nigel Babu 2018-01-19 00:28:14 UTC
Machine is back in the pool.