.Cephadm operations may fail when interactive shell aliases are present
In Red Hat Ceph Storage 7, cephadm uses the shell `mv` command on remote hosts. If the cephadm SSH user has interactive aliases such as `mv='mv -i'` (and similar for `rm` or `cp`), these aliases trigger prompts and block cephadm operations. As a result, commands like `ceph orch upgrade`, `cephadm bootstrap`, or adding hosts may hang or fail because `mv` waits for user confirmation instead of running non-interactively.
Currently there is no workaround. To avoid this issue, remove or disable interactive aliases for `mv`, `cp`, and `rm` for the cephadm SSH user. For example, comment them out in `.bashrc` or define them only for interactive shells, then rerun the cephadm operation.
Descriptionaruffin@redhat.com
2025-04-15 22:24:01 UTC
Description of problem:
Partner ran into trouble upgrading their Ceph managers from Ceph 5 to Ceph 7.
Partner used the following command to upgrade:
ceph orch upgrade start --image <image> --daemon_types mgr --hosts <hostname>
Networking and SSH connectivity between their three mgr servers worked.
Two of the three mgrs complete, but the third mgr stalls and then fails after some time:
# date; ceph orch upgrade status
Tue Mar 18 07:11:10 AM PDT 2025
{
"target_image": "madrid:5000/rhceph/rhceph-7-rhel9@sha256:da42dd4fe433419e859ab68a4c1cb350568304a0bd3140a86129c35987b26b34",
"in_progress": true,
"which": "Upgrading daemons of type(s) mgr on host(s) madrid-aio2",
"services_complete": [],
"progress": "0/1 daemons upgraded",
"message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect to host madrid-aio2 at addr ",
"is_paused": true
}
After a bit of collaboration with Red Hat, partner realized there were aliases set within their .bashrc that were causing the upgrade to pause indefinitely waiting for interaction.
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
While they have since rectified this situation, they noted that these aliases have been in place since April 2016 and have not been an issue until upgrading to Ceph 7.
Further investigation shows the difference between Ceph 5 and 7 is the use of the bash shell's "mv" command to update the ceph.conf file.
From the partner:
"I found ceph 7 uses the mv command in the write_files code path to update the ceph.conf file as seen here on line 276 in /usr/share/ceph/mgr/cephadm/ssh.py:
245 async def _write_remote_file(self,
:
276 await self._check_execute_command(host, ['mv', tmp_path, path], addr=addr)
That ceph 7 mv command was picking up our 'mv -i' alias, causing it to wait at this prompt: mv: overwrite '/etc/ceph/ceph.conf'?
p.s. I believe we did not see the 'mv -i' issue on ceph 5 because ceph 5 /usr/share/ceph/mgr/cephadm/remotes.py uses python os.rename() instead of mv command."
As it is fairly common for customers to set aliases in their shells, this code change could cause other unforeseen issues that would be difficult to track down for those that upgrade.
Is it possible to account for this by unaliasing the shell environment ceph uses (unalias -a)? Or could we add a warning to the release notes on aliases?
How reproducible:
very
Steps to Reproduce:
1. set up the following aliases in .bashrc:
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
2. cephadm bootstrap
3. Add new host from "cephadm shell"
Actual results:
/usr/bin/ssh-copy-id: ERROR: failed to create required temporary directory under ~/.ssh
Expected results:
ceph shell allows access
Additional info:
Description of problem: Partner ran into trouble upgrading their Ceph managers from Ceph 5 to Ceph 7. Partner used the following command to upgrade: ceph orch upgrade start --image <image> --daemon_types mgr --hosts <hostname> Networking and SSH connectivity between their three mgr servers worked. Two of the three mgrs complete, but the third mgr stalls and then fails after some time: # date; ceph orch upgrade status Tue Mar 18 07:11:10 AM PDT 2025 { "target_image": "madrid:5000/rhceph/rhceph-7-rhel9@sha256:da42dd4fe433419e859ab68a4c1cb350568304a0bd3140a86129c35987b26b34", "in_progress": true, "which": "Upgrading daemons of type(s) mgr on host(s) madrid-aio2", "services_complete": [], "progress": "0/1 daemons upgraded", "message": "Error: UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect to host madrid-aio2 at addr ", "is_paused": true } After a bit of collaboration with Red Hat, partner realized there were aliases set within their .bashrc that were causing the upgrade to pause indefinitely waiting for interaction. alias rm='rm -i' alias cp='cp -i' alias mv='mv -i' While they have since rectified this situation, they noted that these aliases have been in place since April 2016 and have not been an issue until upgrading to Ceph 7. Further investigation shows the difference between Ceph 5 and 7 is the use of the bash shell's "mv" command to update the ceph.conf file. From the partner: "I found ceph 7 uses the mv command in the write_files code path to update the ceph.conf file as seen here on line 276 in /usr/share/ceph/mgr/cephadm/ssh.py: 245 async def _write_remote_file(self, : 276 await self._check_execute_command(host, ['mv', tmp_path, path], addr=addr) That ceph 7 mv command was picking up our 'mv -i' alias, causing it to wait at this prompt: mv: overwrite '/etc/ceph/ceph.conf'? p.s. I believe we did not see the 'mv -i' issue on ceph 5 because ceph 5 /usr/share/ceph/mgr/cephadm/remotes.py uses python os.rename() instead of mv command." As it is fairly common for customers to set aliases in their shells, this code change could cause other unforeseen issues that would be difficult to track down for those that upgrade. Is it possible to account for this by unaliasing the shell environment ceph uses (unalias -a)? Or could we add a warning to the release notes on aliases? How reproducible: very Steps to Reproduce: 1. set up the following aliases in .bashrc: alias rm='rm -i' alias cp='cp -i' alias mv='mv -i' 2. cephadm bootstrap 3. Add new host from "cephadm shell" Actual results: /usr/bin/ssh-copy-id: ERROR: failed to create required temporary directory under ~/.ssh Expected results: ceph shell allows access Additional info: