Remote: Automatically manage known hosts keys#451
Conversation
|
Warning Rate limit exceeded@jan-janssen has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 45 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughMake known_hosts handling optional in RemoteQueueAdapter: _ssh_known_hosts is set to an absolute path if provided in config or an empty string "" otherwise. _open_ssh_connection loads host keys only when _ssh_known_hosts is non-empty; otherwise it sets AutoAddPolicy before connecting. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Test / User
participant RQA as RemoteQueueAdapter
participant SSH as Paramiko SSHClient
Caller->>RQA: __init__(config)
alt config contains known_hosts
RQA->>RQA: _ssh_known_hosts = abs(path)
else
RQA->>RQA: _ssh_known_hosts = ""
end
Caller->>RQA: _open_ssh_connection()
alt _ssh_known_hosts != ""
RQA->>SSH: load_host_keys(_ssh_known_hosts)
else
RQA->>SSH: set_missing_host_key_policy(AutoAddPolicy)
end
RQA->>SSH: connect(...)
SSH-->>RQA: connection/session
RQA-->>Caller: client/session
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #451 +/- ##
==========================================
+ Coverage 92.36% 92.39% +0.03%
==========================================
Files 17 17
Lines 1008 1012 +4
==========================================
+ Hits 931 935 +4
Misses 77 77 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
pysqa/base/remote.py (1)
448-451: Inconsistent and insecure host-key policy for proxy connectionFor the proxy (bastion) connection, AutoAddPolicy is hardcoded, bypassing the known_hosts behavior and opt-in policy you established earlier. This is a security hole and inconsistent with the rest of the connection setup.
Align the proxy client policy with the main client:
- client_new = paramiko.SSHClient() - client_new.set_missing_host_key_policy(paramiko.AutoAddPolicy()) + client_new = paramiko.SSHClient() + client_new.load_system_host_keys() + if self._ssh_known_hosts is not None: + try: + client_new.load_host_keys(self._ssh_known_hosts) + except FileNotFoundError: + warnings.warn( + f"known_hosts file not found at '{self._ssh_known_hosts}'. Using system host keys only.", + stacklevel=2, + ) + if getattr(self, "_ssh_auto_add_unknown_hosts", False): + client_new.set_missing_host_key_policy(paramiko.AutoAddPolicy()) + else: + client_new.set_missing_host_key_policy(paramiko.RejectPolicy())
🧹 Nitpick comments (2)
pysqa/base/remote.py (2)
28-35: Update attributes docstring to reflect Optional known_hosts and new policy flagdoc drift: _ssh_known_hosts can now be None; document it. Also document the new ssh_auto_add_unknown_hosts flag if you adopt it.
Apply this doc update:
- _ssh_known_hosts (str): The path to the known hosts file. + _ssh_known_hosts (Optional[str]): Path to the known hosts file. None uses system host keys. + _ssh_auto_add_unknown_hosts (bool): If True, auto-accept and add unknown host keys (unsafe by default).
351-469: Optional: Persist auto-added host keys to disk when a known_hosts path is configuredRight now, AutoAddPolicy only amends the in-memory host keys for the client session. If you intend to "automatically manage known hosts keys," consider persisting newly seen keys to the configured known_hosts file so future sessions verify them.
You can add a small helper and call it after each successful ssh.connect(...):
def _persist_server_key(self, ssh: paramiko.SSHClient) -> None: if not getattr(self, "_ssh_auto_add_unknown_hosts", False): return if not self._ssh_known_hosts: return transport = get_transport(ssh) server_key = transport.get_remote_server_key() hostkeys = paramiko.HostKeys() if os.path.exists(self._ssh_known_hosts): hostkeys.load(self._ssh_known_hosts) hostkeys.add(self._ssh_host, server_key.get_name(), server_key) os.makedirs(os.path.dirname(self._ssh_known_hosts), exist_ok=True) hostkeys.save(self._ssh_known_hosts)Then, right after each ssh.connect(...) call in _open_ssh_connection, invoke:
self._persist_server_key(ssh)And similarly after client_new.connect(...) for the proxy.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pysqa/base/remote.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unittest_matrix (windows-latest, 3.13)
- GitHub Check: unittest_slurm
pysqa/base/remote.py
Outdated
| if "known_hosts" in config: | ||
| self._ssh_known_hosts = os.path.abspath( | ||
| os.path.expanduser(config["known_hosts"]) | ||
| ) | ||
| else: | ||
| self._ssh_known_hosts = None |
There was a problem hiding this comment.
Harden known_hosts config handling; avoid empty-string bug and add explicit opt-in for auto-accepting unknown hosts
- Using "key in config" treats an empty string as a valid path and resolves it to the current working directory, causing paramiko to try to read a non-existent file.
- Defaulting to AutoAddPolicy downstream without explicit opt-in is a security regression; prefer system host keys by default and make auto-accept a config flag.
Proposed fix:
- Treat empty or missing known_hosts as None.
- Validate file existence and warn if missing.
- Introduce ssh_auto_add_unknown_hosts (default False).
Apply this diff:
- if "known_hosts" in config:
- self._ssh_known_hosts = os.path.abspath(
- os.path.expanduser(config["known_hosts"])
- )
- else:
- self._ssh_known_hosts = None
+ # Known hosts: allow None, but prefer system host keys when unspecified.
+ kh = config.get("known_hosts")
+ if kh:
+ path = os.path.abspath(os.path.expanduser(kh))
+ if os.path.exists(path):
+ self._ssh_known_hosts = path
+ else:
+ warnings.warn(
+ f"known_hosts file not found at '{path}'. Falling back to system host keys.",
+ stacklevel=2,
+ )
+ self._ssh_known_hosts = None
+ else:
+ self._ssh_known_hosts = None
+ # Opt-in flag to auto-accept unknown host keys (unsafe by default).
+ self._ssh_auto_add_unknown_hosts = bool(
+ config.get("ssh_auto_add_unknown_hosts", False)
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if "known_hosts" in config: | |
| self._ssh_known_hosts = os.path.abspath( | |
| os.path.expanduser(config["known_hosts"]) | |
| ) | |
| else: | |
| self._ssh_known_hosts = None | |
| # Known hosts: allow None, but prefer system host keys when unspecified. | |
| kh = config.get("known_hosts") | |
| if kh: | |
| path = os.path.abspath(os.path.expanduser(kh)) | |
| if os.path.exists(path): | |
| self._ssh_known_hosts = path | |
| else: | |
| warnings.warn( | |
| f"known_hosts file not found at '{path}'. Falling back to system host keys.", | |
| stacklevel=2, | |
| ) | |
| self._ssh_known_hosts = None | |
| else: | |
| self._ssh_known_hosts = None | |
| # Opt-in flag to auto-accept unknown host keys (unsafe by default). | |
| self._ssh_auto_add_unknown_hosts = bool( | |
| config.get("ssh_auto_add_unknown_hosts", False) | |
| ) |
🤖 Prompt for AI Agents
In pysqa/base/remote.py around lines 96 to 101, the current known_hosts handling
treats an empty string as a valid path and implicitly enables AutoAddPolicy
downstream; change it to treat missing or empty known_hosts as None, add a new
boolean config key ssh_auto_add_unknown_hosts defaulting to False, and only
enable auto-accept when that flag is true. Specifically: read
config.get("known_hosts") and if value is falsy (None or empty string) set
self._ssh_known_hosts = None; if a non-empty path is provided expand and abspath
it and then check os.path.exists(path) — if it does not exist set
self._ssh_known_hosts = None and emit a warning via the module logger; add
self._ssh_auto_add_unknown_hosts = bool(config.get("ssh_auto_add_unknown_hosts",
False)) so callers can decide whether to set AutoAddPolicy; ensure default
behavior prefers loading system host keys when known_hosts is None and only use
AutoAddPolicy when the new flag is true.
pysqa/base/remote.py
Outdated
| if self._ssh_known_hosts is not None: | ||
| ssh.load_host_keys(self._ssh_known_hosts) | ||
| else: | ||
| ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid silently trusting unknown hosts; load system host keys and gate AutoAddPolicy behind explicit opt-in
When _ssh_known_hosts is None, switching to AutoAddPolicy unconditionally trusts unknown hosts, enabling MITM risk. Prefer:
- Always load system host keys.
- Optionally load a user-provided known_hosts file.
- Use RejectPolicy by default; only AutoAdd when explicitly configured.
Apply this diff:
- if self._ssh_known_hosts is not None:
- ssh.load_host_keys(self._ssh_known_hosts)
- else:
- ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+ # Load system-level keys first; optionally supplement with a custom file.
+ ssh.load_system_host_keys()
+ if self._ssh_known_hosts is not None:
+ try:
+ ssh.load_host_keys(self._ssh_known_hosts)
+ except FileNotFoundError:
+ warnings.warn(
+ f"known_hosts file not found at '{self._ssh_known_hosts}'. Using system host keys only.",
+ stacklevel=2,
+ )
+ # Decide how to handle unknown hosts
+ if getattr(self, "_ssh_auto_add_unknown_hosts", False):
+ ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+ else:
+ ssh.set_missing_host_key_policy(paramiko.RejectPolicy())Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pysqa/base/remote.py around lines 352-355, the code currently sets
AutoAddPolicy when self._ssh_known_hosts is None which silently trusts unknown
hosts; change it to always call ssh.load_system_host_keys(), then if
self._ssh_known_hosts is set load that file too, and set the missing host key
policy to paramiko.RejectPolicy() by default; introduce or use an explicit
boolean config flag (e.g. self._allow_auto_add_hosts defaulting to False) so
AutoAddPolicy() is only applied when that flag is True (otherwise use
RejectPolicy()).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/test_remote_auth.py (1)
22-22: Switching to host-based config looks correct; consider adding coverage for the "no known_hosts" path.The updated QueueAdapter(directory=.../config/remote_rebex_hosts) is consistent with making known_hosts optional. To fully exercise the new behavior, add a test that sets _ssh_known_hosts to "" and asserts AutoAddPolicy is applied (without making a real network call).
Add this test method to TestRemoteQueueAdapterAuth:
def test_known_hosts_optional_auto_add_policy(self): # Ensures that when _ssh_known_hosts is empty, AutoAddPolicy is set from unittest.mock import patch class FakeSSHClient: def __init__(self, *args, **kwargs): self.auto_add_policy_set = False self.host_keys_loaded = False def load_host_keys(self, *args, **kwargs): self.host_keys_loaded = True def set_missing_host_key_policy(self, policy): # We don't depend on instance type here; just record that it was set self.auto_add_policy_set = True def connect(self, *args, **kwargs): # No-op to avoid real network calls pass path = os.path.dirname(os.path.abspath(__file__)) remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts")) # Force the "no known_hosts" branch and a password-based auth path remote._adapter._ssh_known_hosts = "" remote._adapter._ssh_key = None remote._adapter._ssh_ask_for_password = False remote._adapter._ssh_password = remote._adapter._ssh_password or "dummy" with patch("pysqa.base.remote.paramiko.SSHClient", FakeSSHClient): ssh = remote._adapter._open_ssh_connection() # Validate behavior self.assertTrue(ssh.auto_add_policy_set) self.assertFalse(ssh.host_keys_loaded)Also applies to: 29-29, 36-36
tests/test_remote.py (2)
162-170: Nit: fix typo in test name and ensure the persistent connection is closed.
- Typo: “continous” → “continuous” in the test name.
- Since you explicitly open a persistent connection, close it in a finally block to avoid leaking connections in CI.
Apply this diff:
-def test_remote_command_continous_connection_hosts(self): +def test_remote_command_continuous_connection_hosts(self): path = os.path.dirname(os.path.abspath(__file__)) remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts")) remote._adapter._ssh_remote_path = path remote._adapter._ssh_continous_connection = True - remote._adapter._open_ssh_connection() - output = remote._adapter._execute_remote_command(command="pwd") - self.assertEqual(output, "/\n") + ssh = remote._adapter._open_ssh_connection() + try: + output = remote._adapter._execute_remote_command(command="pwd") + self.assertEqual(output, "/\n") + finally: + if ssh is not None: + ssh.close() + remote._adapter._ssh_connection = NoneNote: The attribute name _ssh_continous_connection is misspelled in production code; keep it as-is here to avoid breaking behavior. If you plan to fix the spelling in code later, deprecate with a compatibility shim.
173-173: LGTM on switching remaining tests to host-based config; consider DRYing repeated path construction.All these tests now initialize via config/remote_rebex_hosts, which aligns with the new known_hosts handling. To reduce duplication and improve maintainability, compute the hosts config path once in setUpClass or setUp of TestRemoteQueueAdapterRebex and reuse it.
Example refactor (outside this hunk):
@classmethod def setUpClass(cls): cls.path = os.path.dirname(os.path.abspath(__file__)) cls.hosts_dir = os.path.join(cls.path, "config/remote_rebex_hosts") def _make_remote(self): return QueueAdapter(directory=self.hosts_dir) # Usage in tests: remote = self._make_remote()Alternatively, parameterize over both config/remote_rebex and config/remote_rebex_hosts using subTest to collapse duplicated tests.
Also applies to: 180-180, 186-186, 193-193, 200-200
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/config/remote_rebex_hosts/queue.yaml(1 hunks)tests/test_remote.py(1 hunks)tests/test_remote_auth.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/config/remote_rebex_hosts/queue.yaml
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/test_remote_auth.py (2)
pysqa/queueadapter.py (1)
QueueAdapter(13-384)pysqa/base/remote.py (1)
_open_ssh_connection(344-468)
tests/test_remote.py (2)
pysqa/queueadapter.py (2)
QueueAdapter(13-384)submit_job(188-231)pysqa/base/remote.py (5)
_open_ssh_connection(344-468)_execute_remote_command(559-582)submit_job(147-183)transfer_file(266-291)get_transport(680-684)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unittest_matrix (windows-latest, 3.13)
- GitHub Check: unittest_slurm
- GitHub Check: unittest_matrix (macos-latest, 3.13)
| def test_remote_command_individual_connections_hosts(self): | ||
| path = os.path.dirname(os.path.abspath(__file__)) | ||
| remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts")) | ||
| remote._adapter._ssh_remote_path = path | ||
| remote._adapter._open_ssh_connection() | ||
| output = remote._adapter._execute_remote_command(command="pwd") | ||
| self.assertEqual(output, "/\n") | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid opening a redundant SSH connection in the individual-connections host test.
When _ssh_continous_connection is False, _execute_remote_command opens and closes its own connection. Calling _open_ssh_connection beforehand (Line 158) creates an unused connection and may leak resources.
Apply this diff:
def test_remote_command_individual_connections_hosts(self):
path = os.path.dirname(os.path.abspath(__file__))
remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))
remote._adapter._ssh_remote_path = path
- remote._adapter._open_ssh_connection()
output = remote._adapter._execute_remote_command(command="pwd")
self.assertEqual(output, "/\n")Optionally, make the same change in test_remote_command_individual_connections above (Lines 141–143) for consistency.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_remote_command_individual_connections_hosts(self): | |
| path = os.path.dirname(os.path.abspath(__file__)) | |
| remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts")) | |
| remote._adapter._ssh_remote_path = path | |
| remote._adapter._open_ssh_connection() | |
| output = remote._adapter._execute_remote_command(command="pwd") | |
| self.assertEqual(output, "/\n") | |
| def test_remote_command_individual_connections_hosts(self): | |
| path = os.path.dirname(os.path.abspath(__file__)) | |
| remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts")) | |
| remote._adapter._ssh_remote_path = path | |
| output = remote._adapter._execute_remote_command(command="pwd") | |
| self.assertEqual(output, "/\n") |
🤖 Prompt for AI Agents
In tests/test_remote.py around lines 154 to 161, the test calls
remote._adapter._open_ssh_connection() before executing the command which is
redundant when _ssh_continous_connection is False and causes a leaked unused
connection; remove the explicit call to _open_ssh_connection() (line 158) so
_execute_remote_command manages its own connect/disconnect, and also apply the
same removal to the similar test at lines ~141–143 for consistency.
Summary by CodeRabbit
New Features
Tests