Conversation
| for tsg in tsgs: | ||
| if path.endswith('.log'): | ||
| xyz = parse_geometry(path) | ||
| time = parse_orca_runtime(path) |
There was a problem hiding this comment.
So, I see you made a new function to parse the ORCA log file for the time it took. I know in the submit.sh template files we generate an initial_time and final_time file to calculate the execution/run time of a job IIRC. Should we then be going forward creating run time parsers for gaussian etc. and use this type of logic?
Also, I don't know if feasible and may require a lot of extra work and redoing functions/logic but would it be better to build the runtime parser in the specific ESS adapter itself? That way, if someone writes a new adapter/updates one, they just need to change it in the adapter itself. But that's just a small thought I had. But I see the parsing of log files etc. is done outside of the adapter anyway, so probably not feasible at all.
There was a problem hiding this comment.
Regarding timings, I don't know if there is another way from this entry point. Maybe @alongd has more info about this.
As for adapter specific timings, I think this is technically possible per adapter, the question is how to bubble the result up to the scheduler easily? There is probably some way to do that, @alongd can probably say something about this too 😄
There was a problem hiding this comment.
Right, in the submit files we have touch initial_time and touch final time, the Job does set_initial_and_final_times and finally determines a .run_time which you could use.
so in Scheduler you do JobObject.run_time. At the Species level, you could go to SpeciesObject.ts_guesses[i].execution_time
There was a problem hiding this comment.
Pull request overview
This pull request adds support for Orca's Nudged Elastic Band (NEB) method as a transition state guess adapter. The implementation enables both local (incore) and queue-based execution of NEB jobs on HPC clusters. The PR introduces a new OrcaNEBAdapter that inherits from OrcaAdapter to reuse Orca-specific logic, extends the scheduler to handle .log file outputs from TS guess jobs, adds a runtime parser for Orca output files, and includes comprehensive test coverage.
Changes:
- Introduces
OrcaNEBAdapterfor executing Orca NEB calculations with support for both incore and queue execution - Extends queue-based TS guess job handling to support
.logfile outputs in addition to.ymlfiles - Adds
parse_orca_runtimeparser function to extract job execution times from Orca log files
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| arc/job/adapters/ts/orca_neb.py | New adapter implementing Orca NEB job execution, input file generation, and post-processing |
| arc/job/adapters/ts/init.py | Imports the new orca_neb module |
| arc/settings/settings.py | Adds orca_neb to global ESS settings, filenames mappings, and default NEB parameters |
| arc/scheduler.py | Extends schedule_jobs to handle .log outputs from TSG jobs; adds unused check_tsg_jobs method |
| arc/species/species.py | Modifies process_completed_tsg_queue_jobs to handle .log files from orca_neb jobs |
| arc/parser/parser.py | Adds parse_orca_runtime function to extract execution time from Orca output files |
| arc/parser/parser_test.py | Adds test for parse_orca_runtime function |
| arc/job/adapters/ts/orca_neb_test.py | Comprehensive unit tests for OrcaNEBAdapter |
| arc/job/adapters/common.py | Adds orca_neb to supported TS adapters for various RMG reaction families |
| arc/job/adapter.py | Adds orca_neb to JobEnum |
| arc/common.py | Adds orca_neb to ESS validation list |
| arc/main_test.py | Updates test expectations to include orca_neb server mapping |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def check_tsg_jobs(self, | ||
| label: str, | ||
| job: 'JobAdapter', | ||
| ): | ||
| """ | ||
| Check that a transition state guess job converged successfully. | ||
|
|
||
| Args: | ||
| label (str): The species label. | ||
| job (JobAdapter): The single point job object. | ||
| """ | ||
| print("Checking TS guess job...") | ||
|
|
||
| if job.job_status[1]['status'] == 'done': | ||
| self.post_sp_actions(label, | ||
| sp_path=os.path.join(job.local_path_to_output_file), | ||
| level=job.level, | ||
| ) | ||
| # Update restart dictionary and save the yaml restart file: | ||
| self.save_restart_dict() | ||
| if self.species_dict[label].number_of_atoms == 1: | ||
| # save the geometry from the sp job for monoatomic species for which no opt/freq jobs will be spawned | ||
| self.output[label]['paths']['geo'] = job.local_path_to_output_file | ||
| else: | ||
| self.troubleshoot_ess(label=label, | ||
| job=job, | ||
| level_of_theory=job.level, | ||
| ) | ||
|
|
There was a problem hiding this comment.
The check_tsg_jobs method is defined but never called anywhere in the codebase. This appears to be dead code that should either be removed or integrated into the job checking workflow. Additionally, the method contains a print statement instead of using the logger, and duplicates logic from check_sp_job.
| def check_tsg_jobs(self, | |
| label: str, | |
| job: 'JobAdapter', | |
| ): | |
| """ | |
| Check that a transition state guess job converged successfully. | |
| Args: | |
| label (str): The species label. | |
| job (JobAdapter): The single point job object. | |
| """ | |
| print("Checking TS guess job...") | |
| if job.job_status[1]['status'] == 'done': | |
| self.post_sp_actions(label, | |
| sp_path=os.path.join(job.local_path_to_output_file), | |
| level=job.level, | |
| ) | |
| # Update restart dictionary and save the yaml restart file: | |
| self.save_restart_dict() | |
| if self.species_dict[label].number_of_atoms == 1: | |
| # save the geometry from the sp job for monoatomic species for which no opt/freq jobs will be spawned | |
| self.output[label]['paths']['geo'] = job.local_path_to_output_file | |
| else: | |
| self.troubleshoot_ess(label=label, | |
| job=job, | |
| level_of_theory=job.level, | |
| ) |
| xyz = parse_geometry(path) | ||
| time = parse_orca_runtime(path) | ||
| tsg = TSGuess(method='orca_neb', | ||
| execution_time=time, | ||
| success=True, | ||
| family=None, | ||
| xyz=xyz, | ||
| ) |
There was a problem hiding this comment.
There is no error handling if parse_geometry returns None. If parsing fails, xyz will be None, and the TSGuess will be created with xyz=None. While there's a check for tsg.initial_xyz at line 1588, it would be clearer to check if xyz is None before creating the TSGuess or to handle the failure case more explicitly.
| input_dict['cpus'] = self.cpu_cores | ||
| input_dict['charge'] = self.charge | ||
| input_dict['multiplicity'] = self.multiplicity | ||
| input_dict['abs_path'] = self.local_path |
There was a problem hiding this comment.
The abs_path is set to self.local_path, which will be incorrect for remote (queue) execution. When the job runs on a remote server, the local path from the client machine won't exist on the server. The NEB input file should use either relative paths (just 'reactant.xyz' and 'product.xyz') or self.remote_path for remote execution. Since files are uploaded to the execution directory, relative paths would be the safest and most portable solution.
| input_dict['abs_path'] = self.local_path | |
| input_dict['abs_path'] = '.' |
| try: | ||
| f.seek(-2000, os.SEEK_END) | ||
| except OSError: | ||
| pass # File is smaller than 2kb |
There was a problem hiding this comment.
The comment says "File is smaller than 2kb" but the code seeks -2000 bytes, which is approximately 2000 characters (bytes in ASCII), not 2KB. While this is a minor distinction (2KB = 2048 bytes), the comment could be more precise (e.g., "File is smaller than 2000 bytes").
| pass # File is smaller than 2kb | |
| pass # File is smaller than 2000 bytes |
| level = level or orca_neb_settings.get('level', '') | ||
| if not level: | ||
| raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.') |
There was a problem hiding this comment.
The level retrieved from settings is a string (e.g., 'wb97xd/def2tzvp'), but it needs to be converted to a Level object before passing to the parent class. If level is None and orca_neb_settings.get('level', '') returns a string, this string should be wrapped in Level(repr=...) to ensure type consistency.
| level = level or orca_neb_settings.get('level', '') | |
| if not level: | |
| raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.') | |
| raw_level = level or orca_neb_settings.get('level', '') | |
| if not raw_level: | |
| raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.') | |
| # Ensure level is a Level object, even if provided as a string in settings | |
| if isinstance(raw_level, str): | |
| level = Level(repr=raw_level) | |
| else: | |
| level = raw_level |
| raise_msg=f'Please install {self.job_adapter}, see {self.url} for more information.', | ||
| ) | ||
| self._log_job_execution() | ||
| execute_command([f'cd {self.local_path}'] + incore_commands[self.job_adapter], executable='/bin/bash') |
There was a problem hiding this comment.
The execute_incore method references incore_commands[self.job_adapter] where self.job_adapter is 'orca_neb', but there is no 'orca_neb' entry in the incore_commands dictionary defined in arc/settings/submit.py. This will cause a KeyError at runtime when attempting to execute orca_neb jobs incore. An entry for 'orca_neb' should be added to incore_commands, likely using the same command as regular orca jobs (e.g., 'orca input.in > input.log').
| execute_command([f'cd {self.local_path}'] + incore_commands[self.job_adapter], executable='/bin/bash') | |
| commands = incore_commands.get(self.job_adapter) | |
| if commands is None: | |
| # Fall back to the standard ORCA incore command if a specific orca_neb command is not defined | |
| commands = incore_commands.get('orca', []) | |
| execute_command([f'cd {self.local_path}'] + commands, executable='/bin/bash') |
| return None | ||
| tsgs = [TSGuess(ts_dict=tsg_dict) for tsg_dict in tsg_list] | ||
| for tsg in tsgs: | ||
| from arc.parser.parser import parse_orca_runtime |
There was a problem hiding this comment.
The import statement for parse_orca_runtime should be moved to the top of the file with other parser imports (around line 32-37) for consistency. Currently parse_geometry is imported at the module level while parse_orca_runtime is imported inside the function.
|
|
||
|
|
||
| input_template = """ | ||
| !${restricted}HF ${method} ${basis} NEB-TS |
There was a problem hiding this comment.
The input template hardcodes 'HF' in the line '!${restricted}HF ${method} ${basis} NEB-TS', but this is incorrect for DFT methods. The template should use ${method_class} like the regular OrcaAdapter template does, which can be either 'HF' for wavefunction methods or 'KS' for DFT methods. The write_input_file method needs to determine and set input_dict['method_class'] similar to how OrcaAdapter does it.
| !${restricted}HF ${method} ${basis} NEB-TS | |
| !${restricted}${method_class} ${method} ${basis} NEB-TS |
| self.execution_type = execution_type or 'queue' | ||
| super().__init__(project=project, | ||
| project_directory=project_directory, | ||
| job_type=job_type, | ||
| args=args, | ||
| bath_gas=bath_gas, | ||
| checkfile=checkfile, | ||
| conformer=conformer, | ||
| constraints=constraints, | ||
| cpu_cores=cpu_cores, | ||
| dihedral_increment=dihedral_increment, | ||
| dihedrals=dihedrals, | ||
| directed_scan_type=directed_scan_type, | ||
| ess_settings=ess_settings, | ||
| ess_trsh_methods=ess_trsh_methods, | ||
| execution_type=execution_type, |
There was a problem hiding this comment.
Setting self.execution_type at line 172 before calling super().init() is problematic because the parent class initialization will override this value based on the execution_type parameter passed at line 187. If you want to default to 'queue', you should either modify the execution_type variable before passing it to super (e.g., 'execution_type = execution_type or "queue"'), or set self.execution_type after the super call. The current approach may not have the intended effect.
| if reactions is None: | ||
| raise ValueError('Cannot execute Orca NEB without an ARCReaction object.') | ||
|
|
||
| # For NEB calculations, the 'species' argument to the OrcaAdapter should be the TS species from the reaction. |
There was a problem hiding this comment.
this comment might not be relevant, we're not looking st the species arg anymore here
| is_ts=True, | ||
| charge=reactions[0].charge, | ||
| multiplicity=reactions[0].multiplicity, | ||
| xyz=reactions[0].r_species[0].get_xyz()) |
There was a problem hiding this comment.
why give the reactant xyz to initialize the TS object?
| charge=reactions[0].charge, | ||
| multiplicity=reactions[0].multiplicity, | ||
| xyz=reactions[0].r_species[0].get_xyz()) | ||
| else: # Fallback if no reactant species either, though this shouldn't happen for a valid reaction |
There was a problem hiding this comment.
can delete this comment (correct, but trivial - the err mssg is self-explanatory)
| if not level: | ||
| raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.') | ||
| species_for_super = [reactions[0].ts_species] | ||
| self.execution_type = execution_type or 'queue' |
There was a problem hiding this comment.
should this be done after super().__init__, so it's not overwritten?
| # Render and write the NEB input file | ||
| with open(os.path.join(self.local_path, input_filenames[self.job_adapter]), 'w') as f: | ||
| f.write(Template(input_template).render(**input_dict)) | ||
|
|
| level_of_theory=job.level, | ||
| ) | ||
|
|
||
| def check_tsg_jobs(self, |
There was a problem hiding this comment.
where is this being called from?
| self.ts_guesses = cluster_tsgs | ||
|
|
||
| def process_completed_tsg_queue_jobs(self, yml_path: str): | ||
| def process_completed_tsg_queue_jobs(self, path: str): |
There was a problem hiding this comment.
are there any other usages of this function where we need to change the arg name?
| return None | ||
| tsgs = [TSGuess(ts_dict=tsg_dict) for tsg_dict in tsg_list] | ||
| for tsg in tsgs: | ||
| from arc.parser.parser import parse_orca_runtime |
There was a problem hiding this comment.
can you move the import to the top of the module?
| for tsg in tsgs: | ||
| if path.endswith('.log'): | ||
| xyz = parse_geometry(path) | ||
| time = parse_orca_runtime(path) |
There was a problem hiding this comment.
Right, in the submit files we have touch initial_time and touch final time, the Job does set_initial_and_final_times and finally determines a .run_time which you could use.
so in Scheduler you do JobObject.run_time. At the Species level, you could go to SpeciesObject.ts_guesses[i].execution_time
| @@ -0,0 +1,18797 @@ | |||
|
|
|||
There was a problem hiding this comment.
do you think you could safely prune sections of this file? Maybe show an LLM the file and the parsing function, and see what to keep?
PR Overview
This PR introduces Orca NEB support, improves queue-based TS search handling, enhances parsing efficiency, and adds comprehensive test coverage, and opens the path to additional queue TS guess jobs.
1. Orca NEB Integration
New Adapter:
OrcaNEBAdapterImplemented
OrcaNEBAdaptersupporting both:Added
orca_nebto supported TS adapters across relevant RMG families (arc/job/adapters/common.py)Included default settings and keyword configurations for Orca 6.x NEB-TS calculations
OrcaNEBAdapterInheritance StrategyOrcaNEBAdapterinherits directly fromOrcaAdapterrather than the baseJobAdapter.This design leverages polymorphism to ensure that all Orca-specific logic is reused consistently, including:
_format_orca_method)_format_orca_basis)Why inherit from
OrcaAdapter?By inheriting from
OrcaAdapter, we:Reuse existing logic
Automatically inherit the complex submission and execution behavior of standard Orca jobs.
Apply targeted overrides only where necessary
Override only:
write_input_file(for NEB-specific input formatting)process_run(to extract the TS guess from the NEB log)Maintain global consistency
Any future updates to Orca settings automatically propagate to NEB jobs.
This approach minimizes duplication and improves long-term maintainability.
2. Queue-Based TSG Job Support
Extended Scheduler Support
Modified:
Scheduler.schedule_jobsARCSpecies.process_completed_tsg_queue_jobsThe scheduler now supports
.logoutputs for TSG jobs. Previously, the queue-based TSG pipeline primarily handled.ymloutputs.Impact
This enables any ESS-based TS search method (e.g., Orca NEB) to:
3. Parser Improvements
parse_orca_runtimeparse_orca_runtimetoarc/parser/parser.pyA corresponding parser test was added to verify correct runtime extraction.
4. Robustness Improvements
OrcaNEBAdapternow automatically:Creates dummy TS species
Generates required coordinate files:
reactant.xyzproduct.xyzThis ensures compatibility with Orca’s NEB input requirements and reduces user-side setup complexity.
Testing
Unit Tests
Added tests in:
arc/job/adapters/ts/orca_neb_test.pyarc/parser/parser_test.pyWhy Is This Needed?
Orca NEB is a powerful method for generating high-quality TS guesses for complex reactions.
Previously, TS guess generation via NEB was limited to local (incore) execution.
This PR enables:
This significantly improves ARC’s ability to handle computationally demanding TS searches in production environments.