Skip to content

Added support for orca NEB.#833

Open
kfir4444 wants to merge 13 commits intomainfrom
orca_neb
Open

Added support for orca NEB.#833
kfir4444 wants to merge 13 commits intomainfrom
orca_neb

Conversation

@kfir4444
Copy link
Collaborator


PR Overview

This PR introduces Orca NEB support, improves queue-based TS search handling, enhances parsing efficiency, and adds comprehensive test coverage, and opens the path to additional queue TS guess jobs.


1. Orca NEB Integration

New Adapter: OrcaNEBAdapter

  • Implemented OrcaNEBAdapter supporting both:

    • Incore execution
    • Queue-based execution
  • Added orca_neb to supported TS adapters across relevant RMG families (arc/job/adapters/common.py)

  • Included default settings and keyword configurations for Orca 6.x NEB-TS calculations


OrcaNEBAdapter Inheritance Strategy

OrcaNEBAdapter inherits directly from OrcaAdapter rather than the base JobAdapter.

This design leverages polymorphism to ensure that all Orca-specific logic is reused consistently, including:

  • Memory calculation
  • CPU core allocation
  • Method formatting (_format_orca_method)
  • Basis set formatting (_format_orca_basis)
  • Submission script generation
  • Server-specific command formatting

Why inherit from OrcaAdapter?

By inheriting from OrcaAdapter, we:

  • Reuse existing logic
    Automatically inherit the complex submission and execution behavior of standard Orca jobs.

  • Apply targeted overrides only where necessary
    Override only:

    • write_input_file (for NEB-specific input formatting)
    • process_run (to extract the TS guess from the NEB log)
  • Maintain global consistency
    Any future updates to Orca settings automatically propagate to NEB jobs.

This approach minimizes duplication and improves long-term maintainability.


2. Queue-Based TSG Job Support

Extended Scheduler Support

Modified:

  • Scheduler.schedule_jobs
  • ARCSpecies.process_completed_tsg_queue_jobs

The scheduler now supports .log outputs for TSG jobs. Previously, the queue-based TSG pipeline primarily handled .yml outputs.

Impact

This enables any ESS-based TS search method (e.g., Orca NEB) to:

  • Run on HPC clusters
  • Be submitted through queue systems
  • Be parsed automatically upon completion

3. Parser Improvements

parse_orca_runtime

  • Added parse_orca_runtime to arc/parser/parser.py
  • Optimized to read only the end of log files
  • Avoids memory bottlenecks when parsing large Orca outputs

A corresponding parser test was added to verify correct runtime extraction.


4. Robustness Improvements

OrcaNEBAdapter now automatically:

  • Creates dummy TS species

  • Generates required coordinate files:

    • reactant.xyz
    • product.xyz

This ensures compatibility with Orca’s NEB input requirements and reduces user-side setup complexity.


Testing

Unit Tests

Added tests in:

  • arc/job/adapters/ts/orca_neb_test.py

    • Input file generation
    • NEB post-processing
  • arc/parser/parser_test.py

    • Orca runtime extraction

Why Is This Needed?

Orca NEB is a powerful method for generating high-quality TS guesses for complex reactions.

Previously, TS guess generation via NEB was limited to local (incore) execution.

This PR enables:

  • Submission of NEB jobs to HPC queues
  • Automated parsing of results
  • Scalable TS search workflows

This significantly improves ARC’s ability to handle computationally demanding TS searches in production environments.


for tsg in tsgs:
if path.endswith('.log'):
xyz = parse_geometry(path)
time = parse_orca_runtime(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see you made a new function to parse the ORCA log file for the time it took. I know in the submit.sh template files we generate an initial_time and final_time file to calculate the execution/run time of a job IIRC. Should we then be going forward creating run time parsers for gaussian etc. and use this type of logic?

Also, I don't know if feasible and may require a lot of extra work and redoing functions/logic but would it be better to build the runtime parser in the specific ESS adapter itself? That way, if someone writes a new adapter/updates one, they just need to change it in the adapter itself. But that's just a small thought I had. But I see the parsing of log files etc. is done outside of the adapter anyway, so probably not feasible at all.

Copy link
Collaborator Author

@kfir4444 kfir4444 Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding timings, I don't know if there is another way from this entry point. Maybe @alongd has more info about this.

As for adapter specific timings, I think this is technically possible per adapter, the question is how to bubble the result up to the scheduler easily? There is probably some way to do that, @alongd can probably say something about this too 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in the submit files we have touch initial_time and touch final time, the Job does set_initial_and_final_times and finally determines a .run_time which you could use.

so in Scheduler you do JobObject.run_time. At the Species level, you could go to SpeciesObject.ts_guesses[i].execution_time

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for Orca's Nudged Elastic Band (NEB) method as a transition state guess adapter. The implementation enables both local (incore) and queue-based execution of NEB jobs on HPC clusters. The PR introduces a new OrcaNEBAdapter that inherits from OrcaAdapter to reuse Orca-specific logic, extends the scheduler to handle .log file outputs from TS guess jobs, adds a runtime parser for Orca output files, and includes comprehensive test coverage.

Changes:

  • Introduces OrcaNEBAdapter for executing Orca NEB calculations with support for both incore and queue execution
  • Extends queue-based TS guess job handling to support .log file outputs in addition to .yml files
  • Adds parse_orca_runtime parser function to extract job execution times from Orca log files

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
arc/job/adapters/ts/orca_neb.py New adapter implementing Orca NEB job execution, input file generation, and post-processing
arc/job/adapters/ts/init.py Imports the new orca_neb module
arc/settings/settings.py Adds orca_neb to global ESS settings, filenames mappings, and default NEB parameters
arc/scheduler.py Extends schedule_jobs to handle .log outputs from TSG jobs; adds unused check_tsg_jobs method
arc/species/species.py Modifies process_completed_tsg_queue_jobs to handle .log files from orca_neb jobs
arc/parser/parser.py Adds parse_orca_runtime function to extract execution time from Orca output files
arc/parser/parser_test.py Adds test for parse_orca_runtime function
arc/job/adapters/ts/orca_neb_test.py Comprehensive unit tests for OrcaNEBAdapter
arc/job/adapters/common.py Adds orca_neb to supported TS adapters for various RMG reaction families
arc/job/adapter.py Adds orca_neb to JobEnum
arc/common.py Adds orca_neb to ESS validation list
arc/main_test.py Updates test expectations to include orca_neb server mapping

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2666 to +2694
def check_tsg_jobs(self,
label: str,
job: 'JobAdapter',
):
"""
Check that a transition state guess job converged successfully.

Args:
label (str): The species label.
job (JobAdapter): The single point job object.
"""
print("Checking TS guess job...")

if job.job_status[1]['status'] == 'done':
self.post_sp_actions(label,
sp_path=os.path.join(job.local_path_to_output_file),
level=job.level,
)
# Update restart dictionary and save the yaml restart file:
self.save_restart_dict()
if self.species_dict[label].number_of_atoms == 1:
# save the geometry from the sp job for monoatomic species for which no opt/freq jobs will be spawned
self.output[label]['paths']['geo'] = job.local_path_to_output_file
else:
self.troubleshoot_ess(label=label,
job=job,
level_of_theory=job.level,
)

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_tsg_jobs method is defined but never called anywhere in the codebase. This appears to be dead code that should either be removed or integrated into the job checking workflow. Additionally, the method contains a print statement instead of using the logger, and duplicates logic from check_sp_job.

Suggested change
def check_tsg_jobs(self,
label: str,
job: 'JobAdapter',
):
"""
Check that a transition state guess job converged successfully.
Args:
label (str): The species label.
job (JobAdapter): The single point job object.
"""
print("Checking TS guess job...")
if job.job_status[1]['status'] == 'done':
self.post_sp_actions(label,
sp_path=os.path.join(job.local_path_to_output_file),
level=job.level,
)
# Update restart dictionary and save the yaml restart file:
self.save_restart_dict()
if self.species_dict[label].number_of_atoms == 1:
# save the geometry from the sp job for monoatomic species for which no opt/freq jobs will be spawned
self.output[label]['paths']['geo'] = job.local_path_to_output_file
else:
self.troubleshoot_ess(label=label,
job=job,
level_of_theory=job.level,
)

Copilot uses AI. Check for mistakes.
Comment on lines +1580 to +1587
xyz = parse_geometry(path)
time = parse_orca_runtime(path)
tsg = TSGuess(method='orca_neb',
execution_time=time,
success=True,
family=None,
xyz=xyz,
)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error handling if parse_geometry returns None. If parsing fails, xyz will be None, and the TSGuess will be created with xyz=None. While there's a check for tsg.initial_xyz at line 1588, it would be clearer to check if xyz is None before creating the TSGuess or to handle the failure case more explicitly.

Copilot uses AI. Check for mistakes.
input_dict['cpus'] = self.cpu_cores
input_dict['charge'] = self.charge
input_dict['multiplicity'] = self.multiplicity
input_dict['abs_path'] = self.local_path
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abs_path is set to self.local_path, which will be incorrect for remote (queue) execution. When the job runs on a remote server, the local path from the client machine won't exist on the server. The NEB input file should use either relative paths (just 'reactant.xyz' and 'product.xyz') or self.remote_path for remote execution. Since files are uploaded to the execution directory, relative paths would be the safest and most portable solution.

Suggested change
input_dict['abs_path'] = self.local_path
input_dict['abs_path'] = '.'

Copilot uses AI. Check for mistakes.
try:
f.seek(-2000, os.SEEK_END)
except OSError:
pass # File is smaller than 2kb
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "File is smaller than 2kb" but the code seeks -2000 bytes, which is approximately 2000 characters (bytes in ASCII), not 2KB. While this is a minor distinction (2KB = 2048 bytes), the comment could be more precise (e.g., "File is smaller than 2000 bytes").

Suggested change
pass # File is smaller than 2kb
pass # File is smaller than 2000 bytes

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +170
level = level or orca_neb_settings.get('level', '')
if not level:
raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.')
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level retrieved from settings is a string (e.g., 'wb97xd/def2tzvp'), but it needs to be converted to a Level object before passing to the parent class. If level is None and orca_neb_settings.get('level', '') returns a string, this string should be wrapped in Level(repr=...) to ensure type consistency.

Suggested change
level = level or orca_neb_settings.get('level', '')
if not level:
raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.')
raw_level = level or orca_neb_settings.get('level', '')
if not raw_level:
raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.')
# Ensure level is a Level object, even if provided as a string in settings
if isinstance(raw_level, str):
level = Level(repr=raw_level)
else:
level = raw_level

Copilot uses AI. Check for mistakes.
raise_msg=f'Please install {self.job_adapter}, see {self.url} for more information.',
)
self._log_job_execution()
execute_command([f'cd {self.local_path}'] + incore_commands[self.job_adapter], executable='/bin/bash')
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execute_incore method references incore_commands[self.job_adapter] where self.job_adapter is 'orca_neb', but there is no 'orca_neb' entry in the incore_commands dictionary defined in arc/settings/submit.py. This will cause a KeyError at runtime when attempting to execute orca_neb jobs incore. An entry for 'orca_neb' should be added to incore_commands, likely using the same command as regular orca jobs (e.g., 'orca input.in > input.log').

Suggested change
execute_command([f'cd {self.local_path}'] + incore_commands[self.job_adapter], executable='/bin/bash')
commands = incore_commands.get(self.job_adapter)
if commands is None:
# Fall back to the standard ORCA incore command if a specific orca_neb command is not defined
commands = incore_commands.get('orca', [])
execute_command([f'cd {self.local_path}'] + commands, executable='/bin/bash')

Copilot uses AI. Check for mistakes.
return None
tsgs = [TSGuess(ts_dict=tsg_dict) for tsg_dict in tsg_list]
for tsg in tsgs:
from arc.parser.parser import parse_orca_runtime
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement for parse_orca_runtime should be moved to the top of the file with other parser imports (around line 32-37) for consistency. Currently parse_geometry is imported at the module level while parse_orca_runtime is imported inside the function.

Copilot uses AI. Check for mistakes.


input_template = """
!${restricted}HF ${method} ${basis} NEB-TS
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input template hardcodes 'HF' in the line '!${restricted}HF ${method} ${basis} NEB-TS', but this is incorrect for DFT methods. The template should use ${method_class} like the regular OrcaAdapter template does, which can be either 'HF' for wavefunction methods or 'KS' for DFT methods. The write_input_file method needs to determine and set input_dict['method_class'] similar to how OrcaAdapter does it.

Suggested change
!${restricted}HF ${method} ${basis} NEB-TS
!${restricted}${method_class} ${method} ${basis} NEB-TS

Copilot uses AI. Check for mistakes.
Comment on lines +172 to +187
self.execution_type = execution_type or 'queue'
super().__init__(project=project,
project_directory=project_directory,
job_type=job_type,
args=args,
bath_gas=bath_gas,
checkfile=checkfile,
conformer=conformer,
constraints=constraints,
cpu_cores=cpu_cores,
dihedral_increment=dihedral_increment,
dihedrals=dihedrals,
directed_scan_type=directed_scan_type,
ess_settings=ess_settings,
ess_trsh_methods=ess_trsh_methods,
execution_type=execution_type,
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting self.execution_type at line 172 before calling super().init() is problematic because the parent class initialization will override this value based on the execution_type parameter passed at line 187. If you want to default to 'queue', you should either modify the execution_type variable before passing it to super (e.g., 'execution_type = execution_type or "queue"'), or set self.execution_type after the super call. The current approach may not have the intended effect.

Copilot uses AI. Check for mistakes.
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kfir4444 for this awesome addition to ARC!
Please see a few comments

if reactions is None:
raise ValueError('Cannot execute Orca NEB without an ARCReaction object.')

# For NEB calculations, the 'species' argument to the OrcaAdapter should be the TS species from the reaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment might not be relevant, we're not looking st the species arg anymore here

is_ts=True,
charge=reactions[0].charge,
multiplicity=reactions[0].multiplicity,
xyz=reactions[0].r_species[0].get_xyz())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why give the reactant xyz to initialize the TS object?

charge=reactions[0].charge,
multiplicity=reactions[0].multiplicity,
xyz=reactions[0].r_species[0].get_xyz())
else: # Fallback if no reactant species either, though this shouldn't happen for a valid reaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can delete this comment (correct, but trivial - the err mssg is self-explanatory)

if not level:
raise ValueError('A level of theory must be specified for Orca NEB jobs, either in the job arguments or in the settings file.')
species_for_super = [reactions[0].ts_species]
self.execution_type = execution_type or 'queue'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be done after super().__init__, so it's not overwritten?

# Render and write the NEB input file
with open(os.path.join(self.local_path, input_filenames[self.job_adapter]), 'w') as f:
f.write(Template(input_template).render(**input_dict))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 1 extra line break

level_of_theory=job.level,
)

def check_tsg_jobs(self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this being called from?

self.ts_guesses = cluster_tsgs

def process_completed_tsg_queue_jobs(self, yml_path: str):
def process_completed_tsg_queue_jobs(self, path: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any other usages of this function where we need to change the arg name?

return None
tsgs = [TSGuess(ts_dict=tsg_dict) for tsg_dict in tsg_list]
for tsg in tsgs:
from arc.parser.parser import parse_orca_runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the import to the top of the module?

for tsg in tsgs:
if path.endswith('.log'):
xyz = parse_geometry(path)
time = parse_orca_runtime(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in the submit files we have touch initial_time and touch final time, the Job does set_initial_and_final_times and finally determines a .run_time which you could use.

so in Scheduler you do JobObject.run_time. At the Species level, you could go to SpeciesObject.ts_guesses[i].execution_time

@@ -0,0 +1,18797 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think you could safely prune sections of this file? Maybe show an LLM the file and the parsing function, and see what to keep?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants