Automated Interpretation Framework (AIF)

Automated Interpretation Framework (AIF)

An intelligent system for automatically interpreting X-ray diffraction (XRD) patterns in materials science. The framework evaluates multiple phase interpretations by combining statistical analysis, chemical composition matching, and large language model (LLM) evaluation to identify the most plausible interpretation of experimental XRD data.

AIF approach_with_chemical_horizontal_FINAL_newpalette

Overview

The Automated Interpretation Framework addresses a critical challenge in materials science: determining which phase interpretation is most likely correct when multiple valid interpretations exist for a single XRD pattern. The framework uses a Bayesian approach, combining multiple evidence sources to calculate posterior probabilities for each interpretation.

How It Works

The framework evaluates interpretations using a Bayesian approach.

Evaluation Components

Refinement metrics (refinement_metrics.py):
- Performs XRD phase search and Rietveld refinement
- Computes RWP (weighted profile R-factor) and peak-matching score
- Matches calculated vs observed peaks
- Analyzes background and signal quality
- Generates multiple interpretations by iteratively excluding phases
Composition balance (composition_balance.py):
- Compares target composition with interpretation composition
- Calculates composition balance scores (how well phases match target stoichiometry)
- Normalizes compositions for fair comparison
LLM Evaluation (LLM_evaluation.py):
- Uses GPT-4o to evaluate phase stability under synthesis conditions
- Considers temperature, atmosphere, precursors, and dwell time
- Evaluates polymorph stability and thermodynamic plausibility
- Provides explanations for likelihood scores
Probability Calculation (probability_of_interpretation.py):
- Combines all factors into prior and posterior probabilities
- Normalizes scores across interpretations
- Flags untrustworthy interpretations
- Generates visualization plots

Project Structure

AIF_copy/
├── src/
│   ├── refinement_metrics.py          # RWP, score, refinement and fit metrics
│   ├── composition_balance.py         # Composition balance scores
│   ├── LLM_evaluation.py              # LLM-based phase evaluation
│   ├── probability_of_interpretation.py  # Main orchestration script
│   ├── utils.py                       # Utility functions and plotting
│   ├── prompts/
│   │   └── llm_prompt_template.txt    # LLM prompt template
│   └── tests/                         # Test scripts and comparisons
├── data/
│   ├── alab_synthesis_data/            # Synthesis metadata (CSV files)
│   └── xrd_data/                       # XRD patterns and interpretations
├── notebooks/
│   ├── AIF_example_workflow.ipynb     # Display-only example workflow
│   └── AIF_run_workflow.ipynb          # Full pipeline re-run notebook
├── README.md

Dependencies

Core Libraries

pymatgen: Materials analysis and composition handling
openai: LLM API access (via CBORG gateway)
pandas: Data manipulation
numpy: Numerical computations
matplotlib/plotly: Visualization

The `dara` package (required for refinement and phase search)

The framework depends on dara, a library used for:

XRD phase search (dara.search_phases, dara.structure_db.ICSDDatabase)
Rietveld refinement (dara.refine.do_refinement_no_saving)
Peak matching and detection (dara.search.peak_matcher.PeakMatcher, dara.peak_detection.detect_peaks)
Refinement results (dara.result.RefinementResult in utils.py)

Without dara installed and available on your PYTHONPATH, the main entry point (probability_of_interpretation.py) and refinement metrics (refinement_metrics.py) will not run.
Ensure you have access to the dara package and that it is installed in the same environment used to run this code.

External Services

CBORG Gateway: API access to GPT-4 (https://api.cborg.lbl.gov)
ICSD Database: Crystal structure database access (via dara)

Installation and environment

Python environment: Use a Python environment where you can install the dependencies (Python 3.8+ recommended).
Install dependencies: Install pymatgen, openai, pandas, numpy, matplotlib, plotly (e.g. with pip or your project’s environment manager).
Install and expose dara: The dara package must be available when running the code. Install it from your usual source (internal repo, PyPI, etc.) and ensure it is on PYTHONPATH or installed into the same environment.
Run from the project root: Execute the main script from the repository root so that imports like from utils import ... and data paths resolve correctly:
```
cd /path/to/AIF
python src/probability_of_interpretation.py
```
Optional – run unit tests: Install pytest and run tests from the project root (with src on PYTHONPATH):
```
pip install -r requirements-dev.txt
PYTHONPATH=src python -m pytest tests/ -v
```
Unit tests live in the top-level tests/ directory and cover pure functions in utils, composition_balance, refinement_metrics, LLM_evaluation, and probability_of_interpretation
API key for LLM: Set your API key for the CBORG gateway (see Environment Setup below).

Usage

Before running, ensure you have completed Installation and environment (including the dara package and API key).

Basic Workflow

Prepare Data:
- XRD pattern files (.xy or .xrdml format)
- Synthesis metadata CSV with columns: Name, Target, Precursors, Temperature (C), Dwell Duration (h), Furnace
- JSON file mapping patterns to chemical systems
Run Evaluation:
```
python src/probability_of_interpretation.py
```
The script will prompt you to:
- Choose dataset group (TRI, MINES, ARR, GENOME, etc.)
- Select all patterns or specific ones
- Process each pattern and generate interpretations
View Results:
- Interpretation probabilities are saved to JSON files
- Plots are generated showing:
  - Interpretation probabilities
  - Phase probabilities
  - Contribution decomposition
  - Fit quality metrics

Dataset Configuration

The framework supports multiple datasets configured in probability_of_interpretation.py:

    DATASETS = {
    "TRI": {
        "csv": "../data/xrd_data/synthesis_data.csv",
        "combos": "../data/xrd_data/difractogram_paths.json",
        "interpretations":"../data/xrd_data/interpretations/interpretations.json"
        }
   }

Environment Setup

Set the API key for LLM access:

export API_KEY="your-api-key-here"

Key Metrics

Statistical Metrics

RWP: Weighted profile R-factor (lower is better)
Peak Score: Matches calculated vs observed peaks
Missing/Extra Peaks: Count of unmatched peaks
Background Quality: Analysis of background fitting

Chemical Metrics

Composition Balance Score: How well interpretation matches target (0-1)
Gas Loss Adjustment: Accounts for volatile species

LLM Metrics

Phase Likelihood: Individual phase plausibility (0-1)
Interpretation Likelihood: Overall interpretation plausibility (0-1)
Explanations: Detailed reasoning for scores

Final Output

Posterior Intereprtation Probability
Phases Probability
Trust Score: Flag for potentially unreliable interpretations

Notebooks

This repository contains two main Jupyter notebooks:

AIF_example_workflow.ipynb
An example / display notebook intended for inspecting results and visualizations.
This notebook does not re-run the full workflow or make LLM/API calls.
AIF_run_workflow.ipynb
The full workflow notebook used to re-run the analysis pipeline from scratch.
This notebook requires proper environment configuration (e.g. setting API_KEY) and may take significant time to run.

Example Output

For each interpretation, the framework provides:

Interpretation: I_1
  Phases: ZrTiO4_60, ZrO2_14
  LLM Interpretation Likelihood: 93.0%
  Composition Balance: 90.0%
  Fit quality: 0.85
  Normalized_rwp: 75.0%
  Normalized_score: 82.0%
  Prior Probability: 88.5%
  Posterior Probability: 92.3%

Advanced Features

Iterative Phase Exploration

The framework systematically explores alternative interpretations by:

Starting with initial phase search results
Removing individual phases and re-refining
Testing combinations of excluded phases
Tracking unique phase combinations to avoid duplicates

Customizable Weights

Adjust the importance of different factors:

interpretations = calculate_prior_probability(
    interpretations, 
    w_llm=0.5,      # LLM weight
    w_bscore=0.7    # Balance score weight
)

interpretations = calculate_fit_quality(
    interpretations,
    w_rwp=1,        # RWP weight
    w_score=0.5     # Peak score weight
)

Requirements

Requires the dara package for XRD refinement and phase search; the code will not run without it.
Requires API access to LLM service (CBORG gateway)

Citation

If you use this framework in your research, please cite appropriately.

License

[Specify license if applicable]

Contact

[olympiadartsi@gmail.com]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Interpretation Framework (AIF)

Overview

How It Works

Evaluation Components

Project Structure

Dependencies

Core Libraries

The `dara` package (required for refinement and phase search)

External Services

Installation and environment

Usage

Basic Workflow

Dataset Configuration

Environment Setup

Key Metrics

Statistical Metrics

Chemical Metrics

LLM Metrics

Final Output

Notebooks

Example Output

Advanced Features

Iterative Phase Exploration

Customizable Weights

Requirements

Citation

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/xrd_data		data/xrd_data
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pytest.ini		pytest.ini
run_errors.log		run_errors.log

hackingmaterials/AIF

Folders and files

Latest commit

History

Repository files navigation

Automated Interpretation Framework (AIF)

Overview

How It Works

Evaluation Components

Project Structure

Dependencies

Core Libraries

The dara package (required for refinement and phase search)

External Services

Installation and environment

Usage

Basic Workflow

Dataset Configuration

Environment Setup

Key Metrics

Statistical Metrics

Chemical Metrics

LLM Metrics

Final Output

Notebooks

Example Output

Advanced Features

Iterative Phase Exploration

Customizable Weights

Requirements

Citation

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

The `dara` package (required for refinement and phase search)

Packages