🌳 Treescope Integration for Interactive Visualizations (#283)
Inseq now integrates treescope, Google DeepMind's library for interactive model and tensor visualization. Two new methods leverage this integration:
FeatureAttributionOutput.show_granular: Interactive visualization of multidimensional attribution tensors
FeatureAttributionSequenceOutput.show_tokens: Interactive token highlights
- The
visualize_attribute_contextmethod for theinseq attribute-contextCLI command now produces an interactive output:
🔢 Enhanced Aggregation Capabilities (#282, #290)
SliceAggregator
A new SliceAggregator ("slices") allows slicing source (encoder-decoder) or target (decoder-only) tokens from attribution outputs. The __getitem__ method provides convenient [start:stop] syntax:
import inseq
from inseq.data.aggregator import SliceAggregator
attrib_model = inseq.load_model("gpt2", "attention")
input_prompt = """Instruction: Summarize this article.
Input_text: In a quiet village nestled between rolling hills, an ancient tree whispered secrets to those who listened. One night, a curious child named Elara leaned close and heard tales of hidden treasures beneath the roots. As dawn broke, she unearthed a shimmering box, unlocking a forgotten world of wonder and magic.
Summary:"""
full_output_prompt = input_prompt + " Elara discovers a shimmering box under an ancient tree, unlocking a world of magic."
out = attrib_model.attribute(input_prompt, full_output_prompt)[0]
# These are all equivalent ways to slice only the input text contents
out_sliced = out.aggregate(SliceAggregator, target_spans=(13,73))
out_sliced = out.aggregate("slices", target_spans=(13,73))
out_sliced = out[13:73]StringSplitAggregator
A new StringSplitAggregator ("split") supports complex aggregation procedures with regex pattern matching:
# Split on newlines. Default split_mode = "single".
out.aggregate("split", split_pattern="\n").aggregate("sum").show(do_aggregation=False)
# Split on whitespace-separated words of length 5.
out.aggregate("split", split_pattern=r"\s(\w{5})(?=\s)", split_mode="end")PairAggregator Shortcut
The __sub__ method now serves as a shortcut for PairAggregator, enabling intuitive comparison of attribution outputs:
import inseq
attrib_model = inseq.load_model("gpt2", "saliency")
out_male = attrib_model.attribute(
"The director went home because",
"The director went home because he was tired",
step_scores=["probability"]
)[0]
out_female = attrib_model.attribute(
"The director went home because",
"The director went home because she was tired",
step_scores=["probability"]
)[0]
(out_male - out_female).show()💾 Memory-Efficient Attribution Saving (#273)
New scores_precision parameter in FeatureAttributionOutput.save enables efficient saving in float16 and float8 formats:
import inseq
attrib_model = inseq.load_model("gpt2", "attention")
out = attrib_model.attribute("Hello world", generation_kwargs={'max_new_tokens': 100})
# Previous usage, memory inefficient
out.save("output.json")
# Memory-efficient saving
out.save("output_fp16.json", scores_precision="float16") # or "float8"
# Automatic conversion to float32
out_loaded = inseq.FeatureAttributionOutput.load("output_fp16.json")🐍 Python 3.13 Support
Added support for Python 3.13. Current support is Python >= 3.10, <= 3.13.
🤖 New Model Support
Added configurations for new models:
DbrxForCausalLMOlmoForCausalLMPhi3ForCausalLMQwen2MoeForCausalLMGemma2ForCausalLMOlmoeForCausalLMGraniteForCausalLMGraniteMoeForCausalLM
All new models based on the modular Huggingface architecture should work out of the box if based on standard architectures like LlamaForCausalLM.
💥 Breaking Changes
- Dropped support for Python 3.9. Current support is Python >= 3.10, <= 3.13 (#283).
All Merged PRs
🚀 Features
- Added treescope for interactive model and tensor visualization (#283) @gsarti
- New
treescope-powered methodsFeatureAttributionOutput.show_granularandFeatureAttributionSequenceOutput.show_tokensfor interactive visualization (#283) @gsarti - Added support for Python 3.13 @gsarti
- Added new models
DbrxForCausalLM,OlmoForCausalLM,Phi3ForCausalLM,Qwen2MoeForCausalLM,Gemma2ForCausalLM,OlmoeForCausalLM,GraniteForCausalLM,GraniteMoeForCausalLMto model config @gsarti - Add
rescale_attributionsto Inseq CLI commands forrescale=True(#280) @gsarti - Rows and columns in the visualization now have indices alongside tokens (#282) @gsarti
- New parameter
clean_special_charsinmodel.attributeto automatically clean special characters from output tokens (#289) @gsarti - Added
scores_precisiontoFeatureAttributionOutput.savefor efficientfloat16andfloat8saving (#273) @gsarti - New
SliceAggregatorfor slicing tokens with[start:stop]syntax (#282) @gsarti - New
StringSplitAggregatorfor regex-based aggregation (#290) @gsarti __sub__method shortcut forPairAggregator(#282) @gsarti
🔧 Fixes & Refactoring
- Fix the issue in the attention implementation where non-terminal positions were set to nan if they were 0s (#269) @gsarti
- Fix the pad token for models where it is not specified by default (e.g. Qwen models) (#269) @gsarti
- Fix
value_zeroingfor SDPA attention, enabling use on models likeGemmaForCausalLMwithout workarounds (#267) @gsarti - Fix multi-device support and duplicate BOS for chat template models (#280) @gsarti
- Clarified visualization directions using arrows instead of x/y (#282) @gsarti
- Fix support for multi-EOS tokens (e.g. LLaMA 3.2) (#287) @gsarti
- Fix copying configuration parameters to aggregated
FeatureAttributionSequenceOutputobjects (#292) @gsarti
📝 Documentation and Tutorials
- Updated tutorial with
treescopeusage examples @gsarti - New tutorial on advanced analyses on RAG and reasoning models is available in reasoning_rag_attribution.ipynb