Skip to content
Change the repository type filter

All

    Repositories list

    • SemEval-2026-Task13

      Public
      Jupyter Notebook
      163300Updated Feb 19, 2026Feb 19, 2026
    • ImageCLEF-MultimodalReasoning

      Public
      multimodal reasoning shared task
      HTML
      1300Updated Feb 13, 2026Feb 13, 2026
    • PAN CLEF 2026 Shared Task: Reasoning Trajectory Detection
      0000Updated Feb 12, 2026Feb 12, 2026
    • 0000Updated Feb 9, 2026Feb 9, 2026
    • 1600Updated Feb 4, 2026Feb 4, 2026
    • FAID

      Public
      Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning.
      Python
      2000Updated Jan 18, 2026Jan 18, 2026
    • CASA

      Public
      Clinical Annotations for Stuttering Assessment
      Python
      1200Updated Jan 15, 2026Jan 15, 2026
    • finchain

      Public
      A symbolic benchmark for verifiable chain-of-thought financial reasoning. Includes executable templates, 58 topics across 12 domains, and ChainEval metrics.
      Python
      42521Updated Dec 26, 2025Dec 26, 2025
    • SAHM

      Public
      Python
      0100Updated Dec 1, 2025Dec 1, 2025
    • Jupyter Notebook
      0400Updated Nov 1, 2025Nov 1, 2025
    • Official Repository for paper "When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection"
      Python
      0500Updated Oct 15, 2025Oct 15, 2025
    • This repository contains the code, dataset, and resources for our ACL 2025 paper: "Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking…
      Jupyter Notebook
      0900Updated Oct 13, 2025Oct 13, 2025
    • JavaScript
      0000Updated Oct 7, 2025Oct 7, 2025
    • Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
      Python
      33010Updated Oct 6, 2025Oct 6, 2025
    • Python
      1100Updated Oct 5, 2025Oct 5, 2025
    • A benchmark and evaluation framework for assessing the safety of language models in Kazakh and Russian.
      Jupyter Notebook
      1100Updated Sep 28, 2025Sep 28, 2025
    • qraft

      Public
      Python
      0100Updated Sep 17, 2025Sep 17, 2025
    • SPECS

      Public
      SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation (Accepted by EMNLP 2025 Main)
      Python
      1811Updated Sep 2, 2025Sep 2, 2025
    • Repo for ARWI_generate_data (Arabic Read, Write and Improve)
      Python
      0000Updated Aug 18, 2025Aug 18, 2025
    • An Open-source Factuality Evaluation Demo for LLMs
      Python
      32310Updated Aug 10, 2025Aug 10, 2025
    • Python
      0400Updated Jul 30, 2025Jul 30, 2025
    • ArTST

      Public
      Python
      86510Updated Jul 10, 2025Jul 10, 2025
    • Python
      0000Updated Jun 12, 2025Jun 12, 2025
    • A Benchmark and Evaluation framework for evaluating Arabic LLM safeguards
      Jupyter Notebook
      2500Updated Jun 11, 2025Jun 11, 2025
    • fire

      Public
      A lightweight, agent-style framework for fact-checking atomic claims using iterative retrieval and verification. Reduces LLM and search cost while maintaining s…
      Python
      31400Updated Jun 4, 2025Jun 4, 2025
    • An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
      Jupyter Notebook
      0200Updated May 30, 2025May 30, 2025
    • Python
      1400Updated May 29, 2025May 29, 2025
    • JavaScript
      0000Updated May 28, 2025May 28, 2025
    • Multilingual Statement Tuning
      Jupyter Notebook
      0200Updated May 28, 2025May 28, 2025
    • 1400Updated May 26, 2025May 26, 2025