Nature's Insight: A Novel Framework and Comprehensive Analysis of Agentic Reasoning Through the Lens of Neuroscience
- [2025.5.7]: 🔥🔥 We released our survey Nature's Insight: A Novel Framework and Comprehensive Analysis of Agentic Reasoning Through the Lens of Neuroscience ! This is the first survey to systematically explore agentic reasoning from a neuroscience perspective, introducing a comprehensive framework that spans sensory input to motor execution. It bridges AI reasoning and brain mechanisms, offering a foundation for future agent studies.
The proposed neuroscience-inspired framework for agentic reasoning. The left panel illustrates the human brain’s reasoning process, where sensory inputs are processed through modality-specific cortices and integrated in higher association areas such as the parietal and prefrontal cortices. This enables abstract reasoning and decision-making, supported by predictive coding mechanisms and memory retrieval from the hippocampus. Inspired by this cognitive flow, the right panel presents a corresponding architecture for AI agents, consisting of sensory input, multi-level information processing, foundational understanding (via foundation models), factual memory storage (knowledge base), and a centralized reasoning module for adaptive and context-aware decision-making. White arrows denote top-down predictive signals based on predictive coding; black arrows represent the forward reasoning process; and dashed lines indicate the conceptual mapping between human brain functions and agent modules.
The overview of the reasoning process and classification of reasoning behavior from a neuro-perspective. This diagram presents a comprehensive framework of reasoning inspired by human cognitive and neural mechanisms. At the center, a hierarchical reasoning pipeline, spanning data sensory input, information processing, higher-order cognition, and conclusion generation, mirrors the flow of information in biological systems. Surrounding this core are five major categories of reasoning behaviors: perceptual reasoning, driven by multisensory integration; dimensional reasoning, encompassing spatial and temporal inference; relation reasoning, involving analogical thinking and relational matching; logical reasoning, covering inductive, deductive, and abductive logic; and interactive reasoning, focusing on agent-agent and agent-human collaboration within dynamic environments. Together, these components establish a neuro-cognitively grounded taxonomy that bridges biological inspiration and computational implementation in artificial intelligence systems.
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey (arXiv 2025)
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models (arXiv 2025)
- From System 1 to System 2: A Survey of Reasoning Large Language Models (arXiv 2025)
- Logical Reasoning in Large Language Models: A Survey (arXiv 2025)
- Towards reasoning era: A survey of long chain-of-thought for reasoning large language models (arXiv 2025)
- Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models (arXiv 2025)
- A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond (arXiv 2025)
- A Survey of Reasoning with Foundation Models (arXiv 2023)
- Awesome-Neuroscience-Agent-Reasoning
- 📢 News
- Nature's Insight: A Novel Framework and Comprehensive Analysis of Agentic Reasoning Through the Lens of Neuroscience
- Latest Reasoning Surveys
- Agent Reasoning Framework
- Perception-based Reasoning
- Part 1: Visual Reasoning
- Part 2: Lingual Reasoning
- Part 3: Auditory Reasoning
- Part 4: Tactile Reasoning
- Dimension-based Reasoning
- Part 5: Spatial Reasoning
- Part 6: Temporal Reasoning
- Logic-based Reasoning
- Part 7: Inductive Reasoning
- Part 8: Deductive Reasoning
- Part 9: Abductive Reasoning
- Interaction-based Reasoning
- Part 10: Reasoning based on Agent-Agent Interaction
- Part 11: Reasoning based on Agent-Human Interaction
- Benchmark
Taxonomy of Agentic Reasoning Techniques Inspired by Neuroscience. This hierarchical structure organizes reasoning methods in artificial agents based on cognitive mechanisms inspired by neuroscience, including dimensional, perceptual, logical, and interactive reasoning, highlighting the integration of biologically plausible mechanisms into artificial intelligence systems. This taxonomy highlights how agents can emulate human-like reasoning across diverse tasks and environments.
- GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering (arXiv 2024)
- Lisa: Reasoning segmentation via large language model (CVPR 2024)
- KN-VLM: KNowledge-guided Vision-and-Language Model for visual abductive reasoning (Research Square 2025)
- Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge (ECCV 2024)
- Large language models are visual reasoning coordinators (NeurIPS 2023)
- Enhancing LLM Reasoning via Vision-Augmented Prompting (NeurIPS 2024)
- Improving zero-shot visual question answering via large language models with reasoning question prompts (ACM 2023)
- Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning (AAAI 2024)
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning (NeurIPS 2024)
- Visual chain of thought: bridging logical gaps with multimodal infillings (arXiv 2023)
- End-to-End Chart Summarization via Visual Chain-of-Thought in Vision-Language Models (arXiv 2025)
- Llava-o1: Let vision language models reason step-by-step (arXiv 2024)
- ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning (COLM 2024)
- Visual programming: Compositional visual reasoning without training (CVPR 2023)
- Vipergpt: Visual inference via python execution for reasoning (CVPR 2023)
- HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning (ECCV 2024)
- Vision-r1: Incentivizing reasoning capability in multimodal large language models (arXiv 2025)
- Visual-rft: Visual reinforcement fine-tuning (arXiv 2025)
- Medvlm-r1: Incentivizing medical reasoning capability of vision-language models (vlms) via reinforcement learning (arXiv 2025)
- VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving (arXiv 2024)
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (NeurIPS 2022)
- Self-Consistency Improves Chain of Thought Reasoning in Language Models (ICLR 2022)
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models (NeurIPS 2023)
- Graph of Thoughts: Solving Elaborate Problems with Large Language Models (AAAI 2023)
- Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data (EMNLP 2023)
- Active Prompting with Chain-of-Thought for Large Language Models (ACL 2023)
- Large Language Models Are Reasoning Teachers (ACL 2023)
- Chain of Code: Reasoning with a Language Model-Augmented Code Emulator (ICML 2024)
- Abstraction-of-Thought Makes Language Models Better Reasoners (EMNLP 2024)
- Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic (COLING 2024)
- Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models (arXiv 2024)
- Stepwise Self-Consistent Mathematical Reasoning with Large Language Models (arXiv 2024)
- Chain-of-Thought Reasoning Without Prompting (arXiv 2024)
- Interleaved-Modal Chain-of-Thought (CVPR 2025)
- CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning (arXiv 2025)
- Chain of Draft: Thinking Faster by Writing Less (arXiv 2025)
- Making Large Language Models Better Reasoners with Step-Aware Verifier (arXiv 2023)
- Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)
- Free Process Rewards without Process Labels (arXiv 2024)
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations (arXiv 2024)
- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents(arXiv 2024)
- Self-playing Adversarial Language Game Enhances LLM Reasoning(NeurIPS 2024)
- Does RLHF Scale? Exploring the Impacts From Data, Model, and Method (arXiv 2024)
- OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning (NAACL 2024)
- Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs (arXiv 2024)
- AutoPSV: Automated Process-Supervised Verifier (arXiv 2024)
- ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search (arXiv 2024)
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision (arXiv 2024)
- DeepSeek-R1: Incentivising Reasoning Capability in LLMs via Reinforcement Learning (arXiv 2025)
- Reasoning with Reinforced Functional Token Tuning (arXiv 2025)
- Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning (arXiv 2025)
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling (arXiv 2025)
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (arXiv 2025)
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning (arXiv 2025)
- QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search (arXiv 2025)
- DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents (arXiv 2025)
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails (arXiv 2025)
- On the Emergence of Thinking in LLMs I: Searching for the Right Intuition (arXiv 2025)
- KIMI K1.5:SCALING REINFORCEMENT LEARNING WITH LLMS (arXiv 2025)
- Joint audio and speech understanding (IEEE ASRU 2023)
- Listen, think, and understand (ICLR 2024)
- Toward Explainable Physical Audiovisual Commonsense Reasoning (ACMMM 2024)
- BAT: Learning to Reason about Spatial Sounds with Large Language Models (ICML 2024)
- GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities (arXiv 2024)
- What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models (arXiv 2024)
- Disentangled counterfactual learning for physical audiovisual commonsense reasoning (NeurIPS 2024)
- Learning Audio Concepts from Counterfactual Natural Language. (ICASSP 2024)
- Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding (arXiv 2025)
- Octopi: Object Property Reasoning with Large Tactile-Language Models (arXiv 2024)
- TALON: Improving Large Language Model Cognition with Tactility-Vision Fusion (ICIEA 2024)
- Vision-language model-based physical reasoning for robot liquid perception (IROS 2024)
- Visual Spatial Reasoning (TACL 2023)
- SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities (CVPR 2024)
- Large Language Models are Visual Reasoning Coordinators (NeurIPS 2023)
- Is a Picture Worth a Thousand Words? Delving into Spatial Reasoning for Vision-Language Models (NeurIPS 2024)
- Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs (CVPR 2024)
- Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark (AAAI 2024)
- SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors (NeurIPS 2024)
- SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models (NeurIPS 2024)
- Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation (AAAI 2025)
- Metric Reasoning in Large Language Models (ACM GIS 2024)
- Weakly-supervised 3D Spatial Reasoning for Text-based Visual Question Answering (IEEE TIP 2023)
- Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models (DMLR @ICLR 2024)
- StarCraftImage: A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent Environments (CVPR 2023)
- A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering (IEEE 2023)
- Structured Spatial Reasoning with Open Vocabulary Object Detectors (arXiv 2024)
- A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning (arXiv 2023)
- SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning (arXiv 2025)
- Dialectical Language Model Evaluation: An Initial Appraisal of the Commonsense Spatial Reasoning Abilities of LLMs (arXiv 2023)
- Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning (IJCAI 2024)
- What's "Up" with Vision-Language Models? Investigating Their Struggle with Spatial Reasoning (EMNLP 2023)
- Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models (arXiv 2024)
- Chain-of-Symbol Prompting For Spatial Reasoning in Large Language Models (COLM 2024)
- GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning (arXiv 2024)
- Graph-Based Spatial Reasoning for Tracking Landmarks in Dynamic Laparoscopic Environments (IEEE RA-L)
- TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation (arXiv 2024)
- End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering (arXiv 2024)
- I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction (arXiv 2024)
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning (arXiv 2024)
- Text-to-ECG: 12-Lead Electrocardiogram Synthesis Conditioned on Clinical Text Reports (ICASSP 2023)
- Can Brain Signals Reveal Inner Alignment with Human Languages (EMNLP 2023 Findings)
- TempoGPT: Enhancing Temporal Reasoning via Quantizing Embedding (arXiv 2025)
- PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting(IEEE TKDE 2023)
- Large Language Models Can Learn Temporal Reasoning (ACL 2024)
- Back to the future: Towards explainable temporal reasoning with large language models (WWW 2024)
- Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering (EMNLP 2024 Findings)
- Temporal Reasoning Transfer from Text to Video (ICLR 2025)
- Timo: Towards Better Temporal Reasoning for Language Models (COLM 2024)
- Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning (ICML 2024)
- Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning (NAACL 2024 Findings)
- Temporal reasoning for timeline summarisation in social media (arXiv 2024)
- Video LLMs for Temporal Reasoning in Long Videos (arXiv 2024)
- Enhancing temporal knowledge graph forecasting with large language models via chain-of-history reasoning (ACL 2024 Findings)
- Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs (ICML 2017)
- Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering (IEEE TIP 2024)
- Temporal knowledge graph reasoning with historical contrastive learning (AAAI 2023)
- Temporal inductive path neural network for temporal knowledge graph reasoning (Artificial Intelligence 2024)
- Large language models-guided dynamic adaptation for temporal knowledge graph reasoning (NeurIPS 2024 )
- An improving reasoning network for complex question answering over temporal knowledge graphs (Applied Intelligence 2023)
- Once Upon a Time in Graph: Relative-Time Pretraining for Complex Temporal Reasoning (EMNLP 2023)
- Timegraphs: Graph-based temporal reasoning (arXiv 2024)
- Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs (ACL 2021)
- Temporal knowledge graph reasoning based on evolutional representation learning (SiGIR 2021)
- TempoQR: Temporal Question Reasoning over Knowledge Graphs (AAAI 2022)
- Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs (NeurIPS 2022)
- THCN: A Hawkes Process Based Temporal Causal Convolutional Network for Extrapolation Reasoning in Temporal Knowledge Graphs (TKDE 2024)
- Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution (CVPR 2021)
- Teilp: Time prediction over knowledge graphs via logical reasoning (AAAI 2024)
- Self-Supervised Logic Induction for Explainable Fuzzy Temporal Commonsense Reasoning (AAAI 2023)
- The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
- Deeplogic: Joint learning of neural perception and logical reasoning (TPAMI 2022)
- A survey on neural-symbolic learning systems (Neural Networks)
- Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning (EMNLP 2023 Findings)
- LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models (EMNLP 2024)
- Faithful Logical Reasoning via Symbolic Chain-of-Thought (ACL 2024)
- Generalization on the Unseen, Logic Reasoning and Degree Curriculum (JMLR 2024)
- LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers (EMNLP 2023)
- Complex Logical Reasoning over Knowledge Graphs using Large Language Models (arXiv 2023)
- Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming (ACL 2023 Findings)
- GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models (ICLR 2025)
- Premise Order Matters in Reasoning with Large Language Models (ICML 2024)
- Inductive reasoning in humans and large language models (Cognitive Systems Research 2024)
- Hypothesis Search: Inductive Reasoning with Language Models (ICLR 2024)
- Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement (ICLR 2024)
- Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs (arXiv 2024)
- Incorporating Context Graph with Logical Reasoning for Inductive Relation Prediction (SIGIR 2022)
- Audio Entailment: Assessing Deductive Reasoning for Audio Understanding (AAAI 2025)
- Deductive Verification of Chain-of-Thought Reasoning (NeurIPS 2023)
- Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples (NeurIPS 2023)
- Certified Deductive Reasoning with Language Models (TMLR 2024)
- How Far Are We from Intelligent Visual Deductive Reasoning? (CoLM 2024)
- Learning deductive reasoning from synthetic corpus based on formal logic (ICML 2023)
- Strategic deductive reasoning in large language models: A dual-agent approach (ICPICS 2024)
- Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation (IJCLR-NeSy 2022)
- Multi-modal action chain abductive reasoning (ACL 2023)
- Visual Abductive Reasoning (CVPR 2022)
- Language models can improve event prediction by few-shot abductive reasoning (NeurIPS 2023)
- Abductive Reasoning in Logical Credal Networks (Neurips 2024)
- Towards Learning Abductive Reasoning Using VSA Distributed Representations (NeSy 2024)
- Language models can improve event prediction by few-shot abductive reasoning (NeruIPS 2023)
- Dera: enhancing large language model completions with dialog-enabled resolving agents (arXiv 2024)
- Roco: Dialectic multi-robot collaboration with large language models (ICRA 2024)
- Chateval: Towards better llm-based evaluators through multi-agent debate (arXiv 2023)
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate (EMNLP 2024)
- CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation (ICLR 2025)
- Building cooperative embodied agents modularly with large language models (ICLR 2024)
- A virtual conversational agent for teens with autism spectrum disorder: Experimental results and design lessons (ACM 2020)
- Peer: A collaborative language model (arXiv 2022)
- SAPIEN: Affective Virtual Agents Powered by Large Language Models (ACIIW 2023)
- Human-level play in the game of Diplomacy by combining language models with strategic reasoning (Science 2022)
- Language grounded multi-agent reinforcement learning with human-interpretable communication (NeurIPS 2024)
- Vqa: Visual question answering (CVPR 2015)
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering (CVPR 2017)
- Gqa: A new dataset for real-world visual reasoning and compositional question answering (CVPR 2019)
- Roses Are Red, Violets Are Blue... but Should VQA Expect Them To? (CVPR 2021)
- A corpus for reasoning about natural language grounded in photographs (arXiv 2018)
- Super-clevr: A virtual benchmark to diagnose domain robustness in visual reasoning (CVPR 2023)
- Ok-vqa: A visual question answering benchmark requiring external knowledge (CVPR 2019)
- A-okvqa: A benchmark for visual question answering using world knowledge (ECCV 2022)
- Clevr: A diagnostic dataset for compositional language and elementary visual reasoning (CVPR 2017)
- Mr-ben: A meta-reasoning benchmark for evaluating system-2 thinking in llms (arXiv 2024)
- RM-bench: Benchmarking reward models of language models with subtlety and style (ICLR 2025)
- LR2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems (arXiv 2025)
- Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models (arXiv 2025)
- LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion (arXiv 2025)
- Big-bench extra hard (arXiv 2025)
- Researchbench: Benchmarking llms in scientific discovery via inspiration-based task decomposition (arXiv 2025)
- MastermindEval: A Simple But Scalable Reasoning Benchmark (arXiv 2025)
- Z1: Efficient Test-time Scaling with Code (arXiv 2025)
- Audiocaps: Generating captions for audios in the wild (NAACL 2019)
- Clotho: An audio captioning dataset (ICASSP 2020)
- Transferable tactile transformers for representation learning across diverse sensors and tasks (arXiv 2024)
- Touch100k: A large-scale touch-language-vision dataset for touch-centric multimodal representation (arXiv 2024)
- Anytouch: Learning unified static-dynamic representation across multiple visuo-tactile sensors (arXiv 2025)
- Beyond sight: Finetuning generalist robot policies with heterogeneous sensors via language grounding (arXiv 2025)
- Raven: A dataset for relational and analogical visual reasoning (CVPR 2019)
- Grit: General robust image task benchmark (NeurIPS 2022)
- CoDraw: Collaborative drawing as a testbed for grounded goal-driven communication (ACL 2019)
- Touchdown: Natural language navigation and spatial reasoning in visual street environments (CVPR 2019)
- Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments (CVPR 2018)
- Spatialsense: An adversarially crowdsourced benchmark for spatial relation recognition (CVPR 2018)
- AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning (CVPR 2021)
- Timebench: A comprehensive evaluation of temporal reasoning abilities in large language models (ACL 2024)
- TRAM: Benchmarking Temporal Reasoning for Large Language Models (ACL 2024 Findings)
- Towards benchmarking and improving the temporal reasoning capability of large language models (ACL 2023)
- MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models (EMNLP 2023 Findings)
- Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos (arXiv 2024)
- Generic Temporal Reasoning with Differential Analysis and Explanation (ACL 2023)
- V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning (arXiv 2025)
- MusTQ: A Temporal Knowledge Graph Question Answering Dataset for Multi-Step Temporal Reasoning (ACL 2024 Findings)
- CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text (EMNLP 2019)
- ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning (ICLR 2020)
- Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 (arXiv 2023)
- ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning (ACL 2022 Findings)
- Logiqa 2.0—an improved dataset for logical reasoning in natural language understanding (TASLP 2023)
- The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning (ECCV 2022)
- True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3 and Challenging for GPT-4 (arXiv 2022)
- From LSAT: The Progress and Challenges of Complex Reasoning (TASLP 2021)
- Training Verifiers to Solve Math Word Problems (arXiv 2021)
- LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages (NeurIPS 2024)
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models (ACL 2024)
- FOLIO: Natural Language Reasoning with First-Order Logic (EMNLP 2024)
- Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI (EMNLP 2021)
- Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models (arXiv 2023)