Skip to content

ricardoyang00/NPEX

Repository files navigation

NPEX

Python 3.8+ TypeScript 5.0+ Next.js 14+ Apache Solr 9 MongoDB

NASA Project Exploration & eXtraction enables to search and explore thousands of NASA's groundbreaking scientific and technological projects. Powered by hybrid search combining traditional keywords with AI-powered semantic understanding.

Demo video

Search Engine

The core feature is a web interface for searching and exploring NASA projects. It provides:

  • Smart Search: Hybrid retrieval combining keyword matching and semantic similarity
  • Embedding Options: Choose between OpenAI (3072-dim) or MiniLM (384-dim) embeddings
  • Multiple Search Modes: Keyword-only, enhanced keyword, semantic, or hybrid ranking
  • Rich Metadata: Projects include taxonomy classification, facilities, partners, and technology readiness levels
  • Responsive Design: Built with Next.js and React for fast, intuitive browsing

Getting Started

Start all services with Docker Compose:

docker-compose up -d

Access the application at http://localhost:3000

For detailed configuration, development workflows, and troubleshooting, see docker-info.md.

Data Pipeline

Behind the scenes, a complete ETL pipeline processes NASA's research data:

  • Extract: Scrapes NASA TechPort data to JSON format
  • Transform: Enriches data by merging with facilities and taxonomies
  • Load: Generates search-ready documents indexed in Apache Solr

Search Technology

  • Apache Solr: Full-text and vector similarity search engine
  • Hybrid Ranking: Reciprocal Rank Fusion (RRF) combines keyword and semantic results
  • Multiple Configurations: Different schemas support various embedding models
  • MongoDB: Stores enriched project metadata and organizational context

Evaluation Framework

Includes tools for assessing search quality through:

  • LLM-based automatic evaluation
  • Multi-metric assessment (MAP, P@k, nDCG, AUC)
  • Human verification interface for result validation

Documentation

Reports and presentation slides for each milestone, see here.

  • Milestone 1: Data Processing - Data pipeline, enrichment, and preprocessing
  • Milestone 2: Evaluation - Search quality assessment and metrics
  • Milestone 3: User Interface - Web application design and features

About

This project was developed as part of the Information Processing and Retrieval course (PRI) at FEUP (Faculty of Engineering, University of Porto) for educational purposes.


@PRI-GROUP22-2025-26

About

🪐 NASA search engine

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •