NASA Project Exploration & eXtraction enables to search and explore thousands of NASA's groundbreaking scientific and technological projects. Powered by hybrid search combining traditional keywords with AI-powered semantic understanding.
The core feature is a web interface for searching and exploring NASA projects. It provides:
- Smart Search: Hybrid retrieval combining keyword matching and semantic similarity
- Embedding Options: Choose between OpenAI (3072-dim) or MiniLM (384-dim) embeddings
- Multiple Search Modes: Keyword-only, enhanced keyword, semantic, or hybrid ranking
- Rich Metadata: Projects include taxonomy classification, facilities, partners, and technology readiness levels
- Responsive Design: Built with Next.js and React for fast, intuitive browsing
Start all services with Docker Compose:
docker-compose up -dAccess the application at http://localhost:3000
For detailed configuration, development workflows, and troubleshooting, see docker-info.md.
Behind the scenes, a complete ETL pipeline processes NASA's research data:
- Extract: Scrapes NASA TechPort data to JSON format
- Transform: Enriches data by merging with facilities and taxonomies
- Load: Generates search-ready documents indexed in Apache Solr
- Apache Solr: Full-text and vector similarity search engine
- Hybrid Ranking: Reciprocal Rank Fusion (RRF) combines keyword and semantic results
- Multiple Configurations: Different schemas support various embedding models
- MongoDB: Stores enriched project metadata and organizational context
Includes tools for assessing search quality through:
- LLM-based automatic evaluation
- Multi-metric assessment (MAP, P@k, nDCG, AUC)
- Human verification interface for result validation
Reports and presentation slides for each milestone, see here.
- Milestone 1: Data Processing - Data pipeline, enrichment, and preprocessing
- Milestone 2: Evaluation - Search quality assessment and metrics
- Milestone 3: User Interface - Web application design and features
This project was developed as part of the Information Processing and Retrieval course (PRI) at FEUP (Faculty of Engineering, University of Porto) for educational purposes.
@PRI-GROUP22-2025-26