Fiscal Policy Sentiment Analysis for Peru's Fiscal Council
FiscalTone is a research project that constructs a "Fiscal Tone Index" by analyzing official communications from Peru's Fiscal Council (Consejo Fiscal). The pipeline scrapes, processes, and classifies PDF documents using LLM-based sentiment analysis.
# Clone the repository
git clone https://github.com/JasonCruz18/FiscalTone.git
cd FiscalTone
# Create and activate environment
conda env create -f environment.yml
conda activate fiscal_tone
# Copy example config
cp config/config.example.yaml config/config.yaml
# Run the pipeline
python scripts/run_pipeline.py --list# Clone the repository
git clone https://github.com/JasonCruz18/FiscalTone.git
cd FiscalTone
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Copy example config
cp config/config.example.yaml config/config.yaml
# Run the pipeline
python scripts/run_pipeline.py --listNote: For scanned PDF processing, install Tesseract OCR separately.
FiscalTone/
├── fiscal_tone/ # Main package
│ ├── collectors/ # Web scraping and PDF download
│ ├── processors/ # PDF classification, text extraction, cleaning
│ ├── analyzers/ # LLM-based classification
│ └── orchestration/ # Pipeline coordination
├── scripts/
│ └── run_pipeline.py # CLI entry point
├── config/
│ ├── config.example.yaml # Configuration template
│ └── config.yaml # Your local config (gitignored)
├── data/
│ ├── raw/ # Downloaded PDFs (editable/, scanned/)
│ ├── input/ # Preprocessed data
│ └── output/ # Final results
├── metadata/ # JSON metadata files
├── docs/ # Documentation
├── tests/ # Test suite
├── notebooks/ # Jupyter notebooks
└── dashboard/ # Visualization dashboard
| Stage | Command | Description |
|---|---|---|
| collect | --stage collect |
Scrape and download PDFs from cf.gob.pe |
| classify | --stage classify |
Classify PDFs as editable/scanned + enrich metadata |
| extract | --stage extract |
Extract text from PDFs (font-based + OCR) |
| clean | --stage clean |
Clean and normalize extracted text |
| analyze | --stage analyze |
Classify fiscal tone using GPT-4o |
# List available stages
python scripts/run_pipeline.py --list
# Run single stage
python scripts/run_pipeline.py --stage collect
# Run multiple stages
python scripts/run_pipeline.py --stages collect classify extract
# Run complete pipeline
python scripts/run_pipeline.py --allParagraphs are classified on a 1-5 scale measuring fiscal concern:
| Score | Level | Description |
|---|---|---|
| 1 | No concern | Fiscal consolidation, compliance, transparency |
| 2 | Slight concern | Potential risks, extraordinary revenue dependency |
| 3 | Neutral | Technical description, no value judgment |
| 4 | High concern | Non-compliance, fiscal loosening, uncertainty |
| 5 | Alarm | Severe criticism, debt sustainability risk |
Fiscal Tone Index = (3 - avg_risk_score) / 2 → ranges from -1 (alarm) to +1 (positive)
| File | Description |
|---|---|
llm_output_paragraphs.json |
Paragraph-level scores with metadata |
llm_output_documents.json |
Document-level aggregated scores |
cf_metadata.json |
PDF metadata (URLs, dates, types) |
cf_cleaned_text.json |
Cleaned text ready for analysis |
Copy the example configuration and customize:
cp config/config.example.yaml config/config.yamlKey settings:
openai.api_key: Your OpenAI API key (or useOPENAI_API_KEYenv var)openai.model: Model to use (default:gpt-4o)openai.rate_limit: Requests per minute (default: 50)paths.*: Data directory locations
- Python 3.10 or higher
- 4GB RAM minimum (8GB recommended)
- Internet connection for PDF downloads and LLM API
-
Tesseract OCR (for scanned PDFs):
- Windows: UB Mannheim installer
- Linux:
sudo apt-get install tesseract-ocr tesseract-ocr-spa - macOS:
brew install tesseract tesseract-lang
-
OpenAI API Key (for LLM classification):
export OPENAI_API_KEY="your-key-here"
| Document | Description |
|---|---|
| INSTALLATION.md | Detailed setup instructions |
| USAGE.md | Pipeline usage guide |
| ARCHITECTURE.md | System design documentation |
| METHODOLOGY.md | Research methodology |
| DATA_DICTIONARY.md | Data field definitions |
| CONTRIBUTING.md | Contribution guidelines |
# Install development dependencies
pip install -r requirements-dev.txt
# Install package in editable mode
pip install -e .
# Run tests
pytest
# Format code
black fiscal_tone scripts
isort fiscal_tone scripts
# Check code quality
flake8 fiscal_tone scripts
mypy fiscal_tone- Formatter: Black (100 char line length)
- Import sorting: isort (black profile)
- Linting: flake8
- Type checking: mypy
- Docstrings: Google style
- Source: Peru's Fiscal Council (cf.gob.pe)
- Documents: Informes and Comunicados (2016-present)
- Coverage: 75+ official fiscal policy communications
The dataset generated by this project is openly available on Zenodo:
Note: DOI will be assigned upon dataset publication. See DATA_AVAILABILITY.md for details.
The dataset includes:
- Paragraph-level fiscal tone scores (1-5 scale)
- Document-level aggregated metrics
- Fiscal Tone Index time series
- Complete metadata for all Fiscal Council documents
This project supports research on fiscal policy communication and sentiment analysis in Peru. The Fiscal Tone Index measures the degree of concern expressed in Fiscal Council communications regarding fiscal discipline, sustainability, and governance.
If you use this dataset or software in your research, please cite:
Data Paper (Data in Brief):
@article{cruz2025fiscaltone_data,
author = {Cruz, Jason},
title = {Fiscal Tone Dataset: Sentiment Analysis of Peru's Fiscal Council Communications},
journal = {Data in Brief},
year = {2025},
note = {Forthcoming}
}Software:
@software{cruz2025fiscaltone,
author = {Cruz, Jason},
title = {FiscalTone: Fiscal Policy Sentiment Analysis Pipeline},
year = {2025},
url = {https://github.com/JasonCruz18/FiscalTone},
license = {MIT}
}Dataset (Zenodo):
@dataset{cruz2025fiscaltone_zenodo,
author = {Cruz, Jason},
title = {Fiscal Tone Dataset for Peru (2016-2025)},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.XXXXXXX}
}Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Peru's Fiscal Council (Consejo Fiscal) for publishing fiscal policy reports
- OpenAI for GPT-4o API
- Centro de Investigación de la Universidad del Pacífico (CIUP)
- Author: Jason Cruz
- Email: jj.cruza@up.edu.pe
- Issues: GitHub Issues