FiscalTone

Fiscal Policy Sentiment Analysis for Peru's Fiscal Council

FiscalTone is a research project that constructs a "Fiscal Tone Index" by analyzing official communications from Peru's Fiscal Council (Consejo Fiscal). The pipeline scrapes, processes, and classifies PDF documents using LLM-based sentiment analysis.

Quick Start

Option A: Conda (Recommended)

# Clone the repository
git clone https://github.com/JasonCruz18/FiscalTone.git
cd FiscalTone

# Create and activate environment
conda env create -f environment.yml
conda activate fiscal_tone

# Copy example config
cp config/config.example.yaml config/config.yaml

# Run the pipeline
python scripts/run_pipeline.py --list

Option B: Pip + Virtual Environment

# Clone the repository
git clone https://github.com/JasonCruz18/FiscalTone.git
cd FiscalTone

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Copy example config
cp config/config.example.yaml config/config.yaml

# Run the pipeline
python scripts/run_pipeline.py --list

Note: For scanned PDF processing, install Tesseract OCR separately.

Project Structure

FiscalTone/
├── fiscal_tone/              # Main package
│   ├── collectors/           # Web scraping and PDF download
│   ├── processors/           # PDF classification, text extraction, cleaning
│   ├── analyzers/            # LLM-based classification
│   └── orchestration/        # Pipeline coordination
├── scripts/
│   └── run_pipeline.py       # CLI entry point
├── config/
│   ├── config.example.yaml   # Configuration template
│   └── config.yaml           # Your local config (gitignored)
├── data/
│   ├── raw/                  # Downloaded PDFs (editable/, scanned/)
│   ├── input/                # Preprocessed data
│   └── output/               # Final results
├── metadata/                 # JSON metadata files
├── docs/                     # Documentation
├── tests/                    # Test suite
├── notebooks/                # Jupyter notebooks
└── dashboard/                # Visualization dashboard

Pipeline Stages

Stage	Command	Description
collect	`--stage collect`	Scrape and download PDFs from cf.gob.pe
classify	`--stage classify`	Classify PDFs as editable/scanned + enrich metadata
extract	`--stage extract`	Extract text from PDFs (font-based + OCR)
clean	`--stage clean`	Clean and normalize extracted text
analyze	`--stage analyze`	Classify fiscal tone using GPT-4o

Usage Examples

# List available stages
python scripts/run_pipeline.py --list

# Run single stage
python scripts/run_pipeline.py --stage collect

# Run multiple stages
python scripts/run_pipeline.py --stages collect classify extract

# Run complete pipeline
python scripts/run_pipeline.py --all

Fiscal Tone Scoring

Paragraphs are classified on a 1-5 scale measuring fiscal concern:

Score	Level	Description
1	No concern	Fiscal consolidation, compliance, transparency
2	Slight concern	Potential risks, extraordinary revenue dependency
3	Neutral	Technical description, no value judgment
4	High concern	Non-compliance, fiscal loosening, uncertainty
5	Alarm	Severe criticism, debt sustainability risk

Fiscal Tone Index = (3 - avg_risk_score) / 2 → ranges from -1 (alarm) to +1 (positive)

Output Datasets

File	Description
`llm_output_paragraphs.json`	Paragraph-level scores with metadata
`llm_output_documents.json`	Document-level aggregated scores
`cf_metadata.json`	PDF metadata (URLs, dates, types)
`cf_cleaned_text.json`	Cleaned text ready for analysis

Configuration

Copy the example configuration and customize:

cp config/config.example.yaml config/config.yaml

Key settings:

openai.api_key: Your OpenAI API key (or use OPENAI_API_KEY env var)
openai.model: Model to use (default: gpt-4o)
openai.rate_limit: Requests per minute (default: 50)
paths.*: Data directory locations

Requirements

System Requirements

Python 3.10 or higher
4GB RAM minimum (8GB recommended)
Internet connection for PDF downloads and LLM API

External Dependencies

Tesseract OCR (for scanned PDFs):
- Windows: UB Mannheim installer
- Linux: sudo apt-get install tesseract-ocr tesseract-ocr-spa
- macOS: brew install tesseract tesseract-lang
OpenAI API Key (for LLM classification):
```
export OPENAI_API_KEY="your-key-here"
```

Documentation

Document	Description
INSTALLATION.md	Detailed setup instructions
USAGE.md	Pipeline usage guide
ARCHITECTURE.md	System design documentation
METHODOLOGY.md	Research methodology
DATA_DICTIONARY.md	Data field definitions
CONTRIBUTING.md	Contribution guidelines

Development

Setup Development Environment

# Install development dependencies
pip install -r requirements-dev.txt

# Install package in editable mode
pip install -e .

# Run tests
pytest

# Format code
black fiscal_tone scripts
isort fiscal_tone scripts

# Check code quality
flake8 fiscal_tone scripts
mypy fiscal_tone

Code Standards

Formatter: Black (100 char line length)
Import sorting: isort (black profile)
Linting: flake8
Type checking: mypy
Docstrings: Google style

Data Sources

Source: Peru's Fiscal Council (cf.gob.pe)
Documents: Informes and Comunicados (2016-present)
Coverage: 75+ official fiscal policy communications

Data Availability

The dataset generated by this project is openly available on Zenodo:

Note: DOI will be assigned upon dataset publication. See DATA_AVAILABILITY.md for details.

The dataset includes:

Paragraph-level fiscal tone scores (1-5 scale)
Document-level aggregated metrics
Fiscal Tone Index time series
Complete metadata for all Fiscal Council documents

Research Context

This project supports research on fiscal policy communication and sentiment analysis in Peru. The Fiscal Tone Index measures the degree of concern expressed in Fiscal Council communications regarding fiscal discipline, sustainability, and governance.

Citation

If you use this dataset or software in your research, please cite:

Data Paper (Data in Brief):

@article{cruz2025fiscaltone_data,
  author = {Cruz, Jason},
  title = {Fiscal Tone Dataset: Sentiment Analysis of Peru's Fiscal Council Communications},
  journal = {Data in Brief},
  year = {2025},
  note = {Forthcoming}
}

Software:

@software{cruz2025fiscaltone,
  author = {Cruz, Jason},
  title = {FiscalTone: Fiscal Policy Sentiment Analysis Pipeline},
  year = {2025},
  url = {https://github.com/JasonCruz18/FiscalTone},
  license = {MIT}
}

Dataset (Zenodo):

@dataset{cruz2025fiscaltone_zenodo,
  author = {Cruz, Jason},
  title = {Fiscal Tone Dataset for Peru (2016-2025)},
  year = {2025},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.XXXXXXX}
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Peru's Fiscal Council (Consejo Fiscal) for publishing fiscal policy reports
OpenAI for GPT-4o API
Centro de Investigación de la Universidad del Pacífico (CIUP)

Contact

Author: Jason Cruz
Email: jj.cruza@up.edu.pe
Issues: GitHub Issues

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.claude/agents		.claude/agents
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
__pycache__		__pycache__
archive/legacy_scripts		archive/legacy_scripts
config		config
dashboard		dashboard
data		data
docs		docs
fiscal_tone		fiscal_tone
metadata		metadata
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
data_curation_backup.py		data_curation_backup.py
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FiscalTone

Quick Start

Option A: Conda (Recommended)

Option B: Pip + Virtual Environment

Project Structure

Pipeline Stages

Usage Examples

Fiscal Tone Scoring

Output Datasets

Configuration

Requirements

System Requirements

External Dependencies

Documentation

Development

Setup Development Environment

Code Standards

Data Sources

Data Availability

Research Context

Citation

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

JasonCruz18/fiscal_tone

Folders and files

Latest commit

History

Repository files navigation

FiscalTone

Quick Start

Option A: Conda (Recommended)

Option B: Pip + Virtual Environment

Project Structure

Pipeline Stages

Usage Examples

Fiscal Tone Scoring

Output Datasets

Configuration

Requirements

System Requirements

External Dependencies

Documentation

Development

Setup Development Environment

Code Standards

Data Sources

Data Availability

Research Context

Citation

Contributing

License

Acknowledgments

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages