Skip to content

Machine Learning system combining LSTM neural networks and BERT-based sentiment analysis for stock market prediction. Achieved 0.002 MAE with LSTM and 90% accuracy with NLP model, improving trading returns by 10%.

Notifications You must be signed in to change notification settings

nuglifeleoji/Stock-Prediction-with-NLP

Repository files navigation

📈 Stock Closing Price Prediction

Python TensorFlow PyTorch License

Machine learning system combining LSTM neural networks and BERT-based sentiment analysis for stock market prediction.

🎯 Project Overview

This project implements two complementary approaches for stock market prediction:

  1. LSTM Model: Predicted stock returns through lag-based features and MinMaxScaler normalization on an LSTM architecture
  2. NLP Sentiment Analysis: Extracted stock-related news data, built BERT-based embeddings, and trained a sentiment-driven impact model

🏆 Key Results

  • LSTM Model: Achieved test MAE of 0.002 through lag-based features and MinMaxScaler normalization
  • NLP Model: Achieved 90% validation accuracy and increased simulated trading return by 10%
  • Trading Strategy: Outperformed buy-and-hold benchmark through sentiment-driven predictions

🚀 Technologies Used

  • Deep Learning: TensorFlow/Keras LSTM architecture with dropout regularization
  • NLP: BERT-based embeddings (FinBERT) for financial text analysis
  • Data Processing: MinMaxScaler normalization, lag-based feature engineering
  • Trading Simulation: Realistic backtesting with transaction costs

📊 Model Details

🧠 LSTM Model

  • Architecture: 3-layer LSTM (50 units each) with 20% dropout regularization
  • Input Features: 2-day lag sequences of AAPL and S&P 500 closing prices
  • Normalization: MinMaxScaler for optimal neural network performance
  • Performance: Test MAE of 0.002

🤖 NLP Sentiment Model

  • Text Processing: BERT-based embeddings using FinBERT (financial domain-specific)
  • Sentiment Analysis: Multi-dimensional sentiment scoring (polarity, subjectivity)
  • Classification: Ensemble of Random Forest and Logistic Regression
  • Performance: 90% validation accuracy for next-day stock movement prediction

💰 Trading Strategy

  • Signal Generation: Sentiment-driven stock direction predictions
  • Position Sizing: Binary and probability-weighted strategies
  • Performance: 10% improvement in simulated trading returns vs buy-and-hold

🚀 Quick Start

# Install dependencies
pip install -r requirements.txt

# Note: Large data files are excluded from Git due to size limits
# See data/README.md for download instructions

# Run complete pipeline
python run_all_models.py

# Or run individual models
python run_lstm_model.py          # LSTM model
python run_nlp_model.py           # NLP sentiment model

📁 Project Structure

Stock_Closing_Price_Prediction/
├── Stock_Closing_Price_Prediction.ipynb  # LSTM model implementation
├── NLP_Sentiment_Analysis.ipynb          # BERT-based sentiment analysis
├── run_lstm_model.py                     # LSTM execution script
├── run_nlp_model.py                      # NLP execution script
├── run_all_models.py                     # Master execution script
├── data/                                 # Dataset files
│   ├── apple_news_data.csv              # Apple news (2024-2025)
│   ├── apple_prices.csv                 # Apple stock prices
│   ├── stock_prices.csv                 # Historical stock data (1990-2017)
│   └── SP500.csv                        # S&P 500 index data
├── utils/                               # Utility modules
│   ├── data_preprocessing.py            # Data processing utilities
│   ├── model_evaluation.py              # Performance evaluation
│   ├── trading_simulation.py            # Trading strategy simulation
│   └── visualization.py                 # Visualization utilities
└── requirements.txt                     # Python dependencies

📈 Results

Model Performance

  • LSTM Model: Test MAE of 0.002 (extremely low prediction error)
  • NLP Model: 90% validation accuracy for stock direction prediction
  • Trading Strategy: 10% improvement over buy-and-hold benchmark

Key Findings

  • Sentiment features contribute significantly to prediction accuracy
  • LSTM architecture effectively captures temporal patterns in stock prices
  • Combined approach outperforms individual technical or fundamental analysis

🔬 Research Methodology

📚 Literature Review

  • Technical Analysis: LSTM networks for time series forecasting
  • Fundamental Analysis: NLP sentiment analysis in finance
  • Behavioral Finance: News impact on market movements
  • Quantitative Trading: Risk-adjusted performance metrics

🧪 Experimental Design

  1. Data Collection: Multi-source financial and news data
  2. Preprocessing: Advanced cleaning and normalization
  3. Feature Engineering: Technical + fundamental feature fusion
  4. Model Development: Deep learning + machine learning ensemble
  5. Validation: Time-series aware cross-validation
  6. Backtesting: Realistic trading simulation

📊 Statistical Validation

  • Significance Testing: Bootstrap confidence intervals
  • Robustness Checks: Out-of-sample validation
  • Risk Analysis: VaR, CVaR, maximum drawdown
  • Benchmarking: Comparison with market indices

🛠️ Advanced Configuration

🔧 LSTM Model Parameters

python run_lstm_model.py \
    --ticker AAPL \
    --epochs 50 \
    --batch_size 64 \
    --sequence_length 5 \
    --test_split 0.15 \
    --save_model

🤖 NLP Model Parameters

python run_nlp_model.py \
    --model_name "ProsusAI/finbert" \
    --test_split 0.25 \
    --confidence_threshold 0.8 \
    --initial_capital 100000 \
    --transaction_cost 0.005

🚀 Master Pipeline Options

python run_all_models.py \
    --output_dir institutional_results \
    --quick \
    --skip_lstm  # or --skip_nlp

About

Machine Learning system combining LSTM neural networks and BERT-based sentiment analysis for stock market prediction. Achieved 0.002 MAE with LSTM and 90% accuracy with NLP model, improving trading returns by 10%.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published