Machine learning system combining LSTM neural networks and BERT-based sentiment analysis for stock market prediction.
This project implements two complementary approaches for stock market prediction:
- LSTM Model: Predicted stock returns through lag-based features and MinMaxScaler normalization on an LSTM architecture
- NLP Sentiment Analysis: Extracted stock-related news data, built BERT-based embeddings, and trained a sentiment-driven impact model
- LSTM Model: Achieved test MAE of 0.002 through lag-based features and MinMaxScaler normalization
- NLP Model: Achieved 90% validation accuracy and increased simulated trading return by 10%
- Trading Strategy: Outperformed buy-and-hold benchmark through sentiment-driven predictions
- Deep Learning: TensorFlow/Keras LSTM architecture with dropout regularization
- NLP: BERT-based embeddings (FinBERT) for financial text analysis
- Data Processing: MinMaxScaler normalization, lag-based feature engineering
- Trading Simulation: Realistic backtesting with transaction costs
- Architecture: 3-layer LSTM (50 units each) with 20% dropout regularization
- Input Features: 2-day lag sequences of AAPL and S&P 500 closing prices
- Normalization: MinMaxScaler for optimal neural network performance
- Performance: Test MAE of 0.002
- Text Processing: BERT-based embeddings using FinBERT (financial domain-specific)
- Sentiment Analysis: Multi-dimensional sentiment scoring (polarity, subjectivity)
- Classification: Ensemble of Random Forest and Logistic Regression
- Performance: 90% validation accuracy for next-day stock movement prediction
- Signal Generation: Sentiment-driven stock direction predictions
- Position Sizing: Binary and probability-weighted strategies
- Performance: 10% improvement in simulated trading returns vs buy-and-hold
# Install dependencies
pip install -r requirements.txt
# Note: Large data files are excluded from Git due to size limits
# See data/README.md for download instructions
# Run complete pipeline
python run_all_models.py
# Or run individual models
python run_lstm_model.py # LSTM model
python run_nlp_model.py # NLP sentiment modelStock_Closing_Price_Prediction/
├── Stock_Closing_Price_Prediction.ipynb # LSTM model implementation
├── NLP_Sentiment_Analysis.ipynb # BERT-based sentiment analysis
├── run_lstm_model.py # LSTM execution script
├── run_nlp_model.py # NLP execution script
├── run_all_models.py # Master execution script
├── data/ # Dataset files
│ ├── apple_news_data.csv # Apple news (2024-2025)
│ ├── apple_prices.csv # Apple stock prices
│ ├── stock_prices.csv # Historical stock data (1990-2017)
│ └── SP500.csv # S&P 500 index data
├── utils/ # Utility modules
│ ├── data_preprocessing.py # Data processing utilities
│ ├── model_evaluation.py # Performance evaluation
│ ├── trading_simulation.py # Trading strategy simulation
│ └── visualization.py # Visualization utilities
└── requirements.txt # Python dependencies
- LSTM Model: Test MAE of 0.002 (extremely low prediction error)
- NLP Model: 90% validation accuracy for stock direction prediction
- Trading Strategy: 10% improvement over buy-and-hold benchmark
- Sentiment features contribute significantly to prediction accuracy
- LSTM architecture effectively captures temporal patterns in stock prices
- Combined approach outperforms individual technical or fundamental analysis
- Technical Analysis: LSTM networks for time series forecasting
- Fundamental Analysis: NLP sentiment analysis in finance
- Behavioral Finance: News impact on market movements
- Quantitative Trading: Risk-adjusted performance metrics
- Data Collection: Multi-source financial and news data
- Preprocessing: Advanced cleaning and normalization
- Feature Engineering: Technical + fundamental feature fusion
- Model Development: Deep learning + machine learning ensemble
- Validation: Time-series aware cross-validation
- Backtesting: Realistic trading simulation
- Significance Testing: Bootstrap confidence intervals
- Robustness Checks: Out-of-sample validation
- Risk Analysis: VaR, CVaR, maximum drawdown
- Benchmarking: Comparison with market indices
python run_lstm_model.py \
--ticker AAPL \
--epochs 50 \
--batch_size 64 \
--sequence_length 5 \
--test_split 0.15 \
--save_modelpython run_nlp_model.py \
--model_name "ProsusAI/finbert" \
--test_split 0.25 \
--confidence_threshold 0.8 \
--initial_capital 100000 \
--transaction_cost 0.005python run_all_models.py \
--output_dir institutional_results \
--quick \
--skip_lstm # or --skip_nlp