🌾 Optimizing Crop Selection Using Machine Learning for Sustainable Agriculture

📘 Project Overview

This project leverages machine learning to recommend the most suitable crop for a given set of soil and environmental conditions. The system promotes sustainable agriculture by enhancing crop yields, reducing input waste, and improving decision-making for farmers. It is particularly useful in regions where access to agronomic expertise and resources is limited.

🌍 Societal and Industrial Impact

Provides a data-driven solution for farmers to choose the right crop based on soil and climate.
Helps in resource optimization: water, fertilizers, and land can be used more efficiently.
Aids in climate-resilient agriculture, important for regions vulnerable to environmental changes.
Can be used by agricultural departments or agri-tech companies to build advisory systems.

🎯 Problem Statement

Farmers often make crop selection decisions based on tradition or intuition, which can lead to poor yields. There is a need for a scientific, automated tool to determine the most appropriate crop based on real-time soil and environmental data.

🔍 Research Questions

What is the most suitable crop for a given combination of soil and climatic parameters?
Why is it important to guide crop selection using data?
How can machine learning models improve crop selection accuracy and sustainability?

🧠 Contributions

Built an interactive EDA tool to summarize and compare crop requirements.
Used K-Means clustering to identify natural groupings in crop types.
Trained a Logistic Regression classifier to predict the optimal crop for a given set of conditions.
Visualized crop classification performance using a confusion matrix.
Extracted insights about seasonal crops and specific nutrient requirements.

📦 Dataset

Type: Secondary Dataset: agricultural_production_optimization.csv
The dataset includes the following features:
- N (Nitrogen content in the soil)
- P (Phosphorus content)
- K (Potassium content)
- Temperature (In Celsius)
- Humidity (In percentage)
- pH (pH value of the soil)
- Rainfall (In mm)
- Label (Crop name - target variable)

🛠 Methodology

Data Loading & Cleaning:
- Checked for null values and ensured data integrity.
Exploratory Data Analysis (EDA):
- Used interactive widgets to explore crop-specific requirements for nutrients, temperature, humidity, etc.
- Compared average requirements across crops.
Clustering with K-Means:
- Grouped crops into clusters based on environmental and soil needs.
Supervised Learning:
- Applied Logistic Regression for crop prediction.
Model Evaluation:
- Calculated accuracy, precision, recall, F1-score.
- Visualized performance using a confusion matrix.

🤖 ML Models Used

K-Means Clustering: for unsupervised pattern discovery among crops.
Logistic Regression: for supervised crop prediction.

❓ Why These Models?

K-Means provides useful insights into natural groupings of crop types, beneficial for segmentation.
Logistic Regression offers simplicity, interpretability, and reasonable performance for multiclass classification problems.

📏 Evaluation Metrics

For model evaluation, the following metrics were used:

Accuracy: Overall prediction correctness.
Precision (Weighted): How precise each prediction is across all classes.
Recall (Weighted): Ability of the model to capture all relevant crops.
F1 Score (Weighted): Harmonic mean of precision and recall.
Confusion Matrix: Visual tool to identify misclassified crops.

This Model

Accuracy: 0.9681818181818181
Precision: 0.9699452867394045
Recall: 0.9681818181818181
F1 Score: 0.9681168080082031

⚙️ Hyperparameter Tuning

Basic parameters for KMeans: n_clusters = 4 init = 'k-means++' max_iter = 300 random_state = 0

Logistic Regression was used with default settings for initial evaluation. Further improvements could include:

GridSearchCV for hyperparameter tuning
Trying more advanced classifiers (e.g., Random Forest, XGBoost)

🔍 Findings from Clustering

Crops were grouped into four natural clusters based on their resource requirements. Each cluster revealed a different environmental preference, helping identify patterns for:

Low-Nutrient Crops
High-Rainfall Crops
High-Temperature Crops
Balanced Crops

🧠 Reflection and Argument

Strengths:

Clear methodology with interactive analysis
Practical real-world application for farmers
Easy-to-understand model (Logistic Regression)

Limitations:

Logistic Regression may underperform with complex, non-linear data
Dataset may not account for regional crop constraints or pests

Future Work:

Integrate satellite and real-time IoT sensor data
Incorporate regional constraints and climate anomalies
Extend to multi-crop recommendation and yield prediction

📄 Report Submission

Optimizing Crop Selection Using Machine Learning for Sustainable Agriculture

👩‍🔬 Authors - Vanny Sothea 📆 Date of Submission - June 3, 2025

📘 Abstract This project aims to support sustainable agricultural practices by leveraging machine learning to recommend the most suitable crop for specific environmental and soil conditions. By analyzing features such as nitrogen (N), phosphorus (P), potassium (K), temperature, humidity, pH, and rainfall, a classification model can predict optimal crop types. The resulting tool helps farmers improve crop yield, minimize resource waste, and contribute to global food security.

🧠 Keywords Sustainable agriculture, crop recommendation, machine learning, logistic regression, clustering, agricultural optimization

📌 Objectives

To analyze the relationship between soil and climate factors and crop suitability
To develop a machine learning-based model for accurate crop prediction
To help farmers make data-driven decisions for crop selection

🧪 Methodology Summary

Dataset Source: Public CSV dataset of environmental and soil metrics
Tools & Libraries: Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, ipywidgets
Models Used: K-Means Clustering for unsupervised crop grouping, Logistic Regression for classification
Evaluation Metrics: Accuracy, Precision, Recall, F1 Score, Confusion Matrix

🧾 Results Summary

Logistic Regression achieved high performance with:
- Accuracy: ~0.98
- Precision: ~0.98
- Recall: ~0.98
- F1 Score: ~0.98
Clustering revealed four distinct crop groups based on shared environmental requirements
Visualization tools and interactive widgets provide valuable insights for agronomists and farmers

🧠 Key Insights

Different crops have significantly different environmental needs.
Crops like rice and cotton require high nitrogen, whereas crops like apple and orange thrive in low temperatures.
Logistic regression can be effectively used in multiclass classification for crop recommendation.

💡 Future Work

Integration of real-time weather and soil sensor data
Expansion to region-specific crop datasets
Implementation of a mobile/web application for farmer access
Experimentation with ensemble models (Random Forest, XGBoost, etc.)

🔗 Code Access Google Colab: Optimizing_Crop_Selection.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
agricultural_production_optimization.csv		agricultural_production_optimization.csv
optimizing_crop_selection.py		optimizing_crop_selection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌾 Optimizing Crop Selection Using Machine Learning for Sustainable Agriculture

📘 Project Overview

📚 Table of Contents

🌍 Societal and Industrial Impact

🎯 Problem Statement

🔍 Research Questions

🧠 Contributions

📦 Dataset

🛠 Methodology

🤖 ML Models Used

❓ Why These Models?

📏 Evaluation Metrics

⚙️ Hyperparameter Tuning

🔍 Findings from Clustering

🧠 Reflection and Argument

📄 Report Submission

Optimizing Crop Selection Using Machine Learning for Sustainable Agriculture

About

Uh oh!

Releases

Packages

Languages

VannySothea/Optimizing-Crop-Selection-Using-Machine-Learning-for-Sustainable-Agriculture

Folders and files

Latest commit

History

Repository files navigation

🌾 Optimizing Crop Selection Using Machine Learning for Sustainable Agriculture

📘 Project Overview

📚 Table of Contents

🌍 Societal and Industrial Impact

🎯 Problem Statement

🔍 Research Questions

🧠 Contributions

📦 Dataset

🛠 Methodology

🤖 ML Models Used

❓ Why These Models?

📏 Evaluation Metrics

⚙️ Hyperparameter Tuning

🔍 Findings from Clustering

🧠 Reflection and Argument

📄 Report Submission

Optimizing Crop Selection Using Machine Learning for Sustainable Agriculture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages