The Telegram Snowball Sampling Tool is a Python-based utility designed for conducting comprehensive network analysis of Telegram channels through three main methods:
- Forwarded Messages - Automatically discovers channels through message forwards
- Channel Recommendations - Collects Telegram's built-in channel recommendations
- URL Extraction - Maps external connections by extracting URLs from messages
The tool creates detailed edge lists for network visualization and provides extensive analysis capabilities.
This tool implements multiple discovery methods to map the complex network structure of Telegram channels:
Snowball sampling discovers channels through forwarded messages, starting with a seed channel and expanding outward. This method identifies both the origin and dissemination paths of information, creating a directed network structure.
The tool leverages Telegram's built-in recommendation algorithm to discover topically related channels. This provides additional network insights beyond just forward relationships.
By capturing external URLs shared in messages, the tool maps connections between Telegram channels and external websites, providing a more comprehensive view of the information ecosystem.
The Telegram Snowball Sampling Tool can take several days to complete its run due to the exponential nature of the sampling process. Each iteration potentially adds a new set of channels, growing exponentially (e.g., 3 channels in the first iteration can lead to 9 in the second and 27 in the third).
- Limit Iterations: Keep to 3 iterations or fewer to balance depth and runtime
- Filter Forwards: Focus on channels with multiple mentions to target relevant content
- Limit Posts Per Channel: Set a reasonable maximum for posts to check per channel
- Adjust Feature Settings: Selectively enable/disable recommendations and URL extraction based on your needs
- Automated discovery of Telegram channels through three methods:
- Forwarded message tracking
- Channel recommendations retrieval
- URL extraction from messages
- Customizable parameters for depth, frequency thresholds, and scope
- Comprehensive edge list creation for network analysis
- Network visualization ready output for tools like Gephi
- Network metrics calculation and analysis
- Environment-based configuration system
- Detailed logging for monitoring progress
telegram-snowball-sampling/
├── src/
│ └── telegram_snowball_sampling/
│ ├── __init__.py # Package exports
│ ├── check_api_credentials.py # API credential checks
│ ├── config.py # Configuration manager
│ ├── csv_loader.py # CSV seed loader
│ ├── database.py # SQLite persistence
│ ├── edge_list.py # Handles edge list creation
│ ├── merge_csv_data.py # CSV merging utility
│ ├── recommendations.py # Channel recommendations module
│ └── utils.py # Utility functions
├── example.env # Template environment variables
├── .env # Your environment variables (created from example.env)
├── main.py # Main application script
├── scripts/ # CLI wrappers
│ ├── check_api_credentials.py
│ └── network_analysis.py
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── EdgeList/ # Created during execution - edge list files
├── merged/ # Created during execution - merged results
├── network_analysis/ # Created during analysis - network metrics
└── results/ # Created during execution - individual run results
- Python 3.10 or higher
- Telethon library
- NetworkX and Matplotlib libraries for analysis and visualization
- A registered Telegram application (for API credentials)
- All dependencies listed in requirements.txt
- Clone the repository:
git clone https://github.com/yourusername/telegram-snowball-sampling.git
cd telegram-snowball-sampling- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the required dependencies:
pip install -r requirements.txtThe tool automatically creates a .env file from the template and will prompt you for your Telegram API credentials when first run. You can also manually configure the following options in the .env file:
| Variable | Description | Default |
|---|---|---|
| TELEGRAM_API_ID | Your Telegram API ID | (required) |
| TELEGRAM_API_HASH | Your Telegram API Hash | (required) |
| TELEGRAM_SESSION_NAME | Name for the Telegram session | session_name |
| DEFAULT_MIN_MENTIONS | Minimum mentions threshold | 1 |
| DEFAULT_ITERATIONS | Number of iterations | 3 |
| DEFAULT_MAX_POSTS | Maximum posts to check per channel (set to none to scrape full channel) |
none |
| DEFAULT_INCLUDE_FORWARDS | Whether to include forwarded message scraping | True |
| CHECKPOINT_INTERVAL | Iterations between resume checkpoints | 10 |
| DEFAULT_INCLUDE_RECOMMENDATIONS | Whether to include channel recommendations | True |
| DEFAULT_RECOMMENDATIONS_DEPTH | Maximum depth for recommendations | 2 |
| DEFAULT_INCLUDE_URLS | Whether to extract URLs from messages | True |
| RESULTS_FOLDER | Directory for storing results | results |
| MERGED_FOLDER | Directory for merged results | merged |
| EDGE_LIST_FOLDER | Directory for edge list files | EdgeList |
| EDGE_LIST_FILENAME | Name of the edge list file | Edge_List.csv |
| MERGED_FILENAME | Name of the merged file | merged_channels.csv |
| API_DETAILS_FILE | Backup file for API details | api_values.txt |
| DATA_FOLDER | Directory for SQLite data | data |
| DATABASE_FILENAME | SQLite database filename | telegram_snowball.sqlite |
| DEBUG | Enable debug logging | False |
Run the main script:
python main.pyThe script will:
- Prompt for Telegram API credentials if not configured
- Ask for seed channels (comma-separated)
- Request parameters for iterations, minimum mentions, etc.
- Begin the data collection process using all enabled methods
- Save results to CSV and edge list files
- Offer to run network analysis on the collected data
If you are new to Telegram scraping, follow this checklist to get a first run working quickly.
-
Create Telegram API credentials Get them from https://my.telegram.org/auth and keep the API ID and API Hash ready.
-
Create your
.envCopyexample.envto.envand paste in your API ID and API Hash, or just runpython main.pyonce and follow the prompt. -
Start the script
python main.py- Choose your seed input Pick one of:
- CSV file (recommended for multiple seeds)
- Text file (.txt)
- Comma-separated list
- Choose crawl mode
- Recommendations only (fast, no message scraping)
- Recommendations + forwards (slower, more comprehensive)
- Set basic limits If you chose forwards, you will be asked:
- Minimum mentions (use
1for broad discovery) - Max posts per channel (use
noneto scrape full channel, or a number like100to limit runtime)
-
Optional settings If recommendations are enabled, choose depth (start with
1or2). If forwards are enabled, choose whether to extract URLs. -
Find your results Look in:
results/for run CSVs and URL listsEdgeList/forEdge_List.csvmerged/for consolidated channelsnetwork_analysis/for metrics and visualizations (if run)
If a run is interrupted, the tool can resume from the last checkpoint the next time you start it.
Analyzes messages in each channel to find forwards from other channels. This reveals information flow between channels.
Retrieves Telegram's own channel recommendations for each discovered channel. These recommendations are based on Telegram's algorithm which considers content similarity and user overlap.
Extracts all URLs shared in messages across channels, creating connections between Telegram channels and external websites.
The tool generates several outputs:
-
Individual Run Results (in the
resultsfolder):- CSV files containing channel IDs, names, and usernames
- URL lists from message content
-
Edge List (in the
EdgeListfolder):- CSV file with network connections, including:
- Forward relationships
- Recommendation relationships
- URL connections
- Connection types and weights for advanced analysis
- CSV file with network connections, including:
-
Merged Results (in the
mergedfolder):- Consolidated CSV with all unique channels found across multiple runs
-
Network Analysis (in the
network_analysisfolder, when analysis is run):- Network metrics in Excel format
- Gephi-compatible GEXF file for visualization
- Basic network visualization image
The included network analysis script (scripts/network_analysis.py) provides:
-
Basic Network Metrics:
- Node and edge counts
- Network density
- Connected components
- Average path length
-
Key Influencer Identification:
- Top source channels (with most outgoing connections)
- Top receiver channels (with most incoming connections)
-
Connection Type Analysis:
- Distribution of connection types (forwards vs. recommendations vs. URLs)
- Weight distribution analysis
-
Visualization:
- Gephi-compatible GEXF file
- Basic visualization image
- Network metrics in Excel format
Run network analysis separately:
python scripts/network_analysis.py --edge-list EdgeList/Edge_List.csv --output-dir network_analysisFor advanced network visualization:
- Download and install Gephi
- Import the GEXF file from the network_analysis folder
- Apply layouts like ForceAtlas2 to organize the network
- Style nodes based on metrics like degree or betweenness
- Run community detection algorithms to identify clusters
A detailed guide is created in the results folder after each run.
This tool is for educational and research purposes only. Please ensure that you comply with Telegram's terms of service and respect privacy and ethical guidelines when using this tool.
Contributions are welcome! Please feel free to submit a Pull Request.
- Add language detection for message content filtering
- Implement community detection algorithms
- Add multi-API parallel processing for improved performance
- Create live network visualization capabilities
