Skip to content

Commit 2e7dafd

Browse files
authored
Merge pull request #66 from duyet/claude/ultrathink-vision-014YTsREnUYKNswgDtwbsgQc
feat: establish project vision and craftsmanship principles
2 parents 233b5e5 + 0a2f788 commit 2e7dafd

File tree

11 files changed

+2580
-61
lines changed

11 files changed

+2580
-61
lines changed

.github/workflows/blank.yml

Lines changed: 0 additions & 17 deletions
This file was deleted.

.github/workflows/validate.yml

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
name: Quality Assurance
2+
3+
on:
4+
push:
5+
branches: [ main, master, 'claude/**' ]
6+
pull_request:
7+
branches: [ main, master ]
8+
9+
jobs:
10+
validate:
11+
name: Validate Wordlists
12+
runs-on: ubuntu-latest
13+
14+
steps:
15+
- name: Checkout repository
16+
uses: actions/checkout@v4
17+
18+
- name: Set up Python
19+
uses: actions/setup-python@v5
20+
with:
21+
python-version: '3.11'
22+
23+
- name: Run validation suite
24+
run: |
25+
echo "🔍 Running wordlist validation..."
26+
python3 scripts/validate.py
27+
28+
- name: Check for encoding issues
29+
run: |
30+
echo "📝 Checking file encodings..."
31+
if file *.txt *.lst 2>/dev/null | grep -v "UTF-8\|ASCII"; then
32+
echo "⚠️ Warning: Non-UTF-8 files detected"
33+
else
34+
echo "✓ All files are UTF-8 or ASCII"
35+
fi
36+
37+
- name: Generate statistics
38+
run: |
39+
echo "📊 Generating statistics..."
40+
echo "Total wordlist files: $(find . -type f \( -name "*.txt" -o -name "*.lst" \) ! -path "./.git/*" | wc -l)"
41+
echo "Total size: $(du -sh . | cut -f1)"
42+
echo "Total lines: $(find . -type f \( -name "*.txt" -o -name "*.lst" \) ! -path "./.git/*" -exec wc -l {} + | tail -1 | awk '{print $1}')"
43+
44+
- name: Upload manifest
45+
uses: actions/upload-artifact@v4
46+
with:
47+
name: wordlist-manifest
48+
path: manifest.json
49+
retention-days: 30
50+
51+
- name: Validate manifest
52+
run: |
53+
if [ -f manifest.json ]; then
54+
echo "✓ Manifest generated successfully"
55+
cat manifest.json | python3 -m json.tool > /dev/null
56+
echo "✓ Manifest is valid JSON"
57+
else
58+
echo "✗ Manifest generation failed"
59+
exit 1
60+
fi
61+
62+
security:
63+
name: Security Checks
64+
runs-on: ubuntu-latest
65+
66+
steps:
67+
- name: Checkout repository
68+
uses: actions/checkout@v4
69+
70+
- name: Check for sensitive data
71+
run: |
72+
echo "🔒 Scanning for potential sensitive data patterns..."
73+
74+
# Check for API keys, tokens, etc.
75+
if grep -r -i -E "(api[_-]?key|secret[_-]?key|password|token|bearer)" *.txt *.lst 2>/dev/null | grep -v "password" | head -5; then
76+
echo "⚠️ Warning: Potential sensitive data patterns found"
77+
echo "⚠️ Please review carefully"
78+
else
79+
echo "✓ No obvious sensitive data patterns detected"
80+
fi
81+
82+
- name: Verify file sizes
83+
run: |
84+
echo "📏 Checking for unexpectedly large files..."
85+
find . -type f \( -name "*.txt" -o -name "*.lst" \) ! -path "./.git/*" -size +100M -exec ls -lh {} \; | while read line; do
86+
echo "⚠️ Large file detected: $line"
87+
done || echo "✓ All files within reasonable size limits"
88+
89+
integrity:
90+
name: Integrity Verification
91+
runs-on: ubuntu-latest
92+
93+
steps:
94+
- name: Checkout repository
95+
uses: actions/checkout@v4
96+
97+
- name: Set up Python
98+
uses: actions/setup-python@v5
99+
with:
100+
python-version: '3.11'
101+
102+
- name: Verify file integrity
103+
run: |
104+
echo "🔐 Verifying file integrity..."
105+
python3 << 'PYTHON_SCRIPT'
106+
import os
107+
import sys
108+
from pathlib import Path
109+
110+
corrupted = []
111+
checked = 0
112+
113+
for ext in ['*.txt', '*.lst']:
114+
for filepath in Path('.').rglob(ext):
115+
if '.git' in filepath.parts:
116+
continue
117+
118+
checked += 1
119+
try:
120+
with open(filepath, 'rb') as f:
121+
content = f.read()
122+
# Check for null bytes (binary corruption)
123+
if b'\x00' in content:
124+
corrupted.append(str(filepath))
125+
except Exception as e:
126+
print(f"⚠️ Error reading {filepath}: {e}")
127+
corrupted.append(str(filepath))
128+
129+
print(f"Checked {checked} files")
130+
131+
if corrupted:
132+
print(f"✗ Corrupted files detected ({len(corrupted)}):")
133+
for f in corrupted[:10]:
134+
print(f" - {f}")
135+
sys.exit(1)
136+
else:
137+
print("✓ No corrupted files detected")
138+
PYTHON_SCRIPT
139+
140+
- name: Check line endings
141+
run: |
142+
echo "📄 Checking line endings consistency..."
143+
if find . -type f \( -name "*.txt" -o -name "*.lst" \) ! -path "./.git/*" -exec file {} \; | grep -i "CRLF" | head -5; then
144+
echo "⚠️ Warning: Windows line endings (CRLF) detected"
145+
echo "⚠️ Consider normalizing to Unix (LF) for consistency"
146+
else
147+
echo "✓ Line endings are consistent"
148+
fi

.gitignore

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
build/
8+
develop-eggs/
9+
dist/
10+
downloads/
11+
eggs/
12+
.eggs/
13+
lib/
14+
lib64/
15+
parts/
16+
sdist/
17+
var/
18+
wheels/
19+
*.egg-info/
20+
.installed.cfg
21+
*.egg
22+
23+
# Virtual environments
24+
venv/
25+
ENV/
26+
env/
27+
28+
# IDE
29+
.vscode/
30+
.idea/
31+
*.swp
32+
*.swo
33+
*~
34+
35+
# OS
36+
.DS_Store
37+
Thumbs.db
38+
39+
# Temporary files
40+
*.tmp
41+
*.bak
42+
*.log

CHANGELOG.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
### Added
11+
- **Automation & Quality Control**
12+
- Python validation tool (`scripts/validate.py`) for comprehensive wordlist validation
13+
- Deduplication tool (`scripts/deduplicate.py`) for removing duplicates
14+
- Real CI/CD pipeline with quality assurance checks
15+
- Manifest generation system for metadata tracking
16+
- Security scanning for sensitive data patterns
17+
- Integrity verification for file corruption detection
18+
19+
- **Documentation**
20+
- CLAUDE.md - Project philosophy and guiding principles
21+
- CONTRIBUTING.md - Comprehensive contribution guidelines
22+
- CHANGELOG.md - This file, tracking all changes
23+
- Enhanced README with usage examples and decision matrices
24+
25+
- **Infrastructure**
26+
- GitHub Actions workflows for automated validation
27+
- Metadata framework for tracking wordlist provenance
28+
- Statistics generation on every commit
29+
30+
### Changed
31+
- **GitHub Actions**: Replaced placeholder "Hello, world!" workflow with meaningful validation suite
32+
- **Quality Standards**: Established encoding, format, and validation requirements
33+
- **Project Organization**: Defined clear directory structure and naming conventions
34+
35+
### Improved
36+
- **Documentation**: Transformed from basic catalog to comprehensive guide
37+
- **Validation**: Automated checks for encoding, duplicates, and integrity
38+
- **Community**: Clear guidelines for ethical use and contribution
39+
40+
### Philosophy
41+
This update represents a transformation from a **static archive** to a **living toolkit**. We're not just storing wordlists—we're curating them with intelligence, validating them automatically, and documenting them thoroughly.
42+
43+
---
44+
45+
## [1.0.0] - 2017-10-15
46+
47+
### Added
48+
- Forced-browsing wordlists by @danivijay
49+
- Comprehensive directory/file discovery lists
50+
- Categorized by type (Conf, Database, Language, Project)
51+
- Contextual paths (admin, test, debug, error)
52+
- Cain.txt password list (306,706 entries)
53+
54+
### Summary
55+
Last major content update before entering maintenance mode. Established the core collection that has served the security community for years.
56+
57+
---
58+
59+
## [Historical] - 2015-2017
60+
61+
### Initial Collection (2015-2016)
62+
- 2.1M password list from dazzlepod.com
63+
- Facebook first names dataset (4.3M entries)
64+
- Bitcoin brainwallet dictionary (394,748 words)
65+
- US cities and usernames collections
66+
- SecLists password compilation (1M entries)
67+
- SKTorrent username and password lists
68+
- Filtered password sets (7+ and 8+ character requirements)
69+
- Indonesian cities list
70+
- 10,000 common subdomains
71+
72+
### Contributors
73+
Special thanks to all contributors who built this collection:
74+
- Van-Duyet Le (@duyet) - Project creator and primary maintainer
75+
- Taufiq Sumadi (@taufiqsumadi)
76+
- San Sayidul Akdam Augusta (@sanAkdam)
77+
- Dani Vijay (@danivijay) - Forced-browsing wordlists
78+
79+
---
80+
81+
## Future Roadmap
82+
83+
### Planned Improvements
84+
- [ ] Reorganize directory structure for better navigation
85+
- [ ] Add compressed versions (.gz) for large files
86+
- [ ] Implement wordlist effectiveness metrics
87+
- [ ] Create specialized subsets (top 100, top 1000, etc.)
88+
- [ ] Add modern password patterns (passphrases, emoji passwords)
89+
- [ ] Integrate with breach databases for automatic updates
90+
- [ ] Build web interface for searching and filtering
91+
- [ ] Create comparison matrices for choosing the right wordlist
92+
- [ ] Add localized wordlists for non-English passwords
93+
94+
### Community Requests
95+
Have a suggestion? [Open an issue](https://github.com/duyet/bruteforce-database/issues) or start a discussion!
96+
97+
---
98+
99+
## Versioning Strategy
100+
101+
We use semantic versioning:
102+
- **MAJOR**: Significant reorganization or breaking changes
103+
- **MINOR**: New wordlists or major improvements
104+
- **PATCH**: Updates to existing wordlists or documentation
105+
106+
Current version reflects the **quality transformation**, not just content updates.
107+
108+
---
109+
110+
*"The only way to do great work is to love what you do." - Steve Jobs*

0 commit comments

Comments
 (0)