Skip to content
This repository was archived by the owner on Aug 29, 2025. It is now read-only.

Commit ceba744

Browse files
committed
Version 3.1.3: Fix ChatterBox character switching crashes with short text segments
- Fixed ChatterBox sequential generation bug causing CUDA tensor indexing crashes - Added dynamic space padding for short text segments in character switching mode - Space padding preserves speech quality while providing sufficient tokens - Improved version bump script to prevent downgrade attempts
1 parent 963556d commit ceba744

File tree

10 files changed

+174
-11
lines changed

10 files changed

+174
-11
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,13 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [3.1.3] - 2025-07-18
9+
10+
### Fixed
11+
12+
- ChatterBox character switching crashes with short text segments by implementing dynamic space padding
13+
- Sequential generation CUDA tensor indexing errors in character switching mode
14+
- Version bump script now prevents downgrade attempts
815
## [3.1.2] - 2025-07-17
916

1017
### Added

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
[![Forks][forks-shield]][forks-url]
77
[![Dynamic TOML Badge][version-shield]][version-url]
88

9-
# ComfyUI ChatterBox SRT Voice (diogod) v3.1.2
9+
# ComfyUI ChatterBox SRT Voice (diogod) v3.1.3
1010

1111
*This is a refactored node, originally created by [ShmuelRonen](https://github.com/ShmuelRonen/ComfyUI_ChatterBox_Voice).*
1212

chatterbox_srt/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"""
55

66
# Version info
7-
__version__ = "3.1.2"
7+
__version__ = "3.1.3"
88
__author__ = "Diogod"
99

1010
# Import the new SRT modules

core/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"""
55

66
# Version info
7-
__version__ = "3.1.2"
7+
__version__ = "3.1.3"
88
__author__ = "Diogod"
99

1010
# Make imports available at package level

docs/test_cases.txt

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
CHATTERBOX CHARACTER SWITCHING BUG TEST CASES
2+
============================================
3+
4+
Test 1 fails, log:
5+
6+
📦 Loading local ChatterBox models from: J:\stablediffusion1111s2\Data\Packages\ComfyUIPy129\ComfyUI\models\chatterbox
7+
input frame rate=25
8+
loaded PerthNet (Implicit) at step 250,000
9+
✅ Successfully loaded all local ChatterBox models
10+
🎭 ChatterBox: Character switching mode - found characters: narrator, female_01, male_01
11+
🔄 Using main voice for character 'narrator' (not found in voice folders)
12+
🎭 Using character voice for 'female_01'
13+
🎭 Using character voice for 'male_01'
14+
🎤 Generating ChatterBox segment 1/6 chunk 1/1 for 'narrator'...
15+
Sampling: 0%| | 0/1000 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
16+
Sampling: 5%|███▋ | 52/1000 [00:01<00:32, 28.77it/s]
17+
🎤 Generating ChatterBox segment 2/6 chunk 1/1 for 'female_01'...
18+
Sampling: 2%|█▌ | 23/1000 [00:00<00:33, 28.86it/s]
19+
🎤 Generating ChatterBox segment 3/6 chunk 1/1 for 'male_01'...
20+
Reference mel length is not equal to 2 * reference token length.
21+
22+
Sampling: 3%|█▊ | 26/1000 [00:00<00:33, 28.68it/s]
23+
🎤 Generating ChatterBox segment 4/6 chunk 1/1 for 'narrator'...
24+
Sampling: 2%|█▌ | 22/1000 [00:00<00:33, 29.33it/s]
25+
🎤 Generating ChatterBox segment 5/6 chunk 1/1 for 'female_01'...
26+
Sampling: 2%|█▎ | 18/1000 [00:00<00:33, 28.99it/s]
27+
🎤 Generating ChatterBox segment 6/6 chunk 1/1 for 'narrator'...
28+
Sampling: 4%|███ | 43/1000 [00:01<00:32, 29.13it/s]
29+
C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1553: block: [40,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
30+
C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\Indexing.cu:1553: block: [40,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
31+
32+
33+
34+
Test Case 2: Question Mark Focus (Target: punctuation)
35+
------------------------------------------------------
36+
This is a test.
37+
[Alice] Really?
38+
[Bob] Why not?
39+
What do you think?
40+
[Alice] Maybe?
41+
Final words.
42+
43+
Test Case 3: Very Short Segments (Target: minimal text)
44+
-------------------------------------------------------
45+
Start.
46+
[Alice] Ok.
47+
[Bob] No.
48+
Yes?
49+
[Alice] Go.
50+
End.
51+
52+
Test Case 4: Mixed Long/Short (Target: length variation)
53+
-------------------------------------------------------
54+
This is a longer introduction that should work fine without issues.
55+
[Alice] Short.
56+
[Bob] This is a much longer segment that might work better than short ones.
57+
Brief?
58+
[Alice] Another very long segment that contains multiple sentences and should be processed without the same issues.
59+
Done.
60+
61+
Test Case 5: Exact Position Test (Target: 5th segment)
62+
------------------------------------------------------
63+
Segment one here.
64+
[Alice] Segment two here.
65+
[Bob] Segment three here.
66+
[Alice] Segment four here.
67+
This is segment five.
68+
[Bob] Segment six here.
69+
Final segment.
70+
71+
Test Case 6: Character Switching Pattern (Target: same pattern as bug)
72+
----------------------------------------------------------------------
73+
Opening statement.
74+
[crestfallen_original] Character line.
75+
[Girl] Another character.
76+
[crestfallen_original] Second time.
77+
Back to narrator.
78+
[Bob] Different character.
79+
Closing statement.
80+
81+
Test Case 7: Special Characters & Punctuation
82+
---------------------------------------------
83+
Hello there!
84+
[Alice] What's this?
85+
[Bob] It's... complicated.
86+
Really?!
87+
[Alice] Yes—exactly that.
88+
The end.
89+
90+
Test Case 8: Empty/Whitespace Lines
91+
-----------------------------------
92+
First line.
93+
[Alice] Second line.
94+
95+
[Bob] After empty line.
96+
Another gap coming.
97+
98+
Final line.
99+
100+
Test Case 9: Single Words (Target: minimal content)
101+
---------------------------------------------------
102+
Beginning.
103+
[Alice] Word.
104+
[Bob] Another.
105+
Question?
106+
[Alice] Answer.
107+
Conclusion.
108+
109+
Test Case 10: Exact Recreation (Target: original crash)
110+
-------------------------------------------------------
111+
Hello! This is the first subtitle. I'll make it long on purpose.
112+
[crestfallen_original] This is Long?!
113+
114+
[Girl]This is the second [crestfallen_original] subtitle with precise timing.
115+
Back to me?
116+
117+
[Bob] The audio will match these exact timings.
118+
119+
Back to me again? This looks like a meeees...
120+
121+
INSTRUCTIONS:
122+
- Test each case separately
123+
- Note which segment number crashes (if any)
124+
- Record any "Reference mel length" warnings
125+
- Try with same characters: crestfallen_original, Girl (maps to female_01), Bob (maps to male_01)
126+
- Look for patterns in crashes (position, text length, punctuation, etc.)

nodes.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Version and constants
2-
VERSION = "3.1.2"
2+
VERSION = "3.1.3"
33
IS_DEV = False # Set to False for release builds
44
VERSION_DISPLAY = f"v{VERSION}" + (" (dev)" if IS_DEV else "")
55
SEPARATOR = "=" * 70

nodes/tts_node.py

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,32 @@ def INPUT_TYPES(cls):
9999
def __init__(self):
100100
super().__init__()
101101
self.chunker = ImprovedChatterBoxChunker()
102+
103+
def _pad_short_text_for_chatterbox(self, text: str, min_length: int = 35) -> str:
104+
"""
105+
Pad short text with spaces to prevent ChatterBox sequential generation crashes.
106+
107+
ChatterBox has a bug where short text segments cause CUDA tensor indexing errors
108+
in sequential generation scenarios. Adding spaces provides sufficient tokens
109+
without affecting the actual speech content.
110+
111+
Based on testing:
112+
- "word" (4 chars) crashes in sequential generation
113+
- "word" + 26+ spaces works reliably
114+
- Safe threshold appears to be 35+ characters
115+
116+
Args:
117+
text: Input text to check and pad if needed
118+
min_length: Minimum text length threshold (default: 35 characters)
119+
120+
Returns:
121+
Original text or text padded with spaces if too short
122+
"""
123+
stripped_text = text.strip()
124+
if len(stripped_text) < min_length:
125+
padding_needed = min_length - len(stripped_text)
126+
return stripped_text + " " * padding_needed
127+
return text
102128

103129
def validate_inputs(self, **inputs) -> Dict[str, Any]:
104130
"""Validate and normalize inputs."""
@@ -226,8 +252,12 @@ def _process():
226252
for chunk_i, chunk_text in enumerate(segment_chunks):
227253
print(f"🎤 Generating ChatterBox segment {i+1}/{len(character_segments)} chunk {chunk_i+1}/{len(segment_chunks)} for '{character}'...")
228254

255+
# BUGFIX: Pad short text with spaces to prevent ChatterBox sequential generation crashes
256+
# Only for ChatterBox (not F5TTS) and only when text is very short
257+
processed_chunk_text = self._pad_short_text_for_chatterbox(chunk_text)
258+
229259
chunk_audio = self.generate_tts_audio(
230-
chunk_text, char_audio_prompt, inputs["exaggeration"],
260+
processed_chunk_text, char_audio_prompt, inputs["exaggeration"],
231261
inputs["temperature"], inputs["cfg_weight"]
232262
)
233263
audio_segments.append(chunk_audio)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[project]
22
name = "chatterbox_srt_voice"
33
description = "ChatterBox SRT Voice TTS Node is a fork of 'ChatteBox Voice' with additional devolpments and full F5-TTS implementation as well. I introduced a SRT node designed to help you synchronize your generated TTS audio with `.srt` subtitle files. Audio wave analyzer will help you find speech segments for f5 speech edit and much more!"
4-
version = "3.1.2"
4+
version = "3.1.3"
55
license = {file = "LICENSE"}
66
dependencies = ["s3tokenizer>=0.1.7", "resemble-perth", "librosa", "scipy", "omegaconf", "accelerate", "transformers==4.46.3", "# Additional dependencies for SRT support and audio processing", "conformer>=0.3.2", "torch", "torchaudio", "numpy", "einops", "phonemizer", "g2p-en", "unidecode", "# Audio processing and timing dependencies", "soundfile", "resampy", "webrtcvad", "# Optional but recommended for better performance", "numba"]
77

scripts/bump_version_enhanced.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -109,13 +109,13 @@ def main():
109109
new_parts = list(map(int, args.version.split('.')))
110110

111111
if tuple(new_parts) <= tuple(current_parts):
112-
print(f"Warning: New version {args.version} is not newer than current {current_version}")
113-
response = input("Continue anyway? (y/N): ")
114-
if response.lower() != 'y':
115-
print("Version bump cancelled")
116-
sys.exit(0)
112+
print(f"Error: New version {args.version} is not newer than current {current_version}")
113+
print("Cannot bump to an older or same version number.")
114+
print("Use a higher version number for the next release.")
115+
sys.exit(1)
117116
except Exception as e:
118117
print(f"Warning: Could not compare versions: {e}")
118+
print("Proceeding with caution...")
119119

120120
# Create backup
121121
print("\nCreating backup of current files...")
3.51 MB
Binary file not shown.

0 commit comments

Comments
 (0)