Bad Punctuation Formatting Bugs in SRT Document Translation

It seems that in every language I've tried (spanish, portuguese, italian, indonesian, french), while the outputted SRT file has the correct timing, there are weird punctuation formatting bugs spread extensively throughout the resulting SRT file.

I'm working with the API and the DeepL python SDK, but I'm assuming it's not just isolated to that sdk.

### Examples

Commas and/or periods being placed at the beginning of a block instead of attached to the previous word:
```
117
00:10:23,280 --> 00:10:28,160
, de nuevo, si ves algo como un descuento 
del 10 % o más, seguro que es falso. 
```

Periods and commas getting spaces added in front of them (see first line):
```
266
00:23:21,840 --> 00:23:26,000
variante interesante . Podemos hablar de 
ello en los comentarios. Y, por supuesto, 
si te ha gustado el vídeo, 
```

Lots of text getting shoved into one block leaving adjacent ones nearly empty. Notice this one shows a single word for 4 seconds. Also notice the comma with the space inserted before in block 4 line 3.  Also similar to this other issue ( https://github.com/DeepLcom/deepl-api-issues/issues/56 ) , I've seen instances where it creates entirely blank blocks, not just a single word like this. In that case too it's often where adjacent blocks would take them on and be extra long.

```
3
00:00:10,240 --> 00:00:14,880
son 

4
00:00:14,880 --> 00:00:19,840
estafas que deben conocer y de las que 
deben cuidarse en 2026. Y, en realidad, 
para la mayoría de ellas , la mejor forma 
de defenderse es simplemente conocerlas. 
Así que, al final, deberían estar 
preparados. Soy ThioJoe, 

---------- ORIGINAL ----------
3
00:00:10,240 --> 00:00:14,880
want to be aware of and watch out for in 2026. And really for most of these, the best way to

4
00:00:14,880 --> 00:00:19,840
defend against them is to simply know about it. So by the end, you should be good. I'm ThioJoe,
```

None of these issues are present in the original SRT file. 

I also noticed in another instance it split a word right across blocks, the word `estafas`. Also it seems to maintain line breaks in the original SRT file unnecessarily if there were multiple lines:
```
2
00:00:04,560 --> 00:00:10,240
un montón de estafas nuevas o incluso de 
variaciones nuevas de estafas antiguas. 
Todas ellas son estaf 

3
00:00:10,240 --> 00:00:14,880
as que deben conocer y de las que deben 
cuidarse en 2026. 
Y, en realidad, para la mayoría de ellas, 
la mejor manera 

---------- ORIGINAL ----------
2
00:00:04,560 --> 00:00:10,240
a bunch of brand new scams or even ones that are
new variations on older scams. All of which you'll

3
00:00:10,240 --> 00:00:14,880
want to be aware of and watch out for in 2026.
And really for most of these, the best way to
```

After that one I realized the issue about the newlines, and also realized the SRT file i was using downloaded from YouTube includes non-breaking spaces at the end, of the first lines. So I removed those and made each block 1 line, and that seemed to get rid of the mid-word break, but the punctuation issues remain. I'm also not sure if the mid-word break being "fixed" was merely a result of the new translation being slightly different so the word location landed differently.


------

Example Call:

```python
    # We pass in the actual out_file object, the sdk module automatically writes the result to it
    with open(input_path, "rb") as in_file, open(output_path, "wb") as out_file:
        result:deepl.DocumentStatus = deepl_api.translate_document(
            input_document = in_file,
            output_document = out_file,
            target_lang="es",
            formality="prefer_less,
        )
```

Where the in_file is an `.srt` file. I also tried explicitly setting `output_format="srt"` but it doesn't make a difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad Punctuation Formatting Bugs in SRT Document Translation #58

Examples

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bad Punctuation Formatting Bugs in SRT Document Translation #58

Description

Examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions