Skip to content

🐣 add tools for reflowing the transcript into one paragraph per sentence / speaker#510

Open
anuejn wants to merge 2 commits intomainfrom
anujen/text_reflow_tools
Open

🐣 add tools for reflowing the transcript into one paragraph per sentence / speaker#510
anuejn wants to merge 2 commits intomainfrom
anujen/text_reflow_tools

Conversation

@anuejn
Copy link
Member

@anuejn anuejn commented Jan 7, 2026

No description provided.

into one paragraph per sentence / speaker
@anuejn anuejn force-pushed the anujen/text_reflow_tools branch from 134df4c to c7a9403 Compare January 7, 2026 23:51
@anuejn anuejn requested review from pajowu, phlmn and rroohhh January 7, 2026 23:51
@anuejn anuejn force-pushed the anujen/text_reflow_tools branch from 79e9045 to 6fad5ae Compare January 8, 2026 01:21
@@ -0,0 +1,182 @@
import { TbHammer } from 'react-icons/tb';
Copy link
Member

@rroohhh rroohhh Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have tests for the transformations in this file? :)

);
}

export function TextTools({ editor }: { editor: EditorWithWebsocket }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a warning to these if applied to a document that is in a non latin style language?

.filter((token) => token.text.includes(','))
.map((token) => token.pause);
silences.sort();
const thresholdIndex = Math.floor(paragraph.children.length / 100); // aim for paragraphs of max ~50 tokens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says ~50 tokens but divides by 100, this seems contradictory, or am I missing something?

Also the magic paragraph length could probably be a constant that is used here and for the <= 100 further up

}
};
doc.children.forEach((paragraph) => {
let minPauseBetweenSentences = initial; // this gets reduced with every additional token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it get reduces with every additional token?

children: [] as { text: string }[],
};
paragraph.children.forEach((token, i) => {
currentParagraph.children.push(JSON.parse(JSON.stringify(token)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the JSON dance?

addNewChild(currentParagraph);
}
});
doc.children = newChildren;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing it this way totally fucks up collaborative editing...

)}
</TopBarPart>
<TopBarPart>
{editor && <TextTools editor={editor} />}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be gated on data?.can_write, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants