This 4CAT updates mostly comprises bug fixes for processors and data sources, as well as a couple of new processors for statistical analysis of datasets and an implementation of PromptCompass as a 4CAT processor. The update also adds support for calling the Deepseek and Gemini 3 APIs from LLM-based processors.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
New and expanded processors and data sources
- New ‘Regression evaluation’ processor to calculate regression metrics between two numerical columns in a dataset (84a56dd)
- New ‘Descriptive statistics’ processor to calculate various descriptive statistics (mean, median, std dev, etc.) for numerical columns (01ed21c)
- New ‘PromptCompass: Test task-specific prompts’ processor that allows choosing from a pre-defined list of prompts from other LLM-based work to annotate the datasets. Implementation of the standalone tool PromptCompass by Erik Borra (#562)
- Update various network processors to allow disabling the automated community detection; this is now always disabled if the network contains 50,000 or more edges (b3864d9)
Other new features
- New ‘Statistics’ processor category containing processors exclusively focused on calculating statistics from existing columns (25c518a)
- Update the background workers that deletes expired datasets to be more efficient (91f79fd)
- Update the ‘Top Images’ processor to optionally save the top images as annotations (3ff0d8f)
- Update the ‘Confusion matrix’ processor to halt processing when more than 500 categories are found in the parent dataset (916394c)
- 4CAT will now periodically log information about its running workers and threads, including a call stack and process ID, when run with
--log-level=DEBUG(74968b5)
LLM-related features and fixes
- Update the ‘LLM Prompter’ processor to allow image analysis with LLM APIs, by sending image URLs as prompts (a3ee966)
- Update the local LLM API cache and add Deepseek and Gemini 3 and as options for processors that can talk to external LLM APIs (f52e180)
- Add initial support for vLLM as a local LLM provider (a3ee966)
Fixes to processors
- Fix an issue with the ‘Import 4CAT dataset’ data source where it would crash if certain metadata was missing from the uploaded dataset (7cabbf5)
- Fix an issue with the BlueSky data source where it could crash if no query was provided (4ee74b6)
- Fix an issue with the Instagram data source where items would not be parsed if their ‘owner’ was not the same as their ‘author’ (c563911)
- Fix an issue with the RedNote/Xiaohongshu data source where items could incorrectly be reported to be missing a timestamp (f9e455b, #557)
- Fix an issue with the ‘View media metadata’ processor where it would crash if certain metadata was missing (29041e2)
- Fix an issue with the ‘Toxicity scores’ processor where it would keep processing the data even if the API returned an error (de2184f)
- Fix an issue with the ‘Classification evaluation’ processor where it could crash if a label was not a string (17900b9)
- Fix an issue with the ‘Audio to text’ processor where it could crash if the API returned an unexpected response (6fe2a2d)
- Fix an issue with the ‘Audio to text’ processor where it would not process data if the dataset contained only a single file (dba049a)
- Fix an issue with the ‘URL co-occurence network’ where it could crash if the source dataset did not contain a ‘thread_id’ column (ddb38b8)
- Fix an issue with the ‘Hash images’ processor where it could crash if the dataset contained non-image files (9a2fb82)
Other fixes
- Fix an issue with the Explorer where it would not display the correct post texts for Telegram datasets (4ca46b6)
- Fix an issue with datasets containing annotations where a crash could occur when annotated item IDs where not a string (d8d5108)
- Fix an issue with 4CAT’s proxy manager where requests could get ‘stuck’ in limbo when the processor that made them crashed or was interrupted (b7378f6)
- Fix an issue with processors fetching URLs via 4CAT’s proxy manager where it could crash if a request did not complete successfully (6374eb8)
- Fix an issue where memcached connections would not get cleaned up properly when using memcached and keeping 4CAT running for long periods of time (#546, #547)
- Fix an issue where annotations of items that were filtered out would be copied too when copying filtered datasets with annotations (#545)
- Fix an issue where interrupting processors calling external commands (such as video processors calling ffmpeg) would not terminate the called commands properly (#559)
Docker-related changes
- The first time a 4CAT Docker container is run, the logic for notifying the user about 4CAT’s URL and other useful information is now more robust (b8f9b14)
- The 4CAT front-end now no longer uses ‘4cat.local:5000’ as a default domain name, but uses ‘localhost’ instead (4c187cb)
Full Changelog: v1.51...v1.52