Skip to content

v1.52 Winter Wonderland

Latest

Choose a tag to compare

@stijn-uva stijn-uva released this 19 Dec 11:14
· 72 commits to master since this release

This 4CAT updates mostly comprises bug fixes for processors and data sources, as well as a couple of new processors for statistical analysis of datasets and an implementation of PromptCompass as a 4CAT processor. The update also adds support for calling the Deepseek and Gemini 3 APIs from LLM-based processors.

⚠️ Docker users are recommended to rebuild their containers to benefit from some of the speed boosts implemented in previous updates as well as to use the latest available versions of its dependencies (without which some processors, particularly those processing videos, will fail more readily). This will become mandatory in a future release. For now rebuilding is optional and 4CAT will otherwise function normally, but sometimes slower and less effectively than it could be.

⚠️ Please also follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New and expanded processors and data sources

  • New ‘Regression evaluation’ processor to calculate regression metrics between two numerical columns in a dataset (84a56dd)
  • New ‘Descriptive statistics’ processor to calculate various descriptive statistics (mean, median, std dev, etc.) for numerical columns (01ed21c)
  • New ‘PromptCompass: Test task-specific prompts’ processor that allows choosing from a pre-defined list of prompts from other LLM-based work to annotate the datasets. Implementation of the standalone tool PromptCompass by Erik Borra (#562)
  • Update various network processors to allow disabling the automated community detection; this is now always disabled if the network contains 50,000 or more edges (b3864d9)

Other new features

  • New ‘Statistics’ processor category containing processors exclusively focused on calculating statistics from existing columns (25c518a)
  • Update the background workers that deletes expired datasets to be more efficient (91f79fd)
  • Update the ‘Top Images’ processor to optionally save the top images as annotations (3ff0d8f)
  • Update the ‘Confusion matrix’ processor to halt processing when more than 500 categories are found in the parent dataset (916394c)
  • 4CAT will now periodically log information about its running workers and threads, including a call stack and process ID, when run with --log-level=DEBUG (74968b5)

LLM-related features and fixes

  • Update the ‘LLM Prompter’ processor to allow image analysis with LLM APIs, by sending image URLs as prompts (a3ee966)
  • Update the local LLM API cache and add Deepseek and Gemini 3 and as options for processors that can talk to external LLM APIs (f52e180)
  • Add initial support for vLLM as a local LLM provider (a3ee966)

Fixes to processors

  • Fix an issue with the ‘Import 4CAT dataset’ data source where it would crash if certain metadata was missing from the uploaded dataset (7cabbf5)
  • Fix an issue with the BlueSky data source where it could crash if no query was provided (4ee74b6)
  • Fix an issue with the Instagram data source where items would not be parsed if their ‘owner’ was not the same as their ‘author’ (c563911)
  • Fix an issue with the RedNote/Xiaohongshu data source where items could incorrectly be reported to be missing a timestamp (f9e455b, #557)
  • Fix an issue with the ‘View media metadata’ processor where it would crash if certain metadata was missing (29041e2)
  • Fix an issue with the ‘Toxicity scores’ processor where it would keep processing the data even if the API returned an error (de2184f)
  • Fix an issue with the ‘Classification evaluation’ processor where it could crash if a label was not a string (17900b9)
  • Fix an issue with the ‘Audio to text’ processor where it could crash if the API returned an unexpected response (6fe2a2d)
  • Fix an issue with the ‘Audio to text’ processor where it would not process data if the dataset contained only a single file (dba049a)
  • Fix an issue with the ‘URL co-occurence network’ where it could crash if the source dataset did not contain a ‘thread_id’ column (ddb38b8)
  • Fix an issue with the ‘Hash images’ processor where it could crash if the dataset contained non-image files (9a2fb82)

Other fixes

  • Fix an issue with the Explorer where it would not display the correct post texts for Telegram datasets (4ca46b6)
  • Fix an issue with datasets containing annotations where a crash could occur when annotated item IDs where not a string (d8d5108)
  • Fix an issue with 4CAT’s proxy manager where requests could get ‘stuck’ in limbo when the processor that made them crashed or was interrupted (b7378f6)
  • Fix an issue with processors fetching URLs via 4CAT’s proxy manager where it could crash if a request did not complete successfully (6374eb8)
  • Fix an issue where memcached connections would not get cleaned up properly when using memcached and keeping 4CAT running for long periods of time (#546, #547)
  • Fix an issue where annotations of items that were filtered out would be copied too when copying filtered datasets with annotations (#545)
  • Fix an issue where interrupting processors calling external commands (such as video processors calling ffmpeg) would not terminate the called commands properly (#559)

Docker-related changes

  • The first time a 4CAT Docker container is run, the logic for notifying the user about 4CAT’s URL and other useful information is now more robust (b8f9b14)
  • The 4CAT front-end now no longer uses ‘4cat.local:5000’ as a default domain name, but uses ‘localhost’ instead (4c187cb)

Full Changelog: v1.51...v1.52