-
Notifications
You must be signed in to change notification settings - Fork 430
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Preprocessing seems to fail with text data.
This happens only in Perform mode, not Explain mode. In particular but not only with XgBoost.
Console output:
The task is binary_classification with evaluation metric average_precision
AutoML will use algorithms: ['Xgboost']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'ensemble']
Skip simple_algorithms because no parameters were generated.
* Step default_algorithms will try to check up to 1 model
There was an error during 1_Default_Xgboost training.
Please check automl_results/signup_1hour-profile_changes_v2-bad_actors_v3-20250621-20250721/Perform/errors.md for details.
Error file contents:
## Error for 1_Default_Xgboost
Found array with 0 sample(s) (shape=(0, 100)) while a minimum of 1 is required by TfidfTransformer.
Traceback (most recent call last):
File ".venv/lib/python3.10/site-packages/supervised/base_automl.py", line 1183, in _fit
trained = self.train_model(params)
File ".venv/lib/python3.10/site-packages/supervised/base_automl.py", line 391, in train_model
self.keep_model(mf, model_subpath)
File ".venv/lib/python3.10/site-packages/supervised/base_automl.py", line 294, in keep_model
self._base_predict(self._one_sample, model)
File ".venv/lib/python3.10/site-packages/supervised/base_automl.py", line 1474, in _base_predict
predictions = model.predict(X)
File ".venv/lib/python3.10/site-packages/supervised/model_framework.py", line 447, in predict
X_data, _, _ = self.preprocessings[ind].transform(X.copy(), None)
File ".venv/lib/python3.10/site-packages/supervised/preprocessing/preprocessing.py", line 360, in transform
X_validation = tt.transform(X_validation)
File ".venv/lib/python3.10/site-packages/supervised/preprocessing/text_transformer.py", line 36, in transform
vect = self._vectorizer.transform(x)
File ".venv/lib/python3.10/site-packages/sklearn/feature_extraction/text.py", line 2129, in transform
return self._tfidf.transform(X, copy=False)
File ".venv/lib/python3.10/site-packages/sklearn/feature_extraction/text.py", line 1700, in transform
X = validate_data(
File ".venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 2954, in validate_data
out = check_array(X, input_name="X", **check_params)
File ".venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1128, in check_array
raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 100)) while a minimum of 1 is required by TfidfTransformer.
Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new
In Explain mode things are fine:
AutoML directory: automl_results/signup_1hour-profile_changes_v2-bad_actors_v3-20250621-20250721/Explain
The task is binary_classification with evaluation metric average_precision
AutoML will use algorithms: ['Xgboost']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
Skip simple_algorithms because no parameters were generated.
* Step default_algorithms will try to check up to 1 model
1_Default_Xgboost average_precision 0.802009 trained in 31.98 seconds
* Step ensemble will try to check up to 1 model
AutoML fit time: 35.37 seconds
AutoML best model: 1_Default_Xgboost
2025-07-24 09:14:25,482 automl_trainer.train INFO Evaluating model on test set...
2025-07-24 09:14:27,847 automl_trainer.train INFO Test Accuracy: 0.9549
2025-07-24 09:14:29,984 automl_trainer.train INFO Test PR AUC (Average Precision): 0.8071
2025-07-24 09:14:29,989 automl_trainer.train INFO Test ROC AUC: 0.9621
2025-07-24 09:14:29,989 automl_trainer.train INFO Training optimized for: average_precision
Versions:
$ pip list | grep -E "(mljar-supervised|pandas|scikit-learn|numpy|click)"
click 8.2.1
mljar-supervised 1.1.18
numpy 1.26.4
pandas 2.3.1
scikit-learn 1.7.1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working