Skip to content

This project contains transformers for missing value imputation

License

Notifications You must be signed in to change notification settings

pharo-ai/data-imputers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Imputers

CI Coverage Status Pharo version Pharo version Pharo version Pharo version License

This is a Pharo library for transforming data to manage missing values.

How to install it?

To install the project, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):

Metacello new
  baseline: 'AIDataImputers';
  repository: 'github://pharo-ai/data-imputers/src';
  load.

How to depend on it?

If you want to add a dependency on this project to your project, include the following lines into your baseline method:

spec
  baseline: 'AIDataImputers'
  with: [ spec repository: 'github://pharo-ai/data-imputers/src' ].

If you are new to baselines and Metacello, check out this wonderful Baselines tutorial on Pharo Wiki.

Quick Start

I can be used to fill the missing values of a collection like this:

| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
	
AISimpleImputer new
	useMostFrequent;
	fit: collection;
	transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

I can also be used to fill missing values of a DataFrame:

AISimpleImputer mostFrequent fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) )) 

Simple Imputer

I am a simple imputer whose goal is to fill missing values in 2D collections.

To use me you need 3 steps. The first one is to define the value to replace the missing values with:

  • #useAverage (Default value)
  • #useMedian
  • #useMostFrequent
  • #useContant:

Then you need to use #fit: to allow me to compute the missing value. Once it is done, you can use #statistics to get those values.

Finally you can use #transform: to fill the missing values of a 2D collection.

An alternative is to use #fitAndTransform: if you want to fill the missing values using the same collection to compute them.

Example:

| collection|
collection := #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ).
	
AISimpleImputer new
	useMostFrequent;
	fit: collection;
	statistics; "This methods allows to get the replacement values once the imputer is fitted. In this case => #( 7 2 5 6 )"
	transform: collection "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

or

AISimpleImputer new
	useMostFrequent;
	fitAndTransform: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

I can also be used with a DataFrame:

AISimpleImputer new
	useMostFrequent;
	fitAndTransform: (DataFrame withRows: #( #( 7 2 5 6 ) #( 7 nil 5 9 ) #( 10 2 nil 6 ) )) 

It is also possible to change the missing value in case you want to replace something else than nil values:

AISimpleImputer new
	useMostFrequent;
	missingValue: false;
	fitAndTransform: #( #( 7 2 5 6 ) #( 7 false 5 9 ) #( 10 2 false 6 ) ) "#( #( 7 2 5 6 ) #( 7 2 5 9 ) #( 10 2 5 6 ) )"

About

This project contains transformers for missing value imputation

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •