Dataset Description using Orange tool

Jay Patel
2 min readNov 16, 2021

--

Preprocessing data is very crucial and important step in any machine learning project. For introduction to preprocessing you can read my article .

Data Preprocessing includes techniques like:

  1. Feature Scaling
  2. Standardization
  3. Encoding
  4. Discritization
  5. Randomization
  6. Handling missing value

Performing preprocessing we’ll use preprocess widget.

Create workflow for it.

Feature Scaling

Feature scaling is a scaling technique in which value are shifted and rescaled so that they end up ranging between 0&1 or maximum absolute value of each feature is scaled to unit size.

Standardization

Standardization means making distribution to mean 0 & standard deviation 1.

Let’s select that option in preprocess widget.

Encoding

for performing encoding in using continuize discrete variable option

for performing encoding you can use continuize disctrete variables option.

Discritization

Discritization method are used to chop a continuous function into a discrete function , where the solution value are defined at each point in space and time.

In preprocess widget you there is a option called Discretize Continuous Variables.

Randomization

You can achieved randomization using same preprocess widget. Just seect randomized widget and use builtin options as per your need.

Handling missing values:

Conclusion:

I covered a lot of information in this blog about data preprocessing with orange tool.

--

--