Data Science: Visual Programming with Orange Tool

Jay Patel
3 min readSep 23, 2021

--

This blog is all about how to split data into training and testing using the Orange tool. We will also learn more about Test & Score Widget. We will also explore the cross-validation method using the Orange tool.

What is Train Test Split?

The train-test split is a technique for evaluating the performance of a machine learning algorithm. It can be used for classification or regression problems and can be used for any supervised learning algorithm. The procedure involves taking a dataset and dividing it into two subsets.

For the Train Test Split, I used the below workflow.

Train Test Split

Here, we load the heart-disease.tab data set from Browse documentation data sets in File widget. We have 303 patients diagnosed with blood vessel narrowing (1) or diagnosed as healthy (0).

File and Data Sampler

  1. Drag the Data Sampler widget to the canvas.
  2. At the right side of the File widget, there is a semi-circular shape. Mouse down on it and drag it to the Data Info widget.
  3. Notice that there is a link between both widgets with the word data on top.

Now, we will split the data into two parts, 85% of data for training and 15% for testing. We will send the first 85% onwards to build a model.

Sampling and Cross-Validation

Now send the sample data from Data Sampler to Test and Score widget.

Now we will use Naive Bayes, Logistic Regression, and Tree. Now we will send the models to Test & Score widget. We used cross-validation and discovered Logistic Regression scores the highest AUC.

Split data into training and testing

Now it is time to bring in our test data (the remaining 15%) for testing. Connect Data Sampler to Test & Score once again and set the connection Remaining Data — Test Data.

Now get the comparison scores of the three different algorithms. To do so double click on the Test and Score widget and choose the option of Test on test data there and get the scores for all three algorithms.

Conclusion

Here we had learned how to split our data into training and testing data in the orange tool.

Check out more features of the Orange tool here.

--

--