Basic Decision Tree Implementation in Ruby

This is a simple introduction of how to set up a decision tree in Ruby. We'll start off with a simple setup. For this guid we're going to build a medical analysis tool. In order to follow along make sure that you've installed the decisiontree RubyGem. If you don't have it on your system you can install it by running:

gem install decisiontree

To start building a decision tree let’s create a basic application. Begin by pulling in the required gem libraries:

require 'rubygems'
require 'decisiontree'

Addition of Attributes

The next step is to include a single attribute called Temp. This attribute will indicate the temperatures of individuals in the database and whether they are sick or healthy. This will give us the tool to decide which person is sick and which one is not.

attributes = ['Temp']

Addition of Training Data & Its Values

Now we are going to introduce the training data, which is the data that the machine learning algorithm will learn from. This will contain the information of what data we are going to be analyzing with our decision tree program. The code will look like this:

training = [
]

Next we are going to load some basic data into our training array in order to provide it with values to analyze. We're going to place values in the array that fit our requirements. For example 98.7 is healthy, 99.1 is still healthy, 99.5 starts getting a little bit sick, 100.5 is crazy sick and 107.5 results in a dead patient. Enter the values in training program in this format.

training = [
  [98.7, 'healthy'],
  [99.1, 'healthy'],
  [99.5, 'sick'],
  [100.5, 'sick'],
  [102.5, 'crazy sick'],
  [107.5, 'dead'],
]

Calling the ID3 Method

Now with the training function loaded with data, our next step should be to create the decision tree itself. We can do this by calling on the modules provided by the decision tree library and passing in the module that we want to use. We're going to leverage the ID3 algorithm for this example. The ID3 tree is a popular decision making algorithm, which will give us an access to its functions.

After calling the ID3 algorithm, we need to instantiate a new decision tree and pass in the first arguments, which is attributes. With this, we have to pass in the training data and the default will be set as sick. This all will be set to :continuous, which indicates that we are setting the decision tree to run continuously. Your decision tree instantiation should look like this:

dec_tree = DecisionTree::ID3Tree.new(attributes, training, 'sick', :continuous)

Now we will introduce the train method, which is designed to train the decision tree knowledge engine. The function of this method is to take in all the values and their respective results.

dec_tree.train

Function of the `train` Method

Let's now add another attribute name (which is silly, since someone's name won’t determine if they’re sick or not, but I'm going to use it for the sake of the example). Next add names to the training data.

attributes = ['Temp' , 'Name'] 
training = [
  [98.7, 'jordan' , 'healthy'],
  [99.1, 'tiffany' , 'healthy'],
  [99.5, 'sick'],
  [100.5, 'sick'],
  [102.5, 'crazy sick'],
  [107.5, 'dead'],
]

Addition of the Test Method

Now that we have the training data in order, we can add the data we actually want to test. Let's add a line of code that looks like this

test = [98.7, 'healthy']

The healthy part is included just to test the algorithm, in a real world application we wouldn't know if the patient or healthy, that's what the program will determine.

Setting up Decision Variables

Next we're going to utilize the predict method and pass in the test data. We are calling our decision tree, which has been trained with our historical data and we are calling predict, which is a built-in method that the decision tree gem provides. This will allow us to pass in the argument of the test data and will compare it with the data it’s been compared against, and then it will give us a prediction. The code will look like this:

decision = dec_tree.predict(test)

Print the Results Out

Now we are going to create two puts statements that print out data out. The Prediction is assigned for decision and Reality will ensure that our algorithm is working properly.

puts "Prediction: #{decision}"
puts "Reality: #{test.last}"

Testing the program

Let's run the program and see if it works or not. If we use the information we provided it will print out healthy, which means that our base case is working properly. You can test another example by setting the test array to a temperature of 107.5, and the result will be: dead.

Advantage of Decision Trees over `if/else` Statements

You must be wondering why we chose a decision tree instead of if and else statements. If/else statements are great for a small amount of data. But when you have to design a big data module in real life scenarios (like with hospital data), there will be millions of data points and if/else statements would not be practical. You need something more robust, such as a decision tree in order to analyze a large amount of data.