- Read Tutorial
- Watch Guide Video
This is a simple introduction of how to set up a decision tree in Ruby. We'll start off with a simple setup. For this guid we're going to build a medical analysis tool. In order to follow along make sure that you've installed the decisiontree
RubyGem. If you don't have it on your system you can install it by running:
gem install decisiontree
To start building a decision tree let’s create a basic application. Begin by pulling in the required gem libraries:
require 'rubygems' require 'decisiontree'
Addition of Attributes
The next step is to include a single attribute called Temp
. This attribute will indicate the temperatures of individuals in the database and whether they are sick or healthy. This will give us the tool to decide which person is sick and which one is not.
attributes = ['Temp']
Addition of Training Data & Its Values
Now we are going to introduce the training data, which is the data that the machine learning algorithm will learn from. This will contain the information of what data we are going to be analyzing with our decision tree program. The code will look like this:
training = [ ]
Next we are going to load some basic data into our training array in order to provide it with values to analyze. We're going to place values in the array that fit our requirements. For example 98.7
is healthy, 99.1
is still healthy, 99.5
starts getting a little bit sick, 100.5
is crazy sick and 107.5
results in a dead patient. Enter the values in training program in this format.
training = [ [98.7, 'healthy'], [99.1, 'healthy'], [99.5, 'sick'], [100.5, 'sick'], [102.5, 'crazy sick'], [107.5, 'dead'], ]
Calling the ID3 Method
Now with the training function loaded with data, our next step should be to create the decision tree itself. We can do this by calling on the modules provided by the decision tree library and passing in the module that we want to use. We're going to leverage the ID3
algorithm for this example. The ID3 tree is a popular decision making algorithm, which will give us an access to its functions.
After calling the ID3 algorithm, we need to instantiate a new decision tree and pass in the first arguments, which is attributes
. With this, we have to pass in the training data and the default will be set as sick
. This all will be set to :continuous
, which indicates that we are setting the decision tree to run continuously. Your decision tree instantiation should look like this:
dec_tree = DecisionTree::ID3Tree.new(attributes, training, 'sick', :continuous)
Now we will introduce the train
method, which is designed to train the decision tree knowledge engine. The function of this method is to take in all the values and their respective results.
dec_tree.train
Function of the train
Method
Let's now add another attribute name
(which is silly, since someone's name won’t determine if they’re sick or not, but I'm going to use it for the sake of the example). Next add names to the training data.
attributes = ['Temp' , 'Name'] training = [ [98.7, 'jordan' , 'healthy'], [99.1, 'tiffany' , 'healthy'], [99.5, 'sick'], [100.5, 'sick'], [102.5, 'crazy sick'], [107.5, 'dead'], ]
Addition of the Test Method
Now that we have the training data in order, we can add the data we actually want to test. Let's add a line of code that looks like this
test = [98.7, 'healthy']
The healthy
part is included just to test the algorithm, in a real world application we wouldn't know if the patient or healthy, that's what the program will determine.
Setting up Decision Variables
Next we're going to utilize the predict
method and pass in the test data. We are calling our decision tree, which has been trained with our historical data and we are calling predict
, which is a built-in method that the decision tree gem provides. This will allow us to pass in the argument of the test data and will compare it with the data it’s been compared against, and then it will give us a prediction. The code will look like this:
decision = dec_tree.predict(test)
Print the Results Out
Now we are going to create two puts
statements that print out data out. The Prediction is assigned for decision and Reality will ensure that our algorithm is working properly.
puts "Prediction: #{decision}" puts "Reality: #{test.last}"
Testing the program
Let's run the program and see if it works or not. If we use the information we provided it will print out healthy
, which means that our base case is working properly. You can test another example by setting the test array to a temperature of 107.5
, and the result will be: dead
.
Advantage of Decision Trees over if/else
Statements
You must be wondering why we chose a decision tree instead of if
and else
statements. If/else statements are great for a small amount of data. But when you have to design a big data module in real life scenarios (like with hospital data), there will be millions of data points and if/else
statements would not be practical. You need something more robust, such as a decision tree in order to analyze a large amount of data.