[python] Visualizing decision tree in scikit-learn

I am trying to design a simple Decision Tree using scikit-learn in Python (I am using Anaconda's Ipython Notebook with Python 2.7.3 on Windows OS) and visualize it as follows:

from pandas import read_csv, DataFrame
from sklearn import tree
from os import system

data = read_csv('D:/training.csv')
Y = data.Y
X = data.ix[:,"X0":"X33"]

dtree = tree.DecisionTreeClassifier(criterion = "entropy")
dtree = dtree.fit(X, Y)

dotfile = open("D:/dtree2.dot", 'w')
dotfile = tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns)
dotfile.close()
system("dot -Tpng D:.dot -o D:/dtree2.png")

However, I get the following error:

AttributeError: 'NoneType' object has no attribute 'close'

I use the following blog post as reference: Blogpost link

The following stackoverflow question doesn't seem to work for me as well: Question

Could someone help me with how to visualize the decision tree in scikit-learn?

This question is related to python scikit-learn visualization decision-tree

The answer is


The following also works fine:

from sklearn.datasets import load_iris
iris = load_iris()

# Model (can also use single decision tree)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)

# Train
model.fit(iris.data, iris.target)
# Extract single tree
estimator = model.estimators_[5]

from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file='tree.dot', 
                feature_names = iris.feature_names,
                class_names = iris.target_names,
                rounded = True, proportion = False, 
                precision = 2, filled = True)

# Convert to png using system command (requires Graphviz)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])

# Display in jupyter notebook
from IPython.display import Image
Image(filename = 'tree.png')

enter image description here

You can find the source here


Alternatively, you could try using pydot for producing the png file from dot:

...
tree.export_graphviz(dtreg, out_file='tree.dot') #produces dot file

import pydot
dotfile = StringIO()
tree.export_graphviz(dtreg, out_file=dotfile)
pydot.graph_from_dot_data(dotfile.getvalue()).write_png("dtree2.png")
...

You can copy the contents of the export_graphviz file and you can paste the same in the webgraphviz.com site.

You can check out the article on How to visualize the decision tree in Python with graphviz for more information.


Scikit learn recently introduced the plot_tree method to make this very easy (new in version 0.21 (May 2019)). Documentation here.

Here's the minimum code you need:

from sklearn import tree
plt.figure(figsize=(40,20))  # customize according to the size of your tree
_ = tree.plot_tree(your_model_name, feature_names = X.columns)
plt.show()

plot_tree supports some arguments to beautify the tree. For example:

from sklearn import tree
plt.figure(figsize=(40,20))  
_ = tree.plot_tree(your_model_name, feature_names = X.columns, 
             filled=True, fontsize=6, rounded = True)
plt.show()

If you want to save the picture to a file, add the following line before plt.show():

plt.savefig('filename.png')

If you want to view the rules in text format, there's an answer here. It's more intuitive to read.


Here is one liner for those who are using jupyter and sklearn(18.2+) You don't even need matplotlib for that. Only requirement is graphviz

pip install graphviz

than run (according to code in question X is a pandas DataFrame)

from graphviz import Source
from sklearn import tree
Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))

This will display it in SVG format. Code above produces Graphviz's Source object (source_code - not scary) That would be rendered directly in jupyter.

Some things you are likely to do with it

Display it in jupter:

from IPython.display import SVG
graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
SVG(graph.pipe(format='svg'))

Save as png:

graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
graph.format = 'png'
graph.render('dtree_render',view=True)

Get the png image, save it and view it:

graph = Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns))
png_bytes = graph.pipe(format='png')
with open('dtree_pipe.png','wb') as f:
    f.write(png_bytes)

from IPython.display import Image
Image(png_bytes)

If you are going to play with that lib here are the links to examples and userguide


Here is the minimal code to have a nice looking graph with just 3 lines of code :

from sklearn import tree
import pydotplus

dot_data=tree.export_graphviz(dt,filled=True,rounded=True)
graph=pydotplus.graph_from_dot_data(dot_data)
graph.write_png('tree.png')    
plt.imshow(plt.imread('tree.png'))

I have added the plt.imgshow to view the graph it Jupyter Notebook. You can ignore it if you are only interested in saving the png file.

I installed the following dependencies:

pip3 install graphviz
pip3 install pydotplus

For MacOs the pip version of Graphviz did not work. Following Graphviz's official documentation I installed it with brew and everything worked fine.

brew install graphviz

enter image description here


If, like me, you have a problem installing graphviz, you can visualize the tree by

  1. exporting it with export_graphviz as shown in previous answers
  2. Open the .dot file in a text editor
  3. Copy the piece of code and paste it @ webgraphviz.com

If you run into issues with grabbing the source .dot directly you can also use Source.from_file like this:

from graphviz import Source
from sklearn import tree
tree.export_graphviz(dtreg, out_file='tree.dot', feature_names=X.columns)
Source.from_file('tree.dot')

I copy and change a part of your code as the below:

from pandas import read_csv, DataFrame
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from os import system

data = read_csv('D:/training.csv')
Y = data.Y
X = data.ix[:,"X0":"X33"]

dtree = tree.DecisionTreeClassifier(criterion = "entropy")
dtree = dtree.fit(X, Y)

After making sure you have dtree, which means that the above code runs well, you add the below code to visualize decision tree:

Remember to install graphviz first: pip install graphviz

import graphviz 
from graphviz import Source
dot_data = tree.export_graphviz(dtree, out_file=None, feature_names=X.columns)
graph = graphviz.Source(dot_data) 
graph.render("name of file",view = True)

I tried with my data, visualization worked well and I got a pdf file viewed immediately.


Simple way founded here with pydotplus (graphviz must be installed):

from IPython.display import Image  
from sklearn import tree
import pydotplus # installing pyparsing maybe needed

...

dot_data = tree.export_graphviz(best_model, out_file=None, feature_names = X.columns)
graph = pydotplus.graph_from_dot_data(dot_data)
Image(graph.create_png())

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to scikit-learn

LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str' UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples scikit-learn random state in splitting dataset LogisticRegression: Unknown label type: 'continuous' using sklearn in python Can anyone explain me StandardScaler? ImportError: No module named model_selection How to split data into 3 sets (train, validation and test)? How to convert a Scikit-learn dataset to a Pandas dataset? Accuracy Score ValueError: Can't Handle mix of binary and continuous target How can I plot a confusion matrix?

Examples related to visualization

Why do many examples use `fig, ax = plt.subplots()` in Matplotlib/pyplot/python How to plot a histogram using Matplotlib in Python with a list of data? Visualizing decision tree in scikit-learn plot different color for different categorical levels using matplotlib How to combine 2 plots (ggplot) into one plot? 3D Plotting from X, Y, Z Data, Excel or other Tools How can I create a correlation matrix in R? Side-by-side plots with ggplot2 Good tool to visualise database schema? How do you change the size of figures drawn with matplotlib?

Examples related to decision-tree

Visualizing decision tree in scikit-learn How to extract the decision rules from scikit-learn decision-tree?