[python] Save classifier to disk in scikit-learn

You can also use joblib.dump and joblib.load which is much more efficient at handling numerical arrays than the default python pickler.

Joblib is included in scikit-learn:

>>> import joblib
>>> from sklearn.datasets import load_digits
>>> from sklearn.linear_model import SGDClassifier

>>> digits = load_digits()
>>> clf = SGDClassifier().fit(digits.data, digits.target)
>>> clf.score(digits.data, digits.target)  # evaluate training error
0.9526989426822482

>>> filename = '/tmp/digits_classifier.joblib.pkl'
>>> _ = joblib.dump(clf, filename, compress=9)

>>> clf2 = joblib.load(filename)
>>> clf2
SGDClassifier(alpha=0.0001, class_weight=None, epsilon=0.1, eta0=0.0,
       fit_intercept=True, learning_rate='optimal', loss='hinge', n_iter=5,
       n_jobs=1, penalty='l2', power_t=0.5, rho=0.85, seed=0,
       shuffle=False, verbose=0, warm_start=False)
>>> clf2.score(digits.data, digits.target)
0.9526989426822482

Edit: in Python 3.8+ it's now possible to use pickle for efficient pickling of object with large numerical arrays as attributes if you use pickle protocol 5 (which is not the default).

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to machine-learning

Error in Python script "Expected 2D array, got 1D array instead:"? How to predict input image using trained model in Keras? What is the role of "Flatten" in Keras? How to concatenate two layers in keras? How to save final model using keras? scikit-learn random state in splitting dataset Why binary_crossentropy and categorical_crossentropy give different performances for the same problem? What is the meaning of the word logits in TensorFlow? Can anyone explain me StandardScaler? Can Keras with Tensorflow backend be forced to use CPU or GPU at will?

Examples related to scikit-learn

LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str' UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples scikit-learn random state in splitting dataset LogisticRegression: Unknown label type: 'continuous' using sklearn in python Can anyone explain me StandardScaler? ImportError: No module named model_selection How to split data into 3 sets (train, validation and test)? How to convert a Scikit-learn dataset to a Pandas dataset? Accuracy Score ValueError: Can't Handle mix of binary and continuous target How can I plot a confusion matrix?

Examples related to classification

FailedPreconditionError: Attempting to use uninitialized in Tensorflow Scikit-learn train_test_split with indices Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative What are advantages of Artificial Neural Networks over Support Vector Machines? Save classifier to disk in scikit-learn A simple explanation of Naive Bayes Classification Difference between classification and clustering in data mining?