ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method is better for a particular project, and I'm certain the answer is always "it depends." Often, a combination of both along with Bayesian classification is used.

These questions on Stackoverflow have already been asked regarding ANN vs SVM:

what the difference among ANN, SVM and KNN in my classification question

Support Vector Machine or Artificial Neural Network for text processing?

In this question, I'd like to know *specifically* what aspects of an ANN (specifically, a Multilayer Perceptron) might make it desirable to use over an SVM? The reason I ask is because it's easy to answer the *opposite* question: Support Vector Machines are often superior to ANNs because they avoid two major weaknesses of ANNs:

(1) ANNs often converge on *local minima* rather than global minima, meaning that they are essentially "missing the big picture" sometimes (or missing the forest for the trees)

(2) ANNs often *overfit* if training goes on too long, meaning that for any given pattern, an ANN might start to consider the noise as part of the pattern.

SVMs don't suffer from either of these two problems. However, it's not readily apparent that SVMs are meant to be a total replacement for ANNs. So what *specific* advantage(s) does an ANN have over an SVM that might make it applicable for certain situations? I've listed *specific* advantages of an SVM over an ANN, now I'd like to see a list of ANN advantages (if any).

This question is related to
`machine-learning`

`neural-network`

`classification`

`svm`

Judging from the examples you provide, I'm assuming that by ANNs, you mean multilayer feed-forward networks (FF nets for short), such as multilayer perceptrons, because those are in direct competition with SVMs.

One specific benefit that these models have over SVMs is that their size is fixed: they are *parametric* models, while SVMs are non-parametric. That is, in an ANN you have a bunch of hidden layers with sizes *h*_{1} through *h*_{n} depending on the number of features, plus bias parameters, and those make up your model. By contrast, an SVM (at least a kernelized one) consists of a set of support vectors, selected from the training set, with a weight for each. In the worst case, the number of support vectors is exactly the number of training samples (though that mainly occurs with small training sets or in degenerate cases) and in general its model size scales linearly. In natural language processing, SVM classifiers with tens of thousands of support vectors, each having hundreds of thousands of features, is not unheard of.

Also, online training of FF nets is very simple compared to online SVM fitting, and predicting can be quite a bit faster.

**EDIT**: all of the above pertains to the general case of kernelized SVMs. Linear SVM are a special case in that they *are* parametric and allow online learning with simple algorithms such as stochastic gradient descent.

- Error in Python script "Expected 2D array, got 1D array instead:"?
- How to predict input image using trained model in Keras?
- What is the role of "Flatten" in Keras?
- How to concatenate two layers in keras?
- How to save final model using keras?
- scikit-learn random state in splitting dataset
- Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?
- What is the meaning of the word logits in TensorFlow?
- Can anyone explain me StandardScaler?
- Can Keras with Tensorflow backend be forced to use CPU or GPU at will?
- How to get Tensorflow tensor dimensions (shape) as int values?
- Deep-Learning Nan loss reasons
- How to split data into 3 sets (train, validation and test)?
- Accuracy Score ValueError: Can't Handle mix of binary and continuous target
- How can I run Tensorboard on a remote server?
- TensorFlow, "'module' object has no attribute 'placeholder'"
- How can I one hot encode in Python?
- Keras model.summary() result - Understanding the # of Parameters
- How to convert numpy arrays to standard TensorFlow format?
- TensorFlow: "Attempting to use uninitialized value" in variable initialization
- How to load a model from an HDF5 file in Keras?
- How to implement the Softmax function in Python
- How to interpret "loss" and "accuracy" for a machine learning model
- What is logits, softmax and softmax_cross_entropy_with_logits?
- Tensorflow: how to save/restore a model?
- Is it possible to append Series to rows of DataFrame without making a list first?
- How to implement the ReLU function in Numpy
- How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?
- Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
- Convert array of indices to 1-hot encoded numpy array
- How to extract the decision rules from scikit-learn decision-tree?
- gradient descent using python and numpy
- Is there a rule-of-thumb for how to divide a dataset into training and validation sets?
- What is the difference between linear regression and logistic regression?
- Python: tf-idf-cosine: to find document similarity
- What are advantages of Artificial Neural Networks over Support Vector Machines?
- Save classifier to disk in scikit-learn
- A simple explanation of Naive Bayes Classification
- Difference between classification and clustering in data mining?
- Calculate AUC in R?
- Epoch vs Iteration when training neural networks
- Why do we have to normalize the input for an artificial neural network?
- What is the role of the bias in neural networks?
- How to write a confusion matrix in Python?
- What is the difference between supervised learning and unsupervised learning?
- [Move to What are advantages of Artificial Neural Networks over Support Vector Machines?]

- How to initialize weights in PyTorch?
- Keras input explanation: input_shape, units, batch_size, dim, etc
- What is the role of "Flatten" in Keras?
- How to concatenate two layers in keras?
- Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?
- What is the meaning of the word logits in TensorFlow?
- How to return history of validation loss in Keras
- Keras model.summary() result - Understanding the # of Parameters
- Where do I call the BatchNormalization function in Keras?
- How to interpret "loss" and "accuracy" for a machine learning model
- How to assign a value to a TensorFlow variable?
- How to implement the ReLU function in Numpy
- What are advantages of Artificial Neural Networks over Support Vector Machines?
- Epoch vs Iteration when training neural networks
- Why do we have to normalize the input for an artificial neural network?
- What's is the difference between train, validation and test set, in neural networks?
- What is the role of the bias in neural networks?
- [Move to What are advantages of Artificial Neural Networks over Support Vector Machines?]

- FailedPreconditionError: Attempting to use uninitialized in Tensorflow
- Scikit-learn train_test_split with indices
- Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
- What are advantages of Artificial Neural Networks over Support Vector Machines?
- Save classifier to disk in scikit-learn
- A simple explanation of Naive Bayes Classification
- Difference between classification and clustering in data mining?
- [Move to What are advantages of Artificial Neural Networks over Support Vector Machines?]