How to compile Tensorflow with SSE4 2 and AVX instructions

Question

This is the message received from running a script to check if Tensorflow is working   I tensorflow stream executor dso loader cc 125  successfully opened CUDA library libcublas so 8 0 locally I tensorflow stream executor dso loader cc 125  successfully opened CUDA library libcudnn so 5 locally I tensorflow stream executor dso loader cc 125  successfully opened CUDA library libcufft so 8 0 locally I tensorflow stream executor dso loader cc 125  successfully opened CUDA library libcuda so 1 locally I tensorflow stream executor dso loader cc 125  successfully opened CUDA library libcurand so 8 0 locally W tensorflow core platform cpu feature guard cc 95  The TensorFlow library wasn t compiled to use SSE4 2 instructions  but these are available on your machine and could speed up CPU computations  W tensorflow core platform cpu feature guard cc 95  The TensorFlow library wasn t compiled to use AVX instructions  but these are available on your machine and could speed up CPU computations  I tensorflow stream executor cuda cuda gpu executor cc 910  successful NUMA node read from SysFS had negative value  -1   but there must be at least one NUMA node  so returning NUMA node zero   I noticed that it has mentioned SSE4 2 and AVX    What are SSE4 2 and AVX  How do these SSE4 2 and AVX improve CPU computations for Tensorflow tasks  How to make Tensorflow compile using the two libraries

User · Answer

This is the simplest method  Only one step  It has significant impact on speed  In my case  time taken for a training step almost halved  Refer custom builds of tensorflow

User · Answer

I just ran into this same problem  it seems like Yaroslav Bulatov s suggestion doesn t cover SSE4 2 support  adding --copt -msse4 2 would suffice  In the end  I successfully built with  bazel build -c opt --copt -mavx --copt -mavx2 --copt -mfma --copt -mfpmath both --copt -msse4 2 --config cuda -k   tensorflow tools pip package build pip package   without getting any warning or errors   Probably the best choice for any system is   bazel build -c opt --copt -march native --copt -mfpmath both --config cuda -k   tensorflow tools pip package build pip package    Update  the build scripts may be eating -march native  possibly because it contains an      -mfpmath both only works with gcc  not clang   -mfpmath sse is probably just as good  if not better  and is the default for x86-64   32-bit builds default to -mfpmath 387  so changing that will help for 32-bit    But if you want high-performance for number crunching  you should build 64-bit binaries    I m not sure what TensorFlow s default for -O2 or -O3 is   gcc -O3 enables full optimization including auto-vectorization  but that sometimes can make code slower     What this does  --copt for bazel build passes an option directly to gcc for compiling C and C   files  but not linking  so you need a different option for cross-file link-time-optimization   x86-64 gcc defaults to using only SSE2 or older SIMD instructions  so you can run the binaries on any x86-64 system    See https   gcc gnu org onlinedocs gcc x86-Options html    That s not what you want   You want to make a binary that takes advantage of all the instructions your CPU can run  because you re only running this binary on the system where you built it   -march native enables all the options your CPU supports  so it makes -mavx512f -mavx2 -mavx -mfma -msse4 2 redundant    Also  -mavx2 already enables -mavx and -msse4 2  so Yaroslav s command should have been fine    Also if you re using a CPU that doesn t support one of these options  like FMA   using -mfma would make a binary that faults with illegal instructions   TensorFlow s   configure defaults to enabling -march native  so using that should avoid needing to specify compiler options manually   -march native enables -mtune native  so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads   This all applies to gcc  clang  or ICC    For ICC  you can use -xHOST instead of -march native

User · Answer

To compile TensorFlow with SSE4 2 and AVX  you can use directly     bazel build --config mkl                    --config  opt                     --copt  -march broadwell                     --copt  -O3                       tensorflow tools pip package build pip package   Source   https   github com tensorflow tensorflow blob master tensorflow tools docker Dockerfile devel-cpu-mkl

User · Answer

When building TensorFlow from source  you ll run the configure script  One of the questions that the configure script asks is as follows   Please specify optimization flags to use during compilation when bazel option  --config opt  is specified  Default is -march native    The configure script will attach the flag s  you specify to the bazel command that builds the TensorFlow pip package  Broadly speaking  you can respond to this prompt in one of two ways    If you are building TensorFlow on the same type of CPU type as the one on which you ll run TensorFlow  then you should accept the default  -march native   This option will optimize the generated code for your machine s CPU type  If you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type  then consider supplying a more specific optimization flag as described in the gcc  documentation    After configuring TensorFlow as described in the preceding bulleted list  you should be able to build TensorFlow fully optimized for the target CPU just by adding the --config opt flag to any bazel command you are running

User · Answer

Let s start with the explanation of why do you see these warnings in the first place      Most probably you have not installed TF from source and instead of it used something like pip install tensorflow  That means that you installed pre-built  by someone else  binaries which were not optimized for your architecture  And these warnings tell you exactly this  something is available on your architecture  but it will not be used because the binary was not compiled with it  Here is the part from documentation      TensorFlow checks on startup whether it has been compiled with the   optimizations available on the CPU  If the optimizations are not   included  TensorFlow will emit warnings  e g  AVX  AVX2  and FMA   instructions not included    Good thing is that most probably you just want to learn experiment with TF so everything will work properly and you should not worry about it    What are SSE4 2 and AVX   Wikipedia has a good explanation about SSE4 2 and AVX  This knowledge is not required to be good at machine-learning  You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized  for example adding two arrays     Both SSE and AVX are implementation of an abstract idea of SIMD  Single instruction  multiple data   which is     a class of parallel computers in Flynn s taxonomy  It describes   computers with multiple processing elements that perform the same   operation on multiple data points simultaneously  Thus  such machines   exploit data level parallelism  but not concurrency  there are   simultaneous  parallel  computations  but only a single process    instruction  at a given moment   This is enough to answer your next question     How do these SSE4 2 and AVX improve CPU computations for TF tasks  They allow a more efficient computation of various vector  matrix tensor  operations  You can read more in these slides    How to make Tensorflow compile using the two libraries   You need to have a binary which was compiled to take advantage of these instructions  The easiest way is to compile it yourself  As Mike and Yaroslav suggested  you can use the following bazel command  bazel build -c opt --copt -mavx --copt -mavx2 --copt -mfma --copt -mfpmath both --copt -msse4 2 --config cuda -k   tensorflow tools pip package build pip package

User · Answer

Let me answer your 3rd question first   If you want to run a self-compiled version within a conda-env  you can  These are the general instructions I run to get tensorflow to install on my system with additional instructions  Note  This build was for an AMD A10-7850 build  check your CPU for what instructions are supported   it may differ  running Ubuntu 16 04 LTS  I use Python 3 5 within my conda-env  Credit goes to the tensorflow source install page and the answers provided above   git clone https   github com tensorflow tensorflow    Install Bazel   https   bazel build versions master docs install html sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel   Create your virtual env with conda  source activate YOUR ENV pip install six numpy wheel  packaging  appdir   Follow the configure instructions at    https   www tensorflow org install install sources   Build your build like below  Note  Check what instructions your CPU    support  Also  If resources are limited consider adding the following    tag --local resources 2048  5 1 0   This will limit how much ram many   local resources are used but will increase time to compile  bazel build -c opt --copt -mavx --copt -msse4 1 --copt -msse4 2  -k   tensorflow tools pip package build pip package   Create the wheel like so  bazel-bin tensorflow tools pip package build pip package  tmp tensorflow pkg   Inside your conda env  pip install  tmp tensorflow pkg NAME OF WHEEL whl   Then install the rest of your stack pip install keras jupyter etc  etc    As to your 2nd question   A self-compiled version with optimizations are well worth the effort in my opinion  On my particular setup  calculations that used to take 560-600 seconds now only take about 300 seconds  Although the exact numbers will vary  I think you can expect about a 35-50  speed increase in general on your particular setup   Lastly your 1st question   A lot of the answers have been provided above already  To summarize  AVX  SSE4 1  SSE4 2  MFA are different kinds of extended instruction sets on X86 CPUs  Many contain optimized instructions for processing matrix or vector operations    I will highlight my own misconception to hopefully save you some time  It s not that SSE4 2 is a newer version of instructions superseding SSE4 1  SSE4   SSE4 1  a set of 47 instructions    SSE4 2  a set of 7 instructions    In the context of tensorflow compilation  if you computer supports AVX2 and AVX  and SSE4 1 and SSE4 2  you should put those optimizing flags in for all  Don t do like I did and just go with SSE4 2 thinking that it s newer and should superseed SSE4 1  That s clearly WRONG  I had to recompile because of that which cost me a good 40 minutes

User · Answer

2 0 COMPATIBLE SOLUTION   Execute the below commands in Terminal  Linux MacOS  or in Command Prompt  Windows  to install Tensorflow 2 0 using Bazel   git clone https   github com tensorflow tensorflow git cd tensorflow   The repo defaults to the master development branch  You can also checkout a release branch to build  git checkout r2 0   Configure the Build   gt  Use the Below line for Windows Machine python   configure py    Configure the Build   gt  Use the Below line for Linux MacOS Machine   configure  This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options     Build Tensorflow package   CPU support bazel build --config opt   tensorflow tools pip package build pip package    GPU support bazel build --config opt --config cuda --define no tensorflow py deps true   tensorflow tools pip package build pip package

User · Answer

I have recently installed it from source and bellow are all the steps needed to install it from source with the mentioned instructions available   Other answers already describe why those messages are shown  My answer gives a step-by-step on how to isnstall  which may help people struglling on the actual installation as I did    Install Bazel   Download it from one of their available releases  for example 0 5 2  Extract it  go into the directory and configure it  bash   compile sh  Copy the executable to  usr local bin  sudo cp   output bazel  usr local bin   Install Tensorflow   Clone tensorflow  git clone https   github com tensorflow tensorflow git Go to the cloned directory to configure it    configure  It will prompt you with several questions  bellow I have suggested the response to each of the questions  you can  of course  choose your own responses upon as you prefer   Using python library path   usr local lib python2 7 dist-packages Do you wish to build TensorFlow with MKL support   y N  y MKL support will be enabled for TensorFlow Do you wish to download MKL LIB from the web   Y n  Y Please specify optimization flags to use during compilation when bazel option  --config opt  is specified  Default is -march native    Do you wish to use jemalloc as the malloc implementation   Y n  n jemalloc disabled Do you wish to build TensorFlow with Google Cloud Platform support   y N  N No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support   y N  N No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler  experimental    y N  N No XLA JIT support will be enabled for TensorFlow Do you wish to build TensorFlow with VERBS support   y N  N No VERBS support will be enabled for TensorFlow Do you wish to build TensorFlow with OpenCL support   y N  N No OpenCL support will be enabled for TensorFlow Do you wish to build TensorFlow with CUDA support   y N  N No CUDA support will be enabled for TensorFlow    The pip package  To build it you have to describe which instructions you want  you know  those Tensorflow informed you are missing     Build pip script  bazel build -c opt --copt -mavx --copt -mavx2 --copt -mfma --copt -mfpmath both --copt -msse4 1 --copt -msse4 2 -k   tensorflow tools pip package build pip package  Build pip package  bazel-bin tensorflow tools pip package build pip package  tmp tensorflow pkg  Install Tensorflow pip package you just built  sudo pip install  tmp tensorflow pkg tensorflow-1 2 1-cp27-cp27mu-linux x86 64 whl  Now next time you start up Tensorflow it will not complain anymore about missing instructions

User · Answer

Thanks to all this replies   some trial and errors  I managed to install it on a Mac with clang  So just sharing my solution in case it is useful to someone     Follow the instructions on Documentation - Installing TensorFlow from Sources   When prompted for   Please specify optimization flags to use during compilation when bazel option  --config opt  is specified  Default is -march native    then copy-paste this string    -mavx -mavx2 -mfma -msse4 2    The default option caused errors  so did some of the other flags  I got no errors with the above flags  BTW I replied n to all the other questions   After installing  I verify a  2x to 2 5x speedup when training deep models with respect to another installation based on the default wheels - Installing TensorFlow on macOS     Hope it helps

User · Answer

I compiled a small Bash script for Mac  easily can be ported to Linux  to retrieve all CPU features and apply some of them to build TF  Im on TF master and use kinda often  couple times in a month    https   gist github com venik 9ba962c8b301b0e21f99884cbd35082f

User · Answer

To hide those warnings  you could do this before your actual code   import os os environ  TF CPP MIN LOG LEVEL    2  import tensorflow as tf

User · Answer

These are SIMD vector processing instruction sets    Using vector instructions is faster for many tasks  machine learning is such a task   Quoting the tensorflow installation docs      To be compatible with as wide a range of machines as possible  TensorFlow defaults to only using SSE4 1 SIMD instructions on x86 machines  Most modern PCs and Macs support more advanced instructions  so if you re building a binary that you ll only be running on your own machine  you can enable these by using --copt -march native in your bazel build command

[tensorflow] How to compile Tensorflow with SSE4.2 and AVX instructions?

Examples related to tensorflow

Examples related to x86

Examples related to compiler-optimization

Examples related to simd

Examples related to compiler-options