MXNet is a popular machine learning / deep learning framework. Compared to its peers such as TensorFlow, it is not as popular.
Indeed, my last search on safari books online only yielded publications with the
mxnet name mentioned but no concrete examples on how to build or use it.
In this post I aim to show some of the common errors I encountered while building it manually. In future posts, I will demonstrate how I build it from scratch using multi-stage builds.
Below is a list of such errors I encountered.
Please note that its not meant to be a comprehensive / exhaustive list and as usual, different system setup and requirements may / may not result in different errors than mine.
In my earlier attempts at compilation, the process would fail suddenly with errors such as:
1 ... 2 c++: internal compiler error: Killed (program cc1plus) 3 Please submit a full bug report, 4 with preprocessed source if appropriate. 5 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions. 6 make: *** [CMakeFiles/mxnet_static.dir/src/operator/tensor/indexing_op.cc.o] Error 4 7 make: *** Waiting for unfinished jobs.... 8 ...
This is related to a gcc 7.5 memory leak issue
Assuming you have gcc 8 installed, we can use gcc 8 by passing in the following options during compilation…
1 export CC="gcc-8" && \ 2 export CXX="g++-8"
Examples show compilation through
ninja build tool. However, my attempts at using it have been unsuccessful. I find that at least on my local machine,
ninja tends to consume more CPU and memory resources than it should whereas with
cmake build tool, I have more control over the number of processes and it also shows the logs in the console than using the former.
If compilation fails with
namespace not found with onnx option set, then set the appropriate environment variable:
1 export ONNX_NAMESPACE=onnx
After compilation but while loading the framework, if it fails with
File already exists: ... onnx-ml.proto, this is due to the
protobuf library being built as a shared library and since
onnx-ml.proto is also symlinked by other libs, it will raise an error.
The only solution to this is to remove all previous installs of protobuf and build it again manually. The below works for me:
1 git clone --recursive -b 22.214.171.124 https://github.com/google/protobuf.git && \ 2 cd protobuf && \ 3 ./autogen.sh && \ 4 ./configure --disable-shared CXXFLAGS=-fPIC --prefix=/protobufbuild && \ 5 make -j4 && \ 6 make install && \ 7 ldconfig && \ 8 cd / && \ 9 cp /protobufbuild /usr/local && \ 10 rm -rf protobuf
--disable-shared option which builds it as a static library.
For running onnx in python, we need to install the
onnx pypi package. A compatible version I found to work across all mxnet versions from v1.6.x - v1.8.x is
MKLDNN is the intel graphics driver for running ML operations on Intel-compatible CPUs. It’s a suitable replacement for NVIDIA gpus. More information on Intel MKLDNN
The errors below only apply for me when I was trying to compile a python wheel of mxnet. It may not be applicable in your use case.
While compiling the python wheel, I had to enable TVM and it threw an error of:
1 Traceback (most recent call last): 2 File "/mxnet/contrib/tvmop/compile.py", line 20, in <module> 3 import tvm 4 File "/mxnet/3rdparty/tvm/python/tvm/__init__.py", line 36, in <module> 5 from . import target 6 File "/mxnet/3rdparty/tvm/python/tvm/target.py", line 70, in <module> 7 raise err_msg 8 File "/mxnet/3rdparty/tvm/python/tvm/target.py", line 66, in <module> 9 from decorator import decorate 10 ModuleNotFoundError: No module named 'decorator'
Ensure that the
decorator==4.4.2 pypi package is present before building the 3rd party plugins.
Another issue I encountered was not being able to locate / load the TVM config file issue
After building mxnet, you need to run the following to copy the generated
tvmop.conf file from the build folder into
1 mkdir -p /usr/local/lib/python3.6/lib && \ 2 cp tvmop.conf /usr/local/lib/python3.6/lib/
Another point to note is that for v1.8.x and above, it uses cuda version 10.2 and above. If you are building multiple versions of MXNet in Docker your Dockerfile would need to take into account the different versions required else compilation will not work.
In conclusion, it takes some effort to get MXNet to compile from source but you will learn a lot about the framework just by doing it - I certainly did.