tensorflow-gpu版本的编译

tensorflow-gpu版本的编译

tensorflow的安装方法有两种,一种是直接通过pip安装

1
pip3 install tensorflow-gpu

但是这种直接安装源内编译好的tensorflow会有很多功能没有,比如支持kafka,hadoop之类的功能。所以建议通过在github上面下载源码的方法进行安装是比较好的方法。

bazel的安装

网上很多教程给的都算添加apt源的方法下载安装,但是实际中是不可行的,因为bazel的源是谷歌的,所以终端apt无法下载下来,所以直接去git上面下载bazel下载bazel--installer-linux-x86_64.sh这种脚步安装文件。然后先要通过apt下载bazel的一些依赖软件。

1
sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python

然后安装

1
2
chmod +x bazel-<version>-installer-linux-x86_64.sh
./bazel-<version>-installer-linux-x86_64.sh --user

--user是把bazel安装$HOME/bin路径下面。 安装完以后设置环境变量。

1
echo "PATH=$HOME/bin:$PATH>">~/.bashrc

编译

下载tensorflow

下载tensorflow

1
2

git clone git@github.com:tensorflow/tensorflow.git

切换到最新的1.6版本

1
2

git checkout v1.6.0

编译配置configure

配置编译选项configure

1
2

./configure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
You have bazel 0.11.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3


Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:


Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1


Please specify the location where cuDNN 7.1 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Invalid path to cuDNN 7.1 toolkit. None of the following files can be found:
/usr/local/cuda-9.0/lib64/libcudnn.so.7.1
/usr/local/cuda-9.0/libcudnn.so.7.1
None.7.1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=tensorrt # Build with TensorRT support.
Configuration finished

这里有几个需要注意的地方,比如我要装的python3版本的tf这个时候你就需要修改默认直了 比如默认的是/usr/bin/python,这个时候他提示你的时候你输入/usr/bin/python3就可以了,还有当时我被坑的一个地方就是

1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1

当时没有注意就没有填7.1.1,于是configure默认就认为我的cudnn是7.0版本的导致后面编译的时候出现好多问题2333.。。

编译安装

编译 build_pip_package

1
2

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

打包到 pythone wheel

1
2

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

打包的过程中可能会遇到找不到文件的错误提示:

1
2
cp: cannot stat ‘bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/tensorflow’: No such file or directory
cp: cannot stat ‘bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/external’: No such file or directory

执行如下命令可以解决:(实际上是因为缺少的文件在 build_pip_package.runfiles/main/ 目录下)

1
2

cp -r bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/__main__/* bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/

安装

1
2

sudo pip install /tmp/tensorflow_pkg/tensorflow-1.6.0-py3-none-any.whl