The biggest question is how does this perform? Yeah the stack is open source, it uses a driver already integrated into the kernel and is able to run tensorflow, pytorch and caffe but how well does it do that?
Some results are provided from lambda labs for comparsion with the vega 56.
Model / GPU | Vega 56 | 1080 Ti |
---|---|---|
ResNet-50 | 145.19 | 203.99 |
Inception v3 | 67.08 | 130.2 |
VGG16 | 80.57 | 133.16 |
The Vega GPU is quite about half as fast as the 1080ti on the worst performing model (Inception v3). The result for ResNet-50 is where the gap is the closest and that result was actually achieved by turning on ROCm Fusion. This fusion operation seems to alter the computation graph to combine multiple operations into a single convolution where possible.
To enable this, run export TF_ROCM_FUSION_ENABLE=1
inside the docker container before starting a tensorflow workload. Perhaps the other models would have been closer in performance to the 1080ti with this setting. Unfortunately, I was not able to perform very rigorous testing as I was building this machine for someone else. I would like to try out ROCm Fusion as well as [undervolting and overclocking the card])https://github.com/RadeonOpenCompute/ROCm/issues/463). Undervolting should reduce the amount of heat and fan noise, allowing the card to maintain higher boost frequencies.
Conclusion
While this build wasn't for me, I would certainly build this myself if I had an extra thousand dollars to use on this. After tax and everything the entire build was 996.17. However, the value for performance with the Vega GPU is actually pretty decent. I got the Vega 56 for $320 after tax. Considering that the cursory benchmark results obtained showed the Vega getting anywhere from 50-75% of the performance of a 1080ti at under a third of the price of a 1080ti (most new 1080ti's I see are around $850 at the moment).
In the future, it would be better to compare the cost/performance of the Vega to a lower tier Nvidia GPU like a 1070.
Comments
April 28, 2019 06:46
Very interesting, but wait... I missed the most important for me: Are both tensorflow and pytorch compatible with opencl ? Really ? I mean, when you don't want to spend hours configuring tensorflow and pytorch, but you just want to use them to run your deep learning code, is it transparent for the user whether your backend relies en CUDA or opencl ?
For instance, when looking at the pytorch install page, there's no option for opencl. So... which version of pytorch do I have to install ?
Thanks !