IBM just unveiled a new tool which is supposed to largely reduce the time spent in training distributed deep learning (DDL) system by using a lot of quality hardware to the task.
To be specific, this new technique tackles a major challenge when deploying deep learning, which is that a large neural network as well as large datasets can make deep learning thrive, but they also cause longer training time. It may be days or even weeks taken to train a large-scale AI model based on deep learning.
The scaled-up numbers of GPUs communicate with each other so this process usually spends a long term. As a matter of fact, when GPUs get faster, the process can be worse. It is indeed that faster GPUs learn faster, while via conventional software, the communications with one another cannot keep up.
According to IBM’s Hillery Hunter, generally, the smarter as well as faster GPUs just need a better communication means, or these learners (GPUs) are out of sync and take most time to wait for each other’s results. Therefore, it means users may not get high speed, but potentially have degraded performance when using more, faster learners.
IBM’s new tech disclosed today can improve that problem right away. Even better, the new DDL software may run those popular open source codes such as Caffe, Tensorflow, Chainer, and Torch over a lot of neural networks and massive datasets with high accuracy and performance.
IBM Research showed how to achieve the record communication overhead as well as 95% scaling efficiency on Caffe-based deep learning framework over 256 GPUs in the 64 IBM Power systems. Facebook AI Research set the previous scaling record that was near 90% efficiency for a Caffe2-based training, with higher communication overhead.
In addition, IBM Research uses new DDL software to get new 33.8% image recognition accuracy for a neural network which was trained on a large-scale dataset including 7.5 million images, and the high image recognition accuracy was achieved in 7 hours only!
How to achieve that? Hunter explained this, most deep learning frameworks normally scale up to multiple GPUs in one single server, rather than to multiple servers with certain GPUs. As for IBM Research, its team developed software as well as algorithms which can automate as well as optimize this complex and large-scale computing task over massive GPU accelerators which are attacked to a lot of servers. By utilizing the great power of these servers with massive GPUs, IBM Research made the milestone.
There is no doubt that the new DDL software improves deep learning performance a lot. The company hopes users can see the advances in various AI applications, and is making DDL’s technical preview. DDL software now is available in its deep learning software distribution package, PowerAI 4.