Why NVIDIA made supercomputers DGX

The author:(作者)左
published in(发表于) 2016/11/3 8:22:55
Why NVIDIA made supercomputers DGX

English

中文

Why NVIDIA made a super computer DGX-1? -IT information

On the GTC in Silicon Valley this year, NVIDIA released the super computer deep learning DGX-1. Jen-Hsun Huang said it was "fit into the chassis of the data center."

DGX-1 8 Tesla based on Pascal P100 Accelerator and 4 1.92TB solid-state hard drives, using NVLink technology 5-12 times faster than conventional PCIe connection CPU data transfer with the GPU and GPU. Studying in depth training, than an ordinary Xeon E5 2697 v3 dual CPU server 75 times faster overall performance, the equivalent of 250 regular x86 server. Single DGX-1 sold for $129,000.

After the GTC, Jen-Hsun Huang himself of Elon Musk is the first DGX-1 to OpenAI artificial intelligence project. NVIDIA first DGX-1 priority will also be given in recent years have made outstanding contributions to artificial intelligence Research Institute, this list includes Stanford, UC Berkeley, CMU, MIT, the Chinese University and so on. In mainland China, have adopted public information July hikvision dawn signed the nation's first single DGX-1;NVIDIA is telling us now DGX-1 has a dozen clients.

In October, on the just-concluded HPC China 2016, we chatted and NVIDIA's senior in the high performance computing on the view and why they made the super computer.

Most program prepared by the machine

NVIDIA is responsible for solution architecture and Engineering Vice President Marc Hamilton expressed such a view in HPC China 2016, AI gave birth to a new model, most programs will not be written by people in the future, but through the network of deep learning to write.

He cited an example, programs written in the past such as address books, or wage distribution, which is a very rational numbers. Today there are a large number of complex data, such as images, sound, and video. Even code the 1.3 billion Chinese people have become, it is impossible to make enough software to handle large amounts of data generated by a day. Most programs written in depth by neural networks, NVIDIA believes most of the depth of the neural network will run on the GPU.

NVIDIA respects the 2 cases: in Shanghai, they have a bio-medical industry partners, through deep study on MRI and CT Imaging analysis of cancer reviews and review. Another go faster in the country area is security, such as in the video than the suspect photos or are looking for a specific object. This typical partners such as hikvision, which depth the procurement of DGX-1 is also used in video surveillance study.

DGX-1 is a goofy design DGX-1 design can be traced back to 2015 GTC, when NVIDIA released the latest generation of Pascal schema, deep learning of this new architecture will bring some key applications up to 10 times more than double the rate. But this new structure creates a new problem: development/research staff can take weeks or even months of time configuring the GPU. So in just a few months later, Jen-Hsun Huang made a request: I hope that in the second year before the GTC, by NVIDIA's Engineering Department to create a server based on Pascal, so agencies and company can just press the system button on the GPU with 8 in depth study.

Today we don't see DGX-1 8 GPU pinch together so easy.

Marc Hamilton tells us that the DGX-1 also includes the 3 software and integration services.

First is the support for all deep learning framework. Like Caffe, TensorFlow, CNTK ... DGX-1 of the popular deep learning framework were optimized.

Second is the underlying library, called cuDNN, can be understood as a CUDA integrated Deep Neural Network.

Third the DGX cloud services from cloud DGX make a mirror. Any company, they may not know how to deep learning management system software, but you know how a DGX-1 server management in the cloud.

Now, NVIDIA, the biggest challenge is how to rapid spread of deep learning, General Manager of its enterprise group in China telephone said that deep learning is a unique market, NVIDIA make new attempt to DGX-1 is the background. Marc Hamilton tells us that to achieve 150 petaflop performance of floating-point calculations, if more than one GPU, 3,400 server is required, and if using a x86 solution, you need 100,000 servers. For programmers, the server maintains these two orders of magnitude, the choice is obvious.

英伟达为什么要造超级计算机DGX-1？ - IT资讯

在今年硅谷的GTC上，NVIDIA发布了深度学习超级计算机DGX-1。黄仁勋称它是“装进机箱里的数据中心”。

DGX-1内置了8块基于Pascal架构的Tesla P100加速器和4块1.92TB的固态硬盘，使用比传统PCIe快5 -12倍的NVLink技术连接CPU与GPU以及GPU之间的数据传输。在深度学习训练上，它比一台普通的Xeon E5 2697 v3的双CPU服务器快75倍，整体性能相当于250台普通x86服务器。单台DGX-1的售价是12.9万美金。

在GTC之后，黄仁勋亲自将第一台DGX-1送给了Elon Musk的人工智能项目OpenAI。NVIDIA还会将首批DGX-1优先发给近年对人工智能有突出贡献的研究机构，这个名单里包含了Stanford、UC Berkeley、CMU、MIT、香港中文大学等等。而在中国大陆，已经公开的信息是7月份海康威视通过曙光签下了国内第一单DGX-1；NVIDIA方面则告诉我们目前DGX-1在国内已有十来家客户。

在10月份刚刚结束的HPC China 2016上，我们和NVIDIA的高层聊了聊他们在高性能计算上的看法以及他们为什么要造这台超级计算机。

下一代程序大部分会由机器编写

NVIDIA负责解决方案与工程架构的副总裁Marc Hamilton在HPC China 2016上表达了这么一个观点，AI会催生一种新的计算模型，未来大部分程序不会是由人来编写，而是通过深度学习网络来编写。

他举了一个例子，过去编写的程序比如通讯录或者工资的发放，它们是非常规整的数字。而今天有大量更复杂的数据，比如图像、声音、视频。哪怕把13亿中国人都变成码农，也不可能编出足够多的软件来处理一天所产生的大量数据。所以大部分程序会由深度神经网络来编写，而NVIDIA相信大部分深度神经网络会运行在GPU上。

NVIDIA方面讲了2个案例：在上海，他们有一家生物医疗行业的合作伙伴，在通过深度学习对核磁共振、CT影像做分析进行癌症的审查和复核。而另一个在国内走得比较快的领域是安防，比如在视频中去比对疑犯照片或者是寻找特定的物体。这方面典型的合作伙伴如海康威视，后者所采购的DGX-1也是用于视频监控方面的深度学习研究。

DGX-1是一种傻瓜式的设计DGX-1的设计可以回溯到2015年的GTC，当时NVIDIA公布了最新一代的Pascal架构，这一新架构会把一些关键的深度学习应用提升10倍以上的速率。但这一新架构也带来了新的问题：开发/研究人员可能要花数周甚至数月的时间配置这些GPU。所以在几个月后，黄仁勋在内部提出了一个要求：希望在第二年的GTC之前，由NVIDIA的工程部门打造一台基于Pascal架构的服务器，这样研究机构和公司们只要按下机箱按钮就能把8块GPU用在深度学习上。

今天我们看到的DGX-1并不是8块GPU捏在一起那么简单。

Marc Hamilton告诉我们，DGX-1还囊括了3类软件和服务的整合。

第一是对所有深度学习框架的支持。比如Caffe、TensorFlow、CNTK...DGX-1对现在流行的深度学习框架都进行了优化。

第二类是底层的库，称为cuDNN，可以理解成是CUDA融合了Deep Neural Network。

第三类是DGX的云服务，等于从云上给DGX服务器做一个镜像。任何一家公司，他们未必知道如何去管理深度学习的系统软件，但知道怎样在云端管理一台DGX-1服务器。

当下，对NVIDIA来说，最大的挑战是如何快速普及深度学习，其中国区企业事业部总经理沈威说，深度学习是一个独特的市场，NVIDIA自己造DGX-1则是这个背景下的新尝试。Marc Hamilton告诉我们，要实现150个petaflop浮点计算的性能，如果基于多个GPU的话，需要3400个服务器，而如果使用传统x86的解决方案，则需要10万个服务器。对于程序员来说，维护这两个数量级的服务器，其中的选择是显而易见的。