CEO Jensen Huang presented the keynote at the recent GTC21 NVIDIA Conference, November 8-11, sharing with the audience the importance of accelerated computing and much more. The announcements were so prolific and not all pertinent to the AEC industry, thus I will share those that would be of most interest to our audience. One of the most profound announcements came at the end of the talk, wherein Huang announced that they are building a digital twin of the earth.
“Accelerated computing starts with NVIDIA CUDA general-purpose programmable GPUs,” Huang said. “The magic of accelerated computing comes from the combination of CUDA, the acceleration libraries of algorithms that speed-up applications, and the distributed computing systems and software that scale processing across an entire data center. We have been advancing CUDA and the ecosystem for 15 years and counting.”
What this means is NVIDIA optimizes across the “full-stack”, iterating between GPU, acceleration libraries, systems, applications, continuously, all the while expanding the reach of their platform by adding new application domains that they accelerate.
“With our approach, end users experience speed-ups through the life of the product. It is not unusual for us to increase application performance by many x-factors on the same chip over several years. As we accelerate more applications, our network of partners growing demand for NVIDIA platforms. Starting from computer graphics, the reach of our architecture has reached deep into the world’s largest industries. We start with amazing chips, but for each field of science, industry, and application, we create a full stack. We have over 150 SDKs that serve industries, from gaming and design, to life and earth sciences, quantum computing, AI, cybersecurity, 5G and robotics. We introduced 65 new and updated SDKs at GTC this year.”
The feeling of the keynote is one of breathlessness, moving swiftly from one thing to another, accelerating beyond what has gone before, perhaps using popular Omniverse or space movie themes as a backdrop for the feel of fast and effective platforms. One of the major new industries that is accelerating with NVIDIA is Design Automation. Ansys, Synopsys, Cadence, and Dassault accelerate the simulation of thermal, mechanical, and 3D electromagnetics for RFI and signal integrity. NVIDIA is also partnering with Ansys to accelerate Ansys Fluent, said to be the world’s leading industrial fluids simulation package.
Early results with the Ansys multi-GPU solver show one DGX will replace 30 high-end dual-CPU servers, leading to big savings in system cost and power.
With the same total budget, customers can scale to much larger simulations.
The number of developers that use NVIDIA has grown to nearly 3 million – by 6 times over the past 5 years.
CUDA has been downloaded 30 million times over the past 15 years and 7 million last year alone.
“Our expertise in full-stack acceleration and data-center-scale architectures lets us help researchers and developers solve problems at the largest scales,” said Huang. “Our approach to computing is highly energy-efficient. The versatility of architecture let us contribute to fields ranging from AI, to quantum physics, to digital biology, to climate science.”
New acceleration libraries available today include:
REOPT – an accelerated solver for Operations Research optimization problems, like delivery vehicle routing and warehouse picking and packing. There are 87 billion ways to deliver 14 pizzas – it’s not so easy for Dominos to deliver pizza in under 30 minutes.
REOPT is over 100X faster, and scales up to 1000s of items, while providing a world-class optimal solution. REOPT can do in minutes what takes overnight today. NVIDIA can re-optimize in real-time with REOPT as conditions change – like new orders, traffic conditions, broken vans, sick drivers, and all the dynamics of the real world.
Quantum computing, relying on the natural quantum physics phenomenon of superposition and entanglement, has the potential of solving problems that grow with combinatorial complexity.
Nearly 100 teams around the world in universities, science labs, enterprises, and startups are doing research in quantum processors, systems, simulators, and algorithms. In another decade or two there may be a useful quantum computer built. In the meantime, the industry needs a super fast quantum simulator to validate their research.
NVIDIA created the cuQuantum DGX appliance, with an acceleration library for quantum computing workflows that speeds up quantum circuit simulations using state-vector and tensor-network methods.
The first accelerated quantum simulator will be Google Cirq.
The speed up is terrific – here are results of quantum Fourier transform, Shor’s algorithm used to break public-key cryptography, and Google’s Sycamore circuit. A simulation that takes months can now be done in days. NVIDIA research achieved a major milestone in quantum algorithm simulation – using 1,688 qubits to find a solution for MaxCut of 3,375 vertices. This is the largest-ever exact quantum circuit simulation – 8 times more qubits than ever simulated before. With cuQuantum on DGX, quantum-computer and algorithm researchers can invent the computer of tomorrow, with the fastest computer today. The cuQuantum DGX appliance will be available in Q1.
Python is the programming language of scientists and ML and AI researchers. Python has a rich ecosystem of libraries – Pandas for data analytics on data frames; NumPy for analytics of n-dimension arrays and matrices; Scikit-learn for machine learning; SciPy for scientific computing; PyTorch for deep learning, and NetworkX for studying graphs and networks. There are nearly 20 million users of Python.
At the conference NVIDIA announced cuNumeric – a drop-in accelerator for NumPy. Zero code change.
cuNumeric accelerates NumPy, scaling from one GPU, to multi-GPU, to multi-node clusters, to the largest supercomputers in the world. The parallelism is done implicitly and automatically. cuDF is Pandas-like. cuML is Scikit-Learn-like. cuGraph is NetworkX-like.
They are part of NVIDIA’s RAPIDS open-source Python data science suite.
RAPIDS has been downloaded half a million times this year – over 4 times more than last year.
cuNumeric is built on Legion, which schedules tasks across the CPU, GPU, and DPU computing units across a data center in a similar way as a modern CPU schedules instructions across its ALUs and load/store units.
Like modern out of order execution CPUs that automatically extract instruction-level parallelism and dynamically reorder the execution, Legion extracts task-level parallelism and dynamically reorders and dispatches the execution of these tasks, often out of order, across the entire data center.
Legion is a data-center-scale compute engine and cuNumeric is a data-center-scale math library. NumPy was downloaded 122 million times in the last 5 years.
NumPy is used by nearly 800,000 projects on GitHub. Developers are going to be thrilled with cuNumeric. The scalability of cuNumeric is said to be excellent.
On the famous CFDPython teaching code, cuNumeric scales to a thousand GPUs with only a 20% loss from perfect scaling efficiency.
“I’ll update you on big initiatives we’re working on and introduce NEW ones that will shape our industries,” said Huang.
A constant theme you’ll see – how Omniverse is used to simulate digital twins of warehouses, plants and factories, of physical and biological systems, the 5G edge, roboticss, self-driving cars, and even avatars.
Leading-edge computer graphics, physics simulation, and AI came together to make Omniverse possible.
And how the computing platforms and acceleration libraries NVIDIA built lay the foundation to make Omniverse a reality.
Data center scale computing. Million-X Science. Omniverse. AI. Avatars. Robotics and Self-Driving Cars.
The network connects thousands of GPUs into a giant supercomputer, determines its scalability and ultimate performance.
NVIDIA Quantum 2 was announced, the most advanced end-to-end networking platform ever built.
Quantum 2 is a 400 Gbps Infiniband platform and consists of the Quantum 2 switch, the ConnectX 7 NIC, the BlueField 3 DPU, and a whole bunch of software for the new architecture.
Quantum 2 is the first networking platform to offer the performance of a supercomputer AND the share-ability of cloud computing.
Until Quantum 2, you get either bare-metal high-performance or secure multi-tenancy – never both. With Quantum 2, your valuable supercomputer will be cloud-native and far better utilized.
- Performance Isolation keeps the activity of one tenant from disturbing others.
- A telemetry-based congestion-control system keeps high data-rate senders from overwhelming the network and jamming the traffic for all.
- Generation 3 SHARP has 32 times higher in-switch processing to speed up AI training.
- A nanosecond precision timing system can synchronize distributed applications, like database processing, lowering the overhead of waiting and handshaking needed to avoid race conditions.
- Nanosecond timing will also allow cloud datacenters to become part of the telecommunications network and host software-defined 5G radio services.
If NVIDIA’s Selene DGX supercomputer were equipped with Quantum 2 today, the total bandwidth would be 224,000 GBytes per second, or roughly one and a half times the total traffic over the internet.
Quantum 2 starts with the amazing new Infiniband switch chip. 57 billion transistors in TSMC 7nm – as big as the A100 GPU. It has 64 ports at 400Gbps or 128 ports at 200Gbps. A Quantum 2 system can connect up to 2048 ports versus 800 ports in Quantum 1. That’s over 5 times the switching capacity. Quantum 2 can scale up to 1 million end-points within the 3-hop Dragonfly topology. That’s 6.5 times over current generation. This networking speed, switching capacity, and scalability is coming just in time for the giant HPC systems that the world needs to be build. Quantum 2 switch is sampling now.
Quantum 2 offers two networking end point options: CX-7 and BlueField3 CX-7 is the fastest NIC ever built.
8 billion transistors in TSMC 7.
CX-7 doubles the data-rate of the world’s current fastest networking chip – CX-6.
And doubles the performance of Mellanox’s famous capabilities like RDMA, GPU-Direct Storage, GPU- Direct RDMA, and in-network computing.
Quantum 2 will be available from the top computer makers and in supercomputing centers all over the world.
The ease of scale-out and orchestration comes at a cost – east-west network traffic increased incredibly with machine-and-machine message passing and these disaggregated applications open many ports inside the data center that need to be secured from cyber-attack.
A new type of processor is needed to offload the CPU burden of processing the networking, storage, and security software.
NVIDIA’s BlueField DPU, an infrastructure computing platform, is designed to do exactly that.
BlueField offloads and accelerates the infrastructure software, which is consuming some 30%, and growing, of the CPUs.
“Today, we are announcing BlueField DOCA 1.2, a suite of new cybersecurity capabilities that make BlueField the ideal platform for the industry to build their Zero Trust security systems,” said Huang. “Protection at the perimeter and workgroup segmentation are no longer sufficient. Every touch point of applications, data, users, and devices are potential attack surfaces.
Since BlueField is the networking end point, we can secure a data center at virtually every touch point.
We are delighted to announce the leading cybersecurity companies are working with us to provision their next generation firewall services on BlueField –Checkpoint, F5, Fortinet, Juniper, Guardicore, Palo Alto Networks, Trend Micro, and VMWare.”
“The cloud data center movement affects every computing company. There are now 1400 developers working with BlueField. And now cybersecurity companies on BlueField can provide zero-trust security as a service. Until every attack surface is secure, we should assume security will be or is already breached. State-of-the-art cybersecurity platforms monitor and study the torrential user-machine and machine- machine transaction logs yet they only parse a fraction of that data looking for anomalies.
We created Morpheus, a deep learning cybersecurity platform that can monitor and analyze subtle data center characteristics generated by every user, machine, and service.
Morpheus is built on NVIDIA RAPIDS and NVIDIA AI. Workflows in Morpheus create AI models, digital fingerprints, for every combination of app and user to learn their usual patterns and look for abnormal transactions.
These abnormal transactions, which may represent only a handful of millions of events, would trigger a security event and alert an analyst to respond.
NVIDIA Morpheus now comes to cybersecurity and harnesses the power of GPU computing to help protect networks in a way never before possible – by creating customized models tailored to your environment.
Observing every detail of your network activity, Morpheus employs unsupervised learning to understand typical behavior patterns across multiple dimensions without relying on predetermined labels of good or bad.
As the AI learns, it creates not a single model, but potentially millions of models – each one with a specific digital fingerprint that is constantly scanned and analyzed.
And because the models are running on NVIDIA GPUs, they can be scaled out, and parallelized to support massive networks, enabling cyber-security practitioners to apply enhanced capabilities, to detect anomalies quickly and reliably.
The software revolution of deep learning is coming to science. This is extremely exciting and will make a big impact. Three connected dynamics will give us a Million-X leap in computational sciences.
“First, accelerated computing, re-inventing the full computing stack – from the chip and system, the acceleration libraries, to the applications – gave us a 50x boost,” explained Huang. “Second, the boost, launched deep learning, triggered the modern AI revolution, and fundamentally changed software. The software that deep learning writes is highly parallel, making it even more conducive to GPU acceleration and scalable to multi-GPU and multi-node. Scaling to large systems like DGX SuperPOD gave us another 5,000x speed-up.
Third, the AI software written with deep learning can predict results 1,000 to 10,000 times faster than software written by hand, busting open completely the way we solve problems and the problems that are even solvable.
50x times 5,000x times 1,000x gets us 250,000,000x. Of course, the mileage will vary and much depends on the scale you invest. But when a solution to a worthwhile problem is within grasp, the investments will come – look at the investment that’s going into AI, robotics, self-driving cars.
The signs are clear, accelerated computing doing AI at data center scale will give a giant boost in simulation performance.”
The question then becomes, How do we apply deep learning to science?
“Science obeys the laws of physics – Newton, Maxwell, laws of thermal dynamics, Ohm’s law, Bernoulli’s principle, the law of conservation of energy, to name a few,” said Huang. “Researchers are creating AI models that learn physics and make predictions that obey the laws of physics.
The application of machine learning to improve physics simulation has been growing incredibly.
Karniadakis and the team at Brown described the PINN, physics informed neural network.
Li, Anandkumar, a team at Caltech and NVIDIA, described FNO, Fourier Neural Operator, that can learn to approximate any partial differential equation.
The same team recently combined the benefits of PINN and FNO into PINO, a universal function learner that obeys the laws of physics.
PINO can learn from a principled-physics simulator or observed data.
Once trained, it can emulate the principled physics models at extremely high speeds.
And equally importantly, this model is highly parallelizable, and so can scale to very large systems to get a combined Million-X factor.
For climate science, we may finally have a way to simulate the earth’s climate 10, 20, or 30 years from now, predict the regional impact of climate change, and take action to mitigate and adapt before it’s too late. Severe droughts are happening around the world.
This is not caused by lack of rain but higher evaporation from rising temperatures. The dryness is causing more wildfires. Predicting climate change, so to develop strategies to mitigate and adapt, is arguably one of the greatest challenges facing society today.
We don’t currently have the ability to accurately predict the climate decades out. Although much is known about the physics, the scale of simulation is daunting.
Climate simulation is much harder than weather simulation, which largely models atmospheric physics and the accuracy of the model can be validated every few days.
Long term climate prediction must model the physics of earth’s atmosphere, oceans and waters, ice, the lands, and human activities, and their interplay.
Further, simulation resolutions of 1 to 10 meters are needed to incorporate effects like low atmosphere
clouds that reflect the Sun’s radiation back to space.
Ignoring these contributions accumulate to significant error in long term predictions. This is 10 to 100 thousand times higher resolutions than any weather simulation today. There are no computers big enough that we can build. We need a computer science breakthrough.”
In response to this need, NVIDIA announced NVIDIA Modulus – a framework for developing physics-ML models.
- Train physics-ML models using governing physics and data from principled models and observations. Modulus has been optimized to train on multi-GPU and multi-Node.
- The resulting model can emulate physics 1000 to 100,000 times faster than simulation.
With Modulus, scientists will be able to create digital twins to better understand large systems like never before.
“One important problem we can apply Modulus to solve is climate science,” noted Huang. “Climate change is reshaping the world. The combination of accelerated computing, physics-ML, and giant computer systems can give us a Million-X leap and give us a shot.
We will use principled-physics models and observed data to teach AI to predict climate – in super-real- time.
We can create a digital twin of the earth that runs continuously to predict the future, calibrating and improving its predictions with observed data, and predict again.”
Researchers trained a physics-ML model using atmospheric data from ERA5 of ECMWF. The model took 4 hours to train on 128 A100 GPUS. The trained model can predict hurricane severity and path at 30 km spatial resolution.
Hopefully in a couple of years, data will stream into a digital twin of Earth running in Omniverse and an ensemble of physics-ML models will predict the climate.
Back to Omniverse. The internet is essentially a digital overlay on the world. The overlay is largely 2D information – text, voice, images, video. According to Huang, that’s about to change.
“We now have the technology to create new 3D worlds or model our physical world. These virtual worlds will obey the laws of physics, or not. There can be AI or friends with you.
We will jump from one world to another like we do on the web with hypertext. This new world will be much larger than the physical world. We will buy and own 3D things like we buy 2D songs and books today. We will buy, own, and sell homes, furniture, cars, luxury goods, and art in this world.
Creators will make more things in virtual worlds than they do in the physical world. We built Omniverse for builders of these virtual worlds. Some worlds will be built for gathering and games.
But a great many will be built by scientists, creators, and companies. Virtual worlds will crop up like websites today.
Omniverse is a very different than a game engine.
Omniverse is designed to be data center scale – and hopefully someday planetary scale.
The portal of Omniverse is USD, Universal Scene Description – essentially a digital wormhole that connects people and computers to Omniverse, and for one Omniverse world to connect to another.
USD is to Omniverse what HTML is to websites. Omniverse is futuristic.
Omniverse can connect design worlds – things created in the Adobe world can be connected to those in the Autodesk world through Omniverse – enabling designers to collaborate in a shared space.”
Companies can build virtual factories and operate them with virtual robots in Omniverse. The virtual factories and robots are the digital twins of their physical replica.
The physical version is the replica of the digital since they are produced from the digital original.
Omniverse digital twins are where we will design, train, and continuously monitor robotic buildings, factories, warehouses, and cars of the future.
The latest update to Omniverse includes the following:
Showroom – an Omniverse App of demos and samples that showcases core Omniverse technology –
graphics, physics, materials, and AI.
Farm – a systems layer that orchestrates the processing of batch jobs across multiple systems; workstations, servers, bare-metal or virtualized.
Farm can be used for batch rendering, synthetic data generation for AI, or distributed computing. Omniverse AR streams graphics to phones or AR glasses.
Omniverse VR, the world’s first full-frame interactive ray-traced VR
Robots, AV fleets, warehouses, factories, industrial plants, and whole cities will be created, trained, and operated in Omniverse digital twins.
“We will build a digital twin to simulate and predict climate change. The last supercomputer we built was called Cambridge 1, or C-1. This new supercomputer will be E-2,” said Huang. “Earth Two – the digital twin of Earth, running Modulus-created AI physics, at Million-X speeds, in Omniverse.
All the technologies we’ve invented up to this moment are needed to make Earth Two possible. I can’t imagine a greater and more important use.”
The post NVIDIA GTC 21 Positions Itself for Earth Two – the Digital Twin of the Earth appeared first on AECCafe Voice.