Q&A: UW researcher discusses just how much energy ChatGPT uses
[ad_1]
ChatGPT and other large language models learn to mimic humans by analyzing huge amounts of data. Behind any chatbot’s text box is a large network of computer processing units that support training and running these models.
How much energy do networks running large language models consume? A lot, according to Sajjad Moazeni, a University of Washington assistant professor of electrical and computer engineering, who studies networking for AI and machine learning supercomputing. Just training a chatbot can use as much electricity as a neighborhood consumes in a year.
UW News sat down with Moazeni to learn more.
How do large language models, such as ChatGPT, compare to cloud computing energy-wise?
Sajjad Moazeni: These models have become so large that you need thousands of processors to both train the models and then support the billions of daily queries by users. All this computing can only take place in a data center.
In comparison, conventional cloud computing workloads, such as online services, databases and video streaming, are far less computationally intensive, and require orders of magnitude less memory usage.
Can you describe these data centers?
SM: In today’s data centers, there are hundreds of thousands of processing units that can talk to each other using a large number of optical fibers and network switches. These processors (in addition to memory and storage devices) are stored in server racks. There is also internal infrastructure for cooling down the servers (with water and air) and units to generate and distribute power.
There are hundreds of such data centers across the world and they are mainly managed by big tech companies like Amazon, Microsoft and Google.
How much energy do these large data centers use to run these large language models?
SM: In terms of training a large language model, each processing unit can consume over 400 watts of power while operating. Typically, you need to consume a similar amount of power for cooling and power management as well. Overall, this can lead to up to 10 gigawatt-hour (GWh) power consumption to train a single large language model like ChatGPT-3. This is on average roughly equivalent to the yearly electricity consumption of over 1,000 U.S. households.
Today there are hundreds of millions of daily queries on ChatGPT, though that number may be declining. This many queries can cost around 1 GWh each day, which is the equivalent of the daily energy consumption for about 1,000 U.S. households.
While these numbers might seem OK for now, this is only the beginning of a wide development and adoption of these models. We are expecting that soon many different services will be using this technology daily.
Also, as models become more sophisticated, they get larger and larger, which means the data center energy for training and using these models can become unsustainable. Each big technology company is now trying to develop their own model, and this can lead to a huge training load on data centers.
What are some potential solutions to this issue?
SM: Researchers have been trying to optimize the data center hardware and processors to become more energy efficient for these types of computation.
My group specifically focuses on the networking aspect. In data centers today, processors send electrical signals to bring in or send out the data for computing. But these electrical signals can get distorted. In order to send a lot of data quickly, we need to use a lot of power to make sure the signals can be received correctly.
We are building the next generation of optical interconnect solutions, which include converting these electrical signals to optical signals. These optical signals have significantly lower loss and this minimizes the energy consumption.
Because we are just in the beginning phases for this new technology, it’s really important for people to be transparent about their results and to create open-source models. This will also help us reach advanced and sustainable solutions.
###
Related story: Moazeni recently won a Google Faculty Award in networking for this research.
[ad_2]