How Containers LLMs and GPUs Fit with Data Apps - The New Stack

bontereng.blogspot.com
How Containers, LLMs, and GPUs Fit with Data Apps - The New Stack

2023-06-30 13:29:02

How Containers, LLMs, and GPUs Fit with Data Apps

Containers, large language models (LLMs), and GPUs provide a foundation for developers to build services for what Nvidia CEO Jensen Huang describes as an "AI Factory."

Jun 30th, 2023 1:29pm by Alex Williams

Featued image for: How Containers, LLMs, and GPUs Fit with Data Apps

Containers, large language models (LLMs), and GPUs provide a foundation for developers to build services for what Nvidia CEO Jensen Huang describes as an “AI Factory.”

Huang made the statement at the launch of the Snowflake Summit in Las Vegas this week, where Nvidia and Snowflake demonstrated the power that comes with containers as the emerging foundation for generative AI distribution through application architectures that integrate with enterprise data stores.

Announced at the conference and now in private beta, Snowflake’s Snowpark Container Service (SCS) offers the ability to containerize LLMs; through Snowpark, Snowflake will offer Nvidia’s GPUs and NeMO, Nvidia’s “end-to-end, cloud native enterprise framework to build, customize, and deploy generative AI models.

Snowflake’s investment in container services demonstrates how the company has transformed from its roots as a data warehouse service provider to a data analytics company and now as a company that offers a platform for container services and native application development.

Snowpark serves as the home for SCS. It offers a platform to deploy and process Python, Java, and Scala code in Snowflake through a set of libraries and runtimes, including User Defined Functions (UDFs) and stored procedures.

SCS will offer developers an additional Snowpark runtime option with the capability to “manage, and scale containerized workloads (jobs, services, service functions) using secure Snowflake-managed infrastructure with configurable hardware options, such as GPUs. ”

Hex is an early user of the service. Its CEO and co-founder Barry McCardel describes Hex as a Figma for data or a Google Docs for data. The service allows data analysts and data scientists to work on one collaborative platform.

“We’re using it (SCS) to deploy our software, and it’s interesting because Hex today deploys on Amazon Web Services,” he said. “We’re going to be launching on GCP later this year. But the environment I’m actually most excited about is Snowflake, because I think that being able to run Hex workloads where the users customer data is without having to go through that additional process is very, very compelling.”

The business case for SCS comes down to compliance and governance, which people like McCardel cite as a Snowflake core value. With SCS, a company like Hex can offer customers a way to build applications on their data in Snowflake.

“If we were to try to transact with them outside of Snowflake Container Services, we could have months, years of security reviews and InfoSec stuff, because a lot of those bigger customers are just very cautious, on you know, whether they’re going to let some third party application connect to their data,” McCardel said. “So the way it’ll be for our customers is, it’ll actually just be super seamless. They’ll be able to use Hex on top of their existing data store with minimal effort, minimal overhead.”

With SCS, Snowflake may offer services that relieve much of the operations burdens that slowed teams in synchronizing data. Before SCS, the governed data that Snowflake managed would get moved off the platform to get containerized, creating an administrative burden.

SCS, built on Kubernetes, works in the background for users. It’s opinionated for the Snowflake environment. Customers may import their containers into SCS with little of the operations overhead that they had before.

“Containers were often a reason why customers were taking their data out of Snowflake and putting it somewhere else, oftentimes also introducing redundant copies of the data in let’s say, cloud storage somewhere,” said Torsten Grabs, a senior product manager with Snowflake. “And then they had containerized compute run over that data to do some processing. And then sometimes the results came back into Snowflake. But as soon as you create these redundant copies of your data, then it’s very hard to maintain them over time to keep them in sync. What is your version of truth? Is data governance managed in a consistent way in all these places? So it is much simpler if you can bring the work that these containers are doing to where you have the data and then apply the processing to the data where it sits. That was one of the key motivating factors for us to bring containers over into its platform.”

And here’s what’s key. Customers would often need to orchestrate with containers elsewhere where GPUs, such as AI and machine learning workloads, fulfilled their computational needs. So if the user required GPU-backed computations, then they had to save the data in Snowflake and get it to where they could perform the computing.

Snowflake will build on approaches similar to offering data warehouse services when the company started.

“When you’re going to create one of these services, you get to specify the instance that you want to run,” said Christian Kleinerman, senior vice president of product at Snowflake, in a conversation at the Snowflake Summit. “It’s our own mapping of a reduced selection of logical instances. So you will be able to say high memory, low memory, GPU, non-GPU, and then we’ll map it to the right instance on each of the three cloud providers.”

SCS opens opportunities for the use of foundation models in Snowflake.

“One is you bring your own model or you take in various models and you run a container,” Kleinerman said,” You do the heavy lifting or the other one is using some third party. They can do a configuration, and publish that as a native app. And then the Snowflake customer that wants to just be a consumer, could see just a function. And now I know nothing about AI or ML or containers. And I am using foundation models.”

The Nvidia Connection

NeMO has two main components, Kleinerman said. It comes with certain models trained by Nvidia. It comes as an entire framework, including APIs and a user interface. It helps train a model from scratch or fine-tune it with data fed into the model. The NeMO framework will get hosted inside SCS. NeMo itself comes as a container, allowing model portability into SCS.

Models may get imported and then built on top of NeMO. For example, Snowflake announced Reka as a partner. Reka, which just launched, makes generative models. AI21Labs is also a foundation model partner.

Kari Anne Briski is Nvidia’s vice president of AI Software. Briski said Nvidia is ahead of almost everyone in model development. Snowflake used the Snowflake Summit to announce it will use Nvidia’s GPUs and its training models for developers to build generative AI applications. Briski said Snowflake customers may use large foundation models built on Nvidia’s offerings.

Briski traces her work at Nvidia as a timeline of AI development, illustrating how Snowflake will benefit from Nvidia’s research. Seven years ago, Nvidia accelerated computer vision on a single GPU. Today, Nvidia uses thousands of GPUs to train its foundation models.

Briski said it still takes weeks to months to train a foundation model. By offering pre-trained models, Briski said the will use far less compute.

A team may customize the model at runtime, using “zero shots or a few shots,” learning, which provides ways to provide answers with little data provided, she said.

“So you can send in prompts, a couple of examples, and prompts at runtime to help it customize and make “‘Oh, I know what you’re talking about.’ “Now, I’m going to follow your lead.”

The option of prompt tuning or parameter-efficient fine-tuning (PEFT) allows people to use dozens or hundreds of examples.

“We train a smaller model that the large language model uses so you have this customization model,” Briski said. “We can have hundreds or thousands of customization models.”

According to a Hugging Face blog post, PEFT means the user only fine-tunes a small number of (extra) model parameters while freezing most parameters of the pre-trained LLMs, thereby greatly decreasing the computational and storage costs.”

All the weights across the network may also change, but that comes with more intensive computing requirements.

LLMs may become vulnerable to hallucinations if used in isolation, which makes a case for vector databases.

“Again, you don’t want to just think of the LLM by itself,” Briski said. “You might think of it as an entire system. You also do fine-tuning for these. So there’s a retriever model, kind of like you look up in a database.”

But overall, the concept of containers, LLMs, and GPUs means faster capabilities, more robust offerings, and the realization that we now can talk to our data, which signals a new age, Huang said in a fireside chat with Snowflake CEO Frank Slootman.

“We’re all going to be intelligence manufacturers in the future,” Huang said. “We will hire employees, of course, and then we will create a whole bunch of agents. And these agents could be created with Langhain or something like that, which connects models, knowledge bases, and other AIs that you deploy in the cloud and connect to all the Snowflake data. And you’ll operate these AIs at scale. And you’ll continuously refine these AIs. And so every one of us is going to be manufacturing AI, so we’re going to be running the AI factories.”

Disclosure: Snowflake paid for the reporter’s airfare and hotel to attend Snowflake Summit.

Alex Williams is founder and publisher of The New Stack. He's a longtime technology journalist who did stints at TechCrunch, SiliconAngle and what is now known as ReadWrite. Alex has been a journalist since the late 1980s, starting at the...