Kakao Brain: Accelerating large-scale natural language processing and AI development with Cloud TPU

About Kakao Brain

Established in 2017, Kakao Brain is a research and development company that develops AI-based technologies in various fields, such as natural language processing. The company recently released a Korean natural language processing model, KoGPT, to help further expand the use and value of AI.

Industries: Media & Entertainment

Location: Korea

Products: Google Cloud, Cloud TPU

Tell us your challenge. We're here to help.

Cloud TPU enables Kakao Brain to handle large-scale data processing quickly, building a concrete foundation for AI/ML experts to focus on advancing deep-learning models.

Google Cloud results

Massively reduces workload of processing large-scale data
Shortens task completion time from seven days to one day
Enables seamless large-scale system scalability

Learns six billion model parameters and 200 billion token data with TPU

In November 2021, Kakao Brain, an artificial intelligence R&D subsidiary of South Korean tech giant Kakao Corp., unveiled KoGPT. A large-scale deep learning-based natural language processing model, KoGPT was developed by adapting Generative Pre-trained Transformer 3 (GPT-3), the most widely used natural language processing model, to the Korean language.

When it comes to the English language, GPT-3 is already expanding the scope of application beyond simply translating words into text, by accurately reading a user’s intentions and writing letters, even software coding. This was not available for the Korean language because the process of creating a NLG machine learning model is labor intensive, with rapid learning of large-scale data required.

However, KoGPT was able to process six billion model parameters and 200 billion tokens, creating an artificial intelligence model that can understand Korean.

"Most of our AI-related research and projects are being conducted in a custom-built GPU-based cloud. However, with a growing number of tasks that require a larger AI model and more data to learn, we needed a system that could handle it seamlessly."

—Woonhyuk Baek, Large-Scale AI Research Scientist, Kakao Brain

Deploying a dedicated machine learning processor optimized for learning large-scale data

According to Woonhyuk Baek, Large-Scale AI Research Scientist at Kakao Brain, Google Cloud TPU plays an important part in accelerating the training process of KoGPT and its massive workloads.

Baek goes on to explain that understanding the characteristics of GPU (Graphical Processing Unit) and TPU (Tensor Processing Unit) is the most important starting point for proper utilization. Although TPU has strong AI data processing capabilities, simply replacing all AI systems with TPU immediately will not yield the results wanted.

TPU and GPU also have clear areas that complement each other. GPU has the advantages of being able to start a project quickly, and easily respond to a general environment, but it is not easy to scale. Meanwhile, TPU is easy to manage because it can receive resources in units of pods, and the communication speed between each node is fast, which is integral for large-scale data processing.

"Ever since we implemented TPU, we experienced zero network downtime, which we often faced when using GPU servers."

—Woonhyuk Baek, Large-Scale AI Research Scientist, Kakao Brain

Seamless massive-scale data processing without downtime

Unlike on-premise and cloud compatible GPU, Cloud TPU was born to accelerate machine learning workloads within the Google Cloud ecosystem. Baek says on-demand TPU devices and pod slices provided ease with workload management, adding that fast networking speeds between TPU nodes made data processing seamless.

"Ever since we implemented Cloud TPU, we experienced zero network downtime, which we often faced when using GPU servers," says Baek.

According to Baek, the performance level is in proportion with the number of TPU circuits. For example, if it takes four weeks for a specific task involving data processing with 32 TPU pod slices, it takes only one week to do the same task with 128 TPU pod slices. Similarly, it shortens data processing time if the user uses v3 pod slices with a larger memory.

"When I trained KoGPT with v3-32, it took a week. But as soon as I upgraded the pods to v3-512, it only took one day. The v3-512's pricing is nearly sixteen times more costly, but it sped up the task completion time by almost 14 times. As a result, there is almost no difference between the costs of using either v3-32 or v3-512," says Baek.

"If TPU resources are properly allocated, it will greatly maximize performance levels and train ML models quickly at a reasonable cost. You can then forecast how long it will take for specific tasks and move up project deadlines accordingly. AI project managers are always racing against time. Agility in project management frees up time to focus on making KoGPT smarter."

Driving AI research and development further

Kakao Brain appreciates the flexibility and reliability that Google Cloud TPU provides. "Cloud TPU is a Google Cloud specific hardware accelerator, so we needed time to understand how data is processed inside the circuits and which part of the processing pipeline saw dips in performance efficiency," says Baek. "But the latest product updates like TPU Virtual Machine and performance maximizer TPU v4 chips, improves the environment for massive workloads and data profiling."

"Thanks to the continuous development built around Google Cloud products, we were able to focus exclusively on AI research and paper publications within a couple of months. Without Cloud TPU, it would have taken several years to complete all these tasks."

Baek shares that the current infrastructure allows Kakao Brain to draw a clearer product development roadmap for their next goal, which is to build a multi-modal learning model that perceives image and audio data.

"We would like to challenge ourselves to make KoGPT smart enough to be self-learning. We intend to work on more challenging projects like saving trained models to enable memorizing of learned datasets, and draw generalizations. We will ultimately work on training the models to transfer the knowledge across different tasks," says Baek.

"Many professionals in our field still find it extremely challenging to apply AI to real world problems. But we believe that with Google Cloud, Kakao Brain will be able to accelerate the holistic development process of our deep learning models. We look forward to bringing our ideas to life soon," concludes Baek.

"Thanks to the continuous development built around Google Cloud products, we were able to focus exclusively on AI research and paper publications within a couple of months. Without TPU, it would have taken several years to complete all these tasks."

—Woonhyuk Baek, Large-Scale AI Research Scientist, Kakao Brain

Tell us your challenge. We're here to help.

About Kakao Brain

Industries: Media & Entertainment

Location: Korea

Google Cloud Cloud TPU