Retail

How inference at the edge unlocks new AI use cases for retailers

January 13, 2025

Mike Ensor

Tech Lead Google Distributed Cloud, Google

Join us at Google Cloud Next

April 9-11 in Las Vegas

For retailers, making intelligent, data-driven decisions in real-time isn’t an advantage — it's a necessity. Staying ahead of the curve means embracing AI, but many retailers hesitate to adopt because it’s costly to overhaul their technology. While traditional AI implementations may require significant upfront investments, retailers can leverage existing assets to harness the power of AI.

These assets, ranging from security cameras to point-of-sale systems, can unlock store analytics, faster transactions, staff enablement, loss prevention, and personalization — all without straining the budget. In this post, we’ll explore how inference at the edge, a technique that runs AI-optimized applications on local devices without relying on distant cloud servers, can transform retail assets into powerful tools.

In this blog, we will explore how retailers can leverage inference at the edge to enhance their operations and customer experience, including how retailers can leverage the power of cloud infrastructure, AI, and edge hardware powered by Intel Xeon processors with Google Distributed Cloud.

How retailers can build an AI foundation

Retailers can find assets to fuel their AI in all corners of the business. You can unlock employee productivity by transforming your vast repository of handbooks, training materials, and operational procedures into working assets for AI.

Digitized manuals for store equipment, human resources, loss prevention, and domain-specific information can also be combined with agent-based AI assistants to provide contextually aware “next action assistants”. By extending AI optimized applications from the cloud to the edge, retail associates can now ask their AI assistant, “What do I do next?” with a detailed and fast response tailored to the retail associate's question.

Edge processing power decision point: CPU vs GPU

Next, we’ll explore the critical decision on the right hardware to power your applications. The two primary options are CPUs (Central Processing Units) and GPUs (Graphics Processing Units), each with its own strengths and weaknesses. Making the informed choice requires understanding your specific use cases and balancing performance requirements, bandwidth, and model processing with cost considerations. Consider this chart to guide your decision-making process, especially when choosing between deploying at a regional DC or at the edge.

Decision matrix (chart):

Feature	CPU	GPU	Use cases (examples)
Cost	Lower	Higher	Basic analytics, people counting, simple object detection
Performance	Required; Good for general-purpose tasks	Optional; Good for parallel processing	Complex AI, video analytics, high-resolution image processing, ML model training
Power consumption	Lower	Higher	Remote locations, small form-factor devices
Latency	Moderate	Lower (for parallel tasks)	Real-time applications, immediate insights
Deployment location	Edge or Regional DC	Typically Edge, but feasible in Regional DC	Determined by latency, bandwidth, and data processing needs

Key decision criteria for retail decision makers

Complexity of AI models: Retail use case focused AI models, like basic object detection, can often run efficiently on CPUs. More complex models, such as those used for real-time video analytics or personalized recommendations with large datasets, typically require the parallel processing power of GPUs.
Data volume and velocity: If you're processing large amounts of data at high speed, a GPU may be necessary to keep up with the demand. For smaller datasets and lower throughput, a CPU may suffice.
Latency requirements: For use cases requiring ultra-low latency, such as real-time fraud detection, GPUs can provide faster processing, especially when located at the edge, closer to the data source. However, network latency between the edge and a regional DC might negate this benefit if the GPU is located regionally.
Budget: GPUs usually have a higher price tag than CPUs. Carefully consider your budget and the potential ROI of investing in GPU-powered solutions before making a decision. Start with CPU-based solutions where possible and upgrade to GPUs only when absolutely necessary.
Power consumption: GPUs generally consume more power than CPUs. This is an important factor to consider for edge deployments, especially in locations with limited power availability. This is less of a concern if deploying at a regional DC where power and cooling are centralized.
Deployment location: The proximity of the processing power to the data source has major implications for latency. Deploying at the edge (in-store) minimizes latency for real-time use cases. Regional DCs introduce network latency, making them less suitable for applications requiring immediate action. However, certain tasks requiring heavy compute but not low latency (e.g., nightly inventory analysis) might be better suited for a regional DC where resources can be pooled and managed centrally.

Remember, not all AI and ML require new investments in emerging technology. Many AI/ML based use cases can produce the desired outcome without using a GPU. For example, consider visual inspection for storage analytics and fast check out referenced in the Google Distributed Cloud Price-a-Tray interactive game. The inference is performed at 5FPS, while the video stream continues to run at 25FPS. The bounding boxes are then drawn on top of the returned information rather than having one system perform the video stream, detection and bounding boxes. This enables more efficient use of the CPU since many of the actions in this example can be split across cores and threads.

But there are cases when GPUs do make sense. When very high precision is required, GPUs are often needed as the drop in fidelity to quantize a model may reduce the quality beyond acceptable thresholds. In the example of tracking an item, if millimeter movement accuracy is required, 5FPS would not be sufficient on a reasonably fast moving item and a GPU would likely be required.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_5Xp0EUJ.max-2000x2000.png

There is a middle between GPUs and CPUs—the world of speciality accelerators. Accelerators come in the form of peripherals to a system or as special instruction sets to a CPU. CPUs are being manufactured with advanced matrix multiplication math assisting tensor manipulation on-chip, greatly improving performance of ML and AI models. One concrete example is running models compiled for OpenVINO. In addition, Google Distributed Cloud (GDC) Server and Rack editions utilize Intel Xeon processors, an architecture designed to be more flexible, supporting matrix math improving the performance of ML models on CPU over traditional ML model service serving.

Bring AI to your business

By tapping into the power of existing infrastructure and deploying AI at the edge, retailers can deliver modern customer experiences, streamline operations, and unlock employee productivity.

Learn more about how to transform your retail brand with Google Distributed Cloud.

Posted in

Databases

Where’s the beef? For São Paulo’s agricultural secretariat, it’s on Cloud SQL for SQL Server

By Michel Martins da Silva • 6-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/1-Credits_Ben_Hassett__Myrtille_Revemont__He.max-700x700.png

AI & Machine Learning

How L’Oréal's tech accelerator built its end-to-end MLOps platform

By Moutia Khatiri • 9-minute read

Retail

Empowering retailers with AI for commerce, marketing, supply chains, and more

By Paul Tepfenhart • 6-minute read

Developers & Practitioners

How to build dynamic web experiences with Conversational Agents

By Wei Yih Yap • 3-minute read