Back to blog
pimcachingapiperformancearchitecture

Caching product data from a PIM without serving stale content

May 8, 20269 min read

Anyone displaying product data from a PIM like Akeneo or Pimcore on a website eventually runs into the same dilemma. Query the PIM live, and your site is exactly as slow as your PIM's API at the busiest moment of the day. Cache the data, and you risk a visitor seeing a product that was discontinued yesterday. In this article we cover the patterns we use in practice to solve both problems at once.

Not all data is equal

The first step is recognizing that "product data" is not a homogeneous category. A typical product page combines data with very different shelf lives:

  • Name, description and specifications change rarely: cache long and invalidate on event.
  • Images and assets change rarely: CDN with a long TTL and versioned URLs.
  • Categorization and relations change sometimes: cache long and invalidate on event.
  • List prices change periodically: cache short or invalidate on event.
  • Stock changes continuously: cache for seconds at most, through a separate endpoint.
  • Customer-specific prices differ per request: never cache shared.

The biggest mistake we encounter is a uniform TTL across this entire spectrum. Your cache is then either too short for the static data (querying the PIM far more than needed) or too long for stock (selling products that aren't there).

TTL versus event-based invalidation

A TTL is a guess: you estimate how long data stays valid. Event-based invalidation removes the guessing. Virtually every modern PIM can push changes, through webhooks or a message queue. As soon as a product changes, you invalidate only the cache entries that product touches:

final class ProductUpdatedHandler
{
    public function __construct(
        private TagAwareCacheInterface $cache,
    ) {}
 
    public function __invoke(ProductUpdated $event): void
    {
        $this->cache->invalidateTags([
            'product-' . $event->sku,
            'category-' . $event->categoryId,
        ]);
    }
}

Cache tags are essential here. A product doesn't just live on its own detail page, but also in category overviews, search results and "related products" blocks. By tagging every cache entry with the SKUs it contains, you can invalidate precisely instead of flushing the entire cache.

The result: static product data can live in the cache for days, because as soon as something changes in the PIM, the cache is updated within seconds. You get the speed of aggressive caching and the freshness of live querying.

Stale-while-revalidate: never wait for the source

What happens when a cache entry has expired or been invalidated and a visitor requests the page? In a naive setup, that visitor waits until the PIM has responded. With a slow or unreachable source, your website becomes as unreliable as its weakest link.

The stale-while-revalidate pattern solves this: serve the expired version immediately and refresh in the background.

async function getProduct(sku: string): Promise<Product> {
  const cached = await cache.get(`product-${sku}`);
 
  if (cached && !cached.isStale) {
    return cached.value;
  }
 
  if (cached?.isStale) {
    void refreshInBackground(sku);
    return cached.value;
  }
 
  return await fetchAndCache(sku);
}

An added benefit: if the PIM has an outage, your website keeps running on the last known data. The source can go down, the storefront stays open.

Preventing cache stampedes

A classic pitfall on busy pages: a popular cache entry expires and a hundred concurrent requests all decide to query the source themselves. The PIM receives a hundred identical requests in one second and collapses anyway, precisely when traffic is at its peak.

The solution is a lock or "single flight": the first request refreshes, the rest wait for that result or receive the stale version.

$product = $this->cache->get('product-' . $sku, function (ItemInterface $item) use ($sku) {
    $item->tag(['product-' . $sku]);
    return $this->pimClient->fetchProduct($sku);
}, beta: 1.0);

Symfony's cache component has this built in through probabilistic early expiration (the beta parameter): entries are refreshed by a single request shortly before their actual expiry, so the crowd already finds the fresh version.

Stock and price: caching at the right layer

Stock and customer-specific prices don't belong in the same cache as product content. Our approach: render the page entirely from the product cache and fetch volatile data separately through a lightweight endpoint, which itself microcaches for a few seconds at most. The page is then served instantly from cache, while stock indication and price arrive asynchronously from an endpoint built for exactly that.

This also prevents the most persistent bug in these architectures: a "cached" customer-specific price shown to the wrong customer. Whatever differs per customer must simply never end up in a shared cache.

Make it measurable

A caching strategy without measurements is an opinion. The three numbers we put on a dashboard by default:

  • Hit ratio per cache layer: below 90% for product content usually means the invalidation is too coarse.
  • Staleness: the time between a change in the PIM and its visibility on the site. With event-based invalidation this should stay under a minute.
  • Origin load: the number of requests that actually reach the PIM. This number should stay flat when traffic doubles.

Conclusion

Caching PIM data well is not a matter of picking a TTL, but of classifying data by shelf life, invalidating on events instead of the clock, daring to serve stale data while refreshing in the background, and moving volatile data to its own layer. The result is a website that stays fast under peak load, stays current when things change and stays up when the source falters.

Need a product data integration?

We design integration layers that expose PIM data fast and reliably in webshops and web applications.

Get in touch