Why a relational engine and not MongoDB / DynamoDB for this case?

EPC catalogs have strong relations: VIN ↔ vehicle, vehicle ↔ parts, part ↔ supersession, part ↔ supplier. These relations cross in almost every query. A relational engine with partitioning and composite indexes gives better P95 latency than MongoDB for joins, and is 3-5x cheaper to operate than DynamoDB at this volume. MongoDB makes sense if your access is 100% lookup by unique key, which isn't the case.

Is serverless (Lambda, Cloud Run) viable for this?

For the API layer (gateway + logic), yes — Cloud Run with min instances and Lambda with provisioned concurrency work well. For the relational database and Redis we DO NOT recommend serverless: short-lived connections destroy latency (cold start adds 80-200 ms) and per-query costs become unpredictable. Traditional hosting (RDS, Aurora, or managed SQL) is 4-7x cheaper at this volume.

What does this architecture cost to run on AWS or GCP?

To sustain 10k req/min with P95 <400 ms: instancia gestionada SQL r6g.2xlarge ($550/mo) + ElastiCache Redis cache.r6g.large ($240/mo) + 3 API instances on ECS Fargate ($360/mo) + ALB + CloudWatch + egress ≈ $1,500-$2,000 USD/month. Self-hosted on bare metal or Hetzner: $300-$600 USD/month for the same throughput.

How do you handle OEMs changing their EPC API without notice?

Nightly integration test suite that validates response structure of each EPC against a versioned schema. When an OEM changes something (field renamed, type changed), the test fails and a Slack alert fires before production breaks. We also maintain a per-OEM adapter layer to isolate changes — updating the adapter doesn't require touching the rest of the system.

How hard is it to migrate from an in-house solution to this API?

Pretty reasonable. Standard integration is 1-2 weeks: 3-4 days to authenticate against sandbox and map the response to your internal schema, 1 week to replace the in-house code behind a feature flag with rollback. We maintain compatibility with standard ACES/PIES schemas, so if your ERP already consumes ACES catalog, the swap is almost transparent.

Do you support self-hosted deployment on my own Kubernetes?

Yes — for customers with data sovereignty requirements (Italy, Germany, Brazil, defense) we offer self-hosted deployment on your cluster with an official Helm chart. The EPC layer still runs on our infra (we hold the legal agreements with OEMs), but all cache, metrics and end-customer data live in your network.

Ultra-low-latency architecture for EPC catalogs with millions of rows

The problem: latency on a 60M+ row catalog

A realistic EPC catalog has between 40 and 80 million rows: every combination of make × model × year × engine × plant × category × part_number is a row. Multiplied by 8 OEMs and historical SSPL supersessions, you easily hit 60-80M rows in the main table before talking joins.

A naïve query like `SELECT * FROM parts WHERE vin_prefix = ? AND category = ?` over that table, without proper indexes and partitioning, takes between 1.2 and 3.5 seconds on reasonable hardware (8 vCPU, 32 GB RAM, NVMe SSD). That's 6-15x slower than the <250 ms total target for an HTTP request that includes decode + match + SSPL + JSON serialization.

The realistic production target we chase is P50 <150 ms and P95 <400 ms including round-trip to the official OEM EPC when needed. How you get there is the difference between an API that serves 100 B2B customers and one that collapses on the first end-of-month spike.

Data schema: partition by OEM or die trying

The first classic mistake is having a single `parts` table with an `oem` discriminator column. This doesn't scale because (1) the relational engine's optimizer doesn't use implicit partitioning and (2) any query hits the full index even if it only cares about one OEM.

The fix is particionamiento declarativo nativo del motor by LIST on the `oem` column. One partition per brand (toyota, nissan, honda, suzuki, yamaha, bajaj, jlr, lexus) plus a default partition for new OEMs. Each partition lands at ~5-10M rows, fully manageable size.

CREATE TABLE parts (...) PARTITION BY LIST (oem);
CREATE TABLE parts_toyota PARTITION OF parts FOR VALUES IN ('toyota', 'lexus');
Each partition has its own indexes, its own VACUUM and can be moved to separate tablespaces if it grows differently.
The optimizer does automatic partition pruning: a query with WHERE oem = 'toyota' only hits parts_toyota.
This shrinks effective index size ~8x and drops base SELECT-by-VIN latency from ~800 ms to ~120 ms.

Minimum partitioned relational database setup for EPC catalog

Steps to bootstrap the partitioned schema on a clean un motor relacional moderno instance.

1
Create partitioned root table
CREATE TABLE parts (id bigserial, oem text NOT NULL, wmi char(3), vds char(6), model_year smallint, category text, oem_part_number text, supersedes text[], data jsonb) PARTITION BY LIST (oem);
2
Create per-OEM partitions
CREATE TABLE parts_toyota PARTITION OF parts FOR VALUES IN ('toyota','lexus'); CREATE TABLE parts_nissan PARTITION OF parts FOR VALUES IN ('nissan'); CREATE TABLE parts_default PARTITION OF parts DEFAULT;
3
Composite indexes per partition
CREATE INDEX ON parts_toyota (wmi, model_year, category) INCLUDE (oem_part_number, supersedes); Repeat for each partition.
4
Configure a connection pooler in transaction mode
pool_mode = transaction; max_client_conn = 1000; default_pool_size = 25 (per DB). This sustains 10k+ req/min on 25 real conexiones a la base.
5
Enable pg_stat_statements + auto_explain
shared_preload_libraries = 'pg_stat_statements,auto_explain'; auto_explain.log_min_duration = '200ms'. Slow queries get logged with plan for analysis.

VIN-aware composite indexes (not the obvious ones)

After partitioning, the second bottleneck is indexes. The natural reflex is to index by full `vin`, but that's counterproductive: a VIN is unique per vehicle, not per part number, so a B-tree on `vin` doesn't help the real join you need.

The indexes that work in production are composite on VIN prefixes. Three indexes cover 95% of queries: (wmi, model_year, category), (wmi, vds_prefix6, category), (oem, model_family, model_year_range, category). The first captures matches by manufacturer + year + category; the second refines with the 6 VDS chars for specific variants; the third supports queries from users who know the model but don't have a VIN.

CREATE INDEX idx_parts_wmi_year_cat ON parts_toyota (wmi, model_year, category) INCLUDE (oem_part_number, supersedes);
CREATE INDEX idx_parts_wmi_vds6_cat ON parts_toyota (wmi, substr(vds, 1, 6), category);
CREATE INDEX idx_parts_family_year ON parts_toyota (model_family, model_year_range, category);
Use INCLUDE for covering indexes: SELECT answers without touching the table, ~40% latency reduction.
Avoid indexes on full vin: 3-5x more space without speeding real queries.

Fan-out to multiple OEM EPCs without killing P95

When a VIN isn't cached or the category requires live data (prices, OEM stock), you have to hit the manufacturer's official EPC. Each OEM has its own latency: Toyota TIS responds in 400-700 ms, Nissan FAST in 600-1100 ms, Suzuki in 800-1500 ms. Doing these calls serially destroys P95.

The right pattern is controlled fan-out with `asyncio.gather` (Python) or `Promise.all` (Node), with a strict shared timeout — typically 800 ms — and graceful degradation: if the OEM doesn't respond in time, we return what's in cache or a partial response with a `partial: true` flag and an async follow-up.

Hard 800 ms timeout per OEM, not cumulative: client waits 800 ms max even when querying 3 OEMs.
Persistent HTTP connection pool per OEM (keep-alive) — cuts TLS overhead ~60 ms per request.
Circuit breaker per OEM: if Nissan FAST throws >30% errors in 60s, stop querying it and respond from cache for 5 min.
Background refresh: queries that returned stale cache (>1h) trigger an async worker to refresh without blocking response.
Key metric: timeout rate per OEM. If it tops 5%, escalate to OEM support before SLA suffers.

L2 cache: the difference between 200 ms and 50 ms

For an API receiving 10k req/min, cache hit rate >80% is the difference between needing 4 réplicas de lectura and needing 1. L2 cache (Redis or equivalent) is mandatory, not optional.

The cache key has to be semantically correct: many teams cache by full `vin` and are surprised when hit rate is 12%. The key that works is composite: `parts:{vin}:{category}:{sspl_v}`. The `sspl_v` suffix is the current SSPL table version — when the OEM publishes a new supersession, it increments and invalidates all relevant entries without a FLUSHALL.

Layered TTL: hot cache 5 min (identical queries in short window), warm cache 7 days (VIN decodings), cold cache 30 days (historical catalog).
LZ4 compression inside Redis: 8-15 KB JSON payloads drop to 1.5-3 KB, ~5x capacity multiplier without extra RAM.
Read-through with singleflight: if 200 requests ask for the same uncached VIN simultaneously, only 1 hits the backend and the other 199 wait for the same result.
Mandatory metric: hit rate per category. Categories with <60% hit rate usually have cache key bugs.
Avoid in-process memory cache (local LRU): doesn't scale horizontally and creates inconsistency across replicas.

Latency by cache strategy (measured over 10k req)

Strategy	P50	P95	P99	Hit rate
No cache	320 ms	780 ms	1.4 s	0%
In-process LRU	180 ms	650 ms	1.2 s	35%
Redis without singleflight	95 ms	420 ms	850 ms	68%
Redis + singleflight + TTL jitter	48 ms	180 ms	380 ms	83%
Redis + L3 (edge CDN)	22 ms	95 ms	240 ms	91%

SSPL pipeline in background, not in request path

Applying SSPL supersessions in the request path is the most common mistake and the one that kills latency the most. Each OEM publishes SSPL deltas at different frequencies (Toyota monthly, Nissan quarterly, Honda near-live). Resolving a supersession chain can involve 2-4 recursive lookups.

The fix is a background worker (Celery, Sidekiq, Temporal, BullMQ) that consumes OEM SSPL feeds, computes the resolved chain (`old_part → ... → current_part`) and materializes the result in a `sspl_resolved` table with the current version. API queries read that materialized table in a single O(1) lookup, no live chain walking.

Observability: measure percentiles, not averages

Average latency is the most misleading metric. An API with 95% requests at 80 ms and 5% at 4 seconds shows a 280 ms average — "everything's fine" — but those 5% are making your B2B customers trigger the SLA and demand compensation.

Metrics that matter: P50, P95 and P99 per endpoint, per OEM and per category. And the most underrated metric: P95 queue time (how many ms a request waits before processing starts). If queue P95 tops 50 ms, your worker pool is undersized.

Mandatory per-endpoint metrics: P50/P95/P99 latency, RPS, error rate, cache hit rate.
Per-OEM metrics: average EPC latency, timeout rate, circuit breaker state, payload size.
Distributed tracing (OpenTelemetry, Jaeger): lets you see where time goes on a specific slow request.
Alert on P95, not average: your B2B customer feels P95, not average.
Separate dashboards per top customer: SLAs are contractual and each VIP customer deserves its view.

Anti-patterns that destroy latency in production

Mistakes we see repeated in architecture reviews — all fixable, all costly once in production:

ORM without prefetch: each part_number triggers an extra SELECT for relations. 50 parts = 51 queries. Fix: select_related / explicit JOIN.
Conexiones sin pool: opening TLS + auth on every query adds 30-80 ms. Mandatory fix: a connection pooler in transaction mode.
Synchronous logging inside the request handler: each `logger.info(...)` can add 5-15 ms if destination is slow. Fix: async log queue.
JSON serialization with the default lib: orjson or ujson are 3-5x faster on large payloads.
Schema validation inside the endpoint: do it at the gateway/edge (Cloudflare, AWS API GW) — frees the backend from queries with clearly invalid VINs.
Cache without staggered TTL: if everything expires at the same time, the next second every worker hits the DB. Fix: TTL with ±10% jitter.

Ultra-low-latency architecture for EPC catalogs with millions of rows

Technical TL;DR

On this page

The problem: latency on a 60M+ row catalog

Data schema: partition by OEM or die trying

Minimum partitioned relational database setup for EPC catalog

Create partitioned root table

Create per-OEM partitions

Composite indexes per partition

Configure a connection pooler in transaction mode

Enable pg_stat_statements + auto_explain

VIN-aware composite indexes (not the obvious ones)

Fan-out to multiple OEM EPCs without killing P95

L2 cache: the difference between 200 ms and 50 ms

Latency by cache strategy (measured over 10k req)

SSPL pipeline in background, not in request path

Observability: measure percentiles, not averages

Anti-patterns that destroy latency in production

Want to see this architecture applied to your real volume?

Frequently Asked Questions

Related reading

VIN → OEM Parts API

VIN Decoder API

Buy vs Build: parts catalog

Decode VIN to parts catalog