Technical deep-dive · EPC architecture

    Ultra-low-latency architecture for EPC catalogs with millions of rows

    Technical guide for CTOs, architects and tech leads who need to answer a VIN→OEM query in <200 ms against a 60M+ row catalog, with fan-out to 8 different OEMs and live SSPL. Schemas, indexes, caching and patterns that work in production.

    14-minute read · June 2026 · By the Efficiency IT Services technical team

    Technical TL;DR

    • Main catalog: el motor relacional partitioned by OEM with composite indexes (wmi, vds_prefix, model_year_range).
    • Latency target: P50 <150 ms, P95 <400 ms for decode + parts in a single request.
    • Fan-out to OEM EPCs with strict 800 ms timeout and graceful degradation to cache.
    • Redis L2 cache with composite key (vin, category, sspl_applied) and 7-day TTL — typical hit rate 78-85%.
    • Python worker pool with asyncio + connection pool to the database (pooler en transaction mode) to sustain 10k req/min without saturating.

    The problem: latency on a 60M+ row catalog

    A realistic EPC catalog has between 40 and 80 million rows: every combination of make × model × year × engine × plant × category × part_number is a row. Multiplied by 8 OEMs and historical SSPL supersessions, you easily hit 60-80M rows in the main table before talking joins.

    A naïve query like `SELECT * FROM parts WHERE vin_prefix = ? AND category = ?` over that table, without proper indexes and partitioning, takes between 1.2 and 3.5 seconds on reasonable hardware (8 vCPU, 32 GB RAM, NVMe SSD). That's 6-15x slower than the <250 ms total target for an HTTP request that includes decode + match + SSPL + JSON serialization.

    The realistic production target we chase is P50 <150 ms and P95 <400 ms including round-trip to the official OEM EPC when needed. How you get there is the difference between an API that serves 100 B2B customers and one that collapses on the first end-of-month spike.

    Data schema: partition by OEM or die trying

    The first classic mistake is having a single `parts` table with an `oem` discriminator column. This doesn't scale because (1) the relational engine's optimizer doesn't use implicit partitioning and (2) any query hits the full index even if it only cares about one OEM.

    The fix is particionamiento declarativo nativo del motor by LIST on the `oem` column. One partition per brand (toyota, nissan, honda, suzuki, yamaha, bajaj, jlr, lexus) plus a default partition for new OEMs. Each partition lands at ~5-10M rows, fully manageable size.

    • CREATE TABLE parts (...) PARTITION BY LIST (oem);
    • CREATE TABLE parts_toyota PARTITION OF parts FOR VALUES IN ('toyota', 'lexus');
    • Each partition has its own indexes, its own VACUUM and can be moved to separate tablespaces if it grows differently.
    • The optimizer does automatic partition pruning: a query with WHERE oem = 'toyota' only hits parts_toyota.
    • This shrinks effective index size ~8x and drops base SELECT-by-VIN latency from ~800 ms to ~120 ms.

    Minimum partitioned relational database setup for EPC catalog

    Steps to bootstrap the partitioned schema on a clean un motor relacional moderno instance.

    1. 1

      Create partitioned root table

      CREATE TABLE parts (id bigserial, oem text NOT NULL, wmi char(3), vds char(6), model_year smallint, category text, oem_part_number text, supersedes text[], data jsonb) PARTITION BY LIST (oem);

    2. 2

      Create per-OEM partitions

      CREATE TABLE parts_toyota PARTITION OF parts FOR VALUES IN ('toyota','lexus'); CREATE TABLE parts_nissan PARTITION OF parts FOR VALUES IN ('nissan'); CREATE TABLE parts_default PARTITION OF parts DEFAULT;

    3. 3

      Composite indexes per partition

      CREATE INDEX ON parts_toyota (wmi, model_year, category) INCLUDE (oem_part_number, supersedes); Repeat for each partition.

    4. 4

      Configure a connection pooler in transaction mode

      pool_mode = transaction; max_client_conn = 1000; default_pool_size = 25 (per DB). This sustains 10k+ req/min on 25 real conexiones a la base.

    5. 5

      Enable pg_stat_statements + auto_explain

      shared_preload_libraries = 'pg_stat_statements,auto_explain'; auto_explain.log_min_duration = '200ms'. Slow queries get logged with plan for analysis.

    VIN-aware composite indexes (not the obvious ones)

    After partitioning, the second bottleneck is indexes. The natural reflex is to index by full `vin`, but that's counterproductive: a VIN is unique per vehicle, not per part number, so a B-tree on `vin` doesn't help the real join you need.

    The indexes that work in production are composite on VIN prefixes. Three indexes cover 95% of queries: (wmi, model_year, category), (wmi, vds_prefix6, category), (oem, model_family, model_year_range, category). The first captures matches by manufacturer + year + category; the second refines with the 6 VDS chars for specific variants; the third supports queries from users who know the model but don't have a VIN.

    • CREATE INDEX idx_parts_wmi_year_cat ON parts_toyota (wmi, model_year, category) INCLUDE (oem_part_number, supersedes);
    • CREATE INDEX idx_parts_wmi_vds6_cat ON parts_toyota (wmi, substr(vds, 1, 6), category);
    • CREATE INDEX idx_parts_family_year ON parts_toyota (model_family, model_year_range, category);
    • Use INCLUDE for covering indexes: SELECT answers without touching the table, ~40% latency reduction.
    • Avoid indexes on full vin: 3-5x more space without speeding real queries.

    Fan-out to multiple OEM EPCs without killing P95

    When a VIN isn't cached or the category requires live data (prices, OEM stock), you have to hit the manufacturer's official EPC. Each OEM has its own latency: Toyota TIS responds in 400-700 ms, Nissan FAST in 600-1100 ms, Suzuki in 800-1500 ms. Doing these calls serially destroys P95.

    The right pattern is controlled fan-out with `asyncio.gather` (Python) or `Promise.all` (Node), with a strict shared timeout — typically 800 ms — and graceful degradation: if the OEM doesn't respond in time, we return what's in cache or a partial response with a `partial: true` flag and an async follow-up.

    • Hard 800 ms timeout per OEM, not cumulative: client waits 800 ms max even when querying 3 OEMs.
    • Persistent HTTP connection pool per OEM (keep-alive) — cuts TLS overhead ~60 ms per request.
    • Circuit breaker per OEM: if Nissan FAST throws >30% errors in 60s, stop querying it and respond from cache for 5 min.
    • Background refresh: queries that returned stale cache (>1h) trigger an async worker to refresh without blocking response.
    • Key metric: timeout rate per OEM. If it tops 5%, escalate to OEM support before SLA suffers.

    L2 cache: the difference between 200 ms and 50 ms

    For an API receiving 10k req/min, cache hit rate >80% is the difference between needing 4 réplicas de lectura and needing 1. L2 cache (Redis or equivalent) is mandatory, not optional.

    The cache key has to be semantically correct: many teams cache by full `vin` and are surprised when hit rate is 12%. The key that works is composite: `parts:{vin}:{category}:{sspl_v}`. The `sspl_v` suffix is the current SSPL table version — when the OEM publishes a new supersession, it increments and invalidates all relevant entries without a FLUSHALL.

    • Layered TTL: hot cache 5 min (identical queries in short window), warm cache 7 days (VIN decodings), cold cache 30 days (historical catalog).
    • LZ4 compression inside Redis: 8-15 KB JSON payloads drop to 1.5-3 KB, ~5x capacity multiplier without extra RAM.
    • Read-through with singleflight: if 200 requests ask for the same uncached VIN simultaneously, only 1 hits the backend and the other 199 wait for the same result.
    • Mandatory metric: hit rate per category. Categories with <60% hit rate usually have cache key bugs.
    • Avoid in-process memory cache (local LRU): doesn't scale horizontally and creates inconsistency across replicas.

    Latency by cache strategy (measured over 10k req)

    StrategyP50P95P99Hit rate
    No cache320 ms780 ms1.4 s0%
    In-process LRU180 ms650 ms1.2 s35%
    Redis without singleflight95 ms420 ms850 ms68%
    Redis + singleflight + TTL jitter48 ms180 ms380 ms83%
    Redis + L3 (edge CDN)22 ms95 ms240 ms91%

    SSPL pipeline in background, not in request path

    Applying SSPL supersessions in the request path is the most common mistake and the one that kills latency the most. Each OEM publishes SSPL deltas at different frequencies (Toyota monthly, Nissan quarterly, Honda near-live). Resolving a supersession chain can involve 2-4 recursive lookups.

    The fix is a background worker (Celery, Sidekiq, Temporal, BullMQ) that consumes OEM SSPL feeds, computes the resolved chain (`old_part → ... → current_part`) and materializes the result in a `sspl_resolved` table with the current version. API queries read that materialized table in a single O(1) lookup, no live chain walking.

    Observability: measure percentiles, not averages

    Average latency is the most misleading metric. An API with 95% requests at 80 ms and 5% at 4 seconds shows a 280 ms average — "everything's fine" — but those 5% are making your B2B customers trigger the SLA and demand compensation.

    Metrics that matter: P50, P95 and P99 per endpoint, per OEM and per category. And the most underrated metric: P95 queue time (how many ms a request waits before processing starts). If queue P95 tops 50 ms, your worker pool is undersized.

    • Mandatory per-endpoint metrics: P50/P95/P99 latency, RPS, error rate, cache hit rate.
    • Per-OEM metrics: average EPC latency, timeout rate, circuit breaker state, payload size.
    • Distributed tracing (OpenTelemetry, Jaeger): lets you see where time goes on a specific slow request.
    • Alert on P95, not average: your B2B customer feels P95, not average.
    • Separate dashboards per top customer: SLAs are contractual and each VIP customer deserves its view.

    Anti-patterns that destroy latency in production

    Mistakes we see repeated in architecture reviews — all fixable, all costly once in production:

    • ORM without prefetch: each part_number triggers an extra SELECT for relations. 50 parts = 51 queries. Fix: select_related / explicit JOIN.
    • Conexiones sin pool: opening TLS + auth on every query adds 30-80 ms. Mandatory fix: a connection pooler in transaction mode.
    • Synchronous logging inside the request handler: each `logger.info(...)` can add 5-15 ms if destination is slow. Fix: async log queue.
    • JSON serialization with the default lib: orjson or ujson are 3-5x faster on large payloads.
    • Schema validation inside the endpoint: do it at the gateway/edge (Cloudflare, AWS API GW) — frees the backend from queries with clearly invalid VINs.
    • Cache without staggered TTL: if everything expires at the same time, the next second every worker hits the DB. Fix: TTL with ±10% jitter.

    Want to see this architecture applied to your real volume?

    Book a 60-min technical review with our team. We go through your case, estimated volume and infra constraints. No hard sell.

    Frequently Asked Questions

    Everything you need to know before getting started.