What is a pluggable connector framework in an integration platform?

A pluggable connector framework defines a standard interface (e.g., connect, read, write, disconnect) that each integration adapter must implement. Connectors are packaged as isolated modules (JAR, Docker sidecar, or WASM plugin) loaded at runtime. The platform discovers them via a registry, injects credentials from a secrets store, and routes data through them without coupling core logic to any specific third-party API.

How does declarative data mapping work in an integration platform?

Declarative mapping uses a schema definition (JSON Schema, Avro, or a custom DSL) to describe how fields from a source schema correspond to fields in the target schema, including type coercions and default values. The mapping engine interprets these definitions at runtime, so changing a field mapping requires only a config update rather than a code deploy.

How is a transformation pipeline structured in an integration platform?

Transformations are composed as an ordered chain of stateless functions applied to each record: filter, map, enrich (external lookup), aggregate, and validate. Each step is independently testable. The pipeline is defined in config, executed by a stream processor, and can be branched for fan-out to multiple destinations or merged for fan-in from multiple sources.

How is dead-letter handling implemented in an integration platform?

Records that fail transformation or delivery after max retries are routed to a dead-letter queue (DLQ) partitioned by integration ID. Each DLQ record preserves the original payload, the error reason, and retry history. Operators can inspect, correct, and replay DLQ records via an admin UI. Alerts fire when DLQ depth crosses a threshold, preventing silent data loss.

Integration Platform Low-Level Design: Connector Framework, Data Mapping, and Error Handling

⏱ 5 min read

Integration Platform: Low-Level Design

An integration platform connects disparate systems by providing pluggable connectors, a declarative data mapping engine, a transformation pipeline, and robust error handling including dead-letter queuing. It enables non-engineers to wire together SaaS products and internal services without writing custom glue code for each pair of systems.

Requirements

Functional

Connect source and destination systems via pluggable connectors (REST, GraphQL, database, file, message queue)
Define data mappings declaratively: field rename, type cast, computed fields, conditional logic
Execute transformation pipelines with ordered steps: filter, enrich, split, aggregate
Retry failed records with exponential backoff; route permanently failed records to a dead-letter store
Provide a UI and API for managing integrations, viewing run history, and replaying dead-letter records

Non-Functional

Throughput: 50,000 records per second across all active integrations
Pipeline execution latency under 500 ms p95 for single-record synchronous mode
At-least-once delivery semantics per integration

Data Model

connectors: connector_id (UUID), type (ENUM: rest, graphql, db, s3, kafka), config (JSONB — encrypted credentials), owner_id, created_at
integrations: integration_id (UUID), name, source_connector_id, destination_connector_id, trigger (ENUM: schedule, webhook, event), mapping_spec (JSONB), pipeline_spec (JSONB), enabled (BOOL)
pipeline_runs: run_id (UUID), integration_id, started_at, finished_at, status (ENUM: running, success, partial, failed), records_in (INT), records_out (INT), records_failed (INT)
dead_letter_records: dlr_id (UUID), run_id, integration_id, raw_payload (JSONB), error_message (TEXT), attempt_count (INT), last_attempted_at, resolved (BOOL)

Core Algorithms

Connector Framework

Each connector implements a standard interface with three methods: read(cursor, batch_size) returns a record batch and a next cursor; write(records) returns a write result with success and failure counts; test_connection() validates credentials. Connectors are packaged as isolated Docker images and loaded dynamically by the pipeline executor. Configuration schemas are defined as JSON Schema, validated at integration creation time.

Declarative Data Mapping

The mapping spec is a JSON array of field-level rules. Each rule specifies source path (JSONPath), destination path, and an optional transform expression (powered by a sandboxed JavaScript engine, e.g. GraalVM). The mapping engine processes each record in a single pass, applying rules in order. Computed fields (e.g., concatenate first and last name) use template expressions. Conditional mappings use a when predicate evaluated against the source record.

Pipeline Execution

The pipeline spec is a directed acyclic graph (DAG) of steps. The executor topologically sorts steps and executes independent branches in parallel using a worker thread pool. Step types include: Filter (drop records matching a predicate), Enrich (call an external lookup service and merge results), Split (fan one record into N based on an array field), Aggregate (buffer records and emit a summary), and Map (apply the field mapping spec). Each step is wrapped in a try-catch; step-level failures emit the record to the dead-letter store rather than halting the pipeline.

Scalability and Architecture

Integrations triggered by schedule are managed by a distributed scheduler (backed by Postgres advisory locks to prevent double-execution). Webhook-triggered integrations receive events via a gateway that publishes to a Kafka topic per integration. Pipeline executors are stateless workers that consume from Kafka, execute the DAG, and write outputs via the destination connector.

Horizontal scaling: add executor workers to increase throughput; partition Kafka topic by integration_id for ordered processing
Connector credential secrets are stored encrypted (AES-256-GCM) with key management via AWS KMS or HashiCorp Vault
Dead-letter records are stored in Postgres with full payload; a replay API re-enqueues them to the head of the pipeline
Observability: per-integration metrics (records/sec, error rate, latency histogram) exported to Prometheus; run history queryable via API for 90 days

API Design

Integration Lifecycle

POST /v1/integrations — create integration with source, destination, mapping, and pipeline spec
PUT /v1/integrations/{integration_id}/enable / /disable — toggle integration
GET /v1/integrations/{integration_id}/runs?status=STRING&limit=INT — paginated run history

Dead-Letter Management

GET /v1/integrations/{integration_id}/dead-letter?resolved=false — list unresolved DLQ records
POST /v1/dead-letter/{dlr_id}/replay — re-enqueue a single record for reprocessing
POST /v1/integrations/{integration_id}/dead-letter/replay-all — bulk replay, returns a batch job ID

Connector Testing

POST /v1/connectors/{connector_id}/test — validates credentials and connectivity, returns status and a sample record from the source.

Interview Tips

Discuss the schema evolution problem: when a source system changes its payload structure, existing mapping specs may break silently. Mitigate with schema versioning on the source connector and a compatibility check step that validates sample records against the expected schema before enabling an integration. Also address the split-step fan-out risk: a single input record expanding to 10,000 output records can overwhelm the destination connector; implement a configurable per-step rate limiter and back-pressure mechanism that pauses the source read when the destination write queue exceeds a depth threshold.