System Design: Design Blockchain Explorer (Etherscan) — Transaction Indexing, Smart Contract Decoding, Real-Time Blocks

Blockchain explorers like Etherscan and Blockscout provide a searchable interface to blockchain data: transactions, blocks, addresses, smart contracts, and token transfers. Designing an explorer tests your understanding of data indexing from an append-only distributed ledger, smart contract ABI decoding, and serving real-time block data. This is a unique system design question that combines ETL, search, and real-time updates.

Data Ingestion: Indexing the Blockchain

The blockchain is an append-only chain of blocks, each containing transactions. Ethereum produces a new block every ~12 seconds with 150-300 transactions per block. Total history: 20+ million blocks. To make this data searchable, an explorer must index every block, transaction, and event into a queryable database. Ingestion pipeline: (1) The explorer runs a full Ethereum node (Geth or Erigon) that syncs the entire blockchain. The node provides RPC access to block data. (2) An indexer process reads blocks sequentially from the node: for each block, extract: block metadata (number, timestamp, miner, gas_used), all transactions (from, to, value, gas, input_data, receipt, logs), and all event logs (emitted by smart contracts). (3) Decode transaction input data and event logs using the contract ABI (Application Binary Interface). Raw input is hex-encoded — the ABI maps it to function names and parameters. Example: 0xa9059cbb -> ERC-20 transfer(address, uint256). (4) Store decoded data in PostgreSQL (relational queries) and Elasticsearch (full-text search on addresses, transaction hashes, and contract names). (5) For historical data: the initial sync processes 20M+ blocks. This takes days to weeks. Use parallel processing: batch blocks into chunks and index concurrently. For new blocks: subscribe to the node new-block event (WebSocket) and index in real-time (sub-second from block creation to searchability).

Data Model and Queries

Core entities: (1) Block: block_number, timestamp, miner, gas_used, gas_limit, transaction_count, base_fee. (2) Transaction: tx_hash, block_number, from_address, to_address, value (ETH transferred), gas_price, gas_used, input_data (raw), decoded_function, status (success/fail), nonce. (3) Address: address, balance, transaction_count, is_contract, contract_name, contract_abi, token_balances. (4) Event log: log_index, tx_hash, contract_address, topic0 (event signature), decoded_event_name, decoded_parameters. (5) Token transfer: derived from Transfer events. From, to, amount, token_address, token_name, token_symbol. Key queries: (1) Transaction by hash — lookup by tx_hash (primary key). O(1). (2) Transactions for an address — all transactions where from = address OR to = address, ordered by block_number desc. Index on (from_address, block_number) and (to_address, block_number). (3) Token transfers for an address — all ERC-20/721 transfers involving the address. Derived from event logs. (4) Contract source code and ABI — verified contracts have their Solidity source code uploaded and compiled to match the on-chain bytecode. The ABI enables input/event decoding. (5) Block by number or latest — index on block_number. The latest block is cached in Redis for real-time display.

Smart Contract Verification and Decoding

On-chain, smart contract code is stored as EVM bytecode (unreadable hex). Contract verification: the developer submits their Solidity source code and compiler settings. The explorer compiles the source and compares the output bytecode with the on-chain bytecode. If they match: the contract is “verified” — the source code and ABI are stored. This enables: (1) Human-readable transaction decoding — instead of showing raw hex input, display: “transfer(0xABC…, 1000000)” with parameter names and values. (2) Read/write contract interaction — users can call view functions (read state) and send transactions (write state) directly from the explorer UI. (3) Source code browsing — users read the contract logic to understand what it does. ABI decoding for unverified contracts: even without verified source, the explorer can use: (1) Known function signatures — a database of 4-byte function selectors mapped to function names (the 4byte.directory project has millions). (2) ERC standard detection — check if the contract implements ERC-20, ERC-721, or ERC-1155 standards by checking for standard function signatures and events. (3) Proxy detection — detect delegatecall proxy patterns and resolve the implementation contract for decoding.

Real-Time Updates

Users expect the explorer to show the latest blocks and transactions within seconds. Real-time pipeline: (1) The indexer subscribes to the node newHeads event (new block notification) via WebSocket. (2) On new block: fetch the full block with transactions and receipts. Decode and index. Update the “latest blocks” display (push via WebSocket or SSE to connected clients). (3) Pending transactions (mempool): the node exposes pending transactions before they are included in a block. The explorer can show pending transactions for an address (useful for tracking: “your transaction is pending”). Note: mempool data is ephemeral and high-volume (thousands of pending transactions per second on Ethereum). Only index pending transactions for tracked addresses, not all. (4) Chain reorganizations (reorgs): occasionally, the blockchain reorganizes (a competing block replaces the current head). The explorer must detect reorgs and re-index the affected blocks. Listen for the chain_reorg event from the node. Mark affected transactions as potentially reverted and re-fetch the canonical chain. Gas price tracker: aggregate recent block gas prices to show current recommended gas prices (slow, average, fast). Update every block. This is a popular feature for users timing their transactions. Display as a widget on the homepage.

API and Rate Limiting

Etherscan provides a REST API used by thousands of dApps, wallets, and analytics tools. Endpoints: getTransactionByHash, getTransactionsByAddress, getContractABI, getTokenBalance, getBlockByNumber, getLogs. Rate limiting: free tier (5 calls/sec), paid tiers (higher limits). API keys track usage. Caching: popular queries (latest block, top tokens, gas price) are cached with 1-12 second TTL. Address pages for popular contracts (USDT, Uniswap) are cached aggressively. Pagination: cursor-based for transaction lists (block_number + tx_index as cursor). Offset-based pagination fails for addresses with millions of transactions. Webhook notifications: users register webhooks to be notified when: a specific address receives a transaction, a specific event is emitted by a contract, or a pending transaction is confirmed. The webhook service monitors the real-time indexing pipeline and dispatches notifications. Scale: Etherscan API handles millions of requests per day. The database is read-heavy (99%+ reads). Use read replicas for API queries. The indexer writes to the primary. Elasticsearch handles full-text search and complex log queries.

Scroll to Top