How does a blockchain explorer index the entire blockchain for search?

The explorer runs a full Ethereum node syncing the entire blockchain. An indexer reads blocks sequentially via RPC: for each block, extract block metadata, all transactions (from, to, value, gas, input data, receipt, logs), and all event logs. Transaction input and events are decoded using contract ABIs: raw hex 0xa9059cbb maps to ERC-20 transfer(address, uint256). Decoded data is stored in PostgreSQL (relational queries) and Elasticsearch (full-text search). Initial sync of 20M+ blocks takes days -- use parallel chunk processing. For new blocks: subscribe to the node newHeads WebSocket event and index in real-time (sub-second from block creation to searchability). Chain reorganizations require re-indexing affected blocks when the canonical chain changes.

How does smart contract verification work on a blockchain explorer?

On-chain, contracts are stored as EVM bytecode (unreadable hex). Verification: the developer submits Solidity source code + compiler settings. The explorer compiles the source and compares the output bytecode with on-chain bytecode. If they match: the contract is verified. This enables: human-readable transaction decoding (show function names and parameters instead of hex), read/write contract interaction from the explorer UI, and source code browsing. For unverified contracts: the explorer uses known function signature databases (4byte.directory with millions of signatures) to decode common operations. ERC standard detection checks for standard function signatures to identify token contracts. Proxy pattern detection resolves implementation contracts for decoding delegatecall proxies.

System Design: Design Blockchain Explorer (Etherscan) — Transaction Indexing, Smart Contract Decoding, Real-Time Blocks

⏱ 6 min read

Blockchain explorers like Etherscan and Blockscout provide a searchable interface to blockchain data: transactions, blocks, addresses, smart contracts, and token transfers. Designing an explorer tests your understanding of data indexing from an append-only distributed ledger, smart contract ABI decoding, and serving real-time block data. This is a unique system design question that combines ETL, search, and real-time updates.

Data Ingestion: Indexing the Blockchain

The blockchain is an append-only chain of blocks, each containing transactions. Ethereum produces a new block every ~12 seconds with 150-300 transactions per block. Total history: 20+ million blocks. To make this data searchable, an explorer must index every block, transaction, and event into a queryable database. Ingestion pipeline: (1) The explorer runs a full Ethereum node (Geth or Erigon) that syncs the entire blockchain. The node provides RPC access to block data. (2) An indexer process reads blocks sequentially from the node: for each block, extract: block metadata (number, timestamp, miner, gas_used), all transactions (from, to, value, gas, input_data, receipt, logs), and all event logs (emitted by smart contracts). (3) Decode transaction input data and event logs using the contract ABI (Application Binary Interface). Raw input is hex-encoded — the ABI maps it to function names and parameters. Example: 0xa9059cbb -> ERC-20 transfer(address, uint256). (4) Store decoded data in PostgreSQL (relational queries) and Elasticsearch (full-text search on addresses, transaction hashes, and contract names). (5) For historical data: the initial sync processes 20M+ blocks. This takes days to weeks. Use parallel processing: batch blocks into chunks and index concurrently. For new blocks: subscribe to the node new-block event (WebSocket) and index in real-time (sub-second from block creation to searchability).

Data Model and Queries

Core entities: (1) Block: block_number, timestamp, miner, gas_used, gas_limit, transaction_count, base_fee. (2) Transaction: tx_hash, block_number, from_address, to_address, value (ETH transferred), gas_price, gas_used, input_data (raw), decoded_function, status (success/fail), nonce. (3) Address: address, balance, transaction_count, is_contract, contract_name, contract_abi, token_balances. (4) Event log: log_index, tx_hash, contract_address, topic0 (event signature), decoded_event_name, decoded_parameters. (5) Token transfer: derived from Transfer events. From, to, amount, token_address, token_name, token_symbol. Key queries: (1) Transaction by hash — lookup by tx_hash (primary key). O(1). (2) Transactions for an address — all transactions where from = address OR to = address, ordered by block_number desc. Index on (from_address, block_number) and (to_address, block_number). (3) Token transfers for an address — all ERC-20/721 transfers involving the address. Derived from event logs. (4) Contract source code and ABI — verified contracts have their Solidity source code uploaded and compiled to match the on-chain bytecode. The ABI enables input/event decoding. (5) Block by number or latest — index on block_number. The latest block is cached in Redis for real-time display.

Smart Contract Verification and Decoding

On-chain, smart contract code is stored as EVM bytecode (unreadable hex). Contract verification: the developer submits their Solidity source code and compiler settings. The explorer compiles the source and compares the output bytecode with the on-chain bytecode. If they match: the contract is “verified” — the source code and ABI are stored. This enables: (1) Human-readable transaction decoding — instead of showing raw hex input, display: “transfer(0xABC…, 1000000)” with parameter names and values. (2) Read/write contract interaction — users can call view functions (read state) and send transactions (write state) directly from the explorer UI. (3) Source code browsing — users read the contract logic to understand what it does. ABI decoding for unverified contracts: even without verified source, the explorer can use: (1) Known function signatures — a database of 4-byte function selectors mapped to function names (the 4byte.directory project has millions). (2) ERC standard detection — check if the contract implements ERC-20, ERC-721, or ERC-1155 standards by checking for standard function signatures and events. (3) Proxy detection — detect delegatecall proxy patterns and resolve the implementation contract for decoding.

Real-Time Updates

Users expect the explorer to show the latest blocks and transactions within seconds. Real-time pipeline: (1) The indexer subscribes to the node newHeads event (new block notification) via WebSocket. (2) On new block: fetch the full block with transactions and receipts. Decode and index. Update the “latest blocks” display (push via WebSocket or SSE to connected clients). (3) Pending transactions (mempool): the node exposes pending transactions before they are included in a block. The explorer can show pending transactions for an address (useful for tracking: “your transaction is pending”). Note: mempool data is ephemeral and high-volume (thousands of pending transactions per second on Ethereum). Only index pending transactions for tracked addresses, not all. (4) Chain reorganizations (reorgs): occasionally, the blockchain reorganizes (a competing block replaces the current head). The explorer must detect reorgs and re-index the affected blocks. Listen for the chain_reorg event from the node. Mark affected transactions as potentially reverted and re-fetch the canonical chain. Gas price tracker: aggregate recent block gas prices to show current recommended gas prices (slow, average, fast). Update every block. This is a popular feature for users timing their transactions. Display as a widget on the homepage.

API and Rate Limiting

Etherscan provides a REST API used by thousands of dApps, wallets, and analytics tools. Endpoints: getTransactionByHash, getTransactionsByAddress, getContractABI, getTokenBalance, getBlockByNumber, getLogs. Rate limiting: free tier (5 calls/sec), paid tiers (higher limits). API keys track usage. Caching: popular queries (latest block, top tokens, gas price) are cached with 1-12 second TTL. Address pages for popular contracts (USDT, Uniswap) are cached aggressively. Pagination: cursor-based for transaction lists (block_number + tx_index as cursor). Offset-based pagination fails for addresses with millions of transactions. Webhook notifications: users register webhooks to be notified when: a specific address receives a transaction, a specific event is emitted by a contract, or a pending transaction is confirmed. The webhook service monitors the real-time indexing pipeline and dispatches notifications. Scale: Etherscan API handles millions of requests per day. The database is read-heavy (99%+ reads). Use read replicas for API queries. The indexer writes to the primary. Elasticsearch handles full-text search and complex log queries.