Infrastructure Architecture
Overview
OnChain Software's blockchain data infrastructure is built around a single high-performance node running an Erigon client that synchronizes blockchain data into a ClickHouse database. This foundation supports all our applications and provides reliable data access for trading algorithms, analytics platforms, and customer-facing products.
Node Architecture
Our infrastructure currently consists of a single production node with the following components:
- Erigon Client: High-performance Ethereum client for blockchain synchronization
- ClickHouse Database: Columnar database optimized for analytical queries
- Core Tables: Three fundamental tables capturing all on-chain activity:
- Blocks: Block headers and metadata
- Transactions: Individual transaction details
- Logs: Smart contract events and logs
Data Pipeline
- Real-time Collection: Erigon client synchronizes with Ethereum network on every block
- Storage: Raw blockchain data stored in ClickHouse core tables
- Processing: Derived tables and views extract specific data patterns (DEX swaps, price feeds, etc.)
- Access: (In Development) ClickHouse endpoint exposure for application connectivity
Supported Networks
Current: Ethereum Mainnet
- Full historical data from genesis
- Real-time synchronization with latest blocks
- Complete transaction and event log coverage
Planned Network Expansion
We're planning to expand our data coverage to include: - Base (Coinbase L2) - Optimism (Ethereum L2) - BNB Chain (Binance Smart Chain) - Polygon (Polygon PoS) - Additional networks based on product requirements
Data Quality & Update Frequencies
Node Data Freshness
- Real-time Sync: Blockchain data is updated on every Ethereum block (~12 seconds)
- Core Tables: Blocks, Transactions, and Logs are populated immediately upon block confirmation
- Derived Tables: Materialized views and aggregations update automatically as new data arrives
Catalog Documentation Updates
- Minimum Frequency: Schema and documentation updates occur weekly
- Target Frequency: Daily updates to reflect infrastructure changes
- Version Control: All changes tracked through Git for complete audit trail
Data Quality Assurance
- Partitioned Storage: Tables partitioned by time for optimal query performance
- Automated Processing: Materialized views ensure derived data consistency
- Schema Validation: All incoming data validated against defined schemas
- Monitoring: Continuous monitoring of data pipeline health and sync status
Available Datasets
Ethereum Network Data
Our Ethereum dataset provides comprehensive blockchain information including:
- Blocks: Core blockchain blocks with metadata and gas metrics
- Transactions: Individual transaction details and execution results
- Logs: Smart contract event logs and blockchain events
- DEX Trading Data: Decentralized exchange swap transactions and liquidity metrics
Price & Market Data
Time-series pricing information optimized for analytics:
- Token Prices Hourly: Aggregated hourly price data for ERC-20 tokens
- WETH Price Data: Wrapped Ethereum pricing metrics
- Uniswap Trading Metrics: Automated market maker trading statistics
Materialized Views & Aggregations
Pre-computed data for enhanced query performance:
- Deduplicated DEX Swaps: Cleaned and standardized trading data
- Hourly WETH Aggregations: Time-series price and volume metrics