Getting Started
For OnChain Software Team Members
This guide helps internal team members understand and integrate with our blockchain data infrastructure.
1. Explore Available Data
Start by understanding our current datasets: - Infrastructure Overview: Understand our technical architecture - Ethereum Tables: Browse available data tables and schemas - Data Access Patterns: Learn integration methods and best practices
2. Understand the Architecture
Review core concepts and terminology: - Glossary: Technical terminology and system concepts - Warehouse Diagrams: System architecture visualization - Applications: How our products use this data
3. Plan Your Integration
Consider how your application will access data: - Direct ClickHouse: For maximum performance and custom queries - Future API Access: For standardized access with authentication - MCP Server: For AI/ML workflow integration
4. Schema Design
Before requesting new tables or modifications: - Review existing table structures and naming conventions - Check if materialized views can meet your needs - Consider query performance and partitioning strategies - Coordinate with the data team for schema changes
5. Query Optimization
Study existing patterns for optimal performance: - Use time-based partitioning effectively - Leverage materialized views for aggregations - Select only necessary columns - Monitor query execution times
Development Workflow
1. Local Development Setup
# Clone the data catalog repository
git clone https://github.com/OnChain-Software/data-catalog.git
# Install documentation dependencies
pip install mkdocs mkdocs-material mkdocs-include-markdown-plugin
# Run local documentation server
mkdocs serve
2. Understanding Data Structures
- Use this catalog to explore available tables and columns
- Review example queries for common patterns
- Check data types and constraints before writing queries
- Understand partitioning schemes for optimal performance
3. Query Development Process
- Plan: Identify required data and expected query patterns
- Test: Develop queries against documented schemas
- Optimize: Review execution plans and performance
- Document: Update catalog with new query patterns
4. Integration Design
- Data Requirements: Identify specific tables and columns needed
- Access Patterns: Determine query frequency and performance needs
- Real-time vs Batch: Choose appropriate data freshness requirements
- Error Handling: Plan for data delays and connection issues
5. Documentation Updates
When adding new tables or modifying schemas: - Update relevant table documentation - Add example queries and use cases - Include performance considerations - Submit changes via pull request
Common Integration Scenarios
Analytics Dashboard
Requirements: - Historical price data aggregations - Trading volume metrics - Real-time updates
Recommended Approach: - Use materialized views for pre-computed aggregations - Query hourly aggregation tables for performance - Consider caching for frequently accessed data
Trading Algorithm
Requirements: - Real-time price feeds - Low-latency data access - High availability
Recommended Approach: - Direct ClickHouse connection for minimal latency - Use streaming queries for real-time updates - Implement connection pooling and failover
Research & Analytics
Requirements: - Historical data analysis - Complex aggregations - Ad-hoc queries
Recommended Approach: - Use core tables for maximum flexibility - Leverage ClickHouse analytical functions - Consider data export for large analyses
Internal Support & Collaboration
Data Team Resources
This catalog is maintained by the OnChain Software data team. For assistance with:
- New Table Requests: Submit requirements through internal development process
- Schema Changes: Coordinate with data team before modifying existing structures
- Integration Support: Reach out to infrastructure team for connection guidance
- Performance Issues: Report slow queries or optimization opportunities
Communication Channels
- Schema Changes: Coordinate through data team before implementation
- Performance Issues: Report via internal monitoring systems
- Feature Requests: Submit through product development process
- Documentation Updates: Submit pull requests to this repository
Best Practices
- Version Control: All schema and documentation changes tracked in Git
- Testing: Validate queries in development before production deployment
- Documentation: Keep table documentation current with schema changes
- Performance: Monitor query performance and optimize proactively
Troubleshooting
Common Issues
Connection Problems
- Verify ClickHouse connection settings
- Check network connectivity and firewall rules
- Validate authentication credentials
Query Performance
- Review query execution plans
- Check if appropriate indexes are being used
- Consider using materialized views for complex aggregations
- Verify time-based partitioning is being leveraged
Data Freshness
- Check blockchain synchronization status
- Verify materialized view update schedules
- Monitor data pipeline health dashboards
Getting Help
- Technical Issues: Contact infrastructure team
- Data Questions: Reach out to data team
- Integration Planning: Schedule consultation with architecture team
- Documentation: Submit issues or pull requests to this repository