Build data analyst dashboard using Data Warehouse

Sunday. December 15, 2024 - 6 mins

Continuing the series of articles on the Data Warehouse topic, today we will come to a quite interesting content which is building a Data Analysis Dashboard on the Data warehouse. Building a Data Analysis Dashboard is one of the most basic functions in BI (Business Intelligence) activities, helping to visualize data in the form of charts, thereby helping users to easily understand and see insights from the data.
You can use Chainslake, a blockchain data warehouse that allows users to query blockchain data, build and share blockchain data analysis dashboards completely free of charge.

Overview of Metabase
Install and configure Metabase to connect to Data warehouse
Data tables in data warehouse
Query data, build and share dashboard
Conclusion

I am currently consulting, designing and implementing data analysis infrastructure, Data Warehouse, Lakehouse for individuals and organizations in need. You can see and try a system I have built here. Please contact me via email: lakechain.nguyen@gmail.com. Thank you!

Overview of Metabase

Metabase is a BI tool that allows querying data using SQL language on many different databases and SQL engines through plugins, presenting query results into tables, charts, data on analytical dashboards, you can see a demo dashboard of mine here.

Metabase has both an Opensource version with basic functions that can be installed on existing infrastructure and an Enterprise version with advanced functions that can be used directly on Metabase’s cloud or self-hosted via a license key. You can see details here.

Install and configure Metabase to connect to Data warehouse

I will install the Opensource version of Metabase via docker and connect to our DWH cluster via Trino (you can review my article on how to install Trino here)

First, I will create a database for Metabase in postgres on the DWH cluster

postgres=# CREATE DATABASE metabase;

Install metabase via Docker

$ wget https://github.com/starburstdata/metabase-driver/releases/download/5.0.0/starburst-5.0.0.metabase-driver.jar
$ docker run -d -p 3000:3000 --name metabase \
  --network hadoop --add-host=node01:172.20.0.2 \
  -e "MB_DB_TYPE=postgres" \
  -e "MB_DB_DBNAME=metabase" \
  -e "MB_DB_PORT=5432" \
  -e "MB_DB_USER=postgres" \
  -e "MB_DB_PASS=password" \
  -e "MB_DB_HOST=node01" \
  -v starburst-5.0.0.metabase-driver.jar:/plugins/starburst-5.0.0.metabase-driver.jar \
   --name metabase metabase/metabase

Note 172.20.0.2 is the container ip of node01 on my machine, you replace it with your machine ip.

Access http://localhost:3000 you will see the Metabase startup interface, after creating an admin account you will be taken to the main working interface of Metabase.

Metabase intro

To connect to Trino, go to Admin settings, select the Databases tab, select Add database and set it up as shown below:

Connect to Trino

After configuring, Exit admin and check if there is data in Browser/Databases. You can see Chainslake’s data tables here

Chainslake data

Data tables in the data warehouse

In this section, I will briefly describe the data tables in the data warehouse of Chainslake to have a basis for writing queries and building dashboards in the next section.

Chainslake’s data warehouse includes many folders, each folder includes many tables with the same topic, specifically I will introduce some of the following tables and folders:

ethereum: Folder containing Ethereum raw data (You can learn more about the data types of Ethereum blockchain in my previous article here)
- transactions: Contains all transaction data on Ethereum (from the beginning to the present)
- logs: Contains all log data (events) emitted from contracts when they execute transactions.
- traces: Contains all Internal Transactions data (The smallest transaction units on Ethereum).
- address_type: Contains all addresses that have ever appeared on Ethereum, including 2 types: wallet and contract
ethereum_decoded: Directory containing decode data of some protocols on Ethereum. Currently, Chainslake has decode data of some popular protocols as follows:
- erc20_evt_transfer: ERC20 token transfer data (all tokens from the past to present)
- erc721_evt_transfer: ERC721 NFT transfer data
- erc1155_evt_transferbatch, erc1155_evt_transfer_single: NFT 1155 transfer data
- uniswap_v2_evt_swap, uniswap_v3_evt_swap: Token swap data on liquidity pools.
ethereum_contract: Contains all contract information of some popular protocols
- erc20_tokens: Information about ERC20 tokens.
- erc721_tokens: Information about ERC721 NFTs.
- erc1155_tokens: Information about ERC1155 NFTs.
- uniswap_v2_info, uniswap_v3_info: Information about Swap pool contracts (Uniswap and other protocols with the same standard)
ethereum_dex: Contains trading data of ERC20 tokens on DEX decentralized exchanges.
- token_trades: Contains trading data of ERC20 tokens on DEX decentralized exchanges with WETH.
ethereum_prices: Contains data on token prices according to transactions taken from DEX data.
- erc20_usd_day, erc20_usd_hour, erc20_usd_minute: ERC20 token prices in usd by day, hour, minute.
ethereum_balances: Directory containing information on balances, balance fluctuations of addresses on Ethereum
- erc20_native: Summary table of final balances and daily balance fluctuations of all addresses of all tokens including Native tokens on Ethereum.
binance_cex: Table containing transaction price data of coins on Binance exchange
- coin_token_address: List of coins listed on Binance exchange that have corresponding tokens on Ethereum
- trade_minute: Price data of trading pairs by minute on Binance.

Writing queries and building dashboards on Chainslake is quite simple if you already have SQL skills. I have created a dashboard in the collection Demo so you can understand and easily start creating your own analytical dashboards.

Demo collection

Demo dashboard

Some notes when writing queries:

The data tables all have an extremely large number of records (because they contain all historical data), so all tables are partitioned by block_date, you should use this column to filter or join between tables whenever possible to help your queries run faster and more efficiently.
Using query optimization techniques, reducing the amount of data that needs to be scanned when executing queries always needs to be carefully considered to ensure that the query can be executed within the allowed time, and this also helps you improve your ability to write and optimize queries.
Query results will be automatically cached for 1 day, you can change in the configuration of each query.
Finally, don’t forget to share your dashboards with everyone so that Chainslake can have more users.

Conclusion

Above are my shares on how to build a data analysis dashboard on a data warehouse, I hope to receive support from all of you. See you again in the next articles.

Lake Nguyen

Founder of Chainslake

Build data analyst dashboard using Data Warehouse

Contents

Overview of Metabase

Install and configure Metabase to connect to Data warehouse

Data tables in the data warehouse

Query data, build and share dashboards

Conclusion

Related Posts

Lake Nguyen