Blog | Databend

Rewrite sqllogictest framework in rust

2022年12月29日 · 阅读需 3 分钟

Sqllogictest is a program designed to verify that an SQL database engine computes correct results by comparing the results to identical queries from other SQL database engines. Sqllogictest was originally designed to test SQLite, but it is database engine neutral and can just as easily be used to test other database products.

In the rust ecosystem, sqllogictest-rs is a very good implementation of the sqllogictest framework, thanks to which databend can easily and quickly switch the sqllogictest framework from python to rust.

Background

For some historical reasons, databend's original sqllogictest framework was the python version, as seen in rfc for sqllogictest
We decided to rewrite it in rust for the following reasons:
- By having a unified codebase written in Rust, we can improve the speed of development and iteration on our framework in the long term. This is because the entire team is proficient in Rust.
- The previous framework lacked a strict parser at the front end, which resulted in some errors going undetected.
- sqllogictest-rs crate is maturing and building databend's new sqllogictest framework based on it can save a lot of labor.
- Switching from python to rust, there is a potential performance gain. The current python version of sqllogictest has a suboptimal runtime, resulting in a slower CI, and a more desirable end result, with about 10x improvement.

Implementation

The first version of sqllogictest does not strictly follow the sqllogictest wiki implementation, so the format of the test file needs to be adjusted (purely physical work), for example:
- query and ---- has extra blank lines in between.
- The comment format is different.
- Some queries with empty results, such as select ' ', 1, are displayed directly with , which can easily cause confusion throughout the test file and needs to be displayed with (empty) instead.
- ...
Databend supports three client handlers: mysql, http, and clickhouse as described in handlers, each of which returns a different format of content. Mysql is more normal, but http returns json and clickhouse returns tsv. In http and clickhouse, the following substitutions need to be made:
- inf -> Infinity
- nan -> NaN
- \\N -> NULL
Isolation between test files. In order to increase the parallelism as much as possible (multiple test files can run in parallel), we need to isolate different files to prevent misuse of database or table, avoid database or table being dropped by mistake, we introduced sandbox tenant, each test file a separate sandbox environment, so that the files can run in parallel, greatly reducing the test time.
...

Unsolved Issues

What is the best way to effectively test a query that has dynamic results?

For example, SHOW TABLE STATUS results will include the creation time of the table, which is dynamic, and it is worth discussing how to test such sql.

Conclusion

Overall, the benefits of switching from the python version to the rust version to solve the problems mentioned in the background are very good, and RIIR has its justification! For more information on using the databend sqllogictest framework, see README. Some future todos: sqllogictest tracking. And finally, thanks to sqllogictest-rs for the support!

This Week in Databend #73

2022年12月21日 · 阅读需 4 分钟

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

implement show tables (from|in catalog.database) (#9153)

Planner

introduce histogram in column statistics (#9310)

Query

support attaching stage for insert values (#9249)
add native format in fuse table (#9279)
add internal_enable_sandbox_tenant config and sandbox_tenant (#9277)

Sqllogictest

introduce rust native sqllogictest framework (#9150)

Code Refactoring 🎉

unify apply_file_format_options for copy & insert (#9323)

remove unused code (#9266)

meta

test watcher count (#9324)

Planner

replace TableContext in planner with PlannerContext (#9290)

Bug Fixes 🔧

Base

try fix SIGABRT when catch unwind (#9269)
replace #[thread_local] to thread_local macro (#9280)

Query

fix unknown database in query without relation to this database (#9250)
fix wrong current_role when drop the role (#9276)

What's On In Databend

Stay connected with the latest news about Databend.

Introduced a Rust Native Sqllogictest Framework

Sqllogictest verifies the results returned from a SQL database engine by comparing them with the results of other engines for the same queries.

In the past, Databend ran such tests using a program written in Python and migrated a large number of test cases from other popular databases. We implemented the program again with sqllogictest-rs in recent days.

Learn More

Experimental: Native Format

PA is a native storage format based on Apache Arrow. Similar to Arrow IPC, PA aims at optimizing the storage layer.

Databend is introducing PA as a native storage format in the hope of getting a performance boost, though it's still at an early stage of development.

create table tmp (a int) ENGINE=FUSE STORAGE_FORMAT='native';

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Checking File Existence Before Returning Presigned URL

When presigning a file, Databend currently returns a potentially valid URL based on the filename without checking if the file really exists. Thus, the 404 error might occur if the file doesn't exist at all.

Issue 8702: Before return presign url add file exist judgement

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


ariesdevil	b41sh	BohuTANG	ClSlaid	drmingdrmer	everpcpc


leiysky	mergify[bot]	PsiACE	sandflee	soyeric128	sundy-li


Xuanwo	xudong963	youngsofun	zhang2014	ZhiHanZ	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

CPU and Memory Profiling for Rust

2022年12月19日 · 阅读需 2 分钟

BohuTANG

Profiling CPU and memory for Go applications is easy and can be of great help in performance troubleshooting, for example, with flamegraphs. For Rust applications, however, the profiling requires extra work. This post explains how to use flamegraphs to visualize performance data of your CPU and memory for Databend.

To support CPU and memory profiling, some APIs need to be included in the application. For example, Databend includes the following in the code:

CPU Profiling

To do a CPU profiling, simply run the following command on the Databend server:

go tool pprof -http="0.0.0.0:8081" http://localhost:8080/debug/pprof/profile?seconds=30

localhost:8080: Databend management address.
0.0.0.0:8081: pprof server address.
seconds=30: Profiling lasts for 30 seconds.

Then open the URL <your-ip>:8081/ui/flamegraph in your browser to view the flamegraph:

Alt text

Memory Profiling

Compared to CPU profiling, memory profiling is a bit more involved, and can be done in the following steps:

1. Enable Memory Profiling

cargo build --bin databend-query --release --features memory-profiling

2. Start with MALLOC_CONF

MALLOC_CONF=prof:true,lg_prof_interval:30 ./target/release/databend-query

lg_prof_interval:30: Profiles are dumped into a file for each allocation of 1 GiB (2^30 bytes).

3. Replace add2line with a Faster One

This will rocket your jeprof from 30 minutes to 3 seconds.

git clone https://github.com/gimli-rs/addr2line
cd addr2line
cargo b --examples -r
cp ./target/release/examples/addr2line <your-addr2line-find-with-whereis-addr2line>

4. Upgrade jeprof to the Latest Version

jeprof needs an upgrade because the old version doesn't support some parameters for creating flamegraphs. jeporf is a perl script, so the way to upgrade it is a little bit rough-and-ready.

First, find out the path of your local jeprof file:

whereis jeprof

Open and copy the latest version of jeprof, then overwrite your local copy with the copied script EXCEPT for the following two parameters:

my $JEPROF_VERSION = "5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756";
my $PPROF_VERSION = "2.0";

5. Create a Flamegraph

jeprof ./databend-query-main ./jeprof.206330.563.i563.heap --collapse | flamegraph.pl --reverse --invert --minwidth 3 > heap.svg

flamegraph.pl: Download from GitHub.
databend-query-main: Path to your executable.
jeprof.206330.563.i563.heap: Selects a heap file.

Alt text

References

FlameGraph
https://github.com/jemalloc/jemalloc/blob/dev/bin/jeprof.in
Databend, Cloud Lakehouse: https://github.com/datafuselabs/databend

This Week in Databend #72

2022年12月14日 · 阅读需 5 分钟

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

extends show databases SQL (#9152)

Stage

support select from URI (#9247)

Streaming Load

support file_format syntax in streaming load insert sql (#9063)

Planner

push down limit to union (#9210)

Query

use analyze table instead of optimize table statistic (#9143)
fast parse insert values (#9214)

Storage

use distinct count calculated by the xor hash function (#9159)
read_parquet read meta before read data (#9154)
push down filter to parquet reader (#9199)
prune row groups before reading (#9228)

Open Sharing

add prototype open sharing and add sharing stateful tests (#9177)

Code Refactoring 🎉

simplify the global data registry logic (#9187)

Storage

refactor deletion (#8824)

Build/Testing/CI Infra Changes 🔌

release databend deb package and databend with hive (#9138, #9241, etc.)

Bug Fixes 🔧

Format

support ASCII control code hex as format field delimiter (#9160)

Planner

prewhere_column empty and predicate is not const will return empty (#9116)
don't push down topk to Merge when it's child is Aggregate (#9183)
fix nullable column validity not equal (#9220)

Query

address unit test hang on test_insert (#9242)

Storage

too many io requests for read blocks during compact (#9128)
collect orphan snapshots (#9108)

What's On In Databend

Stay connected with the latest news about Databend.

Breaking Change: Unified File Format Options

To simplify, we're rolling out a set of unified file format options as follows for the COPY INTO command, the Streaming Load API, and all the other cases where users need to describe their file formats:

[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]

Please note that the current format options starting with format_* will be deprecated.
... FORMAT CSV ... will still be accepted by the ClickHouse handler.
Support for customized formats created by CREATE FILE FORMAT ... will be added in a future release: ... FILE_FORMAT = (format_name = 'MyCustomCSV') .... .

Learn More

Open Sharing is a simple and secure data-sharing protocol designed for databend-query nodes running in a multi-cloud environment.

Simple & Free: Open Sharing is open-source and basically a RESTful API implementation.
Secure: Open Sharing verifies incoming requesters' identities and access permissions, and provides an audit log.
Multi-Cloud: Open Sharing supports a variety of public cloud platforms, including AWS, Azure, GCP, etc.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

We're about to run stage-related tests again using the Streaming Load API to move files to a stage instead of an AWS command like this:

aws --endpoint-url ${STORAGE_S3_ENDPOINT_URL} s3 cp s3://testbucket/admin/data/ontime_200.csv s3://testbucket/admin/stage/internal/s1/ontime_200.csv >/dev/null 2>&1

This is because Databend users do not need to take care of, or do not even know the stage paths that the AWS command requires.

Issue 8528: refactor stage related tests

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


ariesdevil	b41sh	BohuTANG	Chasen-Zhang	ClSlaid	dantengsky


drmingdrmer	hantmac	lichuang	mergify[bot]	PsiACE	RinChanNOWWW


soyeric128	sundy-li	wubx	Xuanwo	xudong963	youngsofun


ZhiHanZ	zhyass	zzzdong

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

Databend Indexing Explained

2022年12月13日 · 阅读需 4 分钟

wubx

When working with Databend, you don't bother maintaining indexes. Databend takes advantage of these indexing techniques to automatically build and manage indexes on the fly:

Min/Max index
Bloom index
Cluster key

Min/Max Index

Min/Max Index is the key indexing technique for OLAP databases. Databend Fuse Engine uses it as the main indexing method to build indexes and store them in snapshots, segments, and blocks. The following shows how the Min/Max Index works for a table in Databend.

First, use SHOW CREATE TABLE to find the initial snapshot file created for the table:

show create table ontime(
`Year` INT,  -- First column
...
) ENGINE=FUSE SNAPSHOT_LOCATION='1/458/_ss/71b460c61fa943d1a391d3118ebd984c_v1.json'

Downdoad and open the snapshot file with VSCODE:

{
    "format_version": 1,
    "snapshot_id": "71b460c6-1fa9-43d1-a391-d3118ebd984c",
    "timestamp": "2022-11-29T03:44:03.419194Z",
    "prev_snapshot_id": null,
    "schema": {
        "fields": [
            ... -- Field definitions
        ],
        "metadata": {}
    },  
    "summary": {
    "row_count": 90673588,
    "block_count": 200,
    "perfect_block_count": 0,
    "uncompressed_byte_size": 65821591614,
    "compressed_byte_size": 2761791374,
    "index_size": 1194623,
    "col_stats": {
           ...
           "0": {  --  Min/Max indexes for the first column 'Year' in the table
                "min": {
                    "Int64": 1987
                },
                "max": {
                    "Int64": 2004
                },
                "null_count": 0,
                "in_memory_size": 362694352,
                "distinct_of_values": 0
            },
           ...
        }，
    },
    "segments": [
        ...
        [
            "1/458/_sg/ddccbb022ba74387be0b41eefd16bbbe_v1.json",
            1
        ],
        ...
    ],
    "cluster_key_meta": null
}

The file above indicates that the min value of the first column is 1987 and the max is 2004. The indexes in a snapshot file can tell you whether the data you want to retrieve exists in the table. For example, no data would be returned for the following query if Databend cannot find a matching Min/Max interval in all snapshots:

select avg(DepDelay) from ontime where Year='2003';

Databend Fuse Engine stores the most important indexes in segment files. At the end of a snashot file, you can find information about which segments are related to the snapshot. Here's a sample segment file:

{
   "format_version":1,
   "blocks":[
       { -- block ...
           ...
                   "row_count": 556984,
                   "block_size": 405612604,
                   "file_size": 25302413,
                    "col_stats": {
                    ...
                   "0": {
                   "min": {
                       "Int64": 2003
                   },
                   "max": {
                       "Int64": 2003
                   },
                   "null_count": 0,
                   "in_memory_size": 2227936,
                   "distinct_of_values": 1
               },
                    ...
                    },
         "col_metas": {
          -- Used to record the start position and length of each column
         },
           "cluster_stats": null,
           "location": [
               "1/458/_b/e4f3795c79004f22b80ed5ee821edf23_v0.parquet",
               0
           ],
           "bloom_filter_index_location": [
               "1/458/_i_b_v2/e4f3795c79004f22b80ed5ee821edf23_v2.parquet",
               2
           ],
           "bloom_filter_index_size": 60207,
           "compression": "Lz4Raw"
           ...
       }    
   ],
   "summary": {
       "row_count": 11243809,
       "block_count": 25,
       "perfect_block_count": 25,
       "uncompressed_byte_size": 8163837349,
       "compressed_byte_size": 339392734,
       "index_size": 1200133,
       "col_stats": {
       ...
          "0": {
               "min": {
                   "Int64": 1988
               },
               "max": {
                   "Int64": 2003
               },
               "null_count": 0,
               "in_memory_size": 44975236,
               "distinct_of_values": 0
           },
       ...
       }
   }
}

From the sample above, we can see that a segment file contains its own Min/Max index information. So does a block file. The Min/Max indexes are layered and distributed among snapshots, segments, and blocks like this:

Alt text

When retrieving data for a query, Databend starts from the snapshot indexes and locates the corresponding segment by matching the Min/Max interval. Then, it looks up the indexes in the segment file to find the block where the required data is stored and reads data from the block file with information about the start position from col_metas. So Databend literally processes a query by finding the right segments and blocks with the Min/Max Index.

Bloom Index

For queries requiring an exact string match, Databend uses the Min/Max Index to find the right block first, and then locates the offsets with the bloom index information in bloom_filter_index_location to retrieve data from the block.

For more information about the Bloom Index, see https://databend.rs/blog/xor-filter.

Cluster Key

The Min/Max Index seems to work perfectly for Databend, but in fact, data is usually written into a table out of order. As a result, segments and blocks might be created with overlapped Min/Max intervals.

For example, you need to access up to three parquet files for a query condition like Age = 20 & Age = 35. If Age is set as the cluster key, Databend will sort the data by the Age column and combine as many small parquet files as possible.

Alt text

For more information about the cluster key, see https://databend.rs/doc/sql-commands/ddl/clusterkey/.

This Week in Databend #71

2022年12月7日 · 阅读需 4 分钟

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Planner

optimize topk in cluser mode (#9092)

Query

support select * exclude [column_name | (col_name, col_name,...)] (#9009)
alter table flashback (#8967)
new table function read_parquet to read parquet files as a table (#9080)
support select * from @stage (#9123)

Storage

cache policy (#9062)
support hive nullable partition (#9064)

Code Refactoring 🎉

Memory Tracker

keep tracker state consistent (#8973)

REST API

drop ctx after query finished (#9091)

Bug Fixes 🔧

Configs

add more tests for hive config loading (#9074)

Planner

try to fix table name case sensibility (#9055)

Functions

vector_const like bug fix (#9082)

Storage

update last_snapshot_hint file when purge (#9060)

Cluster

try fix broken pipe or connect reset (#9104)

What's On In Databend

Stay connected with the latest news about Databend.

RESTORE TABLE

By the snapshot ID or timestamp you specify in the command, Databend restores the table to a prior state where the snapshot was created. To retrieve snapshot IDs and timestamps of a table, use FUSE_SNAPSHOT.

-- Restore with a snapshot ID
ALTER TABLE <table> FLASHBACK TO (SNAPSHOT => '<snapshot-id>');
-- Restore with a snapshot timestamp
ALTER TABLE <table> FLASHBACK TO (TIMESTAMP => '<timestamp>'::TIMESTAMP);

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Adding Build Information to Error Report

An error report currently only contains an error code and some information about why the error occurred. When build information is available, troubleshooting will become easier.

"Code: xx. Error: error msg... (version ...)"

Issue 9117: Add Build Information to the Error Report

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	b41sh	BohuTANG	dantengsky	drmingdrmer	everpcpc


lichuang	mergify[bot]	PsiACE	RinChanNOWWW	sandflee	soyeric128


sundy-li	TCeason	Xuanwo	xudong963	youngsofun	zhang2014


ZhiHanZ

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Official Website
GitHub Discussions (Feature requests, bug reports, and contributions)
Twitter (Stay in the know)
Slack Channel (Chat with the community)

This Week in Databend #70

2022年11月30日 · 阅读需 5 分钟

PsiACE

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

better checking of format options (#8981)
add basic schema infer for parquet (#9043)

Query

QualifiedName support 'db.table.' and 'table.' (#8965)
support bulk insert without exprssion (#8966)

Storage

add cache layer for fuse engine (#8830)
add system table system.memory_statistics (#8945)
add optimize statistic ddl support (#8891)

Code Refactoring 🎉

Base

remove common macros (#8936)

Format

TypeDeserializer get rid of FormatSetting (#8950)

Planner

refactor extract or predicate (#8951)

Processors

optimize join by merging build data block (#8961)

New Expression

allow sparse column id in chunk, redo #8789 with a new approach. (#9008)

Documentation 📔

i18n support with crowdin (#8987, #8997, etc.)

Bug Fixes 🔧

Base

try fix lost tracker (#8932)

Meta

fix share db bug, create DatabaseIdToName if need (#9006)

Mysql handler

fix mysql conns leak (#8894)

Processors

try fix update list memory leak (#9023)

Storage

read and write block in parallel when compact (#8921)

What's On In Databend

Stay connected with the latest news about Databend.

Infer Schema at a Glance

You usually need to create a table before loading data from a file stored on a stage or somewhere. Unfortunately, sometimes you might not know the file schema to create the table or are unable to input the schema due to its complexity.

Introducing the capability to infer schema from an existing file will make the work much easier. You will even be able to query data directly from a stage using a SELECT statement like select * from @my_stage.

INFER 's3://mybucket/data.csv' FILE_FORMAT = ( TYPE = CSV );
+-------------+---------+----------+
| COLUMN_NAME | TYPE    | NULLABLE |
|-------------+---------+----------|
| CONTINENT   | TEXT    | True     |
| COUNTRY     | VARIANT | True     |
+-------------+---------+----------+

We've added support for inferring the basic schema from parquet files in #9043, and we're now working on #7211 to implement select from @stage.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Tls Support for Mysql Handler

opensrv-mysql v0.3.0 that was released recently includes support for TLS. It sounds like a good idea to introduce it to Databend.

let (is_ssl, init_params) = opensrv_mysql::AsyncMysqlIntermediary::init_before_ssl(
    &mut shim,
    &mut r,
    &mut w,
    &Some(tls_config.clone()),
)
.await
.unwrap();

opensrv_mysql::secure_run_with_options(shim, w, ops, tls_config, init_params).await

Issue 8983: Feature: tls support for mysql handler

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.


andylokandy	ariesdevil	b41sh	BohuTANG	dantengsky	drmingdrmer


everpcpc	flaneur2020	leiysky	lichuang	mergify[bot]	PsiACE


sandflee	soyeric128	sundy-li	TCeason	TracyZYJ	Xuanwo


xudong963	youngsofun	yufan022	zhang2014	zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

Databend Website
GitHub Discussions (Feature/Bug reports, Contributions)
Twitter (Get the news fast)
Slack Channel (For live discussion with the Community)

Speed up hive queries with databend

2022年11月28日 · 阅读需 2 分钟

sandflee

Backgroup

now databend support hive catalog to run hive queries, this docs shows how to set up databend-hive enviroment and run hive sqls.

How to set up databend-hive cluster

hiveserver&metastore&hdfs is supposed to be pre-installed.

download a databend-release with hive support, or build from source

## make sure JAVA_HOME is set
export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
cargo build --features hive,storage-hdfs

setup a databend cluster, refer to deploying-databend
add hive catalog and hdfs storage to databend-query.toml

[storage]
type = "hdfs"

[storage.hdfs]
# hdfs namenode address,such as 127.0.0.1:8020
name_node = "xx"
root = ""

[catalogs.hive]
type = "hive"
# hive metastore address, such as 127.0.0.1:9083
address = "xx"

run databend-query with java&hadoop enviroment

export HADOOP_HOME=xxx
export JAVA_HOME=xxx, such as /Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH

./bin/databend-query -c ./databend-query.toml > query.log 2>&1 &

setup hive related settings with mysql client

set global sql_dialect = 'hive';

suggest settings:

-- for chinese users
set global timezone = 'Asia/Shanghai';
set global max_execute_time = 180000;

-- support hive nvl function
create FUNCTION nvl as (a,b) -> ifnull(a,b);

query hive data using mysql client or mysql jdbc client. Note: hive tables must be reffered as hive.db.table

select * from hive.$db.$table limit 10;

Limititions

only support parquet table, not support orc,txt
not support struct&map&decimal hive data types
only support hive select queries, not support DDL, insert, DML sqls
not support hive udfs, hive functions are limited supported

hive features is now in beta stage, please feel free to report bugs&suggestions in databend issues.

Replace Bloom Filter with Xor Filter to Speed up Queries

2022年11月23日 · 阅读需 4 分钟

BohuTANG

Many optimizations are usually required for big data analytics to "reduce the distance to data." For example, using the Bloom Filter Index, queries can be filtered to determine whether data should be fetched from backend storage:

Alt text

Why We Replaced the Bloom Filter

Most popular databases use Bloom Filters to handle equivalent queries and avoid useless data readings. Databend also used the classic Bloom Filter algorithm in the first version (databend#6639). However, we found the Bloom Filter Index required plenty of storage space which even exceeded the data storage size (Databend automatically created Bloom indexes for some data types to make it easier for users to work with). The Bloom Filter Index didn't show a significant performance improvement because it is not much different from reading data directly from storage.

The reason is that Bloom Filter does not know the cardinality of the data when it is generated. Take the Boolean type as an example, the algorithm allocates space for it without considering the cardinality (2, True or False).

As a result, the Databend community began exploring new solutions and determined a feasible solution that uses HyperLoglog to sort out distinct values before allocating space.

At the TiDB User Conference on a Saturday in September, I met XP (@drmingdrmer) and talked about the solution again with him. He came up with using Trie to solve the problem. An excellent idea, but lots of work.

XP is a master for Trie, so I'm sure implementation would be no big deal for him. But I think some existing technologies might be able to help.

Alt text

Why Xor Filter?

A few explorations later, the Xor Filter algorithm, proposed by Daniel Lemire and his team in 2019: [Xor Filters: Faster and Smaller Than Bloom Filters, caught my attention(https://lemire.me/blog/2019/12/19/xor-filters-faster-and-smaller-than-bloom-filters/).

Alt text

I did a test (Xor Filter Bench) with the Rust version (xorfilter) and got a very positive result, so we replaced Bloom Filters with Xor Filters by databend#7860. Let's do a test and see how it performs with Xor Filters.

u64: 
xor bitmap encode:1230069 bytes, raw:8000000 bytes, ratio:0.15375863

bool:
xor bitmap encode:61 bytes, raw:1000000 bytes, ratio:0.000061

string:
xor bitmap encode:123067 bytes, raw:3000000 bytes, ratio:0.041022334

100000 records of the same key:
xor bitmap encode: 61 bytes, raw:3000000 bytes, ratio:0.000020333333

Test Environment

Databend: v0.8.122-nightly, single node
VM: 32 vCPU, 32 GiB (Cloud VM)
Object Storage: S3
Dataset: 10 billion records, 350G Raw Data, Xor Filter Index 700MB, all indexes and data are stored in object storage.
Table:

mysql> desc t10b;
+-------+-----------------+------+---------+-------+
| Field | Type            | Null | Default | Extra |
+-------+-----------------+------+---------+-------+
| c1    | BIGINT UNSIGNED | NO   | 0       |       |
| c2    | VARCHAR         | NO   |         |       |
+-------+-----------------+------+---------+-------+

Deploying Databend

Step 1: Download installation package

wget https://github.com/datafuselabs/databend/releases/download/v0.8.122-nightly/databend-v0.8.122-nightly-x86_64-unknown-linux-musl.tar.gz 
tar zxvf databend-v0.8.122-nightly-x86_64-unknown-linux-musl.tar.gz

You can find the following content after extracting the package:

tree
.
├── bin
│   ├── databend-meta
│   ├── databend-metabench
│   ├── databend-metactl
│   └── databend-query
├── configs
│   ├── databend-meta.toml
│   └── databend-query.toml
├── readme.txt
└── scripts
    ├── start.sh
    └── stop.sh

Step 2: Start Databend Meta

./bin/databend-meta -c configs/databend-meta.toml

Step 3：Configure Databend Query

For more information, refer to https://databend.rs/doc/deploy/deploying-databend

vim configs/databend-query.toml

... ...

[meta]
endpoints = ["127.0.0.1:9191"]
username = "root"
password = "root"
client_timeout_in_second = 60
auto_sync_interval = 60

# Storage config.
[storage]
# fs | s3 | azblob | obs
type = "s3"

# To use S3-compatible object storage, uncomment this block and set your values.
 [storage.s3]
bucket = "<your-bucket-name>"
endpoint_url = "<your-s3-endpoint>"
access_key_id = "<your-key>"
secret_access_key = "<your-access-key>"

Step 4: Start Databend Query

./bin/databend-query -c configs/databend-query.toml

Step 5: Construct the test data set

mysql -uroot -h127.0.0.1 -P3307

Construct 10 billion rows of test data (time: 16 min 0.41 sec):

create table t10b as select number as c1, cast(rand() as string) as c2 from numbers(10000000000)

Run a query (no cache, all data and indexes are in object storage):

mysql> select * from t10b where  c2='0.6622377673133426';
+-----------+--------------------+
| c1        | c2                 |
+-----------+--------------------+
| 937500090 | 0.6622377673133426 |
+-----------+--------------------+
1 row in set (20.57 sec)
Read 40000000 rows, 1009.75 MiB in 20.567 sec., 1.94 million rows/sec., 49.10 MiB/sec.

With the Xor Filter Index, it takes about 20 seconds for a single-node Databend to complete a point query on a scale of 10 billion rows. You can also speed up point queries by expanding a single Databend node into a cluster. Refer to https://databend.rs/doc/deploy/expanding-to-a-databend-cluster#deploying-a-new-query-node for details.

References

[1] Arxiv: Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

[2] Daniel Lemire’s blog: Xor Filters: Faster and Smaller Than Bloom Filters

[3] Databend, Cloud Lakehouse: https://github.com/datafuselabs/databend

Time Travel - Going Back to Move Forward

2022年10月10日 · 阅读需 4 分钟

wubx

Life is a journey, sometimes you must go back to move forward. When working with databases, you need to access the data history now and then. Time Travel is one of the most valuable features that Databend has rolled out. The time travel feature acts as a data recovery utility that enables you to restore a dropped table or get back a previous version of your data in a table. For example, when you accidentally delete a table or update some rows by mistake, you will need the help from Time Travel. This post sheds some light on what you can do with Time Travel in Databend.

First things first, you must know that not all the historical data can be restored based on the Databend retention policy. The default retention period is 24 hours, which means you can restore your historical data within 24 hours after it is deleted or outdated.

If you run SHOW TABLES HISTORY against a database, you will find the dropped tables (if any) and their drop time. The command does not list the dropped tables that have passed their retention period.

Restore a Dropped Table

When you delete a file from your computer, the file goes to the trash bin and you can restore the file by putting it back to its original folder.

In Databend, restoring a table is as easy as you restore a file from the trash bin. The UNDROP TABLE command makes a dropped table become available again with the data of the latest version. The "latest version" means that Databend recovers a table as well as the data that the table was holding when you deleted it.

Query Old Data

This is the most glorious part of the Time Travel story in Databend. When we say that you can get back a previous version of your data in a table, it does not mean that you roll back your table to an earlier point in time, it shows you the table's data at that point instead.

Databend automatically takes and saves a snapshot of your tables after each transaction that updates your table data. A version of a table's data practically refers to a snapshot that saves the data of the table when the snapshot was taken.

Databend provides a system function named FUSE_SNAPSHOT that enables you to find the saved snapshots. Each snapshot comes with a snapshot ID and a timestamp.

The saved snapshots are the behind-the-scenes heroes that make the time travel become true. So when you try to get back your history data, you need to tell Databend which version you want by the snapshot ID or the timestamp with an AT clause in the SELECT statement.

Create a New Table from Old Data

The Time Travel feature makes it possible to create an OLD table, which means you can create a table to hold and move on from a previous version of your data.

The CREATE TABLE statement can include a SNAPSHOT_LOCATION clause that allows you to specify a snapshot file that holds your old data. This command enables you to insert the data stored in the snapshot file when you create a table. Please note that the table you create must have same column definations as the data from the snapshot.

Go without Time Travel

Tables in Databend support Time Travel out-of-the-box. However, you might not need it for some cases, for example, when you're running low of your storage space or the data is big but unimportant. Databend currently does not provide a setting to switch it off, but you can CREATE TRANSIENT TABLE.

Transient tables are used to hold transitory data that does not require a data protection or recovery mechanism. Dataebend does not hold historical data for a transient table so you will not be able to query from a previous version of the transient table with the Time Travel feature, for example, the AT clause in the SELECT statement will not work for transient tables. Please note that you can still drop and undrop a transient table.

Designing and Querying JSON in Databend

2022年9月14日 · 阅读需 6 分钟

baishen

JSON (JavaScript Object Notation) is a commonly used semi-structured data type. With the self-describing schema structure, JSON can hold all data types, including multi-level nested data types, such as Array, Object, etc. JSON takes advantage of high flexibility and easy dynamic expansion compared with the structured data types that must strictly follow the fields in a tabular data structure.

As data volume increases rapidly in recent years, many platforms have started to use and get the most out of semi-structured data types (such as JSON). For example, the JSON data shared by various platforms through open interfaces, and the public datasets and application logs stored in JSON format.

Databend supports structured data types, as well as JSON. This post dives deeply into the JSON data type in Databend.

Working with JSON in Databend

Databend stores semi-structured data as the VARIANT (also called JSON) data type:

CREATE TABLE test
  (
     id INT32,
     v1 VARIANT,
     v2 JSON
  );

The JSON data needs to be generated by calling the parse_json or try_parse_json function. The input string must be in the standard JSON format, including Null, Boolean, Number, String, Array, and Object. In case of parsing failure due to invalid string, the parse_json function will return an error while the try_parse_json function will return a NULL value.

INSERT INTO test VALUES
  (1, parse_json('{"a":{"b":1,"c":[1,2]}}'), parse_json('[["a","b"],{"k":"a"}]')),
  (2, parse_json('{"a":{"b":2,"c":[3,4]}}'), parse_json('[["c","d"],{"k":"b"}]'));

SELECT * FROM test;
+----+-------------------------+-----------------------+
| id | v1                      | v2                    |
+----+-------------------------+-----------------------+
| 1  | {"a":{"b":1,"c":[1,2]}} | [["a","b"],{"k":"a"}] |
| 2  | {"a":{"b":2,"c":[3,4]}} | [["c","d"],{"k":"b"}] |
+----+-------------------------+-----------------------+

JSON usually holds data of Array or Object type. Due to the nested hierarchical structure, the internal elements can be accessed through JSON PATH. The syntax supports the following delimiters:

:: Colon can be used to obtain the elements in an object by the key.
.: Dot can be used to obtain the elements in an object by the key. Do NOT use a dot as the first delimiter in a statement, or Databend would consider the dot as the delimiter to separate the table name from the column name.
[]: Brackets can be used to obtain the elements in an object by the key or the elements in an array by the index.

You can mix the three types of delimiters above.

SELECT v1:a.c, v1:a['b'], v1['a']:c, v2[0][1], v2[1].k FROM test;

+--------+-----------+-----------+----------+---------+
| v1:a.c | v1:a['b'] | v1['a']:c | v2[0][1] | v2[1].k |
+--------+-----------+-----------+----------+---------+
| [1,2]  | 1         | [1,2]     | "b"      | "a"     |
| [3,4]  | 2         | [3,4]     | "d"      | "b"     |
+--------+-----------+-----------+----------+---------+

The internal elements extracted through JSON PATH are also of JSON type, and they can be converted to basic types through the cast function or using the conversion operator ::.

SELECT cast(v1:a.c[0], int64), v1:a.b::int32, v2[0][1]::string FROM test;

+--------------------------+---------------+------------------+
| cast(v1:a.c[0] as int64) | v1:a.b::int32 | v2[0][1]::string |
+--------------------------+---------------+------------------+
| 1                        | 1             | b                |
| 3                        | 2             | d                |
+--------------------------+---------------+------------------+

Parsing JSON from GitHub

Many public datasets are stored in JSON format. We can import these data into Databend for parsing. The following introduction uses the GitHub events dataset as an example.

The GitHub events dataset (downloaded from GH Archive) uses the following JSON format:

{
  "id":"23929425917",
  "type":"PushEvent",
  "actor":{
    "id":109853386,
    "login":"teeckyar-bot",
    "display_login":"teeckyar-bot",
    "gravatar_id":"",
    "url":"https://api.github.com/users/teeckyar-bot",
    "avatar_url":"https://avatars.githubusercontent.com/u/109853386?"
  },
  "repo":{
    "id":531248561,
    "name":"teeckyar/Times",
    "url":"https://api.github.com/repos/teeckyar/Times"
  },
  "payload":{
    "push_id":10982315959,
    "size":1,
    "distinct_size":1,
    "ref":"refs/heads/main",
    "head":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
    "before":"0a2871cb7e61ce47a6790adaf09facb6e1ef56ba",
    "commits":[
      {
        "sha":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
        "author":{
          "email":"support@teeckyar.ir",
          "name":"teeckyar-bot"
        },
        "message":"1662804002 Timehash!",
        "distinct":true,
        "url":"https://api.github.com/repos/teeckyar/Times/commits/670e7ca4085e5faa75c8856ece0f362e56f55f09"
      }
    ]
  },
  "public":true,
  "created_at":"2022-09-10T10:00:00Z",
  "org":{
    "id":106163581,
    "login":"teeckyar",
    "gravatar_id":"",
    "url":"https://api.github.com/orgs/teeckyar",
    "avatar_url":"https://avatars.githubusercontent.com/u/106163581?"
  }
}

From the data above, we can see that the actor, repo, payload, and org fields have a nested structure and can be stored as JSON. Others can be stored as basic data types. So we can create a table like this:

CREATE TABLE `github_data`
             (
                          `id`   VARCHAR,
                          `type` VARCHAR,
                          `actor` JSON,
                          `repo` JSON,
                          `payload` JSON,
                          `public` BOOLEAN,
                          `created_at` timestamp,
                          `org` json
             );

Use the COPY INTO command to load the data:

COPY INTO github_data
FROM 'https://data.gharchive.org/2022-09-10-10.json.gz'
FILE_FORMAT = (
  compression = auto
  type = NDJSON
);

The following code returns the top 10 projects with the most commits:

SELECT   repo:name,
         count(id)
FROM     github_data
WHERE    type = 'PushEvent'
GROUP BY repo:name
ORDER BY count(id) DESC
LIMIT    10;

+----------------------------------------------------------+-----------+
| repo:name                                                | count(id) |
+----------------------------------------------------------+-----------+
| "Lombiq/Orchard"                                         | 1384      |
| "maique/microdotblog"                                    | 970       |
| "Vladikasik/statistic"                                   | 738       |
| "brokjad/got_config"                                     | 592       |
| "yanonono/booth-update"                                  | 537       |
| "networkoperator/demo-cluster-manifests"                 | 433       |
| "kn469/web-clipper-bed"                                  | 312       |
| "ufapg/jojo"                                             | 306       |
| "bj5nj7oh/bj5nj7oh"                                      | 291       |
| "appseed-projects2/500f32d3-8019-43ee-8f2a-a273163233fb" | 247       |
+----------------------------------------------------------+-----------+

The following code returns the top 10 users with the most forks:

SELECT   actor:login,
         count(id)
FROM     github_data
WHERE    type='ForkEvent'
GROUP BY actor:login
ORDER BY count(id) DESC
LIMIT    10;

+-----------------------------------+-----------+
| actor:login                       | count(id) |
+-----------------------------------+-----------+
| "actions-marketplace-validations" | 191       |
| "alveraboquet"                    | 59        |
| "ajunlonglive"                    | 50        |
| "Shutch420"                       | 13        |
| "JusticeNX"                       | 13        |
| "RyK-eR"                          | 12        |
| "DroneMad"                        | 10        |
| "UnqulifiedEngineer"              | 9         |
| "PeterZs"                         | 8         |
| "lgq2015"                         | 8         |
+-----------------------------------+-----------+

Performance Optimization

The JSON data generally is saved in plaintext format and needs to be parsed to generate the enumeration value of serde_json::Value every time the data is read. Compared to other basic data types, handling JSON data takes more parsing time and needs more memory space.

Databend has improved the read performance of JSON data using the following methods:

To speed up the parsing and reduce memory usage, Databend stores the JSON data as JSONB in binary format and uses the built-in j_entry structure to hold data type and offset position of each element.
Adding virtual columns to speed up the queries. Databend extracts the frequently queried fields and the fields of the same data type and stores them as separate virtual columns. Data will be directly read from the virtual columns when querying, which makes Databend achieve the same performance as querying other basic data types.

Find Peter Parker in Databend

2022年9月8日 · 阅读需 2 分钟

wubx

The most impressive part of the movie Spider-Man: No Way Home is "three generations coming together. In the story, when the spider-man's friend repeats the magic words "Find Peter Parker!", we surprisingly see two old friends on the screen, the previous generations of spider-man. They travel from other universes to join forces and develop cures for the villains.

Did you know that you have a similar magic power in Databend? That is, you can always get back the previous versions of your data in a few simple steps whenever you need them. The secret is Databend automatically creates and saves snapshots with timestamps of your tables when a data updating occurs, so you can track the history of a table and see how what it looked like at a time point in the past.

The following code creates a table first, and then inserts values with three separate SQL statements:

create table spiderman(gen int, nickname varchar);

insert into spiderman values(1,'Peter-1');
insert into spiderman values(2,'Peter-2');
insert into spiderman values(3,'Peter-3');

Databend creates and saves three snapshots for the code above. Each one holds a historical version of the data in the table.

To find them, use the system function FUSE_SNAPSHOT. The function returns everything you may need to know about the saved snapshots of a table, such as the snapshot IDs, timestamps, and locations.

select snapshot_id,previous_snapshot_id, timestamp from fuse_snapshot('default','spiderman');

---
+----------------------------------+----------------------------------+----------------------------+
| snapshot_id                      | previous_snapshot_id             | timestamp                  |
+----------------------------------+----------------------------------+----------------------------+
| 34b8df220edc4d8cb9e3e76118788686 | 4bb479751b7144d8aa2b53e5b281453f | 2022-08-30 01:18:53.202724 |
| 4bb479751b7144d8aa2b53e5b281453f | a2801ed9656d42c9812f2921214f0795 | 2022-08-30 01:18:35.597615 |
| a2801ed9656d42c9812f2921214f0795 | NULL                             | 2022-08-30 01:18:21.750208 |
+----------------------------------+----------------------------------+----------------------------+

You can now query the history data with a snapshot or timestamp by including the AT clause in the SELECT statement:

select * from spiderman at(snapshot=>'a2801ed9656d42c9812f2921214f0795');

---
+------+----------+
| gen  | nickname |
+------+----------+
| 1    | Peter-1  |
+------+----------+

select * from spiderman at(timestamp=>'2022-08-30 01:18:21.750208'::timestamp);

---
+------+----------+
| gen  | nickname |
+------+----------+
| 1    | Peter-1  |
+------+----------+

The "magic" explained above is part of the powerful Time Travel feature of Databend that enables you to query, back up, or restore from a specified historical version of your data. That's not all about the feature. You can do more with the snapshots to make your work easier. Join the Databend community to find out more "magic tricks".

What's Fresh in Databend v0.8

2022年8月18日 · 阅读需 7 分钟

Xuanwo

Hello, everyone! I'm Xuanwo. Today, on behalf of the Databend community, I would like to announce the official release of v0.8.

Development of Databend v0.8 started on March 28th, with 5000+ commits and 4600+ file changes. In the last 5 months, the community of 120+ contributors added 420K lines of code and removed 160K lines, equivalent to rewriting Databend once. In this release, the community made significant improvements to the SQL Planner framework and migrated all SQL statements to the new Planner, providing full JOIN and subquery support.

Click here to download Databend v0.8

Let's see what has been done in v0.8.

What's Databend?

Databend is a modern cloud data warehouse based on Rust that enables high-performance, elastic and scalable real-time data analysis and activates the data potential of users.

databend-arch

Significant improvements

New Planner: JOIN! JOIN! JOIN!

To better support complex SQL queries and improve user experience, Databend v0.8 is designed with a new Planner framework.

Databend has added JOIN and proper subquery support, driven by New Planner.

select vip_info.Client_ID, vip_info.Region 
    from vip_info right 
    join purchase_records 
    on vip_info.Client_ID = purchase_records.Client_ID;

New Parser: The Best Parser！

While refactoring Planner, the databend community has implemented a new nom-based Parser that balances development efficiency with user experience.

New Parser makes it easy for developers to design/develop/test complex SQL syntax in an intuitive way

COPY
    ~ INTO ~ #copy_unit
    ~ FROM ~ #copy_unit
    ~ ( FILES ~ "=" ~ "(" ~ #comma_separated_list0(literal_string) ~ ")")?
    ~ ( PATTERN ~ "=" ~ #literal_string)?
    ~ ( FILE_FORMAT ~ "=" ~ #options)?
    ~ ( VALIDATION_MODE ~ "=" ~ #literal_string)?
    ~ ( SIZE_LIMIT ~ "=" ~ #literal_u64)?

It also gives the user specific and precise information about the error.

MySQL [(none)]> select number from numbers(10) as t inner join numbers(30) as t1 using(number);
ERROR 1105 (HY000): Code: 1065, displayText = error:
  --> SQL:1:8
  |
1 | select number from numbers(10) as t inner join numbers(30) as t1 using(number)
  |        ^^^^^^ column reference is ambiguous

No more worrying about not knowing what's wrong with SQL. Visit The New Databend SQL Planner for more information.

New Features

In addition to the newly designed Planner, the Databend community has implemented a number of new features.

COPY Enhancement

COPY capabilities have been greatly enhanced, and Databend can now:

Copy data from any supported storage service (even https!)

COPY 
    INTO ontime200 
    FROM 'https://repo.databend.rs/dataset/stateful/ontime_2006_[200-300].csv' 
    FILE_FORMAT = (TYPE = 'CSV')

Support for copying compressed files

COPY 
    INTO ontime200 
    FROM 's3://bucket/dataset/stateful/ontime.csv.gz' 
    FILE_FORMAT = (TYPE = 'CSV' COMPRESSION=AUTO)

UNLOAD data to any supported storage service

COPY 
    INTO 'azblob://bucket/'  
    FROM ontime200
    FILE_FORMAT = (TYPE = 'PARQUET‘)

Hive Support

Databend v0.8 designed and developed the Multi Catalog and implemented Hive Metastore support on top of it!

Databend can now interface directly to Hive and read data from HDFS.

select * from hive.default.customer_p2 order by c_nation;

Time Travel

A long time ago, the Databend community shared an implementation of the underlying FUSE Engine, From Git to Fuse Engine, where one of the most important features was the support for time travel, allowing us to query data tables at any point in time.

Starting from v0.8, this feature is now officially installed and we can now

Query the data table for a specified time

-- Travel to the time when the last row was inserted
select * from demo at (TIMESTAMP => '2022-06-22 08:58:54.509008'::TIMESTAMP); 
+----------+
| c        |
+----------+
| batch1.1 |
| batch1.2 |
| batch2.1 |
+----------+

Recover mistakenly deleted data tables

DROP TABLE test;

SELECT * FROM test;
ERROR 1105 (HY000): Code: 1025, displayText = Unknown table 'test'.

-- un-drop table
UNDROP TABLE test;

-- check
SELECT * FROM test;
+------+------+
| a    | b    |
+------+------+
|    1 | a    |
+------+------+

Make business data have more security!

CTE Support

CTE (Common Table Expression) is a frequently used feature in OLAP business to define a temporary result set within the execution of a single statement, which is valid only during the query period, enabling the reuse of code segments, improving readability and better implementation of complex queries.

Databend v0.8 re-implements the CTE based on New Planner and now users can happily use WITH to declare the CTE.

WITH customers_in_quebec 
     AS (SELECT customername, 
                city 
         FROM   customers 
         WHERE  province = 'Québec') 
SELECT customername 
FROM   customers_in_quebec
WHERE  city = 'Montréal' 
ORDER  BY customername; 

In addition to these features mentioned above, Databend v0.8 also supports UDFs, adds DELETE statements, further enhances support for semi-structured data types, not to mention the numerous SQL statement improvements and new methods added. Thanks to all the contributors to the Databend community, without you all the new features mentioned here would not have been possible!

Quality Enhancement

Feature implementation is just the first part of product delivery. In Databend v0.8, the community introduced the concept of engineering quality, which evaluates the quality of Databend development in three dimensions: users, contributors, and community.

Reassuring users

In order for users to use Databend with confidence, the community has added a lot of tests over the last three months, fetching stateless test sets from YDB and others, adding stateful tests for ontime, hits and other datasets, putting SQL Logic Test online to cover all interfaces, and enabling SQL Fuzz testing to cover boundary cases.

Furthermore, the community has also gone live with Databend Perf to do continuous performance testing of Databend in production environments to catch unexpected performance regressions in time.

Make contributors comfortable

Databend is a large Rust project that has been criticized by the community for its build time.

To improve this issue and make contributors feel comfortable, the community went live with a highly configurable, specially tuned Self-hosted Runner to perform integration tests for PR and enabled several services or tools such as Mergify, mold, dev-tools, etc. to optimize the CI process.

We also initiated a new plan to restructure the Databend project, splitting the original huge query crate into multiple sub-crates to avoid, as much as possible, the situation of changing one line of code and check execution for five minutes.

Keeping the community happy

Databend is a contributor and participant in the open source community. During the development of v0.8, the Databend community established the principle of Upstream First, actively following and adopting the latest upstream releases, giving feedback on known bugs, contributing their own patches, and starting Tracking issues of upstream first violation to keep up with the latest developments.

The Databend community is actively exploring integration with other open source projects and has already implemented integration and support for third-party drivers such as Vector, sqlalchemy, clickhouse-driver, etc.

Next Steps

Databend v0.8 is a solid foundation release with a new Planner that makes it easier to implement features and make optimizations. In version 0.9, we expect improvements in the following areas.

Query Result Cache
JSON Optimization
Table Share
Processor Profiling
Resource Quota
Data Caching

Please check the Release proposal: Nightly v0.9 for the latest news~

Get going now!

Visit the release log and download the latest version to learn more, and feel free to submit feedback using Github Issues if you encounter problems!

Benchmarking Databend using TPC-H

2022年8月8日 · 阅读需 5 分钟

BohuTANG

This post gives you a general idea about the TPC-H benchmark and explains how to run a TPC-H benchmark on Databend.

What's TPC-H?

TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

The TPC-H benchmark simulates a system for online sales of parts and components and defines eight tables in total. The structure, data volume, and mutual relationship of each table are shown in the figure below:

The benchmark workload consists of twenty-two decision support queries that must be executed as part of the TPC-H benchmark. Each TPC-H query asks a business question and includes the corresponding query to answer the question. More information about TPC-H can be found at https://www.tpc.org/tpch/.

Running TPC-H Benchmark on Databend

This section describes the steps to run the TPC-H benchmark on Databend and provides the related scripts.

Step 1: Generate test data with TPC-H Docker

The following code pulls a docker image and allocates the data in the path where you are running the TPC-H benchmark.

docker pull ghcr.io/databloom-ai/tpch-docker:main
docker run -it -v "$(pwd)":/data ghcr.io/databloom-ai/tpch-docker:main dbgen -vf -s 1

TPC-H comes with various data set sizes to test different scale factors. You can use the -s option to set scale factor in the command (for example, the code above sets the scale factor to 1). For more information about the command, see https://github.com/databloom-ai/TPCH-Docker.

SF (Gigabytes)	Size
1	Consists of the base row size (several million elements).
10	Consists of the base row size x 10.
100	Consists of the base row size x 100 (several hundred million elements).
1000	Consists of the base row size x 1000 (several billion elements).

Step 2: Create database and tables

CREATE DATABASE IF NOT EXISTS tpch;

USE tpch;

CREATE TABLE IF NOT EXISTS nation
  (
     n_nationkey INTEGER NOT NULL,
     n_name      VARCHAR NOT NULL,
     n_regionkey INT NOT NULL,
     n_comment   VARCHAR
  );

CREATE TABLE IF NOT EXISTS region
  (
     r_regionkey INT NOT NULL,
     r_name      VARCHAR NOT NULL,
     r_comment   VARCHAR
  );

CREATE TABLE IF NOT EXISTS part
  (
     p_partkey     INT NOT NULL,
     p_name        VARCHAR NOT NULL,
     p_mfgr        VARCHAR NOT NULL,
     p_brand       VARCHAR NOT NULL,
     p_type        VARCHAR NOT NULL,
     p_size        INT NOT NULL,
     p_container   VARCHAR NOT NULL,
     p_retailprice FLOAT NOT NULL,
     p_comment     VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS supplier
  (
     s_suppkey   INT NOT NULL,
     s_name      VARCHAR NOT NULL,
     s_address   VARCHAR NOT NULL,
     s_nationkey INT NOT NULL,
     s_phone     VARCHAR NOT NULL,
     s_acctbal   FLOAT NOT NULL,
     s_comment   VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS partsupp
  (
     ps_partkey    INT NOT NULL,
     ps_suppkey    INT NOT NULL,
     ps_availqty   INT NOT NULL,
     ps_supplycost FLOAT NOT NULL,
     ps_comment    VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS customer
  (
     c_custkey    INT NOT NULL,
     c_name       VARCHAR NOT NULL,
     c_address    VARCHAR NOT NULL,
     c_nationkey  INT NOT NULL,
     c_phone      VARCHAR NOT NULL,
     c_acctbal    FLOAT NOT NULL,
     c_mktsegment VARCHAR NOT NULL,
     c_comment    VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS orders
  (
     o_orderkey      INT NOT NULL,
     o_custkey       INT NOT NULL,
     o_orderstatus   VARCHAR NOT NULL,
     o_totalprice    FLOAT NOT NULL,
     o_orderdate     DATE NOT NULL,
     o_orderpriority VARCHAR NOT NULL,
     o_clerk         VARCHAR NOT NULL,
     o_shippriority  INT NOT NULL,
     o_comment       VARCHAR NOT NULL
  );

CREATE TABLE IF NOT EXISTS lineitem
  (
     l_orderkey      INT NOT NULL,
     l_partkey       INT NOT NULL,
     l_suppkey       INT NOT NULL,
     l_linenumber    INT NOT NULL,
     l_quantity      FLOAT NOT NULL,
     l_extendedprice FLOAT NOT NULL,
     l_discount      FLOAT NOT NULL,
     l_tax           FLOAT NOT NULL,
     l_returnflag    VARCHAR NOT NULL,
     l_linestatus    VARCHAR NOT NULL,
     l_shipdate      DATE NOT NULL,
     l_commitdate    DATE NOT NULL,
     l_receiptdate   DATE NOT NULL,
     l_shipinstruct  VARCHAR NOT NULL,
     l_shipmode      VARCHAR NOT NULL,
     l_comment       VARCHAR NOT NULL
  ); 

Step 3: Load test data to Databend

This step uses the HTTP API v1/streaming_load to load the test data to Databend. More information about this API can be found at https://databend.rs/doc/load-data/local.

The code below connects to Databend using the Root user. Please note that the root user only works when you access Databend from localhost. You will need to create new users and grant proper privileges first to connect to Databend remotely.

#!/bin/bash

for t in customer lineitem nation orders partsupp part region supplier
do
    echo "$t"
    curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into tpch.'$t' file_format = (type = "CSV" field_delimiter = "|" record_delimiter = "\n")' -F 'upload=@"./'$t'.tbl"'
done

Step 4: Run TPC-H queries

All the definitions of the TPC-H queries can be found at https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf. You can simply run them by copying and pasting the scripts to Databend.

The Databend team ran the TPC-H benchmark around two months ago and uploaded their queries and results to GitHub. You can find them at https://github.com/datafuselabs/databend/tree/main/tests/suites/0_stateless/13_tpch. Please note that Databend now uses the new planner by default, so you DO NOT need to enable it any more before running the queries.

Many Ways to Load Data into Databend

2022年7月28日 · 阅读需 1 分钟

Eric

Data is important. If you're moving your business to the cloud to take advantage of the cloud-native features, the first thing you need to consider might be how to load your data to the cloud.

There are many ways to load data into Databend. You can use the command-line interface (CLI), API, or the mysqldump client utility, depending on where your data is stored.

The powerful COPY INTO commands allow you to load data from:

Files in an S3 bucket or a blob storage container.
Staged files (internal or external).
Files in a remote server.

If you have local data files to load, use the HTTP API v1/streaming_load to upload them to Databend.

Please note that Databend supports loading data from files in these formats:

CSV
JSON
Parquet

If you're coming from MySQL, Databend can also use a dump file (*.sql) created by the mysqldump client utility to load your data from MySQL.

Detailed explanations about loading data in different scenarios can be found at https://databend.rs/doc/load-data.

They Use Mergify - Databend

2022年7月20日 · 阅读需 5 分钟

Xuanwo

提示

This post was originally published by Anne-Laure Civeyrac on https://mergify.com.

Every day, major projects use Mergify to automate their GitHub workflow. Whether they have a core team of 3 or 50 people, the one thing they all have in common is that the project leads are willing to let their developers focus on what’s really important—code. So we decided to meet with some of them to get to know more about the challenges they face and discover how Mergify helps their teams be more efficient when it comes to pull requests. This time, we sat down (virtually) with Xuanwo, an infrastructure engineer who oversees automation, distributed systems and storage for the Databend project.

Xuanwo

Please could you give us a brief outline of the Databend project.

Of course! So Databend is a modern elasticity and performance cloud data warehouse. It uses the latest techniques in vectorized query processing to allow people to do blazing-fast data analytics on object storage platforms like S3, Azure Blob, or MinIO. It was mainly inspired by ClickHouse and Snowflake and focuses on online analytical processing.

How many people are currently working on the project?

Right now, we have more than 100 contributors on Databend, with about 30 of them contributing continuously.

What’s your GitHub workflow on this project?

Databend is a very new project and it doesn’t have a stable release yet, so our pull request [PR] workflow is quite simple. All our contributions go through GitHub PRs. For every PR, we use GitHub Actions as the CI, where we run cargo check, cargo fmt, cargo clippy, and all our test cases. If all the checks pass, we merge the PR. And once on the main branch, the PR runs production CI, which goes through all test cases with the release build. We then release a nightly version daily, uploading our release builds to GitHub releases and the Docker Hub Registry.

How many people need to approve PRs?

We need two approvals for a PR to be merged.

What are the main challenges with this workflow?

Mainly the PR merge speed! Our developers are coding in Rust, which is known for not being very good with compilation speed. We typically need about 30 minutes to finish all our checks, and most of this time is consumed by rustc, the compiler for Rust. To make Rust compile faster, we set up our own self-hosted GitHub Actions runners, which have a very high performance. Thanks to this change, our PR merge time was reduced from 60 to 30 minutes.

What made the project team start using Mergify in the first place?

Before coming across Mergify, we enabled the option on GitHub that requires code to be updated before merging. But it added a lot of work for our maintainers, who had to merge the main branch repeatedly.

So to make their lives easier, we implemented Mergify to update the branch automatically and merge PRs after all tests have passed.

Which Mergify features do you use the most?

The automatic merge! And with the help of the Mergify team, we enabled the merge queue feature as well. So now, we can merge multiple PRs simultaneously with only one CI check. This helped us significantly reduce the waiting time for the CI.

What is your favorite Mergify feature and why?

Oh, I love PR actions. Although I can implement the same features with GitHub Actions, I find Mergify’s PR actions more simple and exciting. For example, last week, we introduced a new requirement that every PR must be semantic—we want all PRs to contain a valid title starting with a type like “fix”, “feat”, or “refactor”.

Actions

With the help of Mergify, I only needed to create some rules, such as adding corresponding labels if the PR title starts with “fix”, comment in the PR with a help message if the PR title doesn’t fulfill the requirements, and add post checks so that we can mark the PR as not mergeable.

This feature is very cool, and I would love to have it on issues too!

What has been the most significant impact of using Mergify on your team’s performance so far?

Our teams don’t need to worry about merges anymore! We can start jobs without having to wait on the PRs.

What would be your n°1 tip for someone new to Mergify?

Don’t try to migrate all your workloads to Mergify in one go. Migrating things one by one and progressively will make your lives easier.

If you had time to contribute to the Mergify project, what would your contribution be about?

I used to contribute to Mergify regarding a small documentation typo. But I am not comfortable contributing to more complex issues because Mergify uses Python, a programming language I’m unfamiliar with. But if I could contribute to one feature, it would be about commands. Mergify only supports a small set of commands. Maybe we could define new commands in PR rules and allow users to call them with Mergify bots. This feature would be exciting to me!

The New Databend SQL Planner

2022年7月19日 · 阅读需 5 分钟

Yu Lei

To support complex SQL queries and improve user experience, a large-scale refactoring work for Databend's SQL planner was started several months ago. At present, the refactoring is coming to an end. You can now modify the Session settings of Databend as follows to enable the new planner for early access:

Feature Highlights

A more friendly query experience

Data analysts and developers usually get various errors when coding SQL queries, and troubleshooting can be a nightmare when the queries are complex. I hate MySQL's error prompts because I have coded a query with dozens of JOIN clauses.

The new planner now includes some passes for strict semantic checking so that most errors can be intercepted during the compilation. A new error prompt algorithm was also introduced to help users locate the errors. When there is invalid syntax in your SQL query (for example, misspelled keywords or missing clauses), you will receive an error message that is more instructive.

If your SQL query has a semantic error (for example, you reference a column that is ambiguous, or a column does not exist at all), Databend can help you locate it.

You can also get a better experience when coding complex queries:

Support for JOIN queries and correlated subqueries

The new SQL planner supports JOIN queries (INNER JOIN, OUTER JOIN, CROSS JOIN) and correlated subqueries, and provides a Hash Join algorithm to execute JOIN queries.

For more information about how to use JOIN in Databend, go to https://databend.rs/doc/reference/sql/query-syntax/dml-join

JOIN is a very important part of the OLAP query. In traditional star and snowflake schemas, we join dimensional tables with fact tables through the JOIN query to generate the resulting report.

TPC-H Benchmark is a set of OLAP query benchmarks developed by the TPC committee to evaluate the OLAP capabilities of database systems. It contains the following eight tables:

Lineitem: Holds product information.
Orders: Holds order information.
Customer: Holds customer information.
Part: Holds parts information.
Supplier: Holds supplier information.
Partsupp: Parts-Supplier Relationship Table
Nation: Holds nation information.
Region: Holds region information.
TPC-H includes 22 complex queries, corresponding to different business needs. The new SQL planner now supports the Q9 query that calculates the profit amount for a specified year and region using a large number of JOIN calculations:

Correlated subqueries are also an essential part of SQL for coding complex queries. The Q4 query of TPC-H shows the order delivery status of various priority levels over a period of time and uses a correlated subquery with the EXISTS clause to filter overdue orders:

Brand New Architecture
We redesigned the process of SQL parsing for the new SQL planner to support more complex semantic analysis and SQL optimization.

After the client sends a SQL statement to the databend-query server, the components in the new SQL planner process the SQL statement in the order shown in the flowchart below before returning the query result to the client:

The Parser starts to parse a SQL query after receiving it. If a syntax error is found during the parsing, the error information will be directly returned to the client; If the parsing is successful, an AST (Abstract Syntax Tree) for the query will be constructed.

Parser

To provide more powerful syntax analysis functions and a better development experience, we have developed a DSL (Domain Specific Language) nom-rule based on the nom Parser combinator and rewritten SQL Parser based on this framework.

With this framework, we can easily define the syntax for a statement. Taking the CREATE TABLE statement as an example, we can use DSL to describe it as follows:

The elegant syntax brings more fun to the work of coding a parser. Try it out if you’re interested.

Binder

After the AST is successfully parsed by the Parser, we will semantically analyze it through Binder and generate an initial logical plan. During this process, we perform different types of semantic analysis:

Name resolution: Check the validity of the variables referenced in the SQL query by querying the relevant table and column object information in the Databend Catalog and bind the valid variables to their corresponding objects for subsequent analysis.
Type check: Check the validity of the expression according to the information obtained in the name resolution, and find a proper return type for the expression.
Subquery unnesting: Extract the subquery from the expression and translate it into relational algebra.
Grouping check: For queries with aggregate calculations, check whether non-aggregate columns are referenced.

With semantic analysis, we can eliminate most semantic errors and return them to the user during the compilation to provide the best troubleshooting experience.

Optimizer

After getting the initial logical plan, the optimizer will rewrite and optimize it and, finally, generate an executable physical plan.

The new planner introduced a set of Transformer Rule-based optimizer frameworks (Volcano/Cascades). An independent rule can be implemented by defining a relational algebra sub-tree structure pattern with related transform logic.

Take Predicate Push Down as a simple example:

We only need to define the pattern of the input plan:

And then implement a conversion function:

Interpreter

After the physical plan is generated by the Optimizer, we will translate it into an executable pipeline and hand it over to Databend's processor execution framework for calculation.

What's Next

Building a SQL planner from the ground up is a very challenging job, but the redesign and development let us find the most suitable architecture and functionalities for the system. In the future, we will continue to improve and consolidate the new SQL planner on these functions:

Cost-based Optimization (CBO)
Distributed query optimization
More optimization rules

Currently, we’re in the middle of migrating to the new SQL planner. We will release an announcement when the migration is complete (around July 2022). Stay tuned.

From Git to Fuse Engine

2022年7月5日 · 阅读需 7 分钟

BohuTANG

Databend, developed with Rust, is a new open-source data warehouse architected toward the cloud. It is committed to providing fast elastic expansion capabilities and a pay-as-you-go user experience. GitHub：https://github.com/datafuselabs/databend

Introduction

This post explains the Databend base: Fuse Engine, a powerful columnar storage engine. The engine was designed by the Databend community with the following principles: Powerful performance, simple architecture, and high reliability.

Before we start, check out a challenging task that Databend completed: With the Fuse Engine deployed on AWS S3, a transaction wrote 22.89 TB of raw data in around one and a half hour.

mysql> INSERT INTO ontime_new SELECT * FROM ontime_new;
Query OK, 0 rows affected (1 hour 34 min 36.82 sec)
Read 31619274180 rows, 22.89 TB in 5675.207 sec., 5.57 million rows/sec., 4.03 GB/sec.

Meanwhile, the following conditions were met as well:

Distributed transactions: Multiple computing nodes can read and write the same data simultaneously (This is the first problem that an architecture that separates storage from compute must solve).
Snapshot isolation: Different versions of data do not affect each other so you can do Zero-Copy Cloning for tables.
Retrospective ability: You are allowed to switch to any version of the data, so you can recover with the Time Travel feature.
Data merging: A new version of data can be generated after merging.
Simple and robust: Data relationships are described using files, and you can recover entire data system based on these files.

From above, you will find that Fuse Engine is "Git-inspired". Before introducing the design of Fuse Engine, let's take a look at how the bottom layer of Git works.

How Git Works

Git implements data version control (including branch, commit, checkout, and merge) in a distributed environment. Based on Git semantics, it is possible to create a distributed storage engine. There are also some products built on Git-like on the market, such as Nessie - Transactional Catalog for Data Lakes and lakeFS.

To better explore the underlying working mechanism of Git, we use Git semantics to complete a series of "data" operations from the perspective of the database.

Prepare a file named cloud.txt with the content:

2022/05/06, Databend, Cloud

Commit the file cloud.txt to Git.

git commit -m "Add cloud.txt"

Git generates a snapshot (Commit ID: 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71).

git log 
commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71 (HEAD)
Author: BohuTANG <overred.shuttler@gmail.com>
Date:   Fri May 6 16:44:21 2022 +0800

    Add cloud.txt

Prepare another file named warehouse.txt.

2022/05/07, Databend, Warehouse

Commit the file warehouse.txt to Git.

git commit -m "Add warehouse.txt"

Git generates another snapshot (Commit ID: 15af34e4d16082034e1faeaddd0332b3836f1424).

commit 15af34e4d16082034e1faeaddd0332b3836f1424 (HEAD)
Author: BohuTANG <overred.shuttler@gmail.com>
Date:   Fri May 6 17:41:43 2022 +0800

    Add warehouse.txt

commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71
Author: BohuTANG <overred.shuttler@gmail.com>
Date:   Fri May 6 16:44:21 2022 +0800

    Add cloud.txt

Git now keeps two versions of the data:

ID 15af34e4d16082034e1faeaddd0332b3836f1424，Version2
ID 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71，Version1

We can switch between versions by the Commit ID, which implements the functions of Time Travel and Table Zero-Copy. How does Git make it possible in the bottom layer? It's not rocket science. Git introduces these types of object files to describe the relationship:

Commit: Describes tree object information
Tree: Describes blob object information
Blob: Describes file information

HEAD File

First, we need to know the HEAD pointer:

cat .git/HEAD
15af34e4d16082034e1faeaddd0332b3836f1424

Commit File

The Commit file records metadata related to the commit, such as the current tree and parent, as well as the committer, etc.

File path:

.git/objects/15/af34e4d16082034e1faeaddd0332b3836f1424

File content:

git cat-file -p 15af34e4d16082034e1faeaddd0332b3836f1424

tree 576c63e580846fa6df2337c1f074c8d840e0b70a
parent 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71
author BohuTANG <overred.shuttler@gmail.com> 1651830103 +0800
committer BohuTANG <overred.shuttler@gmail.com> 1651830103 +0800

Add warehouse.txt

Tree File

The Tree file records all the files of the current version.

File path:

.git/objects/57/6c63e580846fa6df2337c1f074c8d840e0b70a

File content:

git cat-file -p 576c63e580846fa6df2337c1f074c8d840e0b70a

100644 blob 688de5069f9e873c7e7bd15aa67c6c33e0594dde    cloud.txt
100644 blob bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb    warehouse.txt

Blob File

The Blob files are raw data files. You can veiw the file content using git cat-file (if you use Git to manage code, blobs are the code files).

git cat-file -p 688de5069f9e873c7e7bd15aa67c6c33e0594dde
2022/05/06, Databend, Cloud

git cat-file -p bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb
2022/05/07, Databend, Warehouse

Fuse Engine

Databend's Fuse Engine was designed in a way similar to Git. It introduces three description files：

Snapshot: Describes segment object information.
Segment: Describes block object information.
Block: Describes parquet file information.

Let's repeat the operations we just did with Git in Fuse Engine.

Create a table.

CREATE TABLE git(file VARCHAR, content VARCHAR);

Write cloud.txt to Fuse Engine.

 INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud');

Fuse Engine generates a snapshot (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

CALL system$fuse_snapshot('default', 'git');
*************************** 1. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1
  bytes_uncompressed: 68
    bytes_compressed: 351

Write warehouse.txt to Fuse Engine.

INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse');

Fuse Engine generates another snapshot (Snapshot ID efe2687fd1fc48f8b414b5df2cec1e19) that is linked to the previous one (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

CALL system$fuse_snapshot('default', 'git');
*************************** 1. row ***************************
         snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
   snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
      format_version: 1
previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
       segment_count: 2
         block_count: 2
           row_count: 2
*************************** 2. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1

Fuse Engine now keeps two versions of the data:

ID efe2687fd1fc48f8b414b5df2cec1e19，Version2
ID 6450690b09c449939a83268c49c12bb2，Version1

That's very similar to Git. Right?

HEAD

Git needs a HEAD as an entry. So does Fuse Engine. Check the HEAD of Fuse Engine:

SHOW CREATE TABLE git\G;
*************************** 1. row ***************************
       Table: git
Create Table: CREATE TABLE `git` (
  `file` VARCHAR,
  `content` VARCHAR
) ENGINE=FUSE SNAPSHOT_LOCATION='53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json'

SNAPSHOT_LOCATION is the HEAD, which by default points to the latest snapshot efe2687fd1fc48f8b414b5df2cec1e19, then how do we switch to the snapshot data whose ID is 6450690b09c449939a83268c49c12bb2? First, check the snapshot information of the current table:

CALL system$fuse_snapshot('default', 'git')\G;
*************************** 1. row ***************************
         snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
   snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
      format_version: 1
previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
       segment_count: 2
         block_count: 2
           row_count: 2
*************************** 2. row ***************************
         snapshot_id: 6450690b09c449939a83268c49c12bb2
   snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
      format_version: 1
previous_snapshot_id: NULL
       segment_count: 1
         block_count: 1
           row_count: 1

Then create a new table (git_v1) and point SNAPSHOT_LOCATION to the snapshot file you need:

CREATE TABLE git_v1(`file` VARCHAR, `content` VARCHAR) SNAPSHOT_LOCATION='53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json';

SELECT * FROM git_v1;
+-----------+-----------------------------+
| file      | content                     |
+-----------+-----------------------------+
| cloud.txt | 2022/05/06, Databend, Cloud |
+-----------+-----------------------------+

Snapshot File

Stores the segment information.

File path:

53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json

File content:

{
   "format_version":1,
   "snapshot_id":"efe2687f-d1fc-48f8-b414-b5df2cec1e19",
   "prev_snapshot_id":[
      "6450690b-09c4-4993-9a83-268c49c12bb2",
      1
   ],
   
   "segments":[
      [
         "53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json"
      ],
      [
         "53/133/_sg/d0bff902b98846469480b23c2a8f93d7_v1.json"
      ]
   ]
   ... ...
}

Segment File

Stores block information.

File path:

 53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json

File content:

{
   "format_version":1,
   "blocks":[
      {
         "row_count":1,
         "block_size":76,
         "file_size":360,
         "location":[
            "53/133/_b/ba4a60013e27479e856f739aefeadfaf_v0.parquet",
            0
         ],
         "compression":"Lz4Raw"
      }
   ]
   ... ...
}

Block File

The underlying data of Fuse Engine uses Parquet format, and each file is composed of multiple blocks.

Summary

In the early design period (October 2021) of Databend's Fuse Engine, the requirements were very clear, but the solution selection didn't go smoothly. At that time, the Databend community investigated a large number of Table Format solutions (such as Iceberg) on the market. The challenge was to choose between using an existing solution and building a new one. Finally, we decided to develop a simple and suitable Storage Engine that uses the Parquet standard as the storage format. Fuse Engine stores the Parquet Footer separately to reduce unnecessary Seek operations, and introduces a more flexible indexing mechanism, for example, operations such as Aggregation and Join can have their own indexes for acceleration.

Feel free to deploy Fuse Engine with your object storage to have a different experience on the big data analysis: https://databend.rs/doc/deploy

Databend on GitHub: https://github.com/datafuselabs/databend

Deploying Databend on Your Laptop in Minutes

2022年6月9日 · 阅读需 3 分钟

Eric

Deploying Databend on Your Laptop in Minutes

Deploying a data warehouse sounds like a big job to you? Definitely NOT. Databend can be deployed to your laptop and uses the local file system as storage. You can complete the deployment in a few minutes even if you're new to Databend. Now let's get started!

提示

Databend requires a scalabe storage (for example, object storage) to work. This blog uses local file system to provide you a hands-on experience. Never use a local file system as storage for production purposes.

STEP 1. Downloading Databend

a. Create a folder named databend in the directory /usr/local. Then create the following subfolders in the folder databend:

bin
data
etc
logs

b. Download and extract the latest Databend package for your platform from https://github.com/datafuselabs/databend/releases.

c. Move the extracted files databend-meta and databend-query to the folder /usr/local/databend/bin.

STEP 2. Deploying a Standalone databend-meta

a. Create a file named databend-meta.toml in the folder /usr/local/databend/etc with the following content:

dir = "metadata/_logs"
admin_api_address = "127.0.0.1:8101"
grpc_api_address = "127.0.0.1:9101"

[raft_config]
id = 1
single = true
raft_dir = "metadata/datas"

b. Open a terminal window and navigate to the folder /usr/local/databend/bin.

c. Run the following command to start databend-meta:

./databend-meta -c ../etc/databend-meta.toml > meta.log 2>&1 &

d. Run the following command to check if databend-meta was started successfully:

curl -I  http://127.0.0.1:8101/v1/health

STEP 3. Deploying a Standalone databend-query

a. Create a file named databend-query.toml in the folder /usr/local/databend/etc with the following content:

[log]
level = "INFO"
dir = "benddata/_logs"

[query]
# For admin RESET API.
admin_api_address = "127.0.0.1:8001"

# Metrics.
metric_api_address = "127.0.0.1:7071"

# Cluster flight RPC.
flight_api_address = "127.0.0.1:9091"

# Query MySQL Handler.
mysql_handler_host = "127.0.0.1"
mysql_handler_port = 3307


# Query HTTP Handler.
http_handler_host = "127.0.0.1"
http_handler_port = 8081

tenant_id = "tenant1"
cluster_id = "cluster1"

[meta]
address = "127.0.0.1:9101"
username = "root"
password = "root"

[storage]
# s3
type = "fs"

[storage.fs]
data_path = "benddata/datas"

b. Open a terminal window and navigate to the folder /usr/local/databend/bin.

c. Run the following command to start databend-meta:

./databend-query -c ../etc/databend-query.toml > query.log 2>&1 &

d. Run the following command to check if databend-meta was started successfully:

curl -I  http://127.0.0.1:8001/v1/health

There you go! You have successfully deployed Databend on your computer. If you have a SQL client on your computer, try the steps below to verify the deployment:

a. Create a connection to 127.0.0.1 from your SQL client. In the connection, set the port to 3307, and set the username to root.

b. Run the following commands to check if the query is successful:

CREATE TABLE t1(a int);

INSERT INTO t1 VALUES(1), (2);

SELECT * FROM t1;

Announcing Databend v0.7.0 - Deploy easier, query faster

2022年3月28日 · 阅读需 2 分钟

BohuTANG

Continuing breaking ground, activate your object storage for big data analytics in sub-second!

Background​

Implementation​

Unsolved Issues​

Conclusion​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Bug Fixes 🔧​

What's On In Databend​

Introduced a Rust Native Sqllogictest Framework​

Experimental: Native Format​

What's Up Next​

Checking File Existence Before Returning Presigned URL​​

Changelog​

Contributors​

Connect With Us​

CPU Profiling​

Memory Profiling​

1. Enable Memory Profiling​

2. Start with MALLOC_CONF​

3. Replace add2line with a Faster One​

4. Upgrade jeprof to the Latest Version​

5. Create a Flamegraph​

References​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Build/Testing/CI Infra Changes 🔌​

Bug Fixes 🔧​

What's On In Databend​

Breaking Change: Unified File Format Options​

Open Sharing​

What's Up Next​

Refactoring Stage-Related Tests​

Changelog​

Contributors​

Connect With Us​

Min/Max Index​

Bloom Index​

Cluster Key​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Bug Fixes 🔧​

What's On In Databend​

RESTORE TABLE​

What's Up Next​

Adding Build Information to Error Report​

Changelog​

Contributors​

Connect With Us​

What's New​

Features & Improvements ✨​

Code Refactoring 🎉​

Documentation 📔​

Bug Fixes 🔧​

What's On In Databend​

Infer Schema at a Glance​

What's Up Next​

Add Tls Support for Mysql Handler​

Changelog​

Contributors​

Connect With Us​

Backgroup​

How to set up databend-hive cluster​

Limititions​

Why We Replaced the Bloom Filter​

Why Xor Filter?​

Test Environment​

Deploying Databend​

References​

Restore a Dropped Table​

Query Old Data​

Create a New Table from Old Data​

Go without Time Travel​

Working with JSON in Databend​

Parsing JSON from GitHub​

Performance Optimization​

What's Databend?​

Significant improvements​

Background

Implementation

Unsolved Issues

Conclusion

What's New

Features & Improvements ✨

Code Refactoring 🎉

Bug Fixes 🔧

What's On In Databend

Introduced a Rust Native Sqllogictest Framework

Experimental: Native Format

What's Up Next

Checking File Existence Before Returning Presigned URL

Changelog

Contributors

Connect With Us

CPU Profiling

Memory Profiling

1. Enable Memory Profiling

2. Start with MALLOC_CONF

3. Replace add2line with a Faster One

4. Upgrade jeprof to the Latest Version

5. Create a Flamegraph

References

What's New

Features & Improvements ✨

Code Refactoring 🎉

Build/Testing/CI Infra Changes 🔌

Bug Fixes 🔧

What's On In Databend

Breaking Change: Unified File Format Options

Open Sharing

What's Up Next

Refactoring Stage-Related Tests

Changelog

Contributors

Connect With Us

Min/Max Index

Bloom Index

Cluster Key

What's New

Features & Improvements ✨

Code Refactoring 🎉

Bug Fixes 🔧

What's On In Databend

RESTORE TABLE

What's Up Next

Adding Build Information to Error Report

Changelog

Contributors

Connect With Us

What's New

Features & Improvements ✨

Code Refactoring 🎉

Documentation 📔

Bug Fixes 🔧

What's On In Databend

Infer Schema at a Glance

What's Up Next

Add Tls Support for Mysql Handler

Changelog

Contributors

Connect With Us

Backgroup

How to set up databend-hive cluster

Limititions

Why We Replaced the Bloom Filter

Why Xor Filter?

Test Environment

Deploying Databend

References

Restore a Dropped Table

Query Old Data

Create a New Table from Old Data

Go without Time Travel

Working with JSON in Databend

Parsing JSON from GitHub

Performance Optimization

What's Databend?

Significant improvements