tab¶
A CLI tool for viewing, querying, and converting tabular data files. Supports AWS / Azure / Google Cloud Storage URLs.
Supported Formats¶
- Jsonl
- CSV
- TSV
- Parquet
- Avro
Usage¶
View data¶
Display rows from a tabular data file:
tab view data.csv
Output to different formats:
tab view data.parquet -o jsonl
tab view data.parquet -o csv
Show schema¶
tab schema data.parquet
Show summary¶
tab summary data.parquet
SQL queries¶
Run SQL queries on your data. The table is referenced as t:
tab sql 'SELECT * FROM t WHERE Metric_A_Value > 80' test.csv
Convert¶
Convert between formats:
tab convert data.csv data.parquet
tab convert data.parquet data.jsonl -o jsonl
Write partitioned output:
tab convert data.csv output_dir/ -o parquet -n 4
Concatenate multiple files¶
tab cat data1.csv data2.csv data3.csv -o jsonl > output.jsonl
Options¶
Common options¶
| Option | Description |
|---|---|
-i |
Input format (parquet, csv, tsv, jsonl). Auto-detected from extension. |
-o |
Output format (parquet, csv, tsv, jsonl). |
--limit |
Maximum number of rows to display. |
--skip |
Number of rows to skip from the beginning. |
Convert options¶
| Option | Description |
|---|---|
-n |
Number of output partitions. Creates a directory with part files. |