Source
yaml
id: caching
namespace: company.team
tasks:
- id: transactions
type: io.kestra.plugin.core.http.Download
uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/transactions.csv
- id: products
type: io.kestra.plugin.core.http.Download
uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/products.csv
description: This task pulls the full product catalog once per day. Because the
catalog changes infrequently and contains over 200k rows, running it only
daily avoids unnecessary strain on that production DB, while ensuring
downstream joins always use up-to-date reference data.
taskCache:
enabled: true
ttl: PT24H
- id: duckdb
type: io.kestra.plugin.jdbc.duckdb.Query
store: true
inputFiles:
products.csv: "{{ outputs.products.uri }}"
transactions.csv: "{{ outputs.transactions.uri }}"
sql: |-
SELECT
t.transaction_id,
t.timestamp,
t.quantity,
t.sale_price,
p.product_name,
p.category,
p.cost_price,
p.supplier_id,
(t.sale_price - p.cost_price) * t.quantity AS profit
FROM
read_csv_auto('transactions.csv') AS t
JOIN
read_csv_auto('products.csv') AS p
USING (product_id);
About this blueprint
SQL Kestra Database
This flow illustrates the use of Kestra's taskCache feature to cache a task extracting large product catalog, reducing load on the source system.
- The
transactionstask downloads recent transactions data without caching. - The
productstask downloads the full product catalog and caches the result for 24 hours using thetaskCacheproperty, ensuring that downstream tasks use fresh data while avoiding repeated downloads within the TTL. - The
duckdbtask joins the transactions and product data using DuckDB SQL, calculates profit per transaction, and stores the result.