Cache a computationally expensive task in an ETL pipeline using the taskCache property

Source

yaml

id: caching
namespace: company.team

tasks:
  - id: transactions
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/transactions.csv

  - id: products
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/products.csv
    description: This task pulls the full product catalog once per day. Because the
      catalog changes infrequently and contains over 200k rows, running it only
      daily avoids unnecessary strain on that production DB, while ensuring
      downstream joins always use up-to-date reference data.
    taskCache:
      enabled: true
      ttl: PT24H

  - id: duckdb
    type: io.kestra.plugin.jdbc.duckdb.Query
    store: true
    inputFiles:
      products.csv: "{{ outputs.products.uri }}"
      transactions.csv: "{{ outputs.transactions.uri }}"
    sql: |-
      SELECT
        t.transaction_id,
        t.timestamp,
        t.quantity,
        t.sale_price,
        p.product_name,
        p.category,
        p.cost_price,
        p.supplier_id,
        (t.sale_price - p.cost_price) * t.quantity AS profit
      FROM
        read_csv_auto('transactions.csv') AS t
      JOIN
        read_csv_auto('products.csv') AS p
      USING (product_id);

About this blueprint

SQL Kestra Database

This flow illustrates the use of Kestra's taskCache feature to cache a task extracting large product catalog, reducing load on the source system.

The transactions task downloads recent transactions data without caching.
The products task downloads the full product catalog and caches the result for 24 hours using the taskCache property, ensuring that downstream tasks use fresh data while avoiding repeated downloads within the TTL.
The duckdb task joins the transactions and product data using DuckDB SQL, calculates profit per transaction, and stores the result.

Download

Query

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra