fire_duck_ext

Search Shortcut cmd + k | ctrl + k

Documentation

fire_duck_ext

Downloads 1.0kthis week

GitHub stars 3

Extension repository on GitHub

Extension descriptor (YAML)

Query Google Cloud Firestore directly from DuckDB using SQL

Maintainer(s): BorisBesky

Installing and Loading

INSTALL fire_duck_ext FROM community;
LOAD fire_duck_ext;

Example

LOAD fire_duck_ext;

-- Authenticate with a service account JSON file
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/service-account.json'
);

-- Or use inline JSON for the service account
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '{
        "type": "service_account",
        "project_id": "my-gcp-project",
        "private_key_id": "key123abc",
        "private_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n",
        "client_email": "sa@my-gcp-project.iam.gserviceaccount.com",
        "client_id": "123456789",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token"
    }'
);

-- Or authenticate with an API key (unauthenticated; request.auth is null)
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourApiKeyHere'
);

-- Or authenticate as a Firebase Auth user (browser-safe; respects Security Rules)
-- Email/password sign-in:
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourWebApiKey',
    EMAIL 'user@example.com',
    PASSWORD 'hunter2'
);

-- Anonymous sign-in:
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourWebApiKey',
    ANONYMOUS true
);

-- Or pass a pre-obtained Firebase ID token (e.g. minted by your host app)
CREATE SECRET my_firestore (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourWebApiKey',  -- optional; enables auto-refresh
    ID_TOKEN 'eyJhbGciOi...',
    REFRESH_TOKEN 'AMf-...'         -- optional; auto-refreshed on expiry
);

-- Read documents
SELECT * FROM firestore_scan('users');

-- Filter documents
SELECT __document_id, name, email
FROM firestore_scan('users')
WHERE status = 'active';

-- Update documents
CALL firestore_update('users', 'user123', 'status', 'verified');

-- Batch update with DuckDB filtering
SET VARIABLE ids = (
    SELECT list(__document_id)
    FROM firestore_scan('users')
    WHERE status = 'pending'
);
CALL firestore_update_batch('users', getvariable('ids'), 'status', 'reviewed', 'updated_at', now());

-- Insert documents from a subquery (auto-generated IDs)
CALL firestore_insert('users', (SELECT name, age FROM read_csv('new_users.csv')));

-- Insert with explicit document IDs from a column
CALL firestore_insert('users',
    (SELECT * FROM read_csv('new_users.csv')),
    document_id := 'user_id');

-- Insert with explicit document ID selected from a column
CALL firestore_insert('users',
    (SELECT 'Alice' AS name, 30 AS age, 'alice_user_id' AS user_id),
    document_id := 'user_id');

-- Target a specific named database
CREATE SECRET my_named_db (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASE 'my-database'
);

-- Target multiple databases with a single secret
CREATE SECRET my_multi_db (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    SERVICE_ACCOUNT_JSON '/path/to/credentials.json',
    DATABASES ['(default)', 'analytics-db']
);

-- Target multiple databases with an API key
CREATE SECRET my_multi_db (
    TYPE firestore,
    PROJECT_ID 'my-gcp-project',
    API_KEY 'AIzaSyYourApiKeyHere',
    DATABASES ['(default)', 'analytics-db']
);

-- Connect to a specific database for the session
CALL firestore_connect('analytics-db');

-- Collection group queries
SELECT __document_id, name, email
FROM firestore_scan('~data_group')
WHERE status = 'active';

fire_duck_ext lets you work with Google Cloud Firestore directly from DuckDB using SQL. If you are looking for a way to access Firestore database data with SQL or connect Firebase/Firestore workflows to DuckDB, this extension provides read and write support through familiar SQL statements. Scan data with firestore_scan(), perform DML operations with firestore_update(), firestore_insert(), and firestore_delete(). Batch operations with firestore_update_batch() and firestore_delete_batch(). Transform arrays with firestore_array_union(), firestore_array_remove(), and firestore_array_append(). Query collection groups with firestore_scan('~collection'). List collection IDs with firestore_scan('collection/doc_id'). Manage credentials with DuckDB secrets. Push down filters to Firestore with firestore_scan().

Authentication

Secrets require PROJECT_ID and one of three auth modes:

Service account (SERVICE_ACCOUNT_JSON) — admin access that bypasses Security Rules. Accepts either a file path or the full inline JSON text of a Google Cloud service account key (must contain at least type, project_id, private_key, and client_email). Requires OpenSSL for RS256 signing, so it is native-only (not available in the WebAssembly build).
API key (API_KEY) — unauthenticated access (request.auth == null), suitable for development, testing, or public databases. Reads are still subject to Security Rules. API key auth does not support batchWrite, so batch operations fall back to individual requests.
Firebase Auth user — authenticated access that respects Security Rules and is browser-safe (works in the WebAssembly build). Provide the public Web API_KEY plus one of:
- EMAIL + PASSWORD — email/password sign-in,
- ANONYMOUS true — anonymous sign-in,
- ID_TOKEN (+ optional REFRESH_TOKEN) — a pre-obtained Firebase ID token minted by your host app. API_KEY is optional in this mode but enables automatic token refresh.
The extension sends Authorization: Bearer <id_token> and refreshes the token automatically on expiry.

ANONYMOUS true is not the same as a plain API_KEY. Both pass the API key, but the API key alone is unauthenticated (request.auth == null), while anonymous sign-in obtains an ID token for a real (anonymous) Firebase user, so request.auth != null and a uid is available in Security Rules. Use a plain API key for public collections whose rules allow unauthenticated reads; use ANONYMOUS true when rules require a signed-in user (request.auth != null) but no specific identity is needed. Each anonymous sign-in gets a fresh uid.

Admin-only features. Firestore's index metadata (composite indexes and single-field index configuration) is served by the Firestore Admin API, which is gated by Google Cloud IAM and readable only with a service account. With API key or Firebase user auth the extension cannot query it, skips it, and assumes Firestore's default single-field indexes. Features that depend on this metadata are therefore unavailable to non-admin auth — most notably composite-index detection for multi-field ORDER BY, which falls back to a single-field server-side sort (re-sorted in DuckDB) or surfaces Firestore's "query requires an index" error. Single-field filter and order pushdown are unaffected.

If the GOOGLE_APPLICATION_CREDENTIALS environment variable is set, the extension automatically creates an internal secret on startup that matches all databases (DATABASE '*'), so no CREATE SECRET is needed.

WebAssembly (DuckDB-WASM)

The extension builds and runs in DuckDB-WASM (e.g. in the browser). HTTP is routed through DuckDB's HTTPUtil instead of raw sockets, so reads, filtered :runQuery scans (including collection groups), and writes all work in the browser. Authentication in WASM uses API key or Firebase Auth user credentials (email/password, anonymous, or a pre-obtained ID token); service-account auth is native-only because it needs OpenSSL.

Two things to know for rules-governed (API key / Firebase user) access:

show_missing := true (the default) lists phantom/missing documents, which is an Admin-only operation; over rules-governed access Firestore returns 403 PERMISSION_DENIED even when rules allow the read. Pass show_missing := false.
The Firestore Admin API (index metadata) is reachable only with a service account; for API-key / Firebase-user auth the extension skips it and assumes default single-field indexes.

The WASM build of the extension must match the DuckDB version bundled by the @duckdb/duckdb-wasm runtime it is loaded into (the C++ extension ABI is not stable across versions).

Type Mapping

string → VARCHAR
integer → BIGINT
double → DOUBLE
boolean → BOOLEAN
timestamp → TIMESTAMP
array → LIST
map → VARCHAR (JSON string)
vector → ARRAY(DOUBLE, N) (embedding)
null → NULL
geoPoint → STRUCT(latitude DOUBLE, longitude DOUBLE)
reference → VARCHAR
bytes → BLOB

Schema Inference

When you call firestore_scan(), the extension automatically infers the schema by sampling up to 100 documents from the collection. For each field, it determines the DuckDB type using a voting system: the most common non-null Firestore type across the sampled documents wins. Fields that do not appear in every sampled document are marked nullable.

For array fields, element types are also inferred by sampling elements across documents and selecting the most common element type. If a field is only ever null across all samples, it defaults to VARCHAR.

A virtual __document_id column (VARCHAR) is always added as the first column, containing the Firestore document ID (or the full document path for collection group queries).

Documents with heterogeneous field types are handled gracefully at read time: values that cannot be converted to the inferred column type are returned as NULL.

Null Semantics

Both missing fields and explicit Firestore null values appear as NULL in DuckDB. There is no way to distinguish between the two on read.

Writing NULL to a field sets it to an explicit Firestore null value; it does not delete the field from the document.

WHERE field IS NULL is not pushed down to Firestore because Firestore's IS_NULL operator only matches fields that exist and are explicitly null, while DuckDB treats missing fields as NULL. WHERE field IS NOT NULL is pushed down safely.

Batch Operations

Batch updates and deletes group writes into requests of up to 500 operations each using Firestore's batchWrite API. Note that batchWrite is not atomic: individual writes within the batch may succeed or fail independently. If batchWrite is unavailable (e.g., API key auth), the extension falls back to individual operations automatically.

Database Targeting

By default, secrets target the (default) Firestore database. Use DATABASE 'name' for a single named database, DATABASE '*' as a wildcard that matches any database, or DATABASES ['db1', 'db2'] to allow a single secret to match multiple named databases. You cannot specify both DATABASE and DATABASES in the same secret.

Use CALL firestore_connect('db-name') to set the session database. All subsequent queries use this database until CALL firestore_disconnect() is called. A per-query database := parameter always takes priority over the connected session database.

Collection Group Queries

Use the ~collection prefix to query across all subcollections with a given name. For example, firestore_scan('~orders') reads all documents from every subcollection named "orders" regardless of parent path.

SQL ORDER BY / LIMIT Pushdown

firestore_scan() can automatically push simple SQL ORDER BY, LIMIT, and OFFSET clauses down to Firestore, reducing the number of documents fetched for sorted and top-N queries.

Supported patterns include:

ORDER BY field
ORDER BY field DESC
ORDER BY field1, field2
LIMIT n
ORDER BY ... LIMIT n
ORDER BY ... LIMIT n OFFSET m (pushed as LIMIT n + m, with DuckDB applying the final offset)

Named parameters still work and take precedence over SQL pushdown:

If order_by:= is provided, that server-side ordering is used and DuckDB applies any SQL ORDER BY afterward.
If scan_limit:= is provided, that fetch limit is used and DuckDB applies any SQL LIMIT afterward.

Use EXPLAIN to verify when SQL ordering and limits are being pushed. Firestore will show entries like Firestore Pushed Order: score DESC and Firestore Pushed Limit: 5.

Collection ID Listings

When firestore_scan() is given a document path instead of a collection path, it lists that document's direct subcollection IDs instead of reading documents. This is useful for discovering unknown nested collection names.

The result contains a single __document_id column, where each row is a subcollection ID.

Document-path scans support:

Pagination across large numbers of subcollections
SQL ORDER BY __document_id pushdown
SQL LIMIT pushdown
Named order_by:='__document_id' or order_by:='__document_id DESC', plus scan_limit:=...

Other ordering expressions still work, but they are evaluated in DuckDB after fetching the subcollection IDs.

Missing Documents

By default, firestore_scan() includes phantom/missing documents — documents that have no fields but exist as parent paths for subcollections. This matches Firebase Console behavior and is controlled by show_missing (default: true). Set show_missing := false to only return documents with fields. When a collection contains only phantom documents, the result includes just the __document_id column.

Listing phantom documents is an Admin-only operation, so with rules-governed access (API key or Firebase user) Firestore returns 403 PERMISSION_DENIED even when Security Rules allow the read. Use show_missing := false with those auth modes, or authenticate with a service account.

Added Functions

function_name	function_type	description	comment	examples
firestore_scan	table	Read all documents from a Firestore collection or collection group.	NULL	[SELECT * FROM firestore_scan('users');]
firestore_insert	table	Insert documents into a Firestore collection from a subquery.	NULL	[CALL firestore_insert('users', (SELECT name, age FROM read_csv('new_users.csv')));]
firestore_update	table	Update fields on a single Firestore document.	NULL	[CALL firestore_update('users', 'user123', 'status', 'verified');]
firestore_delete	table	Delete a single Firestore document.	NULL	[CALL firestore_delete('users', 'user123');]
firestore_update_batch	table	Batch update multiple Firestore documents by ID list.	NULL	[CALL firestore_update_batch('users', ['id1', 'id2'], 'status', 'reviewed');]
firestore_delete_batch	table	Batch delete multiple Firestore documents by ID list.	NULL	[CALL firestore_delete_batch('users', ['id1', 'id2']);]
firestore_array_union	table	Add elements to an array field without duplicates.	NULL	[CALL firestore_array_union('users', 'user123', 'tags', ['vip', 'active']);]
firestore_array_remove	table	Remove elements from an array field.	NULL	[CALL firestore_array_remove('users', 'user123', 'tags', ['inactive']);]
firestore_array_append	table	Append elements to an array field (allows duplicates).	NULL	[CALL firestore_array_append('users', 'user123', 'log', ['event1']);]
firestore_clear_cache	table	Clear the cached schema for all or a specific collection.	NULL	[CALL firestore_clear_cache();]
firestore_connect	table	Set the session-scoped Firestore database for subsequent queries.	NULL	[CALL firestore_connect('analytics-db');]
firestore_disconnect	table	Clear the session-scoped Firestore database override.	NULL	[CALL firestore_disconnect();]

name	description	input_type	scope	aliases
firestore_schema_cache_ttl	Schema cache TTL in seconds (0 to disable caching)	BIGINT	GLOBAL	[]