ProvSQL Studio

ProvSQL Studio is a Python-backed web UI for the ProvSQL extension. It runs as a separate package, connects to any PostgreSQL database with ProvSQL installed, and lets you inspect provenance interactively through two complementary inspection modes:

  • Circuit mode renders the provenance directed acyclic graph (DAG) behind a result’s UUID or aggregate token, with frontier expansion, a node inspector, and on-the-fly semiring evaluation.

  • Where mode highlights the source cells that contributed to each output value, against the live content of the provenance-tracked relations.

A schema panel, a configuration panel, and a mode-switcher round out the UI; all three are described below.

Studio is a query-and-inspect tool, not a database loader. Bulk fixture loads (psql -d mydb -f setup.sql) still go through psql; Studio takes over once the data is in place.

Installation

Studio targets Python ≥ 3.10. The released package is on PyPI:

pip install provsql-studio

Studio assumes the ProvSQL extension is already installed in the target database (see Getting ProvSQL) and is running at least the version listed under Compatibility.

For Studio developers, install from a local checkout in editable mode, so edits to the source take effect without reinstalling:

pip install -e ./studio

The Docker demonstration container ships a pre-installed Studio alongside the extension; see Docker Container.

Connecting

Launch Studio with a data source name (DSN):

provsql-studio --dsn postgresql://user@localhost:5432/mydb

If --dsn is omitted, Studio resolves the connection target in this order:

  • The DATABASE_URL environment variable, used as a DSN.

  • libpq’s standard environment variables (PGDATABASE, PGSERVICE, PGHOST…).

  • Fallback to the postgres maintenance database on libpq’s default host (a local Unix socket on Linux and macOS, localhost on Windows). Studio then surfaces a banner inviting you to pick a real database from the in-page switcher.

The browser reaches the UI at http://127.0.0.1:8000/ (override the bind address and port with --host and --port).

In-page connection editor

A plug icon next to the connection-status dot in the top navigation opens a popover where you can paste a DSN. Studio probes the new DSN with SELECT 1 before swapping pools, so a typo or wrong password leaves the existing connection up and surfaces the PostgreSQL error inline. The status dot polls every 5 seconds; its tooltip shows the active user@host:port endpoint.

Database switcher

Clicking the database name in the top nav lists every accessible database on the current server; pick one and the page reloads onto it. The query box is wiped on switch (the previous query rarely makes sense against a different database) but is pushed onto the history first, so Alt+↑ recovers it.

Search path

Studio pins provsql at the end of the per-batch search_path automatically, so the case-study idiom SET search_path TO public, provsql; is unnecessary. The header shows a lock chip for the pinned provsql; the rest of search_path is editable under the Session group of the Config panel (see Configuration).

Studio does not enforce a read-only PostgreSQL role: the query box accepts arbitrary SQL, including DDL (CREATE TABLE), DML (INSERT, UPDATE), and management helpers (add_provenance, set_prob, create_provenance_mapping). Connect Studio with a role that has the privileges your workflow expects.

Query box

The query box is a syntax-highlighted SQL textarea: PostgreSQL keywords, strings, comments, and identifiers are coloured inline.

Submit the current batch with Ctrl+Enter (or ⌘+Enter on macOS) inside the textarea, or by clicking the Send query button next to it.

Past queries are kept in a session history. Alt+↑ and Alt+↓ step through them in place; the History button opens a list view of recent queries to pick from.

The textarea accepts multiple semicolon-separated statements; Studio splits and runs them in a single transaction. Only the last statement’s result is rendered in the result table; preceding statements are useful for setup (SET, CREATE TABLE, fixture INSERTs…) before the interesting SELECT.

The result table is capped at Result rows (default 1000 rows, tunable in Configuration). When more rows are available, the result-table footer shows a (first 1000; more available) marker so the truncation is explicit.

Per-query toggles

Two checkboxes next to the query box flip the connection’s provenance behaviour for the next batch:

  • where_provenance (provsql.where_provenance) enables where-provenance tracking at circuit-build time: when on, the planner adds project and eq gates so each output value records the source cell it was copied from (see Where-Provenance). Where mode locks this toggle on and shows a lock icon next to it (without those gates the hover-to-trace would have nothing to display); Circuit mode leaves the toggle free, useful when you want a circuit that already carries where-provenance.

  • update_provenance (provsql.update_provenance) carries provenance through INSERT, UPDATE, and DELETE statements so the rows they create or modify keep their lineage (see Data Modification Tracking). The toggle is freely user-controllable in both modes.

Toggle states are sent to the server alongside each query.

Circuit mode

Circuit mode is the visual counterpart to view_circuit and the programmatic walks via get_gate_type, get_children, and identify_token. The query runs unwrapped, so the provsql UUID column and any agg_token cells appear in the result table as raw values; clicking one renders its provenance DAG in the sidebar.

Studio Circuit mode showing a small DISTINCT-circuit with the input-gate inspector open and the stored probability visible.

Circuit mode: a DISTINCT query renders a -rooted DAG. Pinning an input gate opens the inspector with its metadata and the stored probability, click-to-edit.

Hovering a node lights up its subtree; clicking pins it and opens the inspector panel. Drag a node to reposition it; the offset is preserved across frontier expansions and reset on every fresh circuit load. A Reset node positions toolbar button undoes all drag-to-move offsets at once. Wheel-to-zoom is supported on the canvas, and a Fit to screen button resets zoom and pan. A Fullscreen toggle pins the canvas to the viewport (Esc exits).

Worked example

Ask for the distinct cities mentioned in personnel:

SELECT DISTINCT city FROM personnel;

The result table shows one row per city with a provsql UUID column. Clicking a UUID cell renders the corresponding circuit: a single (plus) gate over the input gates of the underlying personnel rows.

Frontier expansion

Studio caps each circuit fetch at the value of --max-circuit-nodes (default 200, tunable in Configuration). Nodes whose children were not fetched in the current request carry a small gold + badge: clicking it requests another breadth-first search (BFS) layer rooted at the frontier node and merges it into the current scene. The cap is per-fetch, not per-scene, so a circuit grows interactively as you expand the frontiers that interest you.

Inspector panel

Pinning a node opens the inspector. Each gate type (see Inspecting the Circuit for the catalogue) renders a gate-specific metadata block: function and result type for agg, left and right attribute for eq, value for mulinput, relation id and column list for input and update, and so on. input and update gates additionally show the stored probability: clicking the value swaps it for a number input, Enter sends a set_prob to the server, Esc or blur cancels. Out-of-range values and ProvSQL errors land inline.

Semiring evaluation strip

Below the canvas, the semiring evaluation strip targets the pinned node (or, when no node is pinned, the root). The semiring select groups compiled entries into four sub-optgroups, with custom and “Other” entries below:

  • Compiled semirings

    • Boolean: boolexpr, boolean.

    • Lineage: formula, how, why, which. formula pretty-prints the provenance circuit as a symbolic expression (Green-Karvounarakis-Tannen); how is the same algebra in canonical \(\mathbb{N}[X]\) sum-of-products form, so two semantically-equal circuits collapse to identical strings : suitable for provenance-aware equivalence checks. why and which are the set-valued projections.

    • Numeric: counting, tropical, viterbi, lukasiewicz. Łukasiewicz is the continuous-valued fuzzy logic on numeric values in \([0, 1]\); for the discrete fuzzy / trust shape on a user-defined enum lattice, see maxmin under User-enum below.

    • Intervals: interval-union. One UI option backed by the sr_temporal, sr_interval_num and sr_interval_int kernels: the strip picks the right one from the selected mapping’s multirange type (tstzmultirange / nummultirange / int4multirange); requires PostgreSQL 14+. See Temporal Features for the interval-union algebra.

    • User-enum: minmax and maxmin. One UI option per shape, polymorphic over any user-defined PostgreSQL enum carrier (the bottom and top of the lattice come from pg_enum.enumsortorder). minmax is the security shape (alternatives combine to enum-min, joins to enum-max); maxmin is the discrete fuzzy / availability / trust shape (alternatives to enum-max, joins to enum-min). Backed by sr_minmax and sr_maxmin respectively.

  • Custom semirings: any user-defined wrapper over provenance_evaluate discovered in the schema.

  • Other: probability and PROV-XML export. The probability method picker (see Probabilities) groups exact methods ((default), independent, possible-worlds, tree-decomposition, compilation) and approximate methods (monte-carlo, weightmc). PROV-XML export uses to_provxml; see Provenance Export.

The mapping picker filters on the selected semiring’s expected value type: only boolean-typed mappings appear under boolean, only the numeric base types (smallint / integer / bigint / numeric / real / double precision) under the numeric group, only multirange-typed mappings under interval-union, and only mappings whose value column is a user-defined enum (pg_type.typtype = 'e') under minmax / maxmin. Polymorphic entries (boolexpr, formula, how, why, which) accept any mapping. boolexpr and PROV-XML export accept the mapping as optional: with one, leaves are labelled by the mapping’s value column; without one, leaves carry their gate UUID (PROV-XML) or a bare x<id> placeholder (boolexpr). Custom- semiring entries filter to mappings whose value type matches the wrapper’s return type. Mismatches are surfaced before the round-trip as (no compatible mappings : expected …) in the picker.

Run reports the result inline along with the runtime; Monte-Carlo runs additionally show a ± confidence band (Hoeffding bound, 95 % probability). Clear wipes the result so a verbose Why or Formula output does not obscure the canvas; Copy writes the just-rendered payload (with full precision for probability, regardless of the rounded display) to the clipboard.

The probability cell click-toggles between rounded (per the panel’s Probability decimals setting) and full double-precision; copies always carry the full-precision form.

Oversized circuits

When a top-level circuit fetch exceeds the cap, Studio shows a structured banner instead of an error: This subgraph has 4,521 nodes; the cap is 200 (rendering at depth 4) followed by a single-click Render at depth 1, then expand interactively button when the depth-1 envelope itself fits under the cap. Wide-bound circuits (for instance aggregations with high fan-in) leave the button out rather than promising a render that will exceed the cap again. The eval strip (above) still works against the unrendered root: it operates on the token UUID, not the rendered DAG.

Oversize-circuit banner showing 'This subgraph has X nodes; the cap is Y' and a 'Render at depth 1, then expand interactively' button.

The actionable oversize-circuit banner: when a fetch exceeds the cap, Studio surfaces a single button to start at depth 1 and expand on demand.

Where mode

Where mode is the visual counterpart to where_provenance (see Where-Provenance). It enables provsql.where_provenance on the connection and wraps every SELECT so the result carries the where-provenance of each output value. Hovering a result cell highlights the source cells (in the sidebar) that contributed to it. No explicit where_provenance call is required.

Studio Where mode with a result row hovered, highlighting the contributing rows in the personnel sidebar.

Where mode: source relations on the right; hovering a cell highlights the personnel rows that produced it.

The sidebar lists every provenance-tracked relation. Each panel header reports its row count: personnel 100 of ~50000 tuples makes the per-relation cap explicit when a relation is too large to materialise in full (default 100 rows, tunable in Configuration). The Input gates only toggle (default on) hides relations whose first provsql token is not an input gate, so derived materialisations do not crowd the panel.

Each result row gets a → Circuit button that switches to Circuit mode and pre-loads the provenance DAG of that row’s token; see Mode-switching.

Worked example

Ask for the people listed in Paris:

SELECT * FROM personnel WHERE city = 'Paris';

Each output cell becomes hover-aware: hovering name, classification, or city lights up the matching cells in the personnel panel of the sidebar.

Wrap-fallback notice

If a Where-mode query touches no provenance-tracked relation, Studio silently drops the wrap and surfaces an INFO banner instead of raising. This is the right default for case-study scripts that do bulk setup before the first interesting SELECT.

Schema panel

A button in the top nav opens a searchable popover listing every SELECT-able relation. Each gets one of two pills:

  • prov (purple) on a relation whose provsql column is injected by the planner: provenance tracking is active.

  • mapping (gold) on a relation shaped (value <T>, provenance uuid), including views from create_provenance_mapping_view. The two pills are mutually exclusive: a mapping view that also carries a planner- injected provsql column is classified as mapping (the more specific category).

On a tracked table, each column is a click target that prefills SELECT create_provenance_mapping('<table>_<col>_mapping', '<schema>.<table>', '<col>'); into the query box, so a fresh mapping is two clicks away.

On any provenance-eligible plain table, + prov and − prov action chips prefill SELECT add_provenance(...) / SELECT remove_provenance(...). They are hidden on views (regular and materialised) and on foreign tables, since the underlying ALTER TABLE ADD COLUMN does not work on those relation kinds, and on mappings.

Configuration

The Config panel groups its options into three sections:

  • Provenance mirrors the user-level Grand Unified Configuration (GUC) parameters documented in Configuration Reference: provsql.active, provsql.verbose_level, and provsql.tool_search_path. provsql.where_provenance and provsql.update_provenance live next to the query box instead (see Per-query toggles), since they are typically flipped per query rather than per session. Studio also forces provsql.aggtoken_text_as_uuid = on for the whole session: clickable agg_token cells in the result table need that.

  • Session wraps the per-request session state: the statement_timeout applied to every batch, and the visible part of search_path (provsql is always pinned at the end).

  • Display limits holds the size knobs: Max circuit depth (the BFS depth cap on the initial fetch), Nodes per fetch (the per-fetch node cap), Sidebar rows per relation (per-relation row cap in the Where-mode sidebar), Result rows (row cap on the result table), and Probability decimals (number of decimals in the eval-strip’s probability display).

Studio Config panel with the three section headings (Provenance, Session, Display limits) visible.

The Config panel: three section headings above the per-row options.

All values persist on disk in provsql-studio/config.json under the platform’s user-config directory, and survive a Studio restart:

  • Linux: $XDG_CONFIG_HOME/provsql-studio/ (defaults to ~/.config/provsql-studio/).

  • macOS: ~/Library/Application Support/provsql-studio/.

  • Windows: %APPDATA%\provsql-studio\ (typically C:\Users\<user>\AppData\Roaming\provsql-studio\).

Each option is also exposed on the CLI as a flag (--statement-timeout, --max-sidebar-rows, --max-result-rows, --max-circuit-nodes, --search-path, --tool-search-path); the CLI wins on startup, the panel writes back to the JSON.

Mode-switching

The mode tabs in the top nav switch between Where and Circuit. A switch carries the current SQL forward via sessionStorage; it auto-replays only when the user just ran the query, so unrun drafts and plain reloads never auto-execute (important for side-effecting statements like add_provenance).

In Where mode, every result row gets a → Circuit button that switches to Circuit mode and pre-loads the circuit for that row’s provenance UUID, so a hover-and-trace exploration can cross over to the DAG without retyping the query.

Limitations

  • Per-fetch circuit cap, not per-scene: --max-circuit-nodes bounds each fetch independently, so a scene assembled by repeated frontier expansions can exceed the cap. This is intentional: the cap exists to keep the browser responsive on initial render, not to prevent drilling deep where you want to.

  • Verbose semiring outputs are unbounded: formula, how, why, which, and PROV-XML export can return multi-megabyte strings on large circuits. Compute time is bounded by statement_timeout; output size is not. Use Clear after a bulky run, or evaluate against a pinned subnode to scope the output.

Compatibility

Studio’s version stream is independent of the extension’s. Studio is released on PyPI as provsql-studio from studio-vX.Y.Z git tags; the extension is released on PGXN as provsql from vX.Y.Z git tags. Each Studio release lists a minimum required extension version.

Studio

ProvSQL extension

Notes

1.0.x

1.4.0

First public release. Requires circuit_subgraph, resolve_input, and the provsql.aggtoken_text_as_uuid GUC (used for clickable agg_token cells), all introduced in 1.4.0.

When the installed extension predates this minimum, Studio’s startup check prints the mismatch and exits. Pass --ignore-version to override the check, for instance when running against a development branch.