ProvSQL Studio
ProvSQL Studio is a Python-backed web UI for the ProvSQL extension. It runs as a separate package, connects to any PostgreSQL database with ProvSQL installed, and lets you inspect provenance interactively through two complementary inspection modes:
Circuit mode renders the provenance directed acyclic graph (DAG) behind a result’s UUID or aggregate token, with frontier expansion, a node inspector, and on-the-fly semiring evaluation.
Where mode highlights the source cells that contributed to each output value, against the live content of the provenance-tracked relations.
A schema panel, a configuration panel, and a mode-switcher round out the UI; all three are described below.
Studio is a query-and-inspect tool, not a database loader. Bulk
fixture loads (psql -d mydb -f setup.sql) still go through
psql; Studio takes over once the data is in place.
Installation
Studio targets Python ≥ 3.10. The released package is on PyPI:
pip install provsql-studio
Studio assumes the ProvSQL extension is already installed in the target database (see Getting ProvSQL) and is running at least the version listed under Compatibility.
For Studio developers, install from a local checkout in editable mode, so edits to the source take effect without reinstalling:
pip install -e ./studio
The Docker demonstration container ships a pre-installed Studio alongside the extension; see Docker Container.
Connecting
Launch Studio with a data source name (DSN):
provsql-studio --dsn postgresql://user@localhost:5432/mydb
If --dsn is omitted, Studio resolves the connection target in
this order:
The
DATABASE_URLenvironment variable, used as a DSN.libpq’s standard environment variables (
PGDATABASE,PGSERVICE,PGHOST…).Fallback to the
postgresmaintenance database on libpq’s default host (a local Unix socket on Linux and macOS,localhoston Windows). Studio then surfaces a banner inviting you to pick a real database from the in-page switcher.
The browser reaches the UI at http://127.0.0.1:8000/ (override
the bind address and port with --host and --port).
In-page connection editor
A plug icon next to the connection-status dot in the top navigation
opens a popover where you can paste a DSN. Studio
probes the new DSN with SELECT 1 before swapping pools, so a typo
or wrong password leaves the existing connection up and surfaces the
PostgreSQL error inline. The status dot polls every 5 seconds; its
tooltip shows the active user@host:port endpoint.
Database switcher
Clicking the database name in the top nav lists every accessible database on the current server; pick one and the page reloads onto it. The query box is wiped on switch (the previous query rarely makes sense against a different database) but is pushed onto the history first, so Alt+↑ recovers it.
Search path
Studio pins provsql at the end of the per-batch search_path
automatically, so the case-study idiom
SET search_path TO public, provsql; is unnecessary. The header
shows a lock chip for the pinned provsql; the rest of
search_path is editable under the Session group of the
Config panel (see Configuration).
Studio does not enforce a read-only PostgreSQL role: the query box
accepts arbitrary SQL, including DDL (CREATE TABLE), DML
(INSERT, UPDATE), and management helpers
(add_provenance, set_prob,
create_provenance_mapping).
Connect Studio with a role that has the privileges your workflow
expects.
Query box
The query box is a syntax-highlighted SQL textarea: PostgreSQL keywords, strings, comments, and identifiers are coloured inline.
Submit the current batch with Ctrl+Enter (or ⌘+Enter on macOS) inside the textarea, or by clicking the Send query button next to it.
Past queries are kept in a session history. Alt+↑ and Alt+↓ step through them in place; the History button opens a list view of recent queries to pick from.
The textarea accepts multiple semicolon-separated statements;
Studio splits and runs them in a single transaction. Only the
last statement’s result is rendered in the result table;
preceding statements are useful for setup (SET,
CREATE TABLE, fixture INSERTs…) before the interesting
SELECT.
The result table is capped at Result rows (default 1000 rows,
tunable in Configuration). When more rows are available, the
result-table footer shows a (first 1000; more available) marker
so the truncation is explicit.
Per-query toggles
Two checkboxes next to the query box flip the connection’s provenance behaviour for the next batch:
where_provenance(provsql.where_provenance) enables where-provenance tracking at circuit-build time: when on, the planner addsprojectandeqgates so each output value records the source cell it was copied from (see Where-Provenance). Where mode locks this toggle on and shows a lock icon next to it (without those gates the hover-to-trace would have nothing to display); Circuit mode leaves the toggle free, useful when you want a circuit that already carries where-provenance.update_provenance(provsql.update_provenance) carries provenance throughINSERT,UPDATE, andDELETEstatements so the rows they create or modify keep their lineage (see Data Modification Tracking). The toggle is freely user-controllable in both modes.
Toggle states are sent to the server alongside each query.
Circuit mode
Circuit mode is the visual counterpart to view_circuit
and the programmatic walks via get_gate_type,
get_children, and identify_token. The query
runs unwrapped, so the provsql UUID column and any agg_token
cells appear in the result table as raw values; clicking one renders
its provenance DAG in the sidebar.
Circuit mode: a DISTINCT query renders a ⊕-rooted DAG.
Pinning an input gate opens the inspector with its metadata and
the stored probability, click-to-edit.
Hovering a node lights up its subtree; clicking pins it and opens the inspector panel. Drag a node to reposition it; the offset is preserved across frontier expansions and reset on every fresh circuit load. A Reset node positions toolbar button undoes all drag-to-move offsets at once. Wheel-to-zoom is supported on the canvas, and a Fit to screen button resets zoom and pan. A Fullscreen toggle pins the canvas to the viewport (Esc exits).
Worked example
Ask for the distinct cities mentioned in personnel:
SELECT DISTINCT city FROM personnel;
The result table shows one row per city with a provsql UUID
column. Clicking a UUID cell renders the corresponding circuit: a
single ⊕ (plus) gate over the input gates of the underlying
personnel rows.
Frontier expansion
Studio caps each circuit fetch at the value of
--max-circuit-nodes (default 200, tunable in
Configuration). Nodes whose children were not fetched in the
current request carry a small gold + badge: clicking it requests
another breadth-first search (BFS) layer rooted at the frontier node
and merges it into the current scene. The cap is per-fetch, not per-scene, so a circuit
grows interactively as you expand the frontiers that interest you.
Inspector panel
Pinning a node opens the inspector. Each gate type (see
Inspecting the Circuit for the catalogue) renders a gate-specific
metadata block: function and result type for agg, left and right
attribute for eq, value for mulinput, relation id and
column list for input and update, and so on.
input and update gates additionally show the stored
probability: clicking the value swaps it for a number input, Enter
sends a set_prob to the server, Esc or blur cancels.
Out-of-range values and ProvSQL errors land inline.
Semiring evaluation strip
Below the canvas, the semiring evaluation strip targets the pinned node (or, when no node is pinned, the root). The semiring select groups compiled entries into four sub-optgroups, with custom and “Other” entries below:
Compiled semirings
Boolean:
boolexpr,boolean.Lineage:
formula,how,why,which.formulapretty-prints the provenance circuit as a symbolic expression (Green-Karvounarakis-Tannen);howis the same algebra in canonical \(\mathbb{N}[X]\) sum-of-products form, so two semantically-equal circuits collapse to identical strings : suitable for provenance-aware equivalence checks.whyandwhichare the set-valued projections.Numeric:
counting,tropical,viterbi,lukasiewicz. Łukasiewicz is the continuous-valued fuzzy logic on numeric values in \([0, 1]\); for the discrete fuzzy / trust shape on a user-defined enum lattice, seemaxminunder User-enum below.Intervals:
interval-union. One UI option backed by thesr_temporal,sr_interval_numandsr_interval_intkernels: the strip picks the right one from the selected mapping’s multirange type (tstzmultirange/nummultirange/int4multirange); requires PostgreSQL 14+. See Temporal Features for the interval-union algebra.User-enum:
minmaxandmaxmin. One UI option per shape, polymorphic over any user-defined PostgreSQL enum carrier (the bottom and top of the lattice come frompg_enum.enumsortorder).minmaxis the security shape (alternatives combine to enum-min, joins to enum-max);maxminis the discrete fuzzy / availability / trust shape (alternatives to enum-max, joins to enum-min). Backed bysr_minmaxandsr_maxminrespectively.
Custom semirings: any user-defined wrapper over
provenance_evaluatediscovered in the schema.Other:
probabilityandPROV-XML export. The probability method picker (see Probabilities) groups exact methods ((default),independent,possible-worlds,tree-decomposition,compilation) and approximate methods (monte-carlo,weightmc). PROV-XML export usesto_provxml; see Provenance Export.
The mapping picker filters on the selected semiring’s expected value
type: only boolean-typed mappings appear under boolean, only
the numeric base types (smallint / integer / bigint /
numeric / real / double precision) under the numeric
group, only multirange-typed mappings under interval-union, and
only mappings whose value column is a user-defined enum
(pg_type.typtype = 'e') under minmax / maxmin.
Polymorphic entries (boolexpr, formula, how, why,
which) accept any mapping. boolexpr and PROV-XML export accept the
mapping as optional: with one, leaves are labelled by the mapping’s
value column; without one, leaves carry their gate UUID
(PROV-XML) or a bare x<id> placeholder (boolexpr). Custom-
semiring entries filter to mappings whose value type matches the
wrapper’s return type. Mismatches are surfaced before the round-trip
as (no compatible mappings : expected …) in the picker.
Run reports the result inline along with the runtime; Monte-Carlo runs additionally show a ± confidence band (Hoeffding bound, 95 % probability). Clear wipes the result so a verbose Why or Formula output does not obscure the canvas; Copy writes the just-rendered payload (with full precision for probability, regardless of the rounded display) to the clipboard.
The probability cell click-toggles between rounded (per the panel’s Probability decimals setting) and full double-precision; copies always carry the full-precision form.
Oversized circuits
When a top-level circuit fetch exceeds the cap, Studio shows a
structured banner instead of an error: This subgraph has 4,521
nodes; the cap is 200 (rendering at depth 4) followed by a
single-click Render at depth 1, then expand interactively button
when the depth-1 envelope itself fits under the cap. Wide-bound
circuits (for instance aggregations with high fan-in) leave the
button out rather than promising a render that will exceed the cap
again. The eval strip (above) still works against the unrendered
root: it operates on the token UUID, not the rendered DAG.
The actionable oversize-circuit banner: when a fetch exceeds the cap, Studio surfaces a single button to start at depth 1 and expand on demand.
Where mode
Where mode is the visual counterpart to where_provenance
(see Where-Provenance). It enables provsql.where_provenance
on the connection and wraps every SELECT so the result carries
the where-provenance of each output value. Hovering a result cell highlights the source cells (in
the sidebar) that contributed to it. No explicit
where_provenance call is required.
Where mode: source relations on the right; hovering a cell highlights the personnel rows that produced it.
The sidebar lists every provenance-tracked relation. Each panel
header reports its row count: personnel – 100 of ~50000 tuples
makes the per-relation cap explicit when a relation is too large to
materialise in full (default 100 rows, tunable in
Configuration). The Input gates only toggle (default on)
hides relations whose first provsql token is not an input
gate, so derived materialisations do not crowd the panel.
Each result row gets a → Circuit button that switches to Circuit mode and pre-loads the provenance DAG of that row’s token; see Mode-switching.
Worked example
Ask for the people listed in Paris:
SELECT * FROM personnel WHERE city = 'Paris';
Each output cell becomes hover-aware: hovering name,
classification, or city lights up the matching cells in the
personnel panel of the sidebar.
Wrap-fallback notice
If a Where-mode query touches no provenance-tracked relation, Studio
silently drops the wrap and surfaces an INFO banner instead of
raising. This is the right default for case-study scripts that do
bulk setup before the first interesting SELECT.
Schema panel
A button in the top nav opens a searchable popover listing every
SELECT-able relation. Each gets one of two pills:
prov (purple) on a relation whose
provsqlcolumn is injected by the planner: provenance tracking is active.mapping (gold) on a relation shaped
(value <T>, provenance uuid), including views fromcreate_provenance_mapping_view. The two pills are mutually exclusive: a mapping view that also carries a planner- injectedprovsqlcolumn is classified as mapping (the more specific category).
On a tracked table, each column is a click target that prefills
SELECT create_provenance_mapping('<table>_<col>_mapping',
'<schema>.<table>', '<col>'); into the query box, so a fresh
mapping is two clicks away.
On any provenance-eligible plain table, + prov and
− prov action chips prefill SELECT add_provenance(...)
/ SELECT remove_provenance(...). They are hidden on views
(regular and materialised) and on foreign tables, since the
underlying ALTER TABLE ADD COLUMN does not work on those
relation kinds, and on mappings.
Configuration
The Config panel groups its options into three sections:
Provenance mirrors the user-level Grand Unified Configuration (GUC) parameters documented in Configuration Reference:
provsql.active,provsql.verbose_level, andprovsql.tool_search_path.provsql.where_provenanceandprovsql.update_provenancelive next to the query box instead (see Per-query toggles), since they are typically flipped per query rather than per session. Studio also forcesprovsql.aggtoken_text_as_uuid = onfor the whole session: clickableagg_tokencells in the result table need that.Session wraps the per-request session state: the
statement_timeoutapplied to every batch, and the visible part ofsearch_path(provsqlis always pinned at the end).Display limits holds the size knobs: Max circuit depth (the BFS depth cap on the initial fetch), Nodes per fetch (the per-fetch node cap), Sidebar rows per relation (per-relation row cap in the Where-mode sidebar), Result rows (row cap on the result table), and Probability decimals (number of decimals in the eval-strip’s probability display).
The Config panel: three section headings above the per-row options.
All values persist on disk in provsql-studio/config.json under
the platform’s user-config directory, and survive a Studio restart:
Linux:
$XDG_CONFIG_HOME/provsql-studio/(defaults to~/.config/provsql-studio/).macOS:
~/Library/Application Support/provsql-studio/.Windows:
%APPDATA%\provsql-studio\(typicallyC:\Users\<user>\AppData\Roaming\provsql-studio\).
Each option is also exposed on the CLI as a flag
(--statement-timeout, --max-sidebar-rows, --max-result-rows,
--max-circuit-nodes, --search-path, --tool-search-path);
the CLI wins on startup, the panel writes back to the JSON.
Mode-switching
The mode tabs in the top nav switch between Where and Circuit. A
switch carries the current SQL forward via sessionStorage; it
auto-replays only when the user just ran the query, so unrun drafts
and plain reloads never auto-execute (important for side-effecting
statements like add_provenance).
In Where mode, every result row gets a → Circuit button that switches to Circuit mode and pre-loads the circuit for that row’s provenance UUID, so a hover-and-trace exploration can cross over to the DAG without retyping the query.
Limitations
Per-fetch circuit cap, not per-scene:
--max-circuit-nodesbounds each fetch independently, so a scene assembled by repeated frontier expansions can exceed the cap. This is intentional: the cap exists to keep the browser responsive on initial render, not to prevent drilling deep where you want to.Verbose semiring outputs are unbounded:
formula,how,why,which, andPROV-XML exportcan return multi-megabyte strings on large circuits. Compute time is bounded bystatement_timeout; output size is not. Use Clear after a bulky run, or evaluate against a pinned subnode to scope the output.
Compatibility
Studio’s version stream is independent of the extension’s. Studio
is released on PyPI as provsql-studio from studio-vX.Y.Z
git tags; the extension is released on PGXN as provsql from
vX.Y.Z git tags. Each Studio release lists a minimum required
extension version.
Studio |
ProvSQL extension |
Notes |
|---|---|---|
|
|
First public release. Requires |
When the installed extension predates this minimum, Studio’s startup
check prints the mismatch and exits. Pass --ignore-version to
override the check, for instance when running against a development
branch.