ProvSQL Studio
ProvSQL Studio (studio/, distributed on PyPI as
provsql-studio) is a Flask + vanilla-JavaScript web UI that
points a browser at a ProvSQL-enabled PostgreSQL database. This
chapter documents the layout for contributors; the user-facing
behaviour is in ProvSQL Studio, and the release plumbing
is in Studio Releases.
Studio’s version stream is independent of the extension’s. Each
Studio release lists a minimum extension version in the
compatibility table; the runtime
check that enforces it lives in
studio/provsql_studio/cli.py
(REQUIRED_PROVSQL_VERSION).
Module Layout
Path |
What it contains |
|---|---|
|
Holds |
|
Lets |
|
argparse front end + extension-version preflight check.
Resolves the connection target ( |
|
Flask app factory + all HTTP routes. Owns the
|
|
Everything that touches PostgreSQL: pool construction,
per-batch GUC application, |
|
Circuit fetch + DOT layout pipeline behind
|
|
The single page; |
|
Common shell + Where-mode logic: nav, schema panel, config panel, query box, history, result rendering, hover-to-trace. |
|
Circuit-mode logic: DAG layout, frontier expansion, node inspector, eval-strip dispatching. |
|
BEM-style stylesheet split into two disjoint prefix
namespaces. |
|
Design tokens (palette, typography). Edit this rather than
inlining colors into |
|
Self-hosted webfonts (EB Garamond, Fira Code, Jost), with OFL license files alongside. |
|
Pytest unit suite (one file per Flask blueprint area:
|
|
Developer-facing demo loaders ( |
|
Per-release notes consumed by |
HTTP API
The browser only talks to Studio itself, never directly to PostgreSQL. All data lookups go through the routes below; the front end is otherwise self-contained (no third-party CDN at runtime, fonts are bundled).
Path |
Method |
Purpose |
|---|---|---|
|
GET |
Static shell. Mode is URL-driven; the body class is set
server-side ( |
|
GET |
Static asset passthrough. |
|
GET / POST |
GET returns the active endpoint summary (user, host, db,
PG server version) used by the connection chip and the
polling status dot. POST swaps pools after probing the
new DSN with |
|
GET |
Lists databases the current role can access; powers the in-nav database switcher. |
|
GET |
Per-relation row dumps for the Where-mode sidebar
(capped at |
|
GET |
Schema-panel data: every selectable relation with its
PROV / MAPPING classification, columns, and click-target
hints ( |
|
POST |
Statement-timeout-bounded multi-statement runner. Splits
on |
|
POST |
Sends |
|
GET |
Returns the BFS-bounded subgraph rooted at |
|
POST |
Same as above but rooted at a frontier node, used by the
gold- |
|
GET |
Resolves a leaf gate to its (relation, primary-key, value)
triple via |
|
POST |
Click-to-edit probability on |
|
GET |
Mapping picker contents for the eval strip. |
|
GET |
User-defined semiring wrappers discovered in the schema. Drives the “Custom Semirings” optgroup in the eval strip. |
|
POST |
Compiled-semiring evaluation against a pinned token. The
handler in |
|
GET / POST |
Read / write the persisted Config-panel state (panel
options + GUC overrides). POST validates each field via
|
|
POST / DELETE / GET |
Notebook-kernel lifecycle: create a pinned session (429 at
the |
|
POST |
Run one notebook cell on its kernel: |
|
GET |
The bundled example notebooks (generated from the user
guide): list with titles, and the raw |
|
POST |
Create a database and best-effort install the provsql extension in it; backs the binding banner’s Create action. |
|
POST |
Drop every user schema in the connected database, reinstall provsql, and bounce the pool – the nav-bar “empty database” broom. |
The write endpoints that mutate the database (/api/set_prob,
/api/exec, /api/nb/exec, /api/databases,
/api/database/empty) trust the connecting role for
authorization: Studio does not enforce a read-only PostgreSQL
role itself. Connect with the privileges your workflow expects.
Frontend
The single index.html is rendered identically for every mode;
app.js reads document.body.classList to discover the active
mode and lazy-loads circuit.js only when the body carries
mode-circuit and notebook.js (plus the vendored
marked + DOMPurify Markdown renderer under static/vendor/) only
under mode-notebook.
A small window.__provsqlStudio shared-state object exposes the
current mode and a few utilities (escapeHtml / escapeAttr /
formatCell) so circuit.js and the inline runQuery form
handler can talk to app.js without globals.
Mode switching uses anchor-tag navigations (<a href="/circuit">)
rather than history.pushState: each mode is its own
server-rendered page so the JS bootstraps from a clean slate. The
query carries forward via sessionStorage; auto-replay only
fires when the SQL was actually run (an unrun draft survives the
switch but does not auto-execute, which matters for
side-effecting statements like add_provenance).
Per-Batch GUC Application
Every /api/exec request runs in a single transaction. Before
the user’s SQL, exec_batch issues a SET LOCAL for each
GUC the panel and per-query toggles imply:
provsql.active(panel)provsql.verbose_level(panel, 0..100)provsql.tool_search_path(panel)provsql.monte_carlo_seed(panel;-1= non-deterministic, any other integer reseeds the sharedstd::mt19937_64used by Monte Carlo and continuous sampling)provsql.rv_mc_samples(panel; sample budget for the analytical-evaluator MC fallback,0turns the fallback into an exception)provsql.simplify_on_load(panel; toggles the universal peephole pass at circuit-load time – changing it invalidates Studio’s layout cache because circuit shapes can shift)provsql.hybrid_evaluation(panel, debug-only; gates the in-evaluator hybrid path. Same cache-invalidation rule assimplify_on_load)provsql.provenance = 'where'(per-query toggle; locked on in Where mode)provsql.update_provenance(per-query toggle, free in both modes)provsql.aggtoken_text_as_uuid(alwayson: clickableagg_tokencells need the underlying UUID exposed in text representation)statement_timeout(panel, in milliseconds)search_path, withprovsqlalways pinned at the end (seeprovsql_studio.db.compose_search_path())
The four continuous-distribution panel GUCs
(monte_carlo_seed, rv_mc_samples, simplify_on_load,
hybrid_evaluation) are enumerated in _PANEL_GUCS in
studio/provsql_studio/db.py; simplify_on_load and
hybrid_evaluation additionally clear layout_cache in
POST /api/config so the next /api/circuit re-renders a
circuit whose folded shape may have changed.
SET LOCAL scopes the change to the transaction so a parallel
request on the same connection cannot see the override.
Notebook Kernels
Notebook mode replaces the per-request pool checkout with a pinned
connection per kernel. db.open_kernel_connection opens it with
the same session defaults as the pool (configure_connection) and
flips it to autocommit: nothing dangles between cells, while
anything a cell commits – and any session-scoped object: temporary
tables, plain SET s, prepared statements – persists for the
kernel’s lifetime. That is the Jupyter state model with the database
session in the interpreter’s role.
db.exec_kernel_cell runs one cell inside conn.transaction(),
reusing exec_batch_on verbatim (SET LOCAL prelude including
the per-cell scheme override, classifier savepoint, where-wrap, COPY
blocks, halt-on-first-error), so cell semantics match the query box
exactly; a failed cell rolls back cleanly while earlier committed
cells persist. At the transaction boundary it distinguishes a dead
kernel (wedged mid-protocol, broken socket – detected via
conn.closed / conn.broken / pgconn.transaction_status)
from a legitimate COMMIT-time failure: only the former closes the
connection and tells the front-end to discard the kernel.
The registry in app.extensions["provsql_kernels"] maps session
ids to entries {conn, lock, dsn, db, pid, last_used}. Each entry
has its own lock, so cells on one kernel serialize (a concurrent
/api/nb/exec gets a 409 kernel_busy) while different kernels
run concurrently. Kernels idle past KERNEL_IDLE_TIMEOUT are
garbage-collected lazily on the next kernel-touching request; a
connection switch drops them all (they are pinned to the old DSN).
MAX_KERNELS caps the registry (429 on create); the front-end
closes its kernel on pagehide via navigator.sendBeacon to the
POST twin /api/nb/session/<id>/close.
The .ipynb mapping is nbformat v4: SQL cells are code cells
whose outputs carry the block list under
application/vnd.provsql.blocks+json plus a text/html snapshot
for external viewers; circuit cells are code cells with a
SQL-comment source, the scene JSON under
application/vnd.provsql.scene+json and a self-contained
image/svg+xml snapshot; evaluation cells record their invocation
in metadata.provsql and the result under
application/vnd.provsql.eval+json plus text/plain. Studio
re-renders from the vnd.provsql payloads and never consumes the
viewer fallbacks; GitHub / nbviewer do the opposite. The notebook’s
database binding is metadata.provsql.database (name only).
In the Playground, kernels map onto the single shared PGlite session;
see ProvSQL Playground (browser build) for the single-session caveats (restart as
DISCARD ALL, the beacon bridge, examples in the Pyodide FS).
Config Persistence
Panel options and GUC overrides persist as a JSON file under the
platform’s user-config directory (platformdirs.user_config_path):
Linux $XDG_CONFIG_HOME/provsql-studio/config.json, macOS
~/Library/Application Support/provsql-studio/, Windows
%APPDATA%\provsql-studio\. The path is overridable via
$PROVSQL_STUDIO_CONFIG_DIR (the unit and e2e tests use this
to keep their state out of the developer’s real config).
Validation lives in db.validate_panel_option and is called
both from POST /api/config (rejection comes back as an inline
error) and from the app factory (rejection at startup logs and
falls back to defaults). The helper rejects forbidden characters
(;, --, /*, */) up front to keep an obvious
SQL-injection vector out of the per-batch SET line.
Circuit Fetch and Frontier Expansion
A circuit fetch issues circuit_subgraph server-side,
which performs a BFS rooted at a token and returns at most
--max-circuit-nodes rows. circuit.py then builds a DOT
source from the rows and runs dot -Tjson to obtain layout
coordinates; the front end consumes the JSON directly.
Nodes whose children were not fetched in the current request
(their gate has more children than the BFS layer admitted) are
flagged as frontier nodes and rendered with a gold +
badge. Clicking the badge calls /api/circuit/<token>/expand
rooted at the frontier node, the response gets merged into the
existing scene, and app.js repaints. The cap is per-fetch,
not per-scene, so a circuit grows interactively as you expand
the frontiers that interest you.
Test Harness
studio/tests/conftest.py provides a session-scoped
test_dsn fixture that creates a unique database, installs
the extension, and runs test/sql/setup.sql plus
test/sql/add_provenance.sql (the same fixture the extension
regression suite uses). The teardown drops the database.
Override with the $PROVSQL_STUDIO_TEST_DSN environment
variable to reuse an existing database (CI uses this). In that
case the harness only checks that the extension is installed and
the personnel table is present, leaving setup to the caller.
The unit suite exercises the Flask layer through
app.test_client() (no live HTTP). The Playwright e2e suite
under tests/e2e/ spawns the Studio CLI in a subprocess
against the same test_dsn (with PROVSQL_STUDIO_CONFIG_DIR
pointed at a tempdir so a developer’s persisted UI settings can
not override the --search-path provsql_test we pass at
startup) and drives the live UI through a chromium instance.
Both suites live under the same pytest invocation: make
studio-test from the repo root runs ruff check first, then
pytest tests (which walks both directories). For
finer-grained runs, pytest tests --ignore=tests/e2e skips the
chromium tax.