ProvSQL Logo
  • ProvSQL’s Documentation

User Guide

  • Introduction
    • What Is Provenance?
    • What Can ProvSQL Do?
    • How It Works
    • Where to Start
  • Getting ProvSQL
    • Installing from Source
      • Getting the Source
      • Prerequisites
      • Installation
      • Upgrading an Existing Installation
      • Testing Your Installation
      • Uninstalling
    • Via PGXN
    • Docker Container
    • ProvSQL Studio
    • License
  • Tutorial: Who Killed Daphine?
    • The Scenario
    • Setup
    • Step 1: Explore the Database
    • Step 2: Build a Sightings Table
    • Step 3: Enable Provenance
    • Step 4: Find Contradictions
    • Step 5: Display Provenance Formulas
    • Step 6: Build a Consistent Sightings Table
    • Step 7: Identify Suspects
    • Step 8: Count Confirming Sightings
    • Step 9: Assign Reliability Probabilities
    • Step 10: Find the Murderer
  • Adding Provenance to a Table
    • Loading the Extension
    • Disabling Provenance Temporarily
    • Enabling Provenance on a Table
    • Accessing the Provenance Token
    • Removing Provenance
    • Provenance Mappings
    • ProvSQL Studio
    • Inspecting the Circuit
  • Querying with Provenance
    • How It Works
    • Supported SQL Features
    • Unsupported SQL Features
    • Provenance in Nested Queries
    • CREATE TABLE … AS SELECT
    • INSERT … SELECT
    • The provenance() Function
  • Aggregation and Grouping
    • GROUP BY Queries
    • SELECT DISTINCT
    • Aggregate Functions
      • Arithmetic on Aggregate Results
    • Random-Variable Aggregates
    • HAVING
    • The choose Aggregate
    • Grouping Sets
  • Semiring Evaluation
    • Boolean Semiring
    • Boolean-Expression Semiring
    • Symbolic Representation (as a Formula)
    • Counting Semiring (m-semiring)
    • Why-Provenance
    • How-Provenance
    • Which-Provenance (Lineage)
    • Tropical Semiring (m-semiring)
    • Viterbi Semiring (m-semiring)
    • Łukasiewicz Fuzzy Semiring (m-semiring)
    • Temporal (Interval-Union) Semiring (m-semiring)
    • Numeric and Integer Interval-Union Semirings (m-semirings)
    • Min-Max and Max-Min Semirings (m-semirings)
    • Custom Semirings with provenance_evaluate
    • Provenance Mappings
    • Compatibility with Boolean-Provenance Rewriting
  • Probabilities
    • Setting Probabilities
    • Computing Query Probabilities
      • Computation Methods
    • Expected Values of Aggregates
    • HAVING with Probabilities
    • Independent Tuples and Block-Independent Databases
    • Boolean-Provenance Optimisations
      • Safe-query rewriting (provsql.boolean_provenance)
      • HAVING-COUNT closed-form shortcut
    • Continuous Random Variables
  • Continuous Distributions
    • Introduction
    • Distribution Constructors
    • Arithmetic on Random Variables
    • Probabilistic Queries
    • Configuration of the Monte Carlo Sampler
    • Closed-Form Evaluation
    • Moments and Support
    • Conditional Inference
    • Sampling and Histograms
    • Mixtures and Categorical Random Variables
    • Aggregation Over Random Variables
    • Studio Integration
    • Out of Scope / Open Follow-ups
  • Shapley and Banzhaf Values
    • Background
    • Computing Shapley Values
    • Computing Banzhaf Values
    • Computation Notes
    • Choosing Between Shapley and Banzhaf
  • Where-Provenance
    • Enabling Where-Provenance
    • Using Where-Provenance
    • Projection Gates
    • Example
    • Limitations
  • Data Modification Tracking
    • Enabling Update Provenance
    • INSERT
    • DELETE
    • UPDATE
    • Undoing Updates
    • Limitations
  • Temporal Features
    • Overview
    • Temporal Tables
    • Valid-Time Queries
    • Union of Validity Intervals
    • Temporal Query Functions
    • Relationship to Data Modification Tracking
  • Provenance Export
    • Symbolic Representation
    • PROV-XML Export
    • Circuit Visualisation
    • Verbosity
    • Subcircuit Introspection
    • Circuit Structure
  • ProvSQL Studio
    • Installation
    • Connecting
      • In-page connection editor
      • Database switcher
      • Search path
    • Query box
      • Per-query toggles
    • Circuit mode
      • Worked example
      • Frontier expansion
      • Inspector panel
      • Semiring evaluation strip
      • Oversized circuits
      • Distribution profile panel
      • Conditioning and the row-prov auto-preset
      • Simplified-circuit rendering
    • Where mode
      • Worked example
      • Wrap-fallback notice
    • Schema panel
    • Configuration
    • Mode-switching
    • Limitations
    • Compatibility
  • Configuration Reference

Case Studies

  • Case Study: The Intelligence Agency
    • The Scenario
    • Setup
    • Step 1: Explore the Database
    • Step 2: Enable Provenance and Create a Name Mapping
    • Step 3: Cities Shared by Multiple Agents
    • Step 4: Minimum Security Clearance (sr_minmax)
    • Step 5: Cities with Exactly One Agent (EXCEPT / Monus)
    • Step 6: Where-Provenance
    • Step 7: Assign Probabilities
    • Step 8: Probability – Exact
    • Step 9: Probability – Monte Carlo
    • Step 10: Probability – Knowledge Compiler
    • Step 11: Visualise a Provenance Circuit
    • Step 12: Export to XML
    • Step 13: Large Circuit Benchmark
    • Step 14: The Boolean Expression Behind a Token
    • Step 15: Programmatic Circuit Inspection
  • Case Study: The Open Science Database
    • The Scenario
    • Setup
    • Step 1: Explore the Database
    • Step 2: Enable Provenance and Join with Lookup Tables
    • Step 3: Create a Provenance Mapping
    • Step 4: Identify Single-Source Claims
    • Step 5: Why-Provenance for Replicated Findings
    • Step 6: Evidence Grade Semiring
    • Step 7: Where-Provenance
    • Step 8: Where-Provenance on the Base Table
    • Step 9: Assign Probabilities
    • Step 10: Build the Replication View
    • Step 11: Inspect Replication with sr_counting
    • Step 12: Probability of Replication
    • Step 13: Shapley Values
    • Step 14: Banzhaf Values
    • Step 15: Bulk Shapley/Banzhaf with shapley_all_vars and banzhaf_all_vars
    • Step 16: Arithmetic on Aggregate Results
  • Case Study: Île-de-France Public Transit
    • The Scenario
    • Setup
    • Step 1: Explore the Database
    • Step 2: Provenance and Wheelchair Mapping
    • Step 3: Reachable Stops from Bagneux
    • Step 4: Boolean Provenance – Full Wheelchair Accessibility
    • Step 5: Inspect Individual Results with sr_formula
  • Case Study: Government Ministers Over Time
    • The Scenario
    • Setup
    • Step 1: Explore the Database
    • Step 2: Union of Temporal Intervals
    • Step 3: Timeslice – Who Was in Government During Macron’s First Term?
    • Step 4: History – All Holders of the Minister of Justice Role
    • Step 5: Timetravel – The Socialist Government of July 1981
    • Step 6: Data Modification – Replace the Prime Minister
    • Step 7: Undo – Reinstate the Original Prime Minister
  • Case Study: The Wildlife Photo Archive
    • The Scenario
    • Setup
    • Step 1: Explore the Database
    • Step 2: Enable Provenance and Create a Name Mapping
    • Step 3: Inline Lookup with VALUES
    • Step 4: Conjunctive Query (Naive)
    • Step 5: Mutually Exclusive Candidates with repair_key
    • Step 6: Assign Probabilities and Verify Mutual Exclusion
    • Step 7: Probabilistic Ranking vs. Threshold Filtering
    • Step 8: Absence Constraint with EXCEPT
    • Step 9: Multi-Condition Query via a CTE
    • Step 10: Expected Species Counts with expected
  • Case Study: The City Air-Quality Sensor Network
    • The Scenario
    • Setup
    • Step 1: Inspect a Noisy Reading
    • Step 2: A First Probabilistic Threshold
    • Step 3: The Simplifier in Action
    • Step 4: Calibration via Mixtures
    • Step 5: Aggregation Over Random Variables
    • Step 6: Conditional Inference
    • Step 7: Diagnostic Sampling
    • Step 8: Combining Batches via UNION
    • Step 9: Filtering Grouped Random Variables by Expected Value
    • Step 10: Independent vs Monte Carlo

Developer Guide

  • Developer Guide
    • Chapters
  • PostgreSQL Extension Primer
    • Shared Libraries and shared_preload_libraries
    • Hooks
    • The Query Node Tree
    • Background Workers
    • Memory Management: palloc and Memory Contexts
    • Error Reporting: ereport / elog
    • SQL-Callable C Functions
    • OIDs and Catalog Lookups
    • GUCs (Configuration Parameters)
  • Architecture Overview
    • Extension Lifecycle
    • Component Map
    • Data Flow
    • The OID Cache: constants_t
    • Gate Types
  • Query Rewriting Pipeline
    • Entry Point: provsql_planner
    • Query-Time TID / BID Classifier
    • process_query: The Main Rewriting Function
      • Step 0: Early Exit
      • Step 1: CTE Inlining
      • Step 2: Strip Existing Provenance Columns
      • Step 3: Set-Operation Handling
      • Step 4: Aggregate DISTINCT Rewrite
      • Step 5: Discovery – get_provenance_attributes
      • Step 6: Unsupported Feature Checks
      • Step 7: Build Column Map
      • Step 8: Aggregation Rewriting
      • Step 9: Expression Building – make_provenance_expression
      • Step 10: Splicing – add_to_select
      • Step 11: Replace provenance() Calls
    • Rewriting Rules and Formal Semantics
      • Formal Verification
    • Probabilistic-Qual Classifier
    • Safe-Query Rewriter
      • FD-Aware Extensions
      • TID / BID Propagation Through Derived Relations
  • Memory Management
    • Why Not Regular Tables?
    • Background Worker: provsql_mmap
    • Inter-Process Communication
    • Shared Memory: provsql_shmem
    • Mmap-Backed Data Structures
      • Gate-Type ABI
      • Per-Table Provenance Metadata
    • Per-Backend Circuit Cache
    • Reading Circuits Back
  • Where-Provenance
    • Why a Separate Circuit Class?
    • The WhereCircuit Data Model
    • Evaluation: Locators
    • Building the Circuit During Query Rewriting
    • Sub-Circuit Materialization
    • Why-Provenance vs Where-Provenance
  • Data-Modification Tracking
    • The GUC and the Big Picture
    • The gate_update Gate Type
    • The Triggers
    • The update_provenance Table
    • Undo
    • Time-Travel Queries
  • Aggregation Provenance
    • The Semimodule Picture
      • Row-level provenance and the δ operator
    • The agg_token Type
    • What the Rewriter Builds
    • Currently Supported Aggregates
    • Random-Variable Aggregates
    • Step-by-Step: Adding a New Aggregate
  • Semiring Evaluation
    • The Semiring<V> Interface
      • Required Methods (Pure Virtual)
      • Optional Methods
      • Absorptive Semirings
      • Boolean-Rewrite Compatibility
      • Boolean-Identity Folding
    • Example: The Boolean Semiring
    • Example: The Counting Semiring
    • Step-by-Step: Adding a New Semiring
    • Evaluation Dispatch
    • Symbolic Representation Semirings
  • Probability Evaluation
    • Architecture
    • Background: d-DNNF, Tseytin, Knowledge Compilation
      • d-DNNF
      • Tseytin Encoding
      • Knowledge Compilers and the NNF Format
      • WeightMC: Approximate Weighted Model Counting
      • Tree Decomposition
      • Currently Supported Methods
      • Cmp-Probability Pre-Passes
    • Block-Independent Databases and Multivalued Inputs
      • The gate_mulinput Gate
      • Rewriting Blocks into Independent Booleans
    • Shapley and Banzhaf Values
      • Entry Point
      • The Shapley Recurrence
      • Banzhaf
    • Hybrid Evaluation for Continuous Distributions
    • Conditional Evaluation
    • Step-by-Step: Adding a New Probability Evaluation Method
  • Continuous Distributions
    • Gate Types
    • SQL Surface
    • Planner-Hook Rewriting
    • Monte Carlo Sampler
    • RangeCheck
    • AnalyticEvaluator
    • Analytical Moment Evaluator
    • HybridEvaluator
    • Conditional Evaluation
    • Aggregate Dispatch
    • Studio Rendering
  • Coding Conventions
    • Languages and File Layout
    • Naming
    • Error Reporting
    • Memory Management
    • Doxygen Comments
    • Wiring a New SQL-Callable C Function
    • Adding a New Test
    • Pitfalls to Avoid
  • Testing
    • Test Layout
    • The Schedule File
    • Running Tests
    • Writing a New Test
    • Optional-Tool Skip Pattern
    • Reading Test Failures
  • Debugging
    • Debug Builds
    • Verbose Logging
    • Inspecting Circuits
    • Visualizing Circuits
    • Disabling Provenance
    • GDB Tips
  • Build System
    • Makefile Structure
    • PostgreSQL Version Compatibility
    • Generated SQL
    • Extension Upgrades
      • Writing an Upgrade Script
      • The On-Disk mmap ABI
      • Workflow During Development
      • Automated Testing of the Upgrade Path
        • The Auto-Generated Dev-Cycle Upgrade Script
      • Manual Testing
    • Standalone Tools
    • Documentation Build
    • Releases
    • Studio Releases
    • Website and Deployment
    • CI Workflows
  • ProvSQL Studio
    • Module Layout
    • HTTP API
    • Frontend
    • Per-Batch GUC Application
    • Config Persistence
    • Circuit Fetch and Frontier Expansion
    • Test Harness

API Reference

  • SQL API Reference
  • C/C++ API Reference

References

  • Bibliography
ProvSQL
  • Search


Home · Documentation · Publications · Contributors · Source · Cite