Papers about ProvSQL

These papers describe ProvSQL itself or the theoretical foundations on which it is built, including its data model, query rewriting architecture, knowledge compilation approach, and the semiring provenance theory that motivates its design.

2026

  • Sen, A., Maniu, S., and Senellart, P. 2026. ProvSQL: A General System for Keeping Track of the Provenance and Probability of Data. Proc. IEEE 42nd International Conference on Data Engineering (ICDE), Montréal, Canada, May 2026. PDF BibTeX

    The primary reference for the ProvSQL system. Presents the complete data model (multiset-based relational algebra with semiring provenance), the circuit-based provenance representation stored in memory-mapped files, and the full query evaluation architecture. Includes TPC-H-inspired benchmarks comparing ProvSQL to GProM and MayBMS, demonstrating competitive performance and broader SQL coverage.

2025

  • Widiaatmaja, A.A., Djeffal, B., Dandekar, A., and Senellart, P. 2025. Demonstration of ProvSQL Update Provenance through Temporal Databases. Proceedings of the Provenance Week 2025, PW’25, Berlin, Germany, June 22-27, 2025, ACM, 71–76. PDF Poster DOI BibTeX

    Extends ProvSQL to track provenance for data modification operations (INSERT, UPDATE, DELETE) by introducing monus gates for deleted tuples and times gates for inserted ones. Demonstrates the utility of update provenance by implementing temporal database features—time travel, history tracking, and undo—using the union-of-intervals m-semiring.

  • Yunus, F., Karmakar, P., Senellart, P., Abdessalem, T., and Bressan, S. 2025. Using A Probabilistic Database in an Image Retrieval Application. Proceedings 28th International Conference on Extending Database Technology, EDBT 2025, Barcelona, Spain, March 25-28, 2025, OpenProceedings.org, 1106–1109. PDF Video DOI BibTeX

    Demonstrates ProvSQL as the probabilistic database backend for an image retrieval application. Images are indexed by uncertain feature vectors; queries return ranked results with tuple-independent probabilities computed via ProvSQL’s knowledge compilation pipeline. Shows that ProvSQL’s provenance-based probability management integrates naturally with a real-world multimedia retrieval use case.

2024

  • Senellart, P. 2024. On the Impact of Provenance Semiring Theory on the Design of a Provenance-Aware Database System. The Provenance of Elegance in Computation - Essays Dedicated to Val Tannen, Tannen’s Festschrift, University of Pennsylvania, Philadelphia, PA, USA, May 24-25, 2024, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 9:1–9:10. PDF Slides DOI BibTeX

    A reflective essay on how the provenance semiring framework of Green, Karvounarakis, and Tannen directly shaped ProvSQL’s design. Discusses where theory translates cleanly into implementation (e.g., the universality of the polynomial semiring motivating UUID-based annotations), where SQL’s multiset semantics and aggregation required deviations, and where ProvSQL’s development still lags behind theory.

  • Karmakar, P., Monet, M., Senellart, P., and Bressan, S. 2024. Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases. Proc. ACM Manag. Data 2, 2, 92. PDF Slides Poster DOI BibTeX

    Studies the complexity of expected Shapley and Banzhaf values for Boolean functions in probabilistic database settings, showing that their computation is interreducible in polynomial time with probabilistic query evaluation. Designs a polynomial-time algorithm for Boolean functions represented as d-DNNF circuits. Implements and experimentally validates this algorithm within the ProvSQL system, enabling Shapley value computation directly from provenance circuits.

2020

  • Amarilli, A., Capelli, F., Monet, M., and Senellart, P. 2020. Connecting Knowledge Compilation Classes and Width Parameters. Theory Comput. Syst. 64, 5, 861–914. PDF DOI BibTeX

    Establishes formal connections between knowledge compilation classes (OBDDs, d-DNNFs, SDDs, ...) and graph width parameters (treewidth, pathwidth, cliquewidth). Section 5.1 of this paper provides the algorithm directly implemented in ProvSQL’s internal tree-decomposition-based knowledge compiler (tdkc), which converts a provenance circuit into a d-DNNF for tractable probability evaluation.

2019

  • Senellart, P. 2019. Provenance in Databases: Principles and Applications. Reasoning Web. Explainable Artificial Intelligence - 15th International Summer School 2019, Bolzano, Italy, September 20-24, 2019, Tutorial Lectures, Springer, 104–109. PDF Slides DOI BibTeX

    A concise tutorial on database provenance covering Boolean provenance, semiring provenance, and key applications including probabilistic databases, view maintenance, and query explanation. Also surveys provenance beyond the relational setting (XML, graph, triple-store databases). Serves as an accessible introduction to the theoretical foundations underlying ProvSQL.

2018

  • Senellart, P., Jachiet, L., Maniu, S., and Ramusat, Y. 2018. ProvSQL: Provenance and Probability Management in PostgreSQL. Proc. VLDB Endow. 11, 12, 2034–2037. PDF Poster Video DOI BibTeX

    The original demonstration paper introducing ProvSQL. Describes the core design: transparent query rewriting via a PostgreSQL planner hook, the provenance term algebra circuit as a uniform representation for semiring provenance, where-provenance, and m-semiring provenance, and knowledge compilation to d-DNNF for probabilistic query evaluation.

2017

  • Senellart, P. 2017. Provenance and Probabilities in Relational Databases. SIGMOD Rec. 46, 4, 5–15. PDF DOI BibTeX

    An accessible overview of provenance and probabilistic query evaluation in relational databases, covering the semiring framework, knowledge compilation to d-DNNF, and the connection to probabilistic databases. Serves as a concise introduction to the theoretical and practical ideas behind ProvSQL.

Foundational Works

These papers establish the theoretical framework on which ProvSQL is built: why- and where-provenance, provenance semirings, aggregate provenance, m-semirings for non-monotone queries, the connection between provenance and probabilistic databases, circuit-based provenance representations, and algebraic provenance for update queries.

2020

  • Bourhis, P., Deutch, D., and Moskovitch, Y. 2020. Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries. Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, ACM, 415–429. DOI BibTeX

    Introduces an algebraic provenance framework for hyperplane update queries (INSERT, DELETE, UPDATE via linear constraints), defining a semiring that tracks provenance through data-modification operations in an equivalence-invariant manner. This work provides the theoretical foundation for ProvSQL’s update provenance support.

2014

  • Deutch, D., Milo, T., Roy, S., and Tannen, V. 2014. Circuits for Datalog Provenance. Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014, OpenProceedings.org, 201–212. DOI BibTeX

    Shows how to represent provenance of Datalog queries compactly as arithmetic circuits rather than as (potentially exponential) polynomials or sets. Establishes circuit-based provenance as a practical representation for recursive queries, with efficient evaluation algorithms. ProvSQL’s circuit-based provenance representation for all gate types is directly inspired by this work.

2011

  • Amsterdamer, Y., Deutch, D., and Tannen, V. 2011. Provenance for aggregate queries. Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, ACM, 153–164. DOI BibTeX

    Extends semiring provenance to SQL aggregate queries (COUNT, SUM, AVG, etc.), which fall outside the positive relational algebra covered by Green et al. Introduces a semimodule structure over semirings to handle aggregation provenance. ProvSQL’s support for provenance of GROUP BY queries and aggregation gates is grounded in this work.

2010

  • Geerts, F. and Poggi, A. 2010. On database query languages for K-relations. J. Appl. Log. 8, 2, 173–185. DOI BibTeX

    Introduces m-semirings (semirings equipped with a monus operator ⊖) to extend the provenance semiring framework to the full relational algebra including set difference and negation. Defines the corresponding notion of K-relations for m-semirings. ProvSQL’s handling of EXCEPT queries and its monus gate type are direct implementations of this framework.

2007

  • Green, T.J., Karvounarakis, G., and Tannen, V. 2007. Provenance semirings. Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, ACM, 31–40. DOI BibTeX

    The foundational paper introducing provenance semirings. Defines annotated relations over commutative semirings and shows that the standard relational algebra operators correspond to semiring operations, unifying many existing provenance formalisms (why-provenance, lineage, trio, bag semantics) as instances of a single algebraic framework. ProvSQL’s core data model is a direct implementation of this theory.

2006

  • Green, T.J. and Tannen, V. 2006. Models for Incomplete and Probabilistic Information. Current Trends in Database Technology - EDBT 2006, EDBT 2006 Workshops, Munich, Germany, March 26-31, 2006, Revised Selected Papers, Springer, 278–296. DOI BibTeX

    Proposes a unified model for incomplete and probabilistic databases based on c-tables annotated with provenance expressions over a Boolean semiring. Shows how probabilistic query evaluation reduces to evaluating the provenance expression of a result tuple over a probability distribution. This connection between provenance and probabilistic databases is the theoretical basis for ProvSQL’s probability computation features.

2001

  • Buneman, P., Khanna, S., and Tan, W.C. 2001. Why and Where: A Characterization of Data Provenance. Database Theory - ICDT 2001, 8th International Conference, London, UK, January 4-6, 2001, Proceedings, Springer, 316–330. DOI BibTeX

    The paper that introduced the concepts of why-provenance and where-provenance. Why-provenance tracks which source tuples contribute to a result; where-provenance tracks which specific attribute values (locations in the source) a result value was copied from. ProvSQL implements where-provenance via project and eq gates that record the column-level origin of each output value.


Download full bibliography (.bib)