ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
WhereCircuit.h
Go to the documentation of this file.
1/**
2 * @file WhereCircuit.h
3 * @brief Where-provenance circuit tracking column-level data origin.
4 *
5 * Where-provenance records not just which base tuples contributed to a
6 * query result, but also *which attribute values* in those base tuples
7 * gave rise to each output column.
8 *
9 * @c WhereCircuit extends @c Circuit<WhereGate> with per-gate metadata
10 * that describes:
11 * - **Input gates** – the source table, row UUID, and number of columns.
12 * - **Projection gates** – the list of attribute positions being projected.
13 * - **Equality gates** – the pair of attribute positions being joined.
14 *
15 * The @c evaluate() method traverses the circuit and returns, for each
16 * output position, a set of @c Locator values identifying the base
17 * (table, tuple, column) triples that the value was copied from.
18 */
19#ifndef WHERE_CIRCUIT_H
20#define WHERE_CIRCUIT_H
21
22#include <set>
23#include <vector>
24#include <utility>
25
26#include "Circuit.hpp"
27
28/**
29 * @brief Gate types for a where-provenance circuit.
30 *
31 * - @c UNDETERMINED Placeholder; should not appear in a complete circuit.
32 * - @c TIMES Product (conjunction) of child where-provenance sets.
33 * - @c PLUS Sum (disjunction) of child where-provenance sets.
34 * - @c EQ Equijoin gate recording the joined attribute pair.
35 * - @c PROJECT Projection gate recording which attributes are kept.
36 * - @c IN Input gate for a single base-relation tuple.
37 */
38enum class WhereGate {
39 UNDETERMINED, ///< Placeholder; should not appear in a complete circuit
40 TIMES, ///< Product (conjunction) of child where-provenance sets
41 PLUS, ///< Sum (disjunction) of child where-provenance sets
42 EQ, ///< Equijoin gate recording the joined attribute pair
43 PROJECT, ///< Projection gate recording which attributes are kept
44 IN ///< Input gate for a single base-relation tuple
45};
46
47/**
48 * @brief Circuit encoding where-provenance (column-level data origin).
49 *
50 * Stores extra per-gate metadata that relates each gate to its position
51 * in the original query plan (table name, row UUID, column index).
52 */
53class WhereCircuit : public Circuit<WhereGate> {
54 private:
55 std::unordered_map<gate_t, uuid> input_token; ///< UUID of the source tuple for each IN gate
56 std::unordered_map<gate_t, std::pair<std::string,int>> input_info; ///< (table name, nb_columns) for each IN gate
57 std::unordered_map<gate_t, std::vector<int>> projection_info; ///< Projected attribute positions for PROJECT gates
58 std::unordered_map<gate_t, std::pair<int,int>> equality_info; ///< Joined attribute pair (pos1, pos2) for EQ gates
59
60 public:
61 /** @copydoc Circuit::setGate(const uuid&, gateType) */
62 gate_t setGate(const uuid &u, WhereGate type) override;
63
64 /**
65 * @brief Create an input gate for a specific table row.
66 *
67 * @param u UUID of the source tuple (row token).
68 * @param table Name of the source relation.
69 * @param nb_columns Number of columns in the source tuple.
70 * @return Gate identifier.
71 */
72 gate_t setGateInput(const uuid &u, std::string table, int nb_columns);
73
74 /**
75 * @brief Create a projection gate with column mapping.
76 *
77 * @param u UUID to associate with this gate.
78 * @param infos List of attribute positions surviving the projection.
79 * @return Gate identifier.
80 */
81 gate_t setGateProjection(const uuid &u, std::vector<int> &&infos);
82
83 /**
84 * @brief Create an equality (equijoin) gate for two attribute positions.
85 *
86 * @param u UUID to associate with this gate.
87 * @param pos1 Left-side attribute position.
88 * @param pos2 Right-side attribute position.
89 * @return Gate identifier.
90 */
91 gate_t setGateEquality(const uuid &u, int pos1, int pos2);
92
93 /** @copydoc Circuit::toString() */
94 std::string toString(gate_t g) const override;
95
96 private:
97 /**
98 * @brief Recursive helper for @c toString that propagates the parent
99 * gate type for parenthesis elision.
100 *
101 * The @p parent parameter drives the wrap decision: at the root
102 * (parent set to @c UNDETERMINED) the outer parens are dropped, and
103 * when @p parent matches the current gate type (associative
104 * TIMES/PLUS) the wrap is dropped to flatten same-op nesting. A
105 * 1-wire TIMES/PLUS also bypasses the wrap and delegates to its
106 * single child.
107 */
108 std::string toStringHelper(gate_t g, WhereGate parent) const;
109 public:
110
111 /**
112 * @brief Describes the origin of a single attribute value.
113 *
114 * A @c Locator identifies a specific attribute of a specific tuple in a
115 * specific base table as the origin of an output value.
116 */
117 struct Locator {
118 std::string table; ///< Name of the source relation
119 uuid tid; ///< UUID (row token) of the source tuple
120 int position; ///< Zero-based column index within the tuple
121
122 /**
123 * @brief Construct a @c Locator.
124 * @param t Source table name.
125 * @param u Source tuple UUID.
126 * @param i Column position.
127 */
128 Locator(std::string t, uuid u, int i) : table(t), tid(u), position(i) {}
129 /**
130 * @brief Lexicographic ordering for use in @c std::set.
131 * @param that Other @c Locator.
132 * @return @c true if @c *this is less than @p that.
133 */
134 bool operator<(Locator that) const;
135 /**
136 * @brief Return a human-readable representation of this locator.
137 * @return String of the form "table[tid].position".
138 */
139 std::string toString() const;
140 };
141
142 /**
143 * @brief Evaluate the where-provenance circuit at gate @p g.
144 *
145 * Returns one set of @c Locator values per output column position.
146 * Each @c Locator in the set identifies a base-relation cell that
147 * contributed the value at that output position.
148 *
149 * @param g Root gate to evaluate.
150 * @return Vector (indexed by output column) of sets of @c Locator values.
151 */
152 std::vector<std::set<Locator>> evaluate(gate_t g) const;
153};
154
155#endif /* WHERE_CIRCUIT_H */
gate_t
Strongly-typed gate identifier.
Definition Circuit.h:49
Out-of-line template method implementations for Circuit<gateType>.
WhereGate
Gate types for a where-provenance circuit.
@ PROJECT
Projection gate recording which attributes are kept.
@ EQ
Equijoin gate recording the joined attribute pair.
@ PLUS
Sum (disjunction) of child where-provenance sets.
@ TIMES
Product (conjunction) of child where-provenance sets.
@ IN
Input gate for a single base-relation tuple.
@ UNDETERMINED
Placeholder; should not appear in a complete circuit.
Generic template base class for provenance circuits.
Definition Circuit.h:62
std::string uuid
Definition Circuit.h:65
Circuit encoding where-provenance (column-level data origin).
std::unordered_map< gate_t, std::pair< int, int > > equality_info
Joined attribute pair (pos1, pos2) for EQ gates.
std::unordered_map< gate_t, uuid > input_token
UUID of the source tuple for each IN gate.
std::unordered_map< gate_t, std::pair< std::string, int > > input_info
(table name, nb_columns) for each IN gate
std::unordered_map< gate_t, std::vector< int > > projection_info
Projected attribute positions for PROJECT gates.
gate_t setGateInput(const uuid &u, std::string table, int nb_columns)
Create an input gate for a specific table row.
gate_t setGateEquality(const uuid &u, int pos1, int pos2)
Create an equality (equijoin) gate for two attribute positions.
std::string toStringHelper(gate_t g, WhereGate parent) const
Recursive helper for toString that propagates the parent gate type for parenthesis elision.
gate_t setGate(const uuid &u, WhereGate type) override
Create or update the gate associated with UUID u.
std::string toString(gate_t g) const override
Return a textual description of gate g for debugging.
gate_t setGateProjection(const uuid &u, std::vector< int > &&infos)
Create a projection gate with column mapping.
std::vector< std::set< Locator > > evaluate(gate_t g) const
Evaluate the where-provenance circuit at gate g.
Describes the origin of a single attribute value.
std::string toString() const
Return a human-readable representation of this locator.
bool operator<(Locator that) const
Lexicographic ordering for use in std::set.
int position
Zero-based column index within the tuple.
std::string table
Name of the source relation.
uuid tid
UUID (row token) of the source tuple.
Locator(std::string t, uuid u, int i)
Construct a Locator.