ProvSQL C/C++ API
Adding support for provenance and uncertainty management to PostgreSQL databases
Loading...
Searching...
No Matches
StructuredDNNF.h
Go to the documentation of this file.
1/**
2 * @file StructuredDNNF.h
3 * @brief In-process structured-d-DNNF construction over a query-derived variable
4 * order, for the inversion-free UCQ(OBDD) probability path.
5 *
6 * Given a @c BooleanCircuit (monotone AND / OR / IN over independent inputs) and
7 * a total order on its input variables, compile the function into a ProvSQL
8 * @c dDNNF and read off its probability (inputs independent Bernoulli). For an
9 * inversion-free query the query-derived (Prop. 4.5) order yields a d-DNNF of
10 * size linear in the lineage, where the generic methods (tree-decomposition,
11 * d4) blow up.
12 *
13 * Pure C++ (BooleanCircuit + STL, no PostgreSQL headers) so it builds both in
14 * the extension and the standalone @c -DTDKC harness.
15 */
16#ifndef STRUCTURED_DNNF_H
17#define STRUCTURED_DNNF_H
18
19#include "BooleanCircuit.h"
20#include "dDNNF.h"
21
22#include <map>
23#include <string>
24#include <tuple>
25#include <unordered_map>
26#include <vector>
27#include <cstddef>
28
29/**
30 * @brief Top-down structured-d-DNNF builder over a query-derived variable order.
31 *
32 * This is the §4 construction of the inversion-free plan: rather than the
33 * bottom-up @c apply of @c StructuredDNNF (kept above as a reference oracle), it
34 * compiles the monotone lineage **top-down** into a genuine ProvSQL @c dDNNF
35 * with two node kinds, placed by structure:
36 *
37 * - **Decomposable AND** at independence points. When the residual function is
38 * a product of variable-disjoint sub-functions @f$\varphi_1\wedge\dots@f$
39 * (the consistent-unification self-join's A-part and B-part once the shared
40 * separator variables are fixed; or a pure product term), emit a flat @c AND
41 * and recurse per factor. Decomposability (children share no variable) holds
42 * by construction and is the structural win an OBDD cannot express.
43 * - **Deterministic OR** at Shannon decisions. Otherwise pick the lowest-rank
44 * variable still present and branch on it, emitting
45 * @c OR(AND(v,hi),AND(¬v,lo)). Determinism (children mutually exclusive)
46 * holds because both branches come from a decision on the same variable.
47 *
48 * A component cache (keyed on the canonical residual) shares equal
49 * sub-d-DNNFs, which is what keeps the result linear in the lineage under the
50 * Prop. 4.5 order (and bounds the work). The function is taken as a monotone
51 * DNF expanded from the @c BooleanCircuit (AND distributes, OR unions, IN is a
52 * single literal; @c NOT and multivalued inputs are rejected, out of scope for
53 * the TID inversion-free path).
54 *
55 * Output is a finished @c dDNNF (root set, @c simplify() applied) ready for
56 * @c dDNNF::probabilityEvaluation(). Pure C++ so it builds in the extension and
57 * the standalone @c -DTDKC harness.
58 */
60public:
61 /**
62 * @param bc Boolean circuit (monotone AND/OR/IN, no multivalued
63 * inputs, no NOT).
64 * @param root Root gate of the function to compile.
65 * @param input_rank Maps every IN gate reachable from @p root to a distinct
66 * rank in @c [0,ninputs). Lower rank = decided earlier
67 * (a Prop. 4.5-consistent order, derived by the caller).
68 * @param max_nodes Size guard: throw @c CircuitException once this many
69 * d-DNNF gates have been created (0 = no guard).
70 * @throws CircuitException on multivalued/NOT/unsupported gates, a reachable
71 * input without a rank, or the size guard tripping.
72 */
74 const std::map<gate_t, int> &input_rank,
75 std::size_t max_nodes = 0);
76
77 /**
78 * @brief Structured per-input order key carried by the planner's markers.
79 *
80 * Locates an input variable in the query hierarchy: @c root is the root-class
81 * value (one independent block per value), @c sec the secondary-class value
82 * (one tile per value within a block), and @c factor which quantified factor
83 * (clause) the variable's atom belongs to, or @c GUARD_FACTOR for a self-join
84 * atom shared by every factor of its tile. For the witness
85 * @c S(x,y),A(x,y),S(x,z),B(x,z): the @c S tuple is @c {a,v,GUARD}, @c A is
86 * @c {a,v,0}, @c B is @c {a,v,1}. The caller flattens these keys into a total
87 * rank (root, then sec, then guard-before-payload, then factor) for the
88 * order-only constructor above; the struct is the carrier between the marker
89 * collection and that flattening.
90 */
91 struct InputKey { std::string root; std::string sec; int factor; };
92 static constexpr int GUARD_FACTOR = -1;
93
94 /** @brief The constructed d-DNNF (root set, simplified). */
95 const dDNNF &dnnf() const { return dd_; }
96
97 /** @brief Root gate of the constructed d-DNNF (current after simplify). */
98 gate_t rootGate() const { return dd_.root; }
99
100 /** @brief Probability that the function is true (independent inputs). */
101 double probability() const { return dd_.probabilityEvaluation(); }
102
103 /** @brief d-DNNF gates reachable from the root. */
104 std::size_t size() const;
105
106private:
107 using Var = int; ///< Variable rank in @c [0,ninputs).
108 using Term = std::vector<Var>; ///< A product: sorted, duplicate-free vars.
109 using DNF = std::vector<Term>; ///< A sum of products: canonicalised.
110
113 std::size_t max_nodes_;
114
115 std::vector<double> prob_; ///< prob_[rank] = P(variable)
116 std::vector<std::string> uuid_; ///< rank -> source input UUID (for labels)
117 std::vector<gate_t> in_gate_; ///< rank -> shared dDNNF IN gate
118 std::vector<gate_t> not_gate_; ///< rank -> shared dDNNF NOT(IN) gate (lazy)
119 gate_t true_gate_, false_gate_; ///< empty AND / empty OR terminals
120
121 /* Component cache: (residual, false-sink) -> gate. The sink is part of the
122 * key because the same residual built against a different OR-chain tail (§
123 * @c build) yields a different node. Hashed so a lookup costs one O(|d|)
124 * fingerprint rather than O(log N) comparisons of large DNF keys. */
125 struct CacheKey { DNF d; gate_t fs; bool operator==(const CacheKey &o) const; };
126 struct CacheKeyHash { std::size_t operator()(const CacheKey &k) const; };
127 std::unordered_map<CacheKey, gate_t, CacheKeyHash> cache_;
128
129 /* Expand the circuit's function under @p root into a canonical monotone DNF,
130 * memoised per gate. */
131 DNF extract(const BooleanCircuit &bc, gate_t g,
132 const std::map<gate_t, int> &rank,
133 std::map<gate_t, DNF> &memo) const;
134
135 static DNF canonical(DNF d); ///< sort, dedup, drop supersets
136 static DNF condition(const DNF &d, Var v, bool value);
137
138 /* Split @p d into variable-disjoint OR-components (terms sharing a variable,
139 * transitively, stay together), ordered by least variable; @c {d} if the
140 * co-occurrence graph is connected. These are OR'd, so they are chained, not
141 * AND'd -- see @c build. */
142 std::vector<DNF> orDecompose(const DNF &d) const;
143
144 /* If @p d is a product of variable-disjoint sub-DNFs, return the factors
145 * (size > 1); otherwise return @c {d}. Sound: only splits when an exact
146 * cross-product is verified, so an AND node is always decomposable. */
147 std::vector<DNF> andDecompose(const DNF &d) const;
148
149 gate_t mkAnd(const std::vector<gate_t> &children);
150 gate_t mkLit(Var v); ///< shared positive literal IN
151 gate_t mkNeg(Var v); ///< shared negative literal NOT(IN)
152 gate_t newGate(BooleanGate type); ///< setGate + size guard
153
154 /* Compile @p d into a d-DNNF node whose FALSE leaf is @p false_sink (rather
155 * than the global FALSE terminal). Threading a sink lets an OR-chain build
156 * each disjoint component against the node for "the rest of the disjunction",
157 * so the inert components are not dragged through the residual DNF. */
158 gate_t build(const DNF &d, gate_t false_sink);
159
160};
161
162#endif /* STRUCTURED_DNNF_H */
Boolean provenance circuit with support for knowledge compilation.
BooleanGate
Gate types for a Boolean provenance circuit.
gate_t
Strongly-typed gate identifier.
Definition Circuit.h:49
Boolean circuit for provenance formula evaluation.
gate_t rootGate() const
Root gate of the constructed d-DNNF (current after simplify).
double probability() const
Probability that the function is true (independent inputs).
gate_t mkNeg(Var v)
shared negative literal NOT(IN)
std::vector< DNF > andDecompose(const DNF &d) const
static DNF canonical(DNF d)
sort, dedup, drop supersets
StructuredDNNFBuilder(const BooleanCircuit &bc, gate_t root, const std::map< gate_t, int > &input_rank, std::size_t max_nodes=0)
DNF extract(const BooleanCircuit &bc, gate_t g, const std::map< gate_t, int > &rank, std::map< gate_t, DNF > &memo) const
std::size_t size() const
d-DNNF gates reachable from the root.
std::vector< std::string > uuid_
rank -> source input UUID (for labels)
std::vector< gate_t > in_gate_
rank -> shared dDNNF IN gate
static constexpr int GUARD_FACTOR
std::unordered_map< CacheKey, gate_t, CacheKeyHash > cache_
std::vector< gate_t > not_gate_
rank -> shared dDNNF NOT(IN) gate (lazy)
gate_t mkAnd(const std::vector< gate_t > &children)
static DNF condition(const DNF &d, Var v, bool value)
std::vector< double > prob_
prob_[rank] = P(variable)
std::vector< DNF > orDecompose(const DNF &d) const
gate_t false_gate_
empty AND / empty OR terminals
gate_t mkLit(Var v)
shared positive literal IN
const dDNNF & dnnf() const
The constructed d-DNNF (root set, simplified).
gate_t build(const DNF &d, gate_t false_sink)
std::vector< Term > DNF
A sum of products: canonicalised.
std::vector< Var > Term
A product: sorted, duplicate-free vars.
gate_t newGate(BooleanGate type)
setGate + size guard
int Var
Variable rank in [0,ninputs).
A d-DNNF circuit supporting exact probabilistic and game-theoretic evaluation.
Definition dDNNF.h:71
Decomposable Deterministic Negation Normal Form circuit.
std::size_t operator()(const CacheKey &k) const
bool operator==(const CacheKey &o) const
Structured per-input order key carried by the planner's markers.