Authors:
(1) Ankur Nath, Department of Computer Science and Engineering, Texas A&M University;
(2) Alan Kuhnle, Department of Computer Science and Engineering, Texas A&M University.
Table of Links
Summary and Outlook, References
4 EVALUATION for SAT
4.1 Problem Formulation
4.2 Learned Heuristics
In this subsection, we delve into GNNSAT (Yolcu and P´oczos, 2019) and why we think there is a need for reevaluating GNNSAT.
GNNSAT In their influential paper, Yolcu and P´oczos (2019) utilized a GNN to train on factor graph representations of boolean formulas in conjunctive normal form. Their goal was to learn a variable selection heuristic for a stochastic local search algorithm, WalkSAT (Selman et al., 1993). The algorithm demonstrated promising performance compared to WalkSAT. However, it fell short in terms of generalization abilities when compared to the WalkSAT. A direct comparison with SOTA SAT heuristics was omitted. As benchmark instances are extremely large and run time is the metric of performance for comparing with SAT heuristics, the scalability issues of GNNs represent a bottleneck for conducting a fair comparison with SOTA SAT heuristics.
It is important to note that there have been significant improvements (McAllester et al., 1997; Hoos et al., 2002; Li et al., 2007; Mazure et al., 1997) in SAT heuristics since the WalkSAT algorithm. However, many of these improvements do not currently represent SOTA SAT heuristics.
While we acknowledge that many neural networkbased approaches may face challenges in competing with SOTA SAT solvers in terms of run time due to scalability issues with GNNs, we believe it is still valuable to compare them with algorithms that demonstrate moderate performance. For instance, Kurin et al. (2020) compared their work with MiniSAT(E´en and S¨orensson, 2003). Since run time is an issue for GNN, they compared their algorithm with respect to the number of steps (one step is equivalent to choosing a variable and setting it to a value). This comparison provides insights into performance gaps between approaches and showcases the potential of learned SAT solvers to significantly reduce the number of steps. Such reductions, when coupled with advancements in domain-specific processors, can lead to a decrease in runtime, making learned SAT solvers more practical in the future.
Therefore, we focus on evaluating how much improvement GNNSAT provides in terms of steps to solve a problem rather than solely considering absolute runtime. Specifically, our goal is to evaluate whether the performance of the learned heuristic remains competitive in terms of steps, particularly for small instances, when compared to simple heuristics. Addressing this question holds significant implications for the research community in devising strategies to bridge the performance gap. To achieve this objective, we undertake a comparative analysis between SoftTabu and GNNSAT, as both algorithms involve the selection of a variable at each time step and setting its value.
4.3 Dataset
We conduct experiments utilizing randomly generated formulas derived from a diverse family of distributions, encompassing random 3-SAT, clique detection, graph coloring, and dominating set. These specific problem distributions were utilized in the evaluation of GNNSAT. Random K-SAT problems, especially when critically constrained, are hard, particularly in proximity to the satisfactory/unsatisfactory phase boundary (Mitchell et al., 1992; Selman et al., 1996). These problems serve as standard benchmarks for assessing SAT, with the threshold for 3-SAT occurring when problems have approximately 4.26 times as many clauses as variables. The other three problems fall under the category of NP-complete graph problems. For each of these problems, we randomly sample an Erd˝os-R´enyi graph denoted as G(N, p), where N is the number of vertices and p is the probability of an edge between any two vertices independently from every other edge. We use CNFgen (Lauria et al., 2017) for instance generation and MiniSAT for filtering out unsatisfiable formulas. It is worth noting that we exclude vertex covering—also a benchmark distribution for GNNSAT—due to the unavailability of support for this family of problems in the CNFgen package.
4.4 Is Scalability the Only Limiting Factor?
To evaluate the algorithms, we sample 100 satisfiable formulas from each of the four problem distributions discussed above to obtain evaluation sets. We then run 25 search trials (with each trial starting at a random initial assignment) for each problem.
Since comparing SoftTabu with GNNSAT in runtime will be unfair for the reasons stated above, we follow the evaluation method proposed in KhudaBukhsh et al. (2016). There are three performance metrics for each problem distribution: the average number of steps, the median of the median number of steps (the inner median is over trials on each problem and the outer median is over the problems in the evaluation sets), and the percentage of instances considered solved (the median number of steps less than the allowed number of steps). The main goal of this evaluation is to see if, with the help of GNN, GNNSAT can find the solution in fewer steps than SoftTabu. As shown in Table 3, SoftTabu demonstrates superior performance compared to the learned heuristic, thus shedding light on the performance limitations of the learned local search heuristic. Although GNNSAT is a novel attempt, it seems to show only marginal improvement in performance. Therefore, it is possible that GNNSAT is constrained by both performance and scalability.
This paper is available on arxiv under CC 4.0 DEED license.