Lars Kotthoff

Publications

For citation numbers, please see my Google Scholar page.

2025

Raponi, Elena, Lars Kotthoff, Hyunsun Alicia Kim, and Marius Lindauer. “Automated Machine Learning For Computational Mechanics (Dagstuhl Seminar 24282).” Edited by Elena Raponi, Lars Kotthoff, Hyunsun Alicia Kim, and Marius Lindauer. Dagstuhl Reports 14, no. 7 (2025): 17–34. https://doi.org/10.4230/DagRep.14.7.17. preprint PDF bibTeX abstract
Machine learning (ML) has achieved undeniable success in computational mechanics, an ever-growing discipline that impacts all areas of engineering, from structural and fluid dynamics to solid mechanics and vehicle simulation. Computational mechanics uses numerical models and time- and resource-consuming simulations to reproduce physical phenomena, usually with the goal of optimizing the parameter configuration of the model with respect to the desired properties of the system. ML algorithms enable the construction of surrogate models that approximate the outcome of the simulations, allowing faster identification of well-performing configurations. However, determining the best ML approach for a given task is not straightforward and depends on human experts. Automated machine learning (AutoML) aims to reduce the need for experts to obtain effective ML pipelines. It provides off-the-shelf solutions that can be used without prior knowledge of ML, allowing engineers to spend more time on domain-specific tasks. AutoML is underutilized in computational mechanics; there is almost no communication between the two communities, and engineers spend unnecessary effort selecting and configuring ML algorithms. Our Dagstuhl Seminar aimed to (i) raise awareness of AutoML in the computational mechanics community, (ii) discover strengths and challenges for applying AutoML in practice, and (iii) create a bilateral exchange so that researchers can mutually benefit from their complementary goals and needs.

Stein, Niki van, Anna V. Kononova, Lars Kotthoff, and Thomas Bäck. “Code Evolution Graphs: Understanding Large Language Model Driven Design of Algorithms.” In Genetic and Evolutionary Computation Conference (GECCO), 2025. preprint PDF bibTeX abstract
Large Language Models (LLMs) have demonstrated great promise in generating code, especially when used inside an evolutionary computation framework to iteratively optimize the generated algorithms. However, in some cases they fail to generate competitive algorithms or the code optimization stalls, and we are left with no recourse because of a lack of understanding of the generation process and generated codes. We present a novel approach to mitigate this problem by enabling users to analyze the generated codes inside the evolutionary process and how they evolve over repeated prompting of the LLM. We show results for three benchmark problem classes and demonstrate novel insights. In particular, LLMs tend to generate more complex code with repeated prompting, but additional complexity can hurt algorithmic performance in some cases. Different LLMs have different coding ``styles'' and generated code tends to be dissimilar to other LLMs. These two findings suggest that using different LLMs inside the code evolution frameworks might produce higher performing code than using only one LLM.

2024

Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds. Applied Machine Learning Using mlr3 in R. 1st ed. CRC Press, 2024. preprint PDF bibTeX

Rogers, Jen, Marie Anastacio, Jürgen Bernard, Mehdi Chakhchoukh, Rebecca Faust, Andreas Kerren, Steffen Koch, Lars Kotthoff, Cagatay Turkay, and Emily Wall. “Visualization and Automation in Data Science: Exploring the Paradox of Humans-in-the-Loop.” In IEEE Visualization in Data Science (VDS), 1–5, 2024. https://doi.org/10.1109/VDS63897.2024.00005. preprint PDF bibTeX abstract
We explore the interplay between automation and human involvement in data science. Emerging from in-depth discussions at a Dagstuhl seminar, we synthesize perspectives from Automated Data Science (AutoDS) and Interactive Data Visualization (VIS) – two fields that traditionally represent opposing ends of the human-machine spectrum. While AutoDS seeks to enhance efficiency through increasing automation, VIS underscores the critical value of human involvement in providing nuanced understanding, creativity, innovation, and contextual relevance. We explore these dichotomies through an online survey and advocate for a balanced approach that harmonizes the speed and consistency of effective automation with the indispensable insights of human expertise and thought. Ultimately, we confront the essential question: what aspects of data science should we automate?

Crisan, Anamaria, Lars Kotthoff, Marc Streit, and Kai Xu. “Human-Centered Approaches for Provenance in Automated Data Science (Dagstuhl Seminar 23372).” Edited by Anamaria Crisan, Lars Kotthoff, Marc Streit, and Kai Xu. Dagstuhl Reports 13, no. 9 (2024): 116–36. https://doi.org/10.4230/DagRep.13.9.116. preprint PDF bibTeX abstract
The scope of automated machine learning (AutoML) technology has extended beyond its initial boundaries of model selection and hyperparameter tuning and towards end-to-end development and refinement of data science pipelines. These advances, both theoretical and realized, make the tools of data science more readily available to domain experts that rely on low- or no-code tooling options to analyze and make sense of their data. To ensure that automated data science technologies are applied both effectively and responsibly, it becomes increasingly urgent to carefully audit the decisions made both automatically and with guidance from humans. This Dagstuhl Seminar examines human-centered approaches for provenance in automated data science. While prior research concerning provenance and machine learning exists, it does not address the expanded scope of automated approaches and the consequences of applying such techniques at scale to the population of domain experts. In addition, most of the previous works focus on the automated part of this process, leaving a gap on the support for the sensemaking tasks users need to perform, such as selecting the datasets and candidate models and identifying potential causes for poor performance. The seminar brought together experts from across provenance, information visualization, visual analytics, machine learning, and human-computer interaction to articulate the user challenges posed by AutoML and automated data science, discuss the current state of the art, and propose directions for new research. More specifically, this seminar: - articulates the state of the art in AutoML and automated data science for supporting the provenance of decision making, - describes the challenges that data scientists and domain experts face when interfacing with automated approaches to make sense of an automated decision, - examines the interface between data-centric, model-centric, and user-centric models of provenance and how they interact with automated techniques, and - encourages exploration of human-centered approaches; for example leveraging visualization.

See all

Sutcliffe, Geoff, Christian Suttner, Lars Kotthoff, C. Raymond Perrault, and Zain Khalid. “An Empirical Assessment of Progress in Automated Theorem Proving.” In Automated Reasoning, , edited by Christoph Benzmüller, Marijn J.H. Heule, and Renate A. Schmidt, 53–74. Cham: Springer Nature Switzerland, 2024. preprint PDF bibTeX abstract
The TPTP World is a well established infrastructure that supports research, development, and deployment of Automated Theorem Proving (ATP) systems. This work uses data in the TPTP World to assess progress in ATP from 2015 to 2023.

Bistarelli, Stefano, Lars Kotthoff, Jean-Marie Lagniez, Emmanuel Lonca, Jean-Guy Mailly, Julien Rossit, Francesco Santini, and Carlo Taticchi. “The Third and Fourth International Competitions on Computational Models of Argumentation: Design, Results and Analysis.” Argument & Computation 0, no. 0 (2024): AAC–230013. https://doi.org/10.3233/AAC-230013. preprint PDF bibTeX abstract
The International Competition on Computational Models of Argumentation (ICCMA) focuses on reasoning tasks in abstract argumentation frameworks. Submitted solvers are tested on a selected collection of benchmark instances, including artificially generated argumentation frameworks and some frameworks formalizing real-world problems. This paper presents the novelties introduced in the organization of the Third (2019) and Fourth (2021) editions of the competition. In particular, we proposed new tracks to competitors, one dedicated to dynamic solvers (i.e., solvers that incrementally compute solutions of frameworks obtained by incrementally modifying original ones) in ICCMA’19 and one dedicated to approximate algorithms in ICCMA’21. From the analysis of the results, we noticed that i) dynamic recomputation of solutions leads to significant performance improvements, ii) approximation provides much faster results with satisfactory accuracy, and iii) classical solvers improved with respect to previous editions, thus revealing advancement in state of the art.

Kotthoff, Lars, Raphael Sonabend, Natalie Foss, and Bernd Bischl. “Introduction and Overview.” In Applied Machine Learning Using mlr3 in R, , edited by Bernd Bischl, Raphael Sonabend, Lars Kotthoff, and Michel Lang. CRC Press, 2024. bibTeX

Foss, Natalie, and Lars Kotthoff. “Data and Basic Modeling.” In Applied Machine Learning Using mlr3 in R, , edited by Bernd Bischl, Raphael Sonabend, Lars Kotthoff, and Michel Lang. CRC Press, 2024. bibTeX

2023

Kotthoff, Lars. “Towards Machine-Generated Algorithms.” In AAAI 2023 Bridge Constraint Programming and Machine Learning, 2023. bibTeX

Wahab, Hud, Lars Kotthoff, and Patrick Johnson. “Optimization of Laser-Induced Graphene Manufacturing.” In AAAI 2023 Bridge AI for Materials Science, 2023. bibTeX

Shoaib, Mirza, Neelesh Sharma, Lars Kotthoff, Marius Lindauer, and Surya Kant. “AutoML: Advanced Tool for Mining Multivariate Plant Traits.” Trends in Plant Science, 2023. https://doi.org/https://doi.org/10.1016/j.tplants.2023.09.008. preprint PDF bibTeX

Iqbal, Md Shahriar, Jianhai Su, Lars Kotthoff, and Pooyan Jamshidi. “FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural Networks.” Journal of Artificial Intelligence Research 77 (June 2023): 645–82. preprint PDF bibTeX abstract
The design of machine learning systems often requires trading off different objectives, for example, prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well in all objectives; therefore, finding Pareto-optimal designs is of interest. The search for Pareto-optimal designs involves evaluating designs in an iterative process, and the measurements are used to evaluate an acquisition function that guides the search process. However, measuring different objectives incurs different costs. For example, the cost of measuring the prediction error of DNNs is orders of magnitude higher than that of measuring the energy consumption of a pre-trained DNN as it requires re-training the DNN. Current state-of-the-art methods do not consider this difference in objective evaluation cost, potentially incurring expensive evaluations of objective functions in the optimization process. In this paper, we develop a novel decoupled and cost-aware multi-objective optimization algorithm, which we call Flexible Multi-Objective Bayesian Optimization (FlexiBO) to address this issue. For evaluating each design, FlexiBO selects the objective with higher relative gain by weighting the improvement of the hypervolume of the Pareto region with the measurement cost of each objective. This strategy, therefore, balances the expense of collecting new information with the knowledge gained through objective evaluations, preventing FlexiBO from performing expensive measurements for little to no gain. We evaluate FlexiBO on seven state-of-the-art DNNs for image recognition, natural language processing (NLP), and speech-to-text translation. Our results indicate that, given the same total experimental budget, FlexiBO discovers designs with 4.8\% to 12.4\% lower hypervolume error than the best method in state-of-the-art multi-objective optimization.

Kashgarani, Haniye, and Lars Kotthoff. “Automatic Parallel Portfolio Selection.” In 26th European Conference on Artificial Intelligence, 372:1215–22. Frontiers in Artificial Intelligence and Applications. IOS Press, 2023. preprint PDF bibTeX abstract
Algorithms to solve hard combinatorial problems often exhibit complementary performance, i.e. where one algorithm fails, another shines. Algorithm portfolios and algorithm selection take advantage of this by running all algorithms in parallel or choosing the best one to run on a problem instance. In this paper, we show that neither of these approaches gives the best possible performance and propose the happy medium of running a subset of all algorithms in parallel. We propose a method to choose this subset automatically for each problem instance, and demonstrate empirical improvements of up to 19\% in terms of runtime, 81\% in terms of misclassification penalty, and 26\% in terms of penalized averaged runtime on scenarios from the ASlib benchmark library. Unlike all other algorithm selection and scheduling approaches in the literature, our performance measures are based on the actual performance for algorithms running in parallel rather than assuming overhead-free parallelization based on sequential performance. Our approach is easy to apply in practice and does not require to solve hard problems to obtain a schedule, unlike other techniques in the literature, while still delivering superior performance.

Wahab, Hud, Jacob Heil, Alexander Scott Tyrrell, Todd Muller, John Ackerman, Lars Kotthoff, and Patrick A. Johnson. “Optimization of Structural and Electrical Properties of Graphene-Based TiO2 Thin Film Device Using Bayesian Machine-Learning Approach.” Ceramics International, 2023. https://doi.org/https://doi.org/10.1016/j.ceramint.2023.12.225. preprint PDF bibTeX abstract
The optimization of laser-induced graphene (LIG) patterning is crucial for achieving desirable electronic properties in flexible electronic devices. In this study, we applied state-of-the-art automated parameter optimization techniques, specifically Bayesian optimization, to enhance the electrical resistance of LIG patterns. By iteratively optimizing the laser power, irradiation time, pressure, and gas type, we achieved minimum LIG resistance within eight batch configurations suggested by the Bayesian optimization (BO) approach. Our method eliminates the reliance on skilled operators, as the initial surrogate models were trained with random parameter evaluations. Notably, our system enables the optimization of material properties even when characterizations are only available outside the experiment loop. Furthermore, the surrogate model provided insights into the underlying mechanisms of LIG growth on quartz substrates. Through partial dependence analysis, we identified relevant physical domains for further investigation, leading to the discovery of negative capacitance and a correlation between structural and electrical properties in LIG. These findings were supported by XPS, Raman, and optical characterizations. Our approach streamlines the experimental design, reducing time and cost while accelerating materials research, and offers human-interpretable conclusions for a deeper understanding of LIG patterning processes.

Nourelahi, Mehdi, Lars Kotthoff, Peijie Chen, and Anh Nguyen. “How Explainable Are Adversarially-Robust CNNs?,” 2023. preprint PDF bibTeX abstract
Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs have a higher explainability score on gradient-based attribution methods (but not CAM-based or perturbation-based methods). Second, AdvProp models, despite being highly accurate more than both vanilla and robust models alone, are not superior in explainability. Third, among 9 feature attribution methods tested, GradCAM and RISE are consistently the best methods. Fourth, Insertion and Deletion are biased towards vanilla and robust models respectively, due to their strong correlation with the confidence score distributions of a CNN. Fifth, we did not find a single CNN to be the best in all three criteria, which interestingly suggests that CNNs are harder to interpret as they become more accurate.

2022

Bistarelli, Stefano, Lars Kotthoff, Francesco Santini, and Carlo Taticchi. “Summary Report for the Third International Competition on Computational Models of Argumentation.” AI Magazine 42, no. 3 (2022): 70–73. preprint PDF bibTeX abstract
The Third International Competition on Computational Models of Argumentation (ICCMA’19) focused on reasoning tasks in abstract argumentation frameworks. Submitted solvers were tested on a selected collection of benchmark instances, including artificially generated argumentation frameworks and some frameworks formalizing real-world problems. This competition introduced two main novelties over the two previous editions: the first one is the use of the Docker platform for packaging the participating solvers into virtual “light” containers; the second novelty consists of a new track for dynamic frameworks.

Pulatov, Damir, Marie Anastacio, Lars Kotthoff, and Holger H. Hoos. “Opening the Black Box: Automated Software Analysis for Algorithm Selection.” In INFORMS Computing Society Conference, 2022. bibTeX abstract
Impressive performance improvements have been achieved in many areas of AI by meta-algorithmic techniques, such as automated algorithm selection and configuration. However, existing techniques treat the target algorithms they are applied to as black boxes -- nothing is known about their inner workings. This allows metaalgorithmic techniques to be used broadly, but leaves untapped potential performance improvements enabled by information gained from a deeper analysis of the target algorithms. In this paper, we open the black box without sacrificing universal applicability of meta-algorithmic techniques by automatically analyzing algorithms. We show how to use this information to perform algorithm selection, and demonstrate improved performance compared to previous approaches that treat algorithms as black boxes.

Pulatov, Damir, Marie Anastacio, Lars Kotthoff, and Holger Hoos. “Opening the Black Box: Automated Software Analysis for Algorithm Selection.” In First Conference on Automated Machine Learning (Main Track), 2022. preprint PDF bibTeX abstract
Impressive performance improvements have been achieved in many areas of AI by metaalgorithmic techniques, such as automated algorithm selection and configuration. However, existing techniques treat the target algorithms they are applied to as black boxes – nothing is known about their inner workings. This allows meta-algorithmic techniques to be used broadly, but leaves untapped potential performance improvements enabled by information gained from a deeper analysis of the target algorithms. In this paper, we open the black box without sacrificing universal applicability of meta-algorithmic techniques by automatically analyzing algorithms. We show how to use this information to perform algorithm selection, and demonstrate improved performance compared to previous approaches that treat algorithms as black boxes.

Iqbal, Md Shahriar, Jianhai Su, Lars Kotthoff, and Pooyan Jamshidi. “Getting the Best Bang For Your Buck: Choosing What to Evaluate for Faster Bayesian Optimization.” In First Conference on Automated Machine Learning (Late-Breaking Workshop Track), 2022. preprint PDF bibTeX abstract
Machine learning system design frequently necessitates balancing multiple objectives, such as prediction error and energy consumption, for deep neural networks (DNNs). Typically, no single design performs well across all objectives; thus, finding Pareto-optimal designs is of interest. Measuring different objectives frequently incurs different costs; for example, measuring the prediction error of DNNs is significantly more expensive than measuring the energy consumption of a pre-trained DNN because it requires re-training the DNN. Current state-of-the-art methods do not account for this difference in objective evaluation cost, potentially wasting costly evaluations of objective functions for little information gain. To address this issue, we propose a novel cost-aware decoupled approach that weights the improvement of the hypervolume of the Pareto region by the measurement cost of each objective. To evaluate our approach, we perform experiments on several machine learning systems deployed on energy constraints environments.

Kotthoff, Lars, Sourin Dey, Jake Heil, Vivek Jain, Todd Muller, Alexander Tyrrell, Hud Wahab, and Patrick Johnson. “Optimizing Laser-Induced Graphene Production.” In 11th Conference on Prestigious Applications of Artificial Intelligence, 31–44, 2022. preprint PDF bibTeX abstract
A lot of technological advances depend on next-generation materials, such as graphene, which enables better electronics, to name but one example. Manufacturing such materials is often difficult, in particular, producing graphene at scale is an open problem. We apply state-of-the-art machine learning to optimize the production of laser-induced graphene, an established manufacturing method that has shown great promise. We demonstrate improvements over previous results in terms of the quality of the produced graphene from a variety of different precursor materials. We use Bayesian model-based optimization to quickly improve outcomes based on little initial data and show the robustness of our approach to different experimental conditions, tackling a small-data problem in contrast to the more common big-data applications of machine learning. We analyze the learned surrogate models with respect to the quality of their predictions and learned relationships that may be of interest to domain experts and improve our understanding of the processes governing laser-induced graphene production.

Moosbauer, Julia, Martin Binder, Lennart Schneider, Florian Pfisterer, Marc Becker, Michel Lang, Lars Kotthoff, and Bernd Bischl. “Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers.” IEEE Transactions on Evolutionary Computation Special Issue on Benchmarking Sampling-Based Optimization Heuristics: Methodology and Software (BENCH) 26, no. 6 (December 2022): 1336–50. preprint PDF bibTeX abstract
Automated hyperparameter optimization (HPO) has gained great popularity and is an important component of most automated machine learning frameworks. However, the process of designing HPO algorithms is still an unsystematic and manual process: New algorithms are often built on top of prior work, where limitations are identified and improvements are proposed. Even though this approach is guided by expert knowledge, it is still somewhat arbitrary. The process rarely allows for gaining a holistic understanding of which algorithmic components drive performance and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multi-fidelity HPO (MF-HPO). First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to, common existing HPO algorithms and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration (in some ways, simpler than established methods) performs very well as long as some critical configuration parameters are set to the right value.

2021

Kashgarani, Haniye, and Lars Kotthoff. “Is Algorithm Selection Worth It? Comparing Selecting Single Algorithms and Parallel Execution.” In AAAI Workshop on Meta-Learning and MetaDL Challenge, 140:58–64. Proceedings of Machine Learning Research. PMLR, 2021. preprint PDF bibTeX abstract
For many practical problems, there is more than one algorithm or approach to solve them. Such algorithms often have complementary performance – where one fails, another performs well, and vice versa. Per-instance algorithm selection leverages this by employing portfolios of complementary algorithms to solve sets of difficult problems, choosing the most appropriate algorithm for each problem instance. However, this requires complex models to effect this selection and introduces overhead to compute the data needed for those models. On the other hand, even basic hardware is more than capable of running several algorithms in parallel. We investigate the tradeoff between selecting a single algorithm and running multiple in parallel and incurring a slowdown because of contention for shared resources. Our results indicate that algorithm selection is worth it, especially for large portfolios.

Binder, Martin, Florian Pfisterer, Michel Lang, Lennart Schneider, Lars Kotthoff, and Bernd Bischl. “mlr3pipelines - Flexible Machine Learning Pipelines in R.” Journal of Machine Learning Research 22, no. 184 (2021): 1–7. preprint PDF bibTeX abstract
Recent years have seen a proliferation of ML frameworks. Such systems make ML accessible to non-experts, especially when combined with powerful parameter tuning and AutoML techniques. Modern, applied ML extends beyond direct learning on clean data, however, and needs an expressive language for the construction of complex ML workflows beyond simple pre- and post-processing. We present mlr3pipelines, an R framework which can be used to define linear and complex non-linear ML workflows as directed acyclic graphs. The framework is part of the mlr3 ecosystem, leveraging convenient resampling, benchmarking, and tuning components.

Kotthoff, Lars, Sourin Dey, Vivek Jain, Alexander Tyrrell, Hud Wahab, and Patrick Johnson. “Modeling and Optimizing Laser-Induced Graphene,” 2021. preprint PDF bibTeX abstract
A lot of technological advances depend on next-generation materials, such as graphene, which enables a raft of new applications, for example better electronics. Manufacturing such materials is often difficult; in particular, producing graphene at scale is an open problem. We provide a series of datasets that describe the optimization of the production of laser-induced graphene, an established manufacturing method that has shown great promise. We pose three challenges based on the datasets we provide -- modeling the behavior of laser-induced graphene production with respect to parameters of the production process, transferring models and knowledge between different precursor materials, and optimizing the outcome of the transformation over the space of possible production parameters. We present illustrative results, along with the code used to generate them, as a starting point for interested users. The data we provide represents an important real-world application of machine learning; to the best of our knowledge, no similar datasets are available.

Kotthoff, Lars, Hud Wahab, and Patrick Johnson. “Bayesian Optimization in Materials Science: A Survey,” 2021. preprint PDF bibTeX abstract
Bayesian optimization is used in many areas of AI for the optimization of black-box processes and has achieved impressive improvements of the state of the art for a lot of applications. It intelligently explores large and complex design spaces while minimizing the number of evaluations of the expensive underlying process to be optimized. Materials science considers the problem of optimizing materials' properties given a large design space that defines how to synthesize or process them, with evaluations requiring expensive experiments or simulations -- a very similar setting. While Bayesian optimization is also a popular approach to tackle such problems, there is almost no overlap between the two communities that are investigating the same concepts. We present a survey of Bayesian optimization approaches in materials science to increase cross-fertilization and avoid duplication of work. We highlight common challenges and opportunities for joint research efforts.

2020

Wahab, Hud, Vivek Jain, Alexander Scott Tyrrell, Michael Alan Seas, Lars Kotthoff, and Patrick Alfred Johnson. “Machine-Learning-Assisted Fabrication: Bayesian Optimization of Laser-Induced Graphene Patterning Using in-Situ Raman Analysis.” Carbon 167 (2020): 609–19. https://doi.org/https://doi.org/10.1016/j.carbon.2020.05.087. preprint PDF bibTeX abstract
The control of the physical, chemical, and electronic properties of laser-induced graphene (LIG) is crucial in the fabrication of flexible electronic devices. However, the optimization of LIG production is time-consuming and costly. Here, we demonstrate state-of-the-art automated parameter tuning techniques using Bayesian optimization to advance rapid single-step laser patterning and structuring capabilities with a view to fabricate graphene-based electronic devices. In particular, a large search space of parameters for LIG explored efficiently. As a result, high-quality LIG patterns exhibiting high Raman G/D ratios at least a factor of four larger than those found in the literature were achieved within 50 optimization iterations in which the laser power, irradiation time, pressure and type of gas were optimized. Human-interpretable conclusions may be derived from our machine learning model to aid our understanding of the underlying mechanism for substrate-dependent LIG growth, e.g. high-quality graphene patterns are obtained at low and high gas pressures for quartz and polyimide, respectively. Our Bayesian optimization search method allows for an efficient experimental design that is independent of the experience and skills of individual researchers, while reducing experimental time and cost and accelerating materials research.

Bistarelli, Stefano, Lars Kotthoff, Francesco Santini, and Carlo Taticchi. “A First Overview of ICCMA’19.” In Workshop on Advances In Argumentation In Artificial Intelligence 2020 Co-Located with the 19th International Conference of the Italian Association for Artificial Intelligence (AIxIA 2020), Online, November 25-26, 2020, , edited by Bettina Fazzinga, Filippo Furfaro, and Francesco Parisi, 2777:90–102. CEUR Workshop Proceedings. CEUR-WS.org, 2020. preprint PDF bibTeX

Pulatov, Damir, and Lars Kotthoff. “Opening the Black Box: Automatically Characterizing Software for Algorithm Selection.” In AAAI Student Abstracts, 2020. bibTeX

2019

Hutter, Frank, Lars Kotthoff, and Joaquin Vanschoren, eds. Automated Machine Learning: Methods, Systems, Challenges. 1st ed. The Springer Series on Challenges in Machine Learning. Springer, Cham, 2019. preprint PDF bibTeX abstract
This open access book presents the first comprehensive overview of general methods in Automated Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first series of international challenges of AutoML systems. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. However, many of the recent machine learning successes crucially rely on human experts, who manually select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters. To overcome this problem, the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself. This book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work.

Schwarz, Hannes, Lars Kotthoff, Holger Hoos, Wolf Fichtner, and Valentin Bertsch. “Improving the Computational Efficiency of Stochastic Programs Using Automated Algorithm Configuration: an Application to Decentralized Energy Systems.” Annals of Operations Research, January 2019. https://doi.org/10.1007/s10479-018-3122-6. preprint PDF bibTeX abstract
The optimization of decentralized energy systems is an important practical problem that can be modeled using stochastic programs and solved via their large-scale, deterministic-equivalent formulations. Unfortunately, using this approach, even when leveraging a high degree of parallelism on large high-performance computing systems, finding close-to-optimal solutions still requires substantial computational effort. In this work, we present a procedure to reduce this computational effort substantially, using a state-of-the-art automated algorithm configuration method. We apply this procedure to a well-known example of a residential quarter with photovoltaic systems and storage units, modeled as a two-stage stochastic mixed-integer linear program. We demonstrate that the computing time and costs can be substantially reduced by up to 50\% by use of our procedure. Our methodology can be applied to other, similarly-modeled energy systems.

Lindauer, Marius, Jan N. van Rijn, and Lars Kotthoff. “The Algorithm Selection Competitions 2015 and 2017.” Artificial Intelligence 272 (2019): 86–100. https://doi.org/https://doi.org/10.1016/j.artint.2018.10.004. preprint PDF bibTeX abstract
The algorithm selection problem is to choose the most suitable algorithm for solving a given problem instance. It leverages the complementarity between different approaches that is present in many areas of AI. We report on the state of the art in algorithm selection, as defined by the Algorithm Selection competitions in 2015 and 2017. The results of these competitions show how the state of the art improved over the years. We show that although performance in some cases is very good, there is still room for improvement in other cases. Finally, we provide insights into why some scenarios are hard, and pose challenges to the community on how to advance the current state of the art.

Iqbal, Md Shahriar, Lars Kotthoff, and Pooyan Jamshidi. “Transfer Learning for Performance Modeling of Deep Neural Network Systems.” In USENIX Conference on Operational Machine Learning. Santa Clara, CA: USENIX Association, 2019. preprint PDF bibTeX abstract
Modern deep neural network (DNN) systems are highly configurable with large a number of options that significantly affect their non-functional behavior, for example inference time and energy consumption. Performance models allow to understand and predict the effects of such configuration options on system behavior, but are costly to build because of large configuration spaces. Performance models from one environment cannot be transferred directly to another; usually models are rebuilt from scratch for different environments, for example different hardware. Recently, transfer learning methods have been applied to reuse knowledge from performance models trained in one environment in another. In this paper, we perform an empirical study to understand the effectiveness of different transfer learning strategies for building performance models of DNN systems. Our results show that transferring information on the most influential configuration options and their interactions is an effective way of reducing the cost to build performance models in new environments.

Beel, Joeran, and Lars Kotthoff. “Proposal for the 1st Interdisciplinary Workshop on Algorithm Selection and Meta-Learning in Information Retrieval (AMIR).” In Advances in Information Retrieval, , edited by Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoerd Hiemstra, 383–88. Cham: Springer International Publishing, 2019. preprint PDF bibTeX abstract
The algorithm selection problem describes the challenge of identifying the best algorithm for a given problem space. In many domains, particularly artificial intelligence, the algorithm selection problem is well studied, and various approaches and tools exist to tackle it in practice. Especially through meta-learning impressive performance improvements have been achieved. The information retrieval (IR) community, however, has paid little attention to the algorithm selection problem, although the problem is highly relevant in information retrieval. This workshop will bring together researchers from the fields of algorithm selection and meta-learning as well as information retrieval. We aim to raise the awareness in the IR community of the algorithm selection problem; identify the potential for automatic algorithm selection in information retrieval; and explore possible solutions for this context. In particular, we will explore to what extent existing solutions to the algorithm selection problem from other domains can be applied in information retrieval, and also how techniques from IR can be used for automated algorithm selection and meta-learning.

Kotthoff, Lars, Vivek Jain, Alexander Tyrrell, Hud Wahab, and Patrick Johnson. “AI for Materials Science: Tuning Laser-Induced Graphene Production.” In Data Science Meets Optimisation Workshop at IJCAI 2019, 2019. preprint PDF bibTeX abstract
AI has advanced the state of the art in many application domains, including ones not ordinarily associated with computer science. We present an application of automated parameter tuning to materials science, in particular, we use surrogate models for automated parameter tuning to optimize the fabrication of laser-induced graphene. This process allows to create microscopic conductive lines in thin layers of insulating material, enabling the development of next-generation nano-circuits. Optimizing the parameters that control the laser irradiation process is crucial to creating high-quality graphene that is suitable for this purpose. Through the application of state-of-the-art parameter tuning techniques, we are able to achieve improvements of up to a factor of two compared to existing approaches in the literature and to what human experts are able to achieve. Our results are reproducible across different experimental specimen and the deployed application can be used by domain scientists without a background in AI or machine learning.

Lang, Michel, Martin Binder, Jakob Richter, Patrick Schratz, Florian Pfisterer, Stefan Coors, Quay Au, Giuseppe Casalicchio, Lars Kotthoff, and Bernd Bischl. “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software 4, no. 44 (2019). preprint PDF bibTeX abstract
The R (R Core Team, 2019) package mlr3 and its associated ecosystem of extension packages implements a powerful, object-oriented and extensible framework for machine learning (ML) in R. It provides a unified interface to many learning algorithms available on CRAN, augmenting them with model-agnostic general-purpose functionality that is needed in every ML project, for example train-test-evaluation, resampling, preprocessing, hyperparameter tuning, nested resampling, and visualization of results from ML experiments. The package is a complete reimplementation of the mlr (Bischl et al., 2016) package that leverages many years of experience and learned best practices to provide a state-of-the-art system that is powerful, flexible, extensible, and maintainable. We target both practitioners who want to quickly apply ML algorithms to their problems and researchers who want to implement, benchmark, and compare their new methods in a structured environment. mlr3 is suitable for short scripts that test an idea, for complex multi-stage experiments with advanced functionality that use a broad range of ML functionality, as a foundation to implement new ML (meta-)algorithms (for example AutoML systems), and everything in between. Functional correctness is ensured through extensive unit and integration tests.

Wahab, Hud, Alexander Tyrrell, Vivek Jain, Lars Kotthoff, and Patrick Johnson. “Model-Based Optimization of Laser-Reduced Graphene Using in-Situ Raman Analysis.” In Materials Research Society Fall Symposium, 2019. preprint PDF bibTeX

Hankins, Sarah, Lars Kotthoff, and Ray S. Fertig. “Bio-like Composite Microstructure Designs for Enhanced Damage Tolerance via Machine Learning.” In American Society for Composites 34th Annual Technical Conference, 2019. preprint PDF bibTeX abstract
Variations of unique and tailored composite microstructures have been observed in nature and have served as templates for the development of new synthetic materials. Microstructures are studied in fish scales for their penetration resistance, in spider webs for energy absorption, and in seashells and bone for their strength and toughness. However, it has proven difficult to reproduce the properties found in natural materials, due to the interaction between the intricate structures at different length scales. Rather than attempting to replicate these materials (biomimetics), the focus of this work is to use a bio-inspired pattern generation algorithm to search for new topologies that outperform traditional composite structures due to their naturelike design. The bio-inspired pattern generation algorithm employed in this research is known as the Gray-Scott model. This model was selected due to its unique ability to manufacture patterns that propagate with time, allowing the reinforcement volume fraction of the composite structure to be controlled. The model is capable of producing Turing patterns, propagating wave fronts, homogeneous oscillations, and chaos. Traditionally, Turing models have been primarily studied for their applications in morphogenesis and pattern development. However, this research extends the application of the Gray-Scott model by investigating the patterns as physical load bearing structures. A methodology was developed by which the patterns can be converted to structures, analyzed for a desired mechanical property, and optimized via Bayesian machine learning algorithms that yield an improvement of the average quality of structures produced by almost a factor of 10.

Pulatov, Damir, and Lars Kotthoff. “Utilizing Software Features for Algorithm Selection.” In COSEAL Workshop 2019 (Poster Presentation), 2019. bibTeX

Chawla, Katherine, and Lars Kotthoff. “Interactive Visualizations for ASlib.net.” In COSEAL Workshop 2019 (Poster Presentation), 2019. bibTeX

Kotthoff, Lars, Vivek Jain, Alexander Tyrrell, Hud Wahab, and Patrick Johnson. “AI for Materials Science: Tuning Laser-Induced Graphene Production.” In COSEAL Workshop 2019 (Spotlight Talk and Poster Presentation), 2019. bibTeX

Kotthoff, Lars, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. “Auto-WEKA: Automatic Model Selection and Hyperparameter Optimization in WEKA.” In Automated Machine Learning: Methods, Systems, Challenges, , edited by Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, 81–95. Cham: Springer International Publishing, 2019. https://doi.org/10.1007/978-3-030-05318-5_4. bibTeX abstract
Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider feature selection techniques and all machine learning approaches implemented in WEKA's standard distribution, spanning 2 ensemble methods, 10 meta-methods, 28 base learners, and hyperparameter settings for each learner. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show performance often much better than using standard selection and hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.

2018

Kotthoff, Lars, Alexandre Fréchette, Tomasz P. Michalak, Talal Rahwan, Holger H. Hoos, and Kevin Leyton-Brown. “Quantifying Algorithmic Improvements over Time.” In 27th International Joint Conference on Artificial Intelligence (IJCAI) Special Track on the Evolution of the Contours of AI, 2018. preprint PDF bibTeX abstract
Assessing the progress made in AI and contribu- tions to the state of the art is of major concern to the community. Recently, Frechette et al. [2016] advocated performing such analysis via the Shapley value, a concept from coalitional game theory. In this paper, we argue that while this general idea is sound, it unfairly penalizes older algorithms that advanced the state of the art when introduced, but were then outperformed by modern counterparts. Driven by this observation, we introduce the tem- poral Shapley value, a measure that addresses this problem while maintaining the desirable properties of the (classical) Shapley value. We use the tempo- ral Shapley value to analyze the progress made in (i) the different versions of the Quicksort algorithm; (ii) the annual SAT competitions 2007–2014; (iii) an annual competition of Constraint Programming, namely the MiniZinc challenge 2014–2016. Our analysis reveals novel insights into the development made in these important areas of research over time.

Degroote, Hans, Patrick De Causmaecker, Bernd Bischl, and Lars Kotthoff. “A Regression-Based Methodology for Online Algorithm Selection.” In 11th International Symposium on Combinatorial Search (SoCS), 37–45, 2018. preprint PDF bibTeX abstract
Algorithm selection approaches have achieved impressive performance improvements in many areas of AI. Most of the literature considers the offline algorithm selection problem, where the initial selection model is never updated after training. However, new data from running algorithms on instances becomes available when algorithms are selected and run. We investigate how this online data can be used to improve the selection model over time. This is especially relevant when insufficient training instances were used, but potentially improves the performance of algorithm selection in all cases. We formally define the online algorithm selection problem and model it as a contextual multi-armed bandit problem, propose a methodology for solving it, and empirically demonstrate performance improvements. We also show that our online algorithm selection method can be used when no training data whatsoever is available, a setting where offline algorithm selection cannot be used. Our experiments indicate that a simple greedy approach achieves the best performance.

Bhuiyan, Faisal H., Lars Kotthoff, and Ray S. Fertig. “A Machine Learning Technique to Predict Static Multi-Axial Failure Envelope of Laminated Composites.” In American Society for Composites 33rd Annual Technical Conference, 2018. preprint PDF bibTeX abstract
A machine learning technique was used to predict static, failure envelopes of unidirectional composite laminas under combined normal (longitudinal or transverse) and shear loading at different biaxial ratios. An artificial neural network was chosen for this purpose due to their superior computational efficiency and ability to handle nonlinear relationships between inputs and outputs. Training and test data for the neural network were taken from the experimental composite failure data for glass- and carbon-fiber reinforced epoxies provided by the world-wide failure exercise (WWFE) program. A quadratic, stress interactive Tsai-Wu failure theory was calibrated based on the reported strength values, as well as optimized from the experimental failure data points. The prediction made by the neural network was compared against the Tsai-Wu failure criterion predictions and it was observed that the trained neural network provides a better representation of the experimental data.

Bistarelli, Stefano, Lars Kotthoff, Francesco Santini, and Carlo Taticchi. “Containerisation and Dynamic Frameworks in ICCMA’19.” In Second International Workshop on Systems and Algorithms for Formal Argumentation (SAFA 2018) Co-Located with the 7th International Conference on Computational Models of Argument (COMMA 2018), 2171:4–9. CEUR Workshop Proceedings. CEUR-WS.org, 2018. preprint PDF bibTeX abstract
The International Competition on Computational Models of Argumentation (ICCMA) is a successful event dedicated to advancing the state of the art of solvers in Abstract Argumentation. We describe two proposals that will further improve the third and next edition of the competition, i.e. ICCMA 2019. The first novelty concerns the packaging of each solver-application participating in the competition in a virtual “light” container (using Docker): this allows for easy deploy- ment and to (re)running all of the submissions on different architectures (Linux, Windows, macOS, and also in the cloud). The second proposal consists of a new track focused on solvers processing dynamic frameworks, i.e., solvers described in terms of changes w.r.t. previous ones: a solver can reuse the solution obtained previously to be faster on the same framework modulo a new argument/attack.

Bessière, Christian, Luc de Raedt, Tias Guns, Lars Kotthoff, Mirco Nanni, Siegfried Nijssen, Barry O’Sullivan, Anastasia Paparrizou, Dino Pedreschi, and Helmut Simonis. “The Inductive Constraint Programming Loop.” IEEE Intelligent Systems, 2018. https://doi.org/10.1109/MIS.2017.265115706. preprint PDF bibTeX abstract
Constraint programming is used for a variety of real-world optimization problems, such as planning, scheduling and resource allocation problems. At the same time, one continuously gathers vast amounts of data about these problems. Current constraint programming software does not exploit such data to update schedules, resources and plans. We propose a new framework, which we call the inductive constraint programming loop. In this approach data is gathered and analyzed systematically in order to dynamically revise and adapt constraints and optimization criteria. Inductive Constraint Programming aims at bridging the gap between the areas of data mining and machine learning on the one hand, and constraint programming on the other.

Kerschke, Pascal, Lars Kotthoff, Jakob Bossek, Holger H. Hoos, and Heike Trautmann. “Leveraging TSP Solver Complementarity through Machine Learning.” Evolutionary Computation 26, no. 4 (2018): 597–620. https://doi.org/10.1162/evco_a_00215. preprint PDF bibTeX abstract
The Travelling Salesperson Problem (TSP) is one of the best-studied NP-hard problems. Over the years, many different solution approaches and solvers have been developed. For the first time, we directly compare five state-of-the-art inexact solvers—namely, LKH, EAX, restart variants of those, and MAOS—on a large set of well-known benchmark instances and demonstrate complementary performance, in that different instances may be solved most effectively by different algorithms. We leverage this complementarity to build an algorithm selector, which selects the best TSP solver on a per-instance basis and thus achieves significantly improved performance compared to the single best solver, representing an advance in the state of the art in solving the Euclidean TSP. Our in-depth analysis of the selectors provides insight into what drives this performance improvement.

2017

Kotthoff, Lars, Barry Hurley, and Barry O’Sullivan. “The ICON Challenge on Algorithm Selection.” AI Magazine 38, no. 2 (2017): 91–93. preprint PDF bibTeX

Kotthoff, Lars, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. “Auto-WEKA 2.0: Automatic Model Selection and Hyperparameter Optimization in WEKA.” Journal of Machine Learning Research 18, no. 25 (2017): 1–5. preprint PDF bibTeX abstract
WEKA is a widely used, open-source machine learning platform. Due to its intuitive interface, it is particularly popular with novice users. However, such users often find it hard to identify the best approach for their particular dataset among the many available. We describe the new version of Auto-WEKA, a system designed to help such users by automatically searching through the joint space of WEKA's learning algorithms and their respective hyperparameter settings to maximize performance, using a state-of-the-art Bayesian optimization method. Our new package is tightly integrated with WEKA, making it just as accessible to end users as any other learning algorithm.

Lindauer, Marius, Jan N. van Rijn, and Lars Kotthoff, eds. Proceedings of the Open Algorithm Selection Challenge. Vol. 79. Proceedings of Machine Learning Research. PMLR, 2017. bibTeX

———. “Open Algorithm Selection Challenge 2017: Setup and Scenarios.” In Proceedings of the Open Algorithm Selection Challenge, 79:1–7. Proceedings of Machine Learning Research. Brussels, Belgium: PMLR, 2017. preprint PDF bibTeX abstract
The 2017 algorithm selection challenge provided a snapshot of the state of the art in algorithm selection and garnered submissions from four teams. In this chapter, we describe the setup of the challenge and the algorithm scenarios that were used.

Fawcett, Chris, Lars Kotthoff, and Holger H. Hoos. “Hot-Rodding the Browser Engine: Automatic Configuration of JavaScript Compilers.” CoRR abs/1707.04245 (2017). preprint PDF bibTeX

2016

Fréchette, Alexandre, Lars Kotthoff, Talal Rahwan, Holger H. Hoos, Kevin Leyton-Brown, and Tomasz P. Michalak. “Using the Shapley Value to Analyze Algorithm Portfolios.” In 30th AAAI Conference on Artificial Intelligence, 3397–3403, 2016. preprint PDF bibTeX abstract
Algorithms for NP-complete problems often have different strengths and weaknesses, and thus algorithm portfolios often outperform individual algorithms. It is surprisingly difficult to quantify an component algorithm's contribution to such a portfolio. Reporting a component's standalone performance wrongly rewards near-clones while penalizing algorithms that have small but distinct areas of strength. Measuring a component's marginal contribution to an existing portfolio is better, but penalizes sets of strongly correlated algorithms, thereby obscuring situations in which it is essential to have at least one algorithm from such a set. This paper argues for analyzing component algorithm contributions via a measure drawn from coalitional game theory---the Shapley value---and yields insight into a research community's progress over time. We conclude with an application of the analysis we advocate to SAT competitions, yielding novel insights into the behaviour of algorithm portfolios, their components, and the state of SAT solving technology.

Bischl, Bernd, Pascal Kerschke, Lars Kotthoff, Marius Lindauer, Yuri Malitsky, Alexandre Fréchette, Holger H. Hoos, et al. “ASlib: A Benchmark Library for Algorithm Selection.” Artificial Intelligence Journal 237 (2016): 41–58. preprint PDF bibTeX abstract
The task of algorithm selection involves choosing an algorithm from a set of algorithms on a per-instance basis in order to exploit the varying performance of algorithms over a set of instances. The algorithm selection problem is attracting increasing attention from researchers and practitioners in AI. Years of fruitful applications in a number of domains have resulted in a large amount of data, but the community lacks a standard format or repository for this data. This situation makes it difficult to share and compare different approaches effectively, as is done in other, more established fields. It also unnecessarily hinders new researchers who want to work in this area. To address this problem, we introduce a standardized format for representing algorithm selection scenarios and a repository that contains a growing number of data sets from the literature. Our format has been designed to be able to express a wide variety of different scenarios. To demonstrate the breadth and power of our platform, we describe a study that builds and evaluates algorithm selection models through a common interface. The results display the potential of algorithm selection to achieve significant performance improvements across a broad range of problems and algorithms.

Kotthoff, Lars, Ciaran McCreesh, and Christine Solnon. “Portfolios of Subgraph Isomorphism Algorithms.” In LION 10, 2016. preprint PDF bibTeX abstract
Subgraph isomorphism is a computationally challenging problem with important practical applications, for example in computer vision, biochemistry, and model checking. There are a number of state-of-the-art algorithms for solving the problem, each of which has its own performance characteristics. As with many other hard problems, the single best choice of algorithm overall is rarely the best algorithm on an instance-by-instance. We develop an algorithm selection approach which leverages novel features to characterise subgraph isomorphism problems and dynamically decides which algorithm to use on a per-instance basis. We demonstrate substantial performance improvements on a large set of hard benchmark problems. In addition, we show how algorithm selection models can be leveraged to gain new insights into what affects the performance of an algorithm.

Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. “Mlr: Machine Learning in R.” Journal of Machine Learning Research 17, no. 170 (2016): 1–5. preprint PDF bibTeX abstract
The mlr package provides a generic, object- oriented, and extensible framework for classification, regression, survival analysis and clustering for the R language. It provides a unified interface to more than 160 basic learners and includes meta-algorithms and model selection techniques to improve and extend the functionality of basic learners with, e.g., hyperparameter tuning, feature selection, and ensemble construction. Parallel high-performance computing is natively supported. The package targets practitioners who want to quickly apply machine learning algorithms, as well as researchers who want to implement, benchmark, and compare their new methods in a structured environment.

Degroote, Hans, Bernd Bischl, Lars Kotthoff, and Patrick de Causmacker. “Reinforcement Learning for Automatic Online Algorithm Selection - an Empirical Study.” In ITAT, 1649:93–101. CEUR Workshop Proceedings, 2016. preprint PDF bibTeX abstract
In this paper a reinforcement learning methodology for automatic online algorithm selection is introduced and empirically tested. It is applicable to automatic algorithm selection methods that predict the performance of each available algorithm and then pick the best one. The experiments confirm the usefulness of the methodology: using online data results in better performance. As in many online learning settings an exploration vs. exploitation trade-off, synonymously learning vs. earning trade-off, is incurred. Empirically investigating the quality of classic solution strategies for handling this trade-off in the automatic online algorithm selection setting is the secondary goal of this paper. The automatic online algorithm selection problem can be modelled as a contextual multi-armed bandit problem. Two classic strategies for solving this problem are tested in the context of automatic online algorithm selection: epsilon-greedy and lower confidence bound. The experiments show that a simple purely exploitative greedy strategy outperforms strategies explicitly performing exploration.

Bessière, Christian, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, eds. Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach. 1st ed. Vol. 10101. Lecture Notes in Artificial Intelligence. Springer, 2016. preprint PDF bibTeX abstract
A successful integration of constraint programming and data mining has the potential to lead to a new ICT paradigm with far reaching implications. It could change the face of data mining and machine learning, as well as constraint programming technology. It would not only allow one to use data mining techniques in constraint programming to identify and update constraints and optimization criteria, but also to employ constraints and criteria in data mining and machine learning in order to discover models compatible with prior knowledge. This book reports on some key results obtained on this integrated and cross- disciplinary approach within the European FP7 FET Open project no. 284715 on “Inductive Constraint Programming” and a number of associated workshops and Dagstuhl seminars. The book is structured in five parts: background; learning to model; learning to solve; constraint programming for data mining; and showcases.

Kotthoff, Lars. “Algorithm Selection for Combinatorial Search Problems: A Survey.” In Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach, , edited by Christian Bessiere, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, 149–90. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-50137-6_7. bibTeX abstract
The Algorithm Selection Problem is concerned with selecting the best algorithm to solve a given problem on a case-by-case basis. It has become especially relevant in the last decade, as researchers are increasingly investigating how to identify the most suitable existing algorithm for solving a problem instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where Algorithm Selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine Algorithm Selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which Algorithm Selection has been approached. This chapter contrasts and compares different methods for solving the problem as well as ways of using these solutions.

Hurley, Barry, Lars Kotthoff, Barry O’Sullivan, and Helmut Simonis. “ICON Loop Health Show Case.” In Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach, , edited by Christian Bessiere, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, 325–33. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-50137-6_14. bibTeX abstract
In this document we describe the health show case for the ICON project. This corresponds to Task 6.2 in WP 6 of the Description of Work for the project. The description provides a high-level abstraction, detailed description of the interfaces between modules, and a description of sample data.

Nanni, Mirco, Lars Kotthoff, Riccardo Guidotti, Barry O’Sullivan, and Dino Pedreschi. “ICON Loop Carpooling Show Case.” In Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach, , edited by Christian Bessiere, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, 310–24. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-50137-6_13. bibTeX abstract
In this chapter we describe a proactive carpooling service that combines induction and optimization mechanisms to maximize the impact of carpooling within a community. The approach autonomously infers the mobility demand of the users through the analysis of their mobility traces (i.e. Data Mining of GPS trajectories) and builds the network of all possible ride sharing opportunities among the users. Then, the maximal set of carpooling matches that satisfy some standard requirements (maximal capacity of vehicles, etc.) is computed through Constraint Programming models, and the resulting matches are proactively proposed to the users. Finally, in order to maximize the expected impact of the service, the probability that each carpooling match is accepted by the users involved is inferred through Machine Learning mechanisms and put in the CP model. The whole process is reiterated at regular intervals, thus forming an instance of the general ICON loop.

Bessière, Christian, Luc De Raedt, Tias Guns, Lars Kotthoff, Mirco Nanni, Siegfried Nijssen, Barry O’Sullivan, Anastasia Paparrizou, Dino Pedreschi, and Helmut Simonis. “The Inductive Constraint Programming Loop.” In Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach, , edited by Christian Bessiere, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, 303–9. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-50137-6_12. bibTeX abstract
Constraint programming is used for a variety of real-world optimization problems, such as planning, scheduling and resource allocation problems. At the same time, one continuously gathers vast amounts of data about these problems. Current constraint programming software does not exploit such data to update schedules, resources and plans. We propose a new framework, that we call the Inductive Constraint Programming (ICON) loop. In this approach data is gathered and analyzed systematically in order to dynamically revise and adapt constraints and optimization criteria. Inductive Constraint Programming aims at bridging the gap between the areas of data mining and machine learning on the one hand, and constraint programming on the other hand.

Hurley, Barry, Lars Kotthoff, Yuri Malitsky, Deepak Mehta, and Barry O’Sullivan. “Advanced Portfolio Techniques.” In Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach, , edited by Christian Bessiere, Luc De Raedt, Lars Kotthoff, Siegfried Nijssen, Barry O’Sullivan, and Dino Pedreschi, 191–225. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-50137-6_8. bibTeX abstract
There exists a proliferation of different approaches to using portfolios and algorithm selection to make solving combinatorial search and optimisation problems more efficient, as surveyed in the previous chapter. In this chapter, we take a look at a detailed case study that leverages transformations between problem representations to make portfolios more effective, followed by extensions to the state of the art that make algorithm selection more robust in practice.

2015

Kotthoff, Lars, Pascal Kerschke, Holger Hoos, and Heike Trautmann. “Improving the State of the Art in Inexact TSP Solving Using Per-Instance Algorithm Selection.” In LION 9, 202–17, 2015. preprint PDF bibTeX abstract
We investigate per-instance algorithm selection techniques for solving the Travelling Salesman Problem (TSP), based on the two state-of-the-art inexact TSP solvers, LKH and EAX. Our comprehensive experiments demonstrate that the solvers exhibit complementary performance across a diverse set of instances, and the potential for improving the state of the art by selecting between them is significant. Using TSP features from the literature as well as a set of novel features, we show that we can capitalise on this potential by building an efficient selector that achieves significant performance improvements in practice. Our selectors represent a significant improvement in the state-of-the-art in inexact TSP solving, and hence in the ability to find optimal solutions (without proof of optimality) for challenging TSP instances in practice.

Kotthoff, Lars, Mirco Nanni, Riccardo Guidotti, and Barry O’Sullivan. “Find Your Way Back: Mobility Profile Mining with Constraints.” In CP, 638–53. Cork, Ireland, 2015. preprint PDF bibTeX abstract
Mobility profile mining is a data mining task that can be formulated as clustering over movement trajectory data. The main challenge is to separate the signal from the noise, i.e.{\textbackslash} one-off trips. We show that standard data mining approaches suffer the important drawback that they cannot take the symmetry of non-noise trajectories into account. That is, if a trajectory has a symmetric equivalent that covers the same trip in the reverse direction, it should become more likely that neither of them is labelled as noise. We present a constraint model that takes this knowledge into account to produce better clusters. We show the efficacy of our approach on real-world data that was previously processed using standard data mining techniques.

Kotthoff, Lars, Barry O’Sullivan, S. S. Ravi, and Ian Davidson. “Complex Clustering Using Constraint Programming: Modelling Electoral Map Creation.” In 14th International Workshop on Constraint Modelling and Reformulation, 2015. preprint PDF bibTeX abstract
Traditional clustering is limited to a single collection of objects, described by a set of features under simple objectives and constraints. Though this setting can scale to huge data sets, many real world problems do not fit it. Consider the problem motivating this work: creating electoral district maps. Not only are two sets of objects (electoral districts and elected officials) separately clustered simultaneously under complex constraints, the clusters must be matched and it is required to find a global optimum. Existing formulations of clustering such as those using procedural languages or convex programming cannot handle such complex settings. In this paper we explore clustering this complex setting using constraint programming. We implement our methods in the Numberjack language and make use of large-scale solvers such as Gurobi which exploit multi-core architectures.

Chue Hong, Neil P., Tom Crick, Ian P. Gent, Lars Kotthoff, and Kenji Takeda. “Top Tips to Make Your Research Irreproducible.” CoRR abs/1504.00062 (2015). preprint PDF bibTeX abstract
It is an unfortunate convention of science that research should pretend to be reproducible; our top tips will help you mitigate this fussy conventionality, enabling you to enthusiastically showcase your irreproducible work.

Kotthoff, Lars. “ICON Challenge on Algorithm Selection.” CoRR abs/1511.04326 (2015). preprint PDF bibTeX

2014

———. “Reliability of Computational Experiments on Virtualised Hardware.” Journal of Experimental and Theoretical Artificial Intelligence 26, no. 1 (2014): 33–49. preprint PDF bibTeX abstract
We present a large-scale investigation of the variability of run times on physical and virtualised hardware. The aim of our investigation is to establish whether cloud infrastructures are suitable for running computational experiments where the results rely on reported run times. Our application is the use of the Minion constraint solver as an example of an Artificial Intelligence experiment. We include two major providers of public cloud infrastructure, Amazon and Rackspace, as well as a private Eucalyptus cloud. While there are many studies in the literature that investigate the performance of cloud environments, the problem of whether this performance is consistent and run time measurements are reliable has been largely ignored. Our comprehensive experiments and detailed analysis of the results show that there is no intrinsic disadvantage of virtualised hardware over physical hardware and that in general cloud environments are suitable for running computational experiments. Our meticulous investigation reveals several interesting patterns in the variability of run times that researchers using a cloud for this purpose should be aware of. We close by giving recommendations as to which type of virtual machine with which cloud provider should be used to achieve reproducible results.

———. “Algorithm Selection for Combinatorial Search Problems: A Survey.” AI Magazine 35, no. 3 (2014): 48–60. preprint PDF bibTeX abstract
The Algorithm Selection Problem is concerned with selecting the best algorithm to solve a given problem instance on a case-by-case basis. It has become especially relevant in the last decade, with researchers increasingly investigating how to identify the most suitable existing algorithm for solving a problem instance instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where algorithm selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine algorithm selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which algorithm selection has been approached. This paper contrasts and compares different methods for solving the problem as well as ways of using these solutions.

———. “Ranking Algorithms by Performance.” In LION 8, 16–19, 2014. preprint PDF bibTeX

Geschwender, Daniel, Frank Hutter, Lars Kotthoff, Yuri Malitsky, Holger H. Hoos, and Kevin Leyton-Brown. “Algorithm Configuration in the Cloud: A Feasibility Study.” In LION 8, 41–44, 2014. preprint PDF bibTeX

Hurley, Barry, Lars Kotthoff, Yuri Malitsky, and Barry O’Sullivan. “Proteus: A Hierarchical Portfolio of Solvers and Transformations.” In CPAIOR, 301–17, 2014. preprint PDF bibTeX abstract
In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different encodings for representing CSPs as SAT instances. In this paper, we leverage advances in both SAT and CSP solving to present a novel hierarchical portfolio-based approach to CSP solving, which we call Proteus, that does not rely purely on CSP solvers. Instead, it may decide that it is best to encode a CSP problem instance into SAT, selecting an appropriate encoding and a corresponding SAT solver. Our experimental evaluation uses an instance of Proteus that involved four CSP solvers, three SAT encodings, and six SAT solvers, evaluated on the most challenging problem instances from the CSP solver competitions, involving global and intensional constraints. We demonstrate that significant performance improvements can be achieved by Proteus obtained by exploiting alternative view-point and solvers for combinatorial problem-solving.

Kelsey, Thomas W., Lars Kotthoff, Christopher A. Jefferson, Stephen A. Linton, Ian Miguel, Peter Nightingale, and Ian P. Gent. “Qualitative Modelling via Constraint Programming.” Constraints 19, no. 2 (2014): 163–73. preprint PDF bibTeX abstract
Qualitative modelling is a technique integrating the fields of theoretical computer science, artificial intelligence and the physical and biological sciences. The aim is to be able to model the behaviour of systems without estimating parameter values and fixing the exact quantitative dynamics. Traditional applications are the study of the dynamics of physical and biological systems at a higher level of abstraction than that obtained by estimation of numerical parameter values for a fixed quantitative model. Qualitative modelling has been studied and implemented to varying degrees of sophistication in Petri nets, process calculi and constraint programming. In this paper we reflect on the strengths and weaknesses of existing frameworks, we demonstrate how recent advances in constraint programming can be leveraged to produce high quality qualitative models, and we describe the advances in theory and technology that would be needed to make constraint programming the best option for scientific investigation in the broadest sense.

Johnson, Peter George, Tina Balke, and Lars Kotthoff. “Integrating Optimisation and Agent-Based Modelling.” In 28th European Conference on Modelling & Simulation. Brescia, Italy, 2014. preprint PDF bibTeX abstract
A key strength of agent-based modelling is the ability to explore the upward-causation of micro-dynamics on the macro-level behaviour of a system. However, in policy contexts, it is also important to be able to represent downward-causation from the macro and meso-levels to the micro, and to represent decision-making at the macro level (i.e., by governments) in a sensible way. Though we cannot model political processes easily, we can try to optimise decisions given certain stated goals (e.g., minimum cost, or maximum impact). Optimisation offers one potential method to model macro-level decisions in this way. This paper presents the implementation of an integration of optimisation with agent-based modelling for the example of an auction scenario of government support for the installation of photovoltaic solar panels by households. Auction type scenarios of this kind, in which large groups of individuals or organisations make bids for subsidies or contracts from government, are common in many policy domains.

Hussain, Bilal, Ian P. Gent, Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, Glenna F. Nightingale, and Peter Nightingale. “Discriminating Instance Generation for Automated Constraint Model Selection.” In 20th International Conference on Principles and Practice of Constraint Programming, 356–65. Lyon, France, 2014. preprint PDF bibTeX abstract
One approach to automated constraint modelling is to generate, and then select from, a set of candidate models. This method is used by the automated modelling system CONJURE. To select a preferred model or set of models for a problem class from the candidates, CONJURE uses a set of training instances drawn from the target class. It is important that the training instances are discriminating. If all models solve a given instance in a trivial amount of time, or if no models solve it in the time available, then the instance is not useful for model selection. This paper addresses the task of generating small sets of discriminating training instances automatically. The instance space is determined by the parameters of the associated problem class. We develop a number of methods of finding parameter configurations that give discriminating training instances, some of them leveraging existing parameter-tuning techniques. Our experimental results confirm the success of our approach in reducing a large set of input models to a small set that we can expect to perform well for the given problem class.

Kelsey, Thomas W., Martin McCaffery, and Lars Kotthoff. “Web-Scale Distributed EScience AI Search across Disconnected and Heterogeneous Infrastructures.” In 10th IEEE International Conference on EScience, 39–46. Guarujá, Brazil, 2014. preprint PDF bibTeX abstract
We present a robust and generic framework for web-scale distributed e-Science Artificial Intelligence search. Our validation approach is to distribute constraint satisfaction problems that require perfect accuracy to 10, 12 and 15 digits. By checking solutions obtained using the framework against known results, we can ensure that no errors, duplications nor omissions are introduced. Unlike other approaches, we do not require dedicated machines, homogeneous infrastructure or the ability to communicate between nodes. We give special consideration to the robustness of the framework, minimising the loss of effort even after a total loss of infrastructure, and allowing easy verification of every step of the distribution process. The unique challenges our framework tackles are related to the combinatorial explosion of the space that contains the possible solutions, and the robustness of long-running computations. Not only is the time required to finish the computations unknown, but also the resource requirements may change during the course of the computation. We demonstrate the applicability of our framework by using it to solve challenging problems using two separate large-scale distribution paradigms. The results show that our approach scales to e-Science computations of a size that would have been impossible to tackle just a decade ago.

Gent, Ian P., and Lars Kotthoff. “Recomputation.org: Experience of Its First Year and Lessons Learned.” In Recomputability 2014. London, UK, 2014. preprint PDF bibTeX abstract
We founded recomputation.org about 18 months ago as we write. The site is intended to serve as a repository for computational experiments, embodied in virtual machines so that they can be recomputed at will by other researchers. We reflect in this paper on those aspects of recomputation.org that have worked well, those that have worked less well, and to what extent our views have changed on reproducibility in computational science.

Kotthoff, Lars. “Algorithm Selection in Practice.” AISB Quarterly, no. 138 (2014): 4–8. preprint PDF bibTeX

Wilson, Nic, and Lars Kotthoff. “Taking into Account Expected Future Bids in EPolicy Optimisation Problem.” Insight Centre for Data Analytics, July 2014. preprint PDF bibTeX

Arabas, Sylwester, Michael R. Bareford, Lakshitha R. de Silva, Ian P. Gent, Benjamin M. Gorman, Masih Hajiarabderkani, Tristan Henderson, et al. “Case Studies and Challenges in Reproducibility in the Computational Sciences.” arXiv, 2014. https://doi.org/10.48550/ARXIV.1408.2123. preprint PDF bibTeX abstract
This paper investigates the reproducibility of computational science research and identifies key challenges facing the community today. It is the result of the First Summer School on Experimental Methodology in Computational Science Research (this https URL). First, we consider how to reproduce experiments that involve human subjects, and in particular how to deal with different ethics requirements at different institutions. Second, we look at whether parallel and distributed computational experiments are more or less reproducible than serial ones. Third, we consider reproducible computational experiments from fields outside computer science. Our final case study looks at whether reproducibility for one researcher is the same as for another, by having an author attempt to have others reproduce their own, reproducible, paper. This paper is open, executable and reproducible: the whole process of writing this paper is captured in the source control repository hosting both the source of the paper, supplementary codes and data; we are providing setup for several experiments on which we were working; finally, we try to describe what we have achieved during the week of the school in a way that others may reproduce (and hopefully improve) our experiments.

2013

Kotthoff, Lars, and Barry O’Sullivan. “Constraint-Based Clustering.” In 10th International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, 2013. bibTeX abstract
Machine learning and constraint programming are almost completely independent research fields. However, there are significant opportunities for synergy between them. In this presentation, we introduce a constraint programming approach to the classification problem in machine learning. Specifically, we treat classification as a clustering problem. Previous approaches have used constraints as a means of representing background knowledge to improve the quality of the resulting clustering. We show how to use such constraints to not only guide the machine learning algorithms, but replace them entirely. Our approach uses an off-the-shelf constraint solver to find the clustering that reflects as much background knowledge as possible. A second formulation allows us to optimise for the objectives commonly used in machine learning algorithms, such as maximising the inter-cluster distances. We present an evaluation results of our approaches on a variety of well-known benchmarks covering a range of different application domains. Our approaches can significantly outperform standard clustering methods used in machine learning in terms of the quality of the resulting clustering and classification. In addition, the constraint programming formulation provides much more flexibility and customisation opportunities than standard machine learning approaches.

Akgun, Ozgur, Alan M. Frisch, Bilal Hussain, Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, and Peter Nightingale. “An Automated Constraint Modelling and Solving Toolchain.” In 20th Automated Reasoning Workshop, 2013. preprint PDF bibTeX

Prokopas, Arunas, Alan M. Frisch, Ian P. Gent, Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, and Peter Nightingale. “Constructing Constraint Solvers Using Monte Carlo Tree Search.” In 20th Automated Reasoning Workshop, 2013. preprint PDF bibTeX abstract
Constraint solvers are complex pieces of software that are capable of solving a wide variety of problems. Customisation and specialisation opportunities are usually very limited and require specialist knowledge. The Dominion constraint solver synthesizer automatically creates problem-specific solvers. The configuration of a constraint solver is highly complex, especially if the aim is to achieve high performance. We demonstrate how Monte Carlo Tree Search can be employed to tackle this problem.

Kotthoff, Lars. “LLAMA: Leveraging Learning to Automatically Manage Algorithms.” arXiv, June 2013. preprint PDF bibTeX abstract
Algorithm portfolio approaches have achieved remarkable improvements over single solvers. However, the implementation of such systems is often highly specialized and specific to the problem domain. This makes it difficult for researchers to explore different techniques for their specific problems. We present LLAMA, a modular and extensible toolkit that facilitates the exploration of a range of different portfolio techniques on any problem domain. We describe the current capabilities and limitations of the toolkit and illustrate its usage on a set of example SAT problems.

Hurley, Barry, Lars Kotthoff, Yuri Malitsky, and Barry O’Sullivan. “Proteus: A Hierarchical Portfolio of Solvers and Transformations.” arXiv, June 2013. preprint PDF bibTeX abstract
In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different techniques for converting SAT problems into CSPs. In this paper, we leverage advances in both areas and present a novel hierarchical portfolio-based approach to CSP solving that does not rely purely on CSP solvers, but may convert a problem to SAT choosing a conversion technique and the accommodating SAT solver. Our experimental evaluation relies on competition CSP instances and uses eight CSP solvers, three SAT encodings and eighteen SAT solvers. We demonstrate that significant performance improvements can be obtained by considering alternative view-points of a combinatorial problem.

Akgun, Ozgur, Alan M. Frisch, Ian P. Gent, Bilal Hussain, Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, and Peter Nightingale. “Automated Symmetry Breaking and Model Selection in Conjure.” In 19th International Conference on Principles and Practice of Constraint Programming, 107–16. Uppsala, Sweden, 2013. preprint PDF bibTeX abstract
Constraint modelling is widely recognised as a key bottleneck in applying constraint solving to a problem of interest. The Conjure automated constraint modelling system addresses this problem by automatically refining constraint models from problem specifications written in the Essence language. Essence provides familiar mathematical concepts like sets, functions and relations nested to any depth. To date, Conjure has been able to produce a set of alternative model kernels (i.e. without advanced features such as symmetry breaking or implied constraints) for a given specification. The first contribution of this paper is a method by which Conjure can break symmetry in a model as it is introduced by the modelling process. This works at the problem class level, rather than just individual instances, and does not require an expensive detection step after the model has been formulated. This allows Conjure to produce a higher quality set of models. A further limitation of Conjure has been the lack of a mechanism to select among the models it produces. The second contribution of this paper is to present two such mechanisms, allowing effective models to be chosen automatically.

Mehta, Deepak, Barry O’Sullivan, Lars Kotthoff, and Yuri Malitsky. “Lazy Branching for Constraint Satisfaction.” In ICTAI, 1012–19, 2013. preprint PDF bibTeX abstract
When solving a constraint satisfaction problem using a systematic backtracking method, the branching scheme normally selects a variable to which a value is assigned. In this paper we refer to such strategies as eager branching schemes. These contrast with the alternative class of novel branching schemes considered in this paper whereby having selected a variable we proceed by removing values from its domain. In this paper we study such lazy branching schemes in depth. We define three lazy branching schemes based on k-way, binary and split branching. We show how each can be incorporated into MAC, and define a novel value ordering heuristic that is suitable in this setting. Our results show that lazy branching can significantly out-perform traditional branching schemes across a variety of problem classes. While, in general, neither lazy nor eager branching dominates the other, our results clearly show that choosing the correct branching scheme for a given problem instance can significantly reduce search effort. Therefore, we implemented a variety of branching portfolios for choosing amongst all of the branching strategies studied in this paper. The results demonstrate that a good branching scheme can be automatically selected for a given problem instances and that including lazy branching schemes in the portfolio significantly reduces runtime.

Kotthoff, Lars. “Ranking Algorithms by Performance,” November 2013. preprint PDF bibTeX abstract
A common way of doing algorithm selection is to train a machine learning model and predict the best algorithm from a portfolio to solve a particular problem. While this method has been highly successful, choosing only a single algorithm has inherent limitations -- if the choice was bad, no remedial action can be taken and parallelism cannot be exploited, to name but a few problems. In this paper, we investigate how to predict the ranking of the portfolio algorithms on a particular problem. This information can be used to choose the single best algorithm, but also to allocate resources to the algorithms according to their rank. We evaluate a range of approaches to predict the ranking of a set of algorithms on a problem. We furthermore introduce a framework for categorizing ranking predictions that allows to judge the expressiveness of the predictive output. Our experimental evaluation demonstrates on a range of data sets from the literature that it is beneficial to consider the relationship between algorithms when predicting rankings. We furthermore show that relatively naive approaches deliver rankings of good quality already.

2012

Kelsey, Thomas W., Lars Kotthoff, Christopher A. Jefferson, Stephen A. Linton, Ian Miguel, Peter Nightingale, and Ian P. Gent. “Qualitative Modelling via Constraint Programming: Past, Present and Future.” In 18th International Conference on Principles and Practice of Constraint Programming (Position Paper), 2012. preprint PDF bibTeX abstract
Qualitative modelling is a technique integrating the fields of theoretical computer science, artificial intelligence and the physical and biological sciences. The aim is to be able to model the behaviour of systems without estimating parameter values and fixing the exact quantitative dynamics. Traditional applications are the study of the dynamics of physical and biological systems at a higher level of abstraction than that obtained by estimation of numerical parameter values for a fixed quantitative model. Qualitative modelling has been studied and implemented to varying degrees of sophistication in Petri nets, process calculi and constraint programming. In this paper we reflect on the strengths and weaknesses of existing frameworks, we demonstrate how recent advances in constraint programming can be leveraged to produce high quality qualitative models, and we describe the advances in theory and technology that would be needed to make constraint programming the best option for scientific investigation in the broadest sense.

Distler, Andreas, Christopher A. Jefferson, Tom Kelsey, and Lars Kotthoff. “The Semigroups of Order 10.” In 18th International Conference on Principles and Practice of Constraint Programming, 883–99, 2012. preprint PDF bibTeX abstract
The number of finite semigroups increases rapidly with the number of elements. Since existing enumeration formulae do not give the complete number of semigroups of given order up to symmetric equivalence, the remainder can only be found by careful search. We describe the use of mathematical results combined with distributed Constraint Satisfaction to show that the number of non-equivalent semigroups of order 10 is 12,418,001,077,381,302,684. This solves a previously open problem in Mathematics, and has directly led to improvements in Constraint Satisfaction technology.

Kotthoff, Lars, Ian P. Gent, and Ian Miguel. “An Evaluation of Machine Learning in Algorithm Selection for Search Problems.” AI Communications 25, no. 3 (2012): 257–70. preprint PDF bibTeX abstract
Machine learning is an established method of selecting algorithms to solve hard search problems. Despite this, to date no systematic comparison and evaluation of the different techniques has been performed and the performance of existing systems has not been critically compared with other approaches. We compare the performance of a large number of different machine learning techniques from different machine learning methodologies on five data sets of hard algorithm selection problems from the literature. In addition to well-established approaches, for the first time we also apply statistical relational learning to this problem. We demonstrate that there is significant scope for improvement both compared with existing systems and in general. To guide practitioners, we close by giving clear recommendations as to which machine learning techniques are likely to achieve good performance in the context of algorithm selection problems. In particular, we show that linear regression and alternating decision trees have a very high probability of achieving better performance than always selecting the single best algorithm.

Hammond, Gail, Samantha Krause, Lars Kotthoff, and Thomas H. Guderjan. “Continuing Research Using Landscape Archaeology and GIS at Nojol Nah, Belize.” In 77th Annual Meeting of the Society for American Archaeology. Memphis, TN, 2012. preprint PDF bibTeX abstract
We present preliminary results of interdisciplinary efforts in determining the use and management of ancient Maya rural landscapes in northwestern Belize. Through our comprehensive data set combining archaeology, GIS, land survey and soil science, we provide a singular insight into the strategies used by the people that inhabited this site on the periphery of the Maya world. Georeferenced maps - the results of intensive pedestrian survey - have been combined with a NASA digital elevation model as well as hydrological and soil data into a regional geodatabase, which includes the results of ongoing test units, and larger excavations.

Kotthoff, Lars, and Thomas H. Guderjan. “An Interactive Atlas of Maya Sites.” In 77th Annual Meeting of the Society for American Archaeology. Memphis, TN, 2012. preprint PDF bibTeX abstract
Every year, a large amount of geographical data on Maya settlements is generated in the course of excavations, surveys and similar operations. Only some of those data are published in reports and most of it is not used more than once. We present a new effort that aims to make publicly available such data as the location of individual sites as well as detailed maps, excavation pictures and contextual information. All this is available through an easy-to-use web interface at www.mayamap.org that brings the power of traditional GIS systems to the layman user.

Kotthoff, Lars. “On Algorithm Selection, with an Application to Combinatorial Search Problems.” Ph.D., University of St Andrews, 2012. preprint PDF bibTeX abstract
The Algorithm Selection Problem is to select the most appropriate way for solving a problem given a choice of different ways. Some of the most prominent and successful applications come from Artificial Intelligence and in particular combinatorial search problems. Machine Learning has established itself as the de facto way of tackling the Algorithm Selection Problem. Yet even after a decade of intensive research, there are no established guidelines as to what kind of Machine Learning to use and how. This dissertation presents an overview of the field of Algorithm Selection and associated research and highlights the fundamental questions left open and problems facing practitioners. In a series of case studies, it underlines the difficulty of doing Algorithm Selection in practice and tackles issues related to this. The case studies apply Algorithm Selection techniques to new problem domains and show how to achieve significant performance improvements. Lazy learning in constraint solving and the implementation of the alldifferent constraint are the areas in which we improve on the performance of current state of the art systems. The case studies furthermore provide empirical evidence for the effectiveness of using the misclassification penalty as an input to Machine Learning. After having established the difficulty, we present an effective technique for reducing it. Machine Learning ensembles are a way of reducing the background knowledge and experimentation required from the researcher while increasing the robustness of the system. Ensembles do not only decrease the difficulty, but can also increase the performance of Algorithm Selection systems. They are used to much the same ends in Machine Learning itself. We finally tackle one of the great remaining challenges of Algorithm Selection — which Machine Learning technique to use in practice. Through a large-scale empirical evaluation on diverse data taken from Algorithm Selection applications in the literature, we establish recommendations for Machine Learning algorithms that are likely to perform well in Algorithm Selection for combinatorial search problems. The recommendations are based on strong empirical evidence and additional statistical simulations. The research presented in this dissertation significantly reduces the knowledge threshold for researchers who want to perform Algorithm Selection in practice. It makes major contributions to the field of Algorithm Selection by investigating fundamental issues that have been largely ignored by the research community so far.

Balasubramaniam, Dharini, Christopher Jefferson, Lars Kotthoff, Ian Miguel, and Peter Nightingale. “An Automated Approach to Generating Efficient Constraint Solvers.” In 34th International Conference on Software Engineering, 661–71, 2012. preprint PDF bibTeX abstract
Combinatorial problems appear in numerous settings, from timetabling to industrial design. Constraint solving aims to find solutions to such problems efficiently and automatically. Current constraint solvers are monolithic in design, accepting a broad range of problems. The cost of this convenience is a complex architecture, inhibiting efficiency, extensibility and scalability. Solver components are also tightly coupled with complex restrictions on their configuration, making automated generation of solvers difficult. We describe a novel, automated, model-driven approach to generating efficient solvers tailored to individual problems and present some results from applying the approach. The main contribution of this work is a solver generation framework called Dominion, which analyses a problem and, based on its characteristics, generates a solver using components chosen from a library. The key benefit of this approach is the ability to solve larger and more difficult problems as a result of applying finer-grained optimisations and using specialised techniques as required.

Kotthoff, Lars. “Hybrid Regression-Classification Models for Algorithm Selection.” In 20th European Conference on Artificial Intelligence, 480–85, 2012. preprint PDF bibTeX abstract
Many state of the art Algorithm Selection systems use Machine Learning to either predict the run time or a similar performance measure of each of a set of algorithms and choose the algorithm with the best predicted performance or predict the best algorithm directly. We present a technique based on the well-established Machine Learning technique of stacking that combines the two approaches into a new hybrid approach and predicts the best algorithm based on predicted run times. We demonstrate significant performance improvements of up to a factor of six compared to the previous state of the art. Our approach is widely applicable and does not place any restrictions on the performance measure used, the way to predict it or the Machine Learning used to predict the best algorithm. We investigate different ways of deriving new Machine Learning features from the predicted performance measures and evaluate their effectiveness in increasing performance further. We use five different regression algorithms for performance prediction on five data sets from the literature and present strong empirical evidence that shows the effectiveness of our approach.

———. “Algorithm Selection for Combinatorial Search Problems: A Survey.” University College Cork, 2012. preprint PDF bibTeX abstract
The Algorithm Selection Problem is concerned with selecting the best algorithm to solve a given problem on a case-by-case basis. It has become especially relevant in the last decade, as researchers are increasingly investigating how to identify the most suitable existing algorithm for solving a problem instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where Algorithm Selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine Algorithm Selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which Algorithm Selection has been approached. This paper contrasts and compares different methods for solving the problem as well as ways of using these solutions. It closes by identifying directions of current and future research.

2011

Kelsey, Tom, and Lars Kotthoff. “Exact Closest String as a Constraint Satisfaction Problem.” In Proceedings of the International Conference on Computational Science, 1062–71, 2011. preprint PDF bibTeX abstract
We report the first evaluation of Constraint Satisfaction as a computational framework for solving closest string problems. We show that careful consideration of symbol occurrences can provide search heuristics that provide several orders of magnitude speedup at and above the optimal distance. We also report the first analysis and evaluation – using any technique – of the computational difficulties involved in the identification of all closest strings for a given input set. We describe algorithms for web-scale distributed solution of closest string problems, both purely based on AI backtrack search and also hybrid numeric-AI methods.

Kotthoff, Lars, Ian P. Gent, and Ian Miguel. “A Preliminary Evaluation of Machine Learning in Algorithm Selection for Search Problems.” In Fourth Annual Symposium on Combinatorial Search, 84–91, 2011. preprint PDF bibTeX abstract
Machine learning is an established method of selecting algorithms to solve hard search problems. Despite this, to date no systematic comparison and evaluation of the different techniques has been performed and the performance of existing systems has not been critically compared to other approaches. We compare machine learning techniques for algorithm selection on real-world data sets of hard search problems. In addition to well-established approaches, for the first time we also apply statistical relational learning to this problem. We demonstrate that most machine learning techniques and existing systems perform less well than one might expect. To guide practitioners, we close by giving clear recommendations as to which machine learning techniques are likely to perform well based on our experiments.

Gent, Ian P., Christopher A. Jefferson, Lars Kotthoff, and Ian Miguel. “Modelling Constraint Solver Architecture Design as a Constraint Problem.” In Annual ERCIM Workshop on Constraint Solving and Constraint Logic Programming, 87–96, 2011. preprint PDF bibTeX abstract
Designing component-based constraint solvers is a complex problem. Some components are required, some are optional and there are interdependencies between the components. Because of this, previous approaches to solver design and modification have been ad-hoc and limited. We present a system that transforms a description of the components and the characteristics of the target constraint solver into a constraint problem. Solving this problem yields the description of a valid solver. Our approach represents a significant step towards the automated design and synthesis of constraint solvers that are specialised for individual constraint problem classes or instances.

Gent, Ian P., and Lars Kotthoff. “Reliability of Computational Experiments on Virtualised Hardware.” In AAAI Workshop AI for Data Center Management and Cloud Computing, 2011. preprint PDF bibTeX abstract
We present preliminary results of an investigation into the suitability of virtualised hardware -- in particular clouds -- for running computational experiments. Our main concern was that the reported CPU time would not be reliable and reproducible. The results demonstrate that while this is true in cases where many virtual machines are running on the same physical hardware, there is no inherent variation introduced by using virtualised hardware compared to non-virtualised hardware.

Balasubramaniam, Dharini, Lakshitha de Silva, Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, and Peter Nightingale. “Dominion: An Architecture-Driven Approach to Generating Efficient Constraint Solvers.” In 9th Working IEEE/IFIP Conference on Software Architecture, 228–31, 2011. preprint PDF bibTeX abstract
Constraints are used to solve combinatorial problems in a variety of industrial and academic disciplines. However most constraint solvers are designed to be general and monolithic, leading to problems with efficiency, scalability and extensibility. We propose a novel, architecture-driven constraint solver generation framework called Dominion to tackle these issues. For any given problem, Dominion generates a lean and efficient solver tailored to that problem. In this paper, we outline the Dominion approach and its implications for software architecture specification of the solver.

2010

Kotthoff, Lars, and Neil C.A. Moore. “Distributed Solving through Model Splitting.” In 3rd Workshop on Techniques for Implementing Constraint Programming Systems (TRICS), 26–34, 2010. preprint PDF bibTeX abstract
Constraint problems can be trivially solved in parallel by exploring different branches of the search tree concurrently. Previous approaches have focused on implementing this functionality in the solver, more or less transparently to the user. We propose a new approach, which modifies the constraint model of the problem. An existing model is split into new models with added constraints that partition the search space. Optionally, additional constraints are imposed that rule out the search already done. The advantages of our approach are that it can be implemented easily, computations can be stopped and restarted, moved to different machines and indeed solved on machines which are not able to communicate with each other at all.

Gent, Ian P., Lars Kotthoff, Ian Miguel, and Peter Nightingale. “Machine Learning for Constraint Solver Design – a Case Study for the Alldifferent Constraint.” In 3rd Workshop on Techniques for Implementing Constraint Programming Systems (TRICS), 13–25, 2010. preprint PDF bibTeX abstract
Constraint solvers are complex pieces of software which require many design decisions to be made by the implementer based on limited information. These decisions affect the performance of the finished solver significantly. Once a design decision has been made, it cannot easily be reversed, although a different decision may be more appropriate for a particular problem. We investigate using machine learning to make these decisions automatically depending on the problem to solve. We use the alldifferent constraint as a case study. Our system is capable of making non-trivial, multi-level decisions that improve over always making a default choice and can be implemented as part of a general-purpose constraint solver.

Kotthoff, Lars, Ian Miguel, and Peter Nightingale. “Ensemble Classification for Constraint Solver Configuration.” In 16th International Conference on Principles and Practices of Constraint Programming, 321–29, 2010. preprint PDF bibTeX abstract
The automatic tuning of the parameters of algorithms and automatic selection of algorithms has received a lot of attention recently. One possible approach is the use of machine learning techniques to learn classifiers which, given the characteristics of a particular problem, make a decision as to which algorithm or what parameters to use. Little research has been done into which machine learning algorithms are suitable and the impact of picking the "right" over the "wrong" technique. This paper investigates the differences in performance of several techniques on different data sets. It furthermore provides evidence that by using a meta-technique which combines several machine learning algorithms, we can avoid the problem of having to pick the "best" one and still achieve good performance.

Gent, Ian P., Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, Neil Moore, Peter Nightingale, and Karen E. Petrie. “Learning When to Use Lazy Learning in Constraint Solving.” In 19th European Conference on Artificial Intelligence, 873–78, 2010. preprint PDF bibTeX abstract
Learning in the context of constraint solving is a technique by which previously unknown constraints are uncovered during search and used to speed up subsequent search. Recently, lazy learning, similar to a successful idea from satisfiability modulo theories solvers, has been shown to be an effective means of incorporating constraint learning into a solver. Although a powerful technique to reduce search in some circumstances, lazy learning introduces a substantial overhead, which can outweigh its benefits. Hence, it is desirable to know beforehand whether or not it is expected to be useful. We approach this problem using machine learning (ML). We show that, in the context of a large benchmark set, standard ML approaches can be used to learn a simple, cheap classifier which performs well in identifying instances on which lazy learning should or should not be used. Furthermore, we demonstrate significant performance improvements of a system using our classifier and the lazy learning and standard constraint solvers over a standard solver. Through rigorous cross-validation across the different problem classes in our benchmark set, we show the general applicability of our learned classifier.

Kotthoff, Lars, Ian P. Gent, and Ian Miguel. “Using Machine Learning to Make Constraint Solver Implementation Decisions.” In SICSA PhD Conference, 2010. preprint PDF bibTeX abstract
Programs to solve so-called constraint problems are complex pieces of software which require many design decisions to be made more or less arbitrarily by the implementer. These decisions affect the performance of the finished solver significantly. Once a design decision has been made, it cannot easily be reversed, although a different decision may be more appropriate for a particular problem. We investigate using machine learning to make these decisions automatically depending on the problem to solve with the alldifferent constraint as an example. Our system is capable of making non-trivial, multi-level decisions that improve over always making a default choice.

2009

Kotthoff, Lars. “Constraint Solvers: An Empirical Evaluation of Design Decisions.” CIRCA preprint. University of St Andrews, Centre for Interdisciplinary Research in Computational Algebra, 2009. preprint PDF bibTeX abstract
This paper presents an evaluation of the design decisions made in four state-of-the-art constraint solvers; Choco, ECLiPSe, Gecode, and Minion. To assess the impact of design decisions, instances of the five problem classes n-Queens, Golomb Ruler, Magic Square, Social Golfers, and Balanced Incomplete Block Design are modelled and solved with each solver. The results of the experiments are not meant to give an indication of the performance of a solver, but rather investigate what influence the choice of algorithms and data structures has. The analysis of the impact of the design decisions focuses on the different ways of memory management, behaviour with increasing problem size, and specialised algorithms for specific types of variables. It also briefly considers other, less significant decisions.

Gent, Ian P., Christopher A. Jefferson, Lars Kotthoff, Ian Miguel, and Peter Nightingale. “Specification of the Dominion Input Language Version 0.1.” University of St Andrews, 2009. preprint PDF bibTeX

Kotthoff, Lars. “Dominion – A Constraint Solver Generator.” In Doctoral Program of CP, 2009. preprint PDF bibTeX abstract
This paper proposes a design for a system to generate constraint solvers that are specialised for specific problem models. It describes the design in detail and gives preliminary experimental results showing the feasibility and effectiveness of the approach.

2007

———. “Using Constraints to Render Websites — Applications of Artificial Intelligence in E-Commerce Environments.” Diplom, University of Leipzig, 2007. preprint PDF bibTeX abstract
Constraint programming is an area of Artiﬁcial Intelligence which has many applica- tions. This thesis applies its techniques to a new kind of problem – the rendering of online retailer websites. First, in-depth introductions to constraint programming and the problem of rendering a shop website will be given. A prototypical implementation of a constraint problem solver and a system to solve and illustrate the problem will be described. The architecture of the prototypical implementation and speciﬁc features, algo- rithms, and design decisions will be detailed, analysed, and illustrated. An overview of related work both in the ﬁelds of constraint programming and website generation will be presented and existing technologies evaluated. Features and concepts unique to this thesis, like real-time constraint satisfaction, will be introduced and discussed. Finally, a comprehensive example will illustrate the problem, means of modelling it, and possible solutions. An outlook to future work and a summary conclude the thesis.

Lars Kotthoff

Templeton Associate Professor

Derecho Professor

Presidential Faculty Fellow

News

Publications

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2007

Software

Teaching

Awards

Other