Homepage of Jurgen J. Vinju

What is code and what is coding? Understanding what programmers do is important, because what software does is influencing our daily lives in increasing scope and intensity. In fact, it is today quite hard to make list of things we do and experience which are not influenced by some sort of software. Since programmers write all this software, they are accountable and we all must engage in conversation with them. Understanding what “code” is and what “coding” entails, paves the way for a good conversation. Read about it here.

I am a researcher in the general fields of Software Engineering and Computer Science, with a particular interest in Programming Languages, Model Driven Engineering, Domain Specific Languages, Software Analysis and Software Maintenance & Evolution.

My interest is especially what all these topics have to do with each other: How can we learn to understand software engineering practise? Based on this understanding, how can we design new tools for software engineers? The impact of software in society is very large; we must understand it better to be able to provide better guarantees. Guarantees about privacy and security but also about software budgets, software functionality, and software sustainability.

I was on sick leave a lot between February 2018 and February 2021, in case you have identifiedy a gap in my activities. And from March 2024 up to now I am again sick at home. Hope to be back to you all soon!

Typical tasks and roles:

senior researcher in the SWAT - Software Analysis & Transformation group (former group leader) at CWI, and
part-time full professor at Eindhoven University of Technology since September 1st, 2014.
co-founder of a CWI spin-off company named SWAT.engineering in 2017.
leader and contributor (as a programmer) to several open-source software projects, with a current focus on the Rascal metaprogramming language.
member of the board of the IPA school
member of the steering committee of the SLE conference
co-founder and treasurer of VERSEN and the SEN symposium

Position statements about research and engineering can be found in the blog part of this homepage.

Download my CV as pdf

2024

🏆 OIL: an industrial case study in language engineering with Spoofax, Olav Bunte, Jasper Denkers, Louis C. M. van Gool, Jurgen J. Vinju, Eelco Visser †, Tim A. C. Willemse, Andy Zaidman. In Springer Journal of Software and Systems Modeling (SOSYM), 2024, SoSym First paper award.

Domain-specific languages (DSLs) promise to improve the software engineering process, e.g., by reducing software development and maintenance effort and by improving communication, and are therefore seeing increased use in industry. To support the creation and deployment of DSLs, language workbenches have been developed. However, little is published about the actual added value of a language workbench in an industrial setting, compared to not using a language workbench. In this paper, we evaluate the productivity of using the Spoofax language workbench by comparing two implementations of an industrial DSL, one in Spoofax and one in Python, that already existed before the evaluation. The subject is the Open Interaction Language (OIL): a complex DSL for implementing control software with requirements imposed by its industrial context at Canon Production Printing. Our findings indicate that it is more productive to implement OIL using Spoofax compared to using Python, especially if editor services are desired. Although Spoofax was sufficient to implement OIL, we find that Spoofax should especially improve on practical aspects to increase its adoptability in industry.

@article{bunte_oil_2024,
	author = {Bunte, Olav and Denkers, Jasper and van Gool, Louis C. M. and Vinju, Jurgen J. and Visser, Eelco and Willemse, Tim A. C. and Zaidman, Andy},
	doi = {10.1007/s10270-024-01185-x},
	issn = {1619-1374},
	journal = {Software and Systems Modeling},
	month = jun,
	title = { {OIL}: an industrial case study in language engineering with {Spoofax} },
	year = {2024}

2023

Taming complexity of industrial printing systems using a constraint-based DSL: An industrial experience report, Jasper Denkers, Marvin Brunner, Louis van Gool, Jurgen J. Vinju, Andy Zaidman, Eelco Visser †. In Journal of Software: Practise & Experience (2023), Wiley.

Flexible printing systems are highly complex systems that consist of printers, that print individual sheets of paper, and finishing equipment, that processes sheets after printing, for example, assembling a book. Integrating finishing equipment with printers involves the development of control software that configures the devices, taking hardware constraints into account. This control software is highly complex to realize due to (1) the intertwined nature of printing and finishing, (2) the large variety of print products and production options for a given product, and (3) the large range of finishers produced by different vendors. We have developed a domain-specific language called CSX that offers an interface to constraint solving specific to the printing domain. We use it to model printing and finishing devices and to automatically derive constraint solver-based environments for automatic configuration. We evaluate CSX on its coverage of the printing domain in an industrial context, and we report on lessons learned on using a constraint-based DSL in an industrial context.


@article{denkers2024,
	author = {Denkers, Jasper and Brunner, Marvin and van Gool, Louis and Vinju, Jurgen J. and Zaidman, Andy and Visser, Eelco},
	doi = {https://doi.org/10.1002/spe.3239},
	journal = {Software: Practice and Experience},
	number = {10},
	pages = {2026-2064},
	title = {Taming complexity of industrial printing systems using a constraint-based DSL: An industrial experience report},
	volume = {53},
	year = {2023}

Comparing Bottom-up with Top-down Parsing Architectures for the Syntax Definition Formalism from a Disambiguation standpoint Jurgen J. Vinju. OASIcs, Volume 109, EVCS: Eelco Visser's Commemorative Symposium.

Context-free general parsing and disambiguation algorithms are threaded throughout the research and engineering career of Eelco Visser. Both our Ph.D. theses featured the study of "disambiguation." Disambiguation is the declarative definition of choices among different parse trees, derived using the same context-free grammar, for the same input sentence. This essay highlights the differences between syntactic disambiguation for context-free general parsing in a top-down architecture and a bottom-up architecture. The differences between top-down and bottom-up are mainly observed as practical aspects of the software architecture and software implementation. Eventually, the concept of data-dependent context-free grammar brings all engineering perspectives of disambiguation back into a conceptual (declarative) framework independent of the parsing architecture. The novelty in this essay is the juxtaposition of three general parsing architectures from a disambiguation point of view: SGLR, SGLL, and DDGLL. It also motivates design decisions in the parsing architectures for SDF{1,2} and Rascal with previously unpublished detail. The essay falls short of a literature review and a tool evaluation since it does not investigate the disambiguation methods of the many other parser generator tools that exist. The fact that only the implementation algorithms are different between the compared parsing architectures, while the syntax definition formalisms have practically the same formal semantics for historical reasons, nicely "isolates the variable" of interest. We hope this essay lives up to the enormous enthusiasm, curiosity, and drive for perfection in syntax definition and parsing that Eelco always radiated. We dearly miss him.

@InProceedings{vinju:OASIcs.EVCS.2023.31,
  author =	{Vinju, Jurgen J.},
  title =	,
  booktitle =	{Eelco Visser Commemorative Symposium (EVCS 2023)},
  pages =	{31:1--31:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  year =	{2023},
  volume =	{109},
  editor =	{L\"{a}mmel, Ralf and Mosses, Peter D. and Steimann, Friedrich},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  doi =		{10.4230/OASIcs.EVCS.2023.31},
}

2022

Large-scale semi-automated migration of legacy C/C++ test code Mathijs T. W. Schuts, Rodin T. A. Aarssen, Paul M. Tielemans, and Jurgen J. Vinju. Software: Practise & Experience.

This is an industrial experience report on a large semi-automated migration of legacy test code in C and C++. The particular migration was enabled by automating most of the maintenance steps. Without automation this particular large-scale migration would not have been conducted, due to the risks involved in manual maintenance (risk of introducing errors and risk of unexpected rework and loss of productivity). We describe and evaluate the method of automation we used on this real-world case. The benefits were that by automating analysis, we could make sure that we understand all the relevant details for the envisioned maintenance, without having to manually read and check our theories. Furthermore, by automating transformations we could reiterate and improve over complex and large scale source code updates, until they were "just right". The drawbacks were that, first, we have had to learn new metaprogramming skills. Second, our automation scripts are not readily reusable for other contexts; they were necessarily developed for this ad-hoc maintenance task. Our analysis shows that automated software maintenance as compared to the (hypothetical) manual alternative method seems to be better both in terms of avoiding mistakes and avoiding rework because of such mistakes. It seems that necessary and beneficial source code maintenance need not to be avoided, if software engineers are enabled to create bespoke (and ad-hoc) analysis and transformation tools to support it.


@article{spe.3082,
	author = {Schuts, Mathijs T. W. and Aarssen, Rodin T. A. and Tielemans, Paul M. and Vinju, Jurgen J.},
	doi = {https://doi.org/10.1002/spe.3082},
	journal = {Software: Practice and Experience},
	keywords = {parsers, pattern matching, program analysis, refactoring, source code generation},
	number = {n/a},
	title = {Large-scale semi-automated migration of legacy C/C++ test code},
	url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.3082},
}

2021

Breaking Bad? Semantic Versioning and Impact of Breaking Changes in Maven Central L. Ochoa, T. Degueule, J-R. Falleri, J. Vinju. Empirical Software Engineering (EMSE), 2021

Just like any software, libraries evolve to incorporate new features, bug fixes, security patches, and refactorings. However, when a library evolves, it may break the contract previously established with its clients by introducing Breaking Changes (BCs) in its API. These changes might trigger compile-time, link-time, or run-time errors in client code. As a result, clients may hesitate to upgrade their dependencies, raising security concerns and making future upgrades even more difficult.Understanding how libraries evolve helps client developers to know which changes to expect and where to expect them, and library developers to understand how they might impact their clients. In the most extensive study to date, Raemaekers et al. investigate to what extent developers of Java libraries hosted on the Maven Central Repository (MCR) follow semantic versioning conventions to signal the introduction of BCs and how these changes impact client projects. Their results suggest that BCs are widespread without regard for semantic versioning, with a significant impact on this http URL this paper, we conduct an external and differentiated replication study of their work. We identify and address some limitations of the original protocol and expand the analysis to a new corpus spanning seven more years of the MCR. We also present a novel static analysis tool for Java bytecode, Maracas, which provides us with: (i) the set of all BCs between two versions of a library; and (ii) the set of locations in client code impacted by individual BCs. Our key findings, derived from the analysis of 119, 879 library upgrades and 293, 817 clients, contrast with the original study and show that 83.4% of these upgrades do comply with semantic versioning. Furthermore, we observe that the tendency to comply with semantic versioning has significantly increased over time. Finally, we find that most BCs affect code that is not used by any client, and that only 7.9% of all clients are affected by BCs. These findings should help (i) library developers to understand and anticipate the impact of their changes; (ii) library users to estimate library upgrading effort and to pick libraries that are less likely to break; and (iii) researchers to better understand the dynamics of library-client co-evolution in Java.

@article{ochoa21,
  author = {L. Ochoa and T. Degueule and J-R. Falleri and J. Vinju},
  journal = {Empirical Software Engineering},
  title = {Breaking Bad? Semantic Versioning and Impact of Breaking Changes in Maven Central},
year = {2021}
}

Contract-Based Return-Value Commutativity: Safely exploiting contract-based commutativity for faster serializable transactions Tim Soethout, Tijs van der Storm CWI, Netherlands, Jurgen J. Vinju. AGERE 2021.

A key challenge of designing distributed software systems is maintaining data consistency. We can define data consistency and data isolation guarantees —e.g. serializability– in terms of schedules of atomic reads and writes, but this excludes schedules that would be semantically consistent. Others use manually provided information on "non-conflicting operations" to define guarantees that work for more applications allowing more parallel schedules. To be safe, an engineer might avoid marking operations as non-conflicting, with detrimental effects to efficiency. To be fast, they might mark more non-conflicting operations than is strictly safe. Our goal is to help engineers by automatically deriving commutative operations (using their respective contracts) such that more parallel schedules with global consistency are possible. We define a new general consistency and isolation guarantee named "Return-Value Serializability" to check consistency claims automatically, and we present distributed event processing algorithms that make use of the same "Contract-based Commutativity" information. We validated both the definitions and the algorithms using model-checking with TLA+. Previous work provided evidence that local coordination avoidance such as applied here has a significant positive effect on the performance of distributed transaction systems. Client-centric return-value commutativity promises to hit a sweet spot in design trade-offs for business applications, such as payment systems, that must scale-out while their operations are not embarrassingly parallel and consistency guarantees are of the highest priority. It can also provide design feedback, indicating that some operations will simply not scale together even before a line of code has been written.

@inproceedings{Soethout2021,
 author = {Soethout, Tim and van der Storm, Tijs and Vinju, Jurgen J.},
 title = {Contract-Based Return-Value Commutativity: Safely exploiting contract-based commutativity for faster serializable transactions},
 booktitle = {Proceedings of the 11th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control},
 series = {AGERE 2021},
 year = {2021},
 publisher = {ACM},
}

🏆 Getting Grammars into Shape for Block-based Editors, Mauricio Verano Merino, Tom Beckmann, Tijs van der Storm, Robert Hirschfeld, and Jurgen J. Vinju. SLE 2021. Won the Distinguished Artifact Award.

Block-based programming environments allow users to program by interactively arranging visual jigsaw-like program elements. They have shown to be helpful in several domains, but often require experienced developers for their creation. Previous research investigated the use of language frameworks to generate block-based editors based on grammars, but often the results provided too many, unnecessary kinds of blocks, leading to verbose and less concise environments and also programs. To reduce the number of interactions, we propose the use of a pipeline of transformations to simplify the original grammar, yielding a reduction of the number of (useful) kinds of blocks available in the resulting editors. We show that, up to a certain complexity, our generated block-based editors are significantly improved with respects to a set of observed aesthetic criteria. As such, analyzing and simplifying grammars before generating block-based editors allows us to derive more compact and potentially more usable block-based editors, making reuse of existing grammars through automatic generation feasible.

@InProceedings{Verano2021,
 title                = {Getting Grammars into Shape for Block-based Editors},
 author               = {Mauricio Verano Merino, Tom Beckmann, Tijs van der Storm, Robert Hirschfeld, and Jurgen J. Vinju},
 booktitle            = {Proceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering},
 year                 = 2021,
 month                = oct
}

Modeling with Mocking, Jouke Stoel, Tijs van der Storm and Jurgen J. Vinju. 14th IEEE Conference on Software Testing, Verification and Validation (ICST)

Writing formal specifications often requires users to abstract from the original problem. Especially when verification techniques such as model checking are used. Without applying abstraction the search space the model checker need to traverse tends to grow quickly beyond the scope of what can be checked within reasonable time.The downside of this need to omit details is that it increases the distance to the implementation. Ideally, the created specifications could be used to generate software from (either manually or automatically). But having an incomplete description of the desired system is not enough for this purpose.In this work we introduce the Rebel2 specification language. Rebel2 lets the user write full system specifications in the form of state machines with data without the need to apply abstraction while still preserving the ability to verify non-trivial properties. This is done by allowing the user to forget and mock specifications when running the model checker. The original specifications are untouched by these techniques.We compare the expressiveness of Rebel2 and the effectiveness of mock and forget by implementing two case studies: one from the automotive domain and one from the banking domain. We find that Rebel2 is expressive enough to implement both case studies in a concise manner. Next to that, when performing checks in isolation, mocking can speed up model checking significantly.

@INPROCEEDINGS{Stoel2021,
  author={Stoel, Jouke and Storm, Tijs van der and Vinju, Jurgen},
  booktitle={2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST)},
  title={Modeling with Mocking},
  year={2021},
  pages={59-70},
  doi={10.1109/ICST49551.2021.00018}}

Path-Sensitive Atomic Commit - Local Coordination Avoidance for Distributed Transactions, Tim Soethout, Jurgen J. Vinju and Tijs van der Storm, <Programming> Journal

a
Context Concurrent objects with asynchronous messaging are an increasingly popular way to structure highly available, high performance, large-scale software systems. To ensure data-consistency and support synchronization between objects such systems often use distributed transactions with Two-Phase Locking (2PL) for concurrency control and Two-Phase commit (2PC) as atomic commitment protocol. Inquiry In highly available, high-throughput systems, such as large banking infrastructure, however, 2PL becomes a bottleneck when objects are highly contended, when an object is queuing a lot of messages because of locking.

Approach In this paper we introduce Path-Sensitive Atomic Commit (PSAC) to address this situation. We start from message handlers (or methods), which are decorated with pre- and post-conditions, describing their guards and effect.

Knowledge This allows the PSAC lock mechanism to check whether the effect of two incoming messages at the same time are independent, and to avoid locking if this is the case. As a result, more messages are directly accepted or rejected, and higher overall throughput is obtained.

Grounding We have implemented PSAC for a state machine-based DSL called Rebel, on top of a runtime based on the Akka actor framework. Our performance evaluation shows that PSAC exhibits the same scalability and latency characteristics as standard 2PL/2PC, and obtains up to 1.8 times median higher throughput in congested scenarios.

Importance We believe PSAC is a step towards enabling organizations to build scalable distributed applications, even if their consistency requirements are not embarrassingly parallel.

@article{Soethout2021,
  author    = {Tim Soethout and
               Tijs van der Storm and
               Jurgen J. Vinju},
  title     = {Path-Sensitive Atomic Commit - Local Coordination Avoidance for Distributed
               Transactions},
  journal   = {Art Sci. Eng. Program.},
  volume    = {5},
  number    = {1},
  pages     = {3},
  year      = {2021},
  doi       = {10.22152/programming-journal.org/2021/5/3},
}

2020

Automated Validation of State-Based Client-Centric Isolation with TLA+, Tim Soethout, Tijs van der Storm and Jurgen J. Vinju. ASYDE 2020.

Clear consistency guarantees on data are paramount for the design and implementation of distributed systems. When implementing distributed applications, developers require approaches to verify the data consistency guarantees of an implementation choice. Crooks et al. define a state-based and client-centric model of database isolation. This paper formalizes this state-based model in TLA, reproduces their examples and shows how to model check algorithms with this formalization. The formalized model in TLA enables semi-automatic model checking for different implementation choices for transactional operations and allows checking of conformance to isolation levels. We reproduce examples of the original paper and confirm the isolation guarantees of the combination of the well-known 2-phase locking and 2-phase commit algorithms. Using model checking this formalization can also help finding bugs in incorrect specifications. This improves feasibility of automated checking of isolation guarantees of synthesized synchronization implementations and it provides an environment for experimenting with new designs.

@inproceedings{Soethout2020,,
  author    = {Tim Soethout and Tijs van der Storm and Jurgen J. Vinju},
  title     = {Automated Validation of State-Based Client-Centric Isolation with TLA+},
  booktitle   = {Proceedings of ASYDE},
  year      = {2020},
}

Bacatá: Notebooks for DSLs, Almost for Free Mauricio Verano Merino, Jurgen J. Vinju and Tijs van der Storm. <Programming> Journal, 2020

Context: Computational notebooks are a contemporary style of literate programming, in which users can communicate and transfer knowledge by interleaving executable code, output, and prose in a single rich document. A Domain-Specific Language (DSL) is an artificial software language tailored for a particular application domain. Usually, DSL users are domain experts that may not have a software engineering background. As a consequence, they might not be familiar with Integrated Development Environments (IDEs). Thus, the development of tools that offer different interfaces for interacting with a DSL is relevant. Inquiry: However, resources available to DSL designers are limited. We would like to leverage tools used to interact with general purpose languages in the context of DSLs. Computational notebooks are an example of such tools. Then, our main question is: What is an efficient and effective method of designing and implementing notebook interfaces for DSLs? By addressing this question we might be able to speed up the development of DSL tools, and ease the interaction between end-users and DSLs. Approach: In this paper, we present Bacatá, a mechanism for generating notebook interfaces for DSLs in a language parametric fashion. We designed this mechanism in a way in which language engineers can reuse as many language components (e.g., language processors, type checkers, code generators) as possible. Knowledge: Our results show that notebook interfaces generated by Bacatá can be automatically generated with little manual configuration. There are few considerations and caveats that should be addressed by language engineers that rely on language design aspects. The creation of a notebook for a DSL with Bacatá becomes a matter of writing the code that wires existing language components in the Rascal language workbench with the Jupyter platform. Grounding: We evaluate Bacatá by generating functional computational notebook interfaces for three different non-trivial DSLs, namely: a small subset of Halide (a DSL for digital image processing), SweeterJS (an extended version of JavaScript), and QL (a DSL for questionnaires). Additionally, it is relevant to generate notebook implementations rather than implementing them manually. We measured and compared the number of Source Lines of Code (SLOCs) that we reused from existing implementations of those languages. Importance: The adoption of notebooks by novice-programmers and end-users has made them very popular in several domains such as exploratory programming, data science, data journalism, and machine learning. Why are they popular? In (data) science, it is essential to make results reproducible as well as understandable. However, notebooks are only available for GPLs. This paper opens up the notebook metaphor for DSLs to improve the end-user experience when interacting with code and to increase DSLs adoption.

@article{Mauricio2020,
  author    = {Mauricio Verano Merino and
               Jurgen J. Vinju and
               Tijs van der Storm},
  title     = {Bacat{\'{a}}: Notebooks for DSLs, Almost for Free},
  journal   = {The Art, Science, and Engineering of Programming}
  volume    = {4},
  number    = {3},
  pages     = {11},
  year      = {2020},
}

2019

🏆Rascal, 10 years later Paul Klint, Tijs van der Storm and Jurgen J. Vinju. SCAM 2019 (related to the IEEE SCAM Most Influential Paper award for Rascal: A Domain Specific Language for Source Code Analysis and Manipulation

Concrete Syntax with Black Box Parsers Rodin Aarssen, Tijs van der Storm, Jurgen J. Vinju. <Programming> Journal, 2019

Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is error-prone, and meta programs are not resilient to evolution of the structure of such ASTs, requiring invasive, fault-prone change to these programs.
Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta programmer to match and create syntax trees using the actual syntax of the object language. Systems supporting concrete syntax patterns, however, require a concrete grammar of the object language in their own formalism. Creating such grammars is a costly and error-prone process, especially for realistic languages such as Java and C++.
Approach: In this paper we present Concretely, a technique to extend meta programming systems with pluggable concrete syntax patterns, based on external, black box parsers. We illustrate Concretely in the context of Rascal, an open-source meta programming system and language workbench, and show how to reuse existing parsers for Java, JavaScript, and C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST structures to Rascal’s internal data structures. Tympanic allows implementors of Concretely to solve the impedance mismatch between object-oriented class hierarchies in Java and Rascal’s algebraic data types. Both the algebraic data type and AST marshalling code is automatically generated.
Knowledge: The conceptual architecture of Concretely and Tympanic supports the reuse of pre-existing, external parsers, and their AST representation in meta programming systems that feature concrete syntax patterns for matching and constructing syntax trees. As such this opens up concrete syntax pattern matching for a host of realistic languages for which writing a grammar from scratch is time consuming and error-prone, but for which industry-strength parsers exist in the wild.
Grounding: We evaluate Concretely in terms of source lines of code (SLOC), relative to the size of the AST data type and marshalling code. We show that for real programming languages such as C++ and Java, adding support for concrete syntax patterns takes an effort only in the order of dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an order of magnitude of reduction in SLOC compared to manual implementation of the AST data types and marshalling code.
Importance: Meta programming has applications in reverse engineering, reengineering, source code analysis, static analysis, software renovation, domain-specific language engineering, and many others. Processing of syntax trees is central to all of these tasks. Concrete syntax patterns improve the practice of constructing meta programs. The combination of Concretely and Tympanic has the potential to make concrete syntax patterns available with very little effort, thereby improving and promoting the application of meta programming in the general software engineering context.

@article{Aarsen2019,
  author    = {Rodin Aarssen and Jurgen J. Vinju and Tijs van der Storm},
  title     = {Concrete Syntax with Black Box Parsers},
  journal   = {The Art, Science, and Engineering of Programming},
  volume    = {3},
  number    = {3},
  pages     = {15},
  year      = {2019},
}

AlleAlle: Bounded Relational Model Finding with Unbounded Data Jouke Stoel, Tijs van der Storm, Jurgen J. Vinju. SPLASH Onward! 2019

Relational model finding is a successful technique which has been used in a wide range of problems during the last decade. This success is partly due to the fact that many problems contain relational structure which can be explored using relational model finders. Although these model finders allow for the exploration of such structures they often struggle with incorporating the non-relational elements.
In this paper we introduce AlleAlle, a method and language that integrates reasoning on both relational structure and non-relational elements —the data— of a problem. By combining first order logic with Codd’s relational algebra, transitive closure, and optimization criteria, we obtain a rich input language for expressing constraints on both relational and scalar values.
We present the semantics of AlleAlle and the translation of AlleAlle specifications to SMT constraints, and use the off-the-shelf SMT solver Z3 to find solutions. We evaluate AlleAlle by comparing its performance with Kodkod, a state-of-the-art relational model finder, and by encoding a solution to the optimal package resolution problem. Initial benchmarking show that although the translation times of AlleAlle can be improved, the resulting SMT constraints can efficiently be solved by the underlying solver.

@inproceedings{stoel2019,
 author = {Stoel, Jouke and van der Storm, Tijs and Vinju, Jurgen J.},
 title = {AlleAlle: Bounded Relational Model Finding with Unbounded Data},
 booktitle = {Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software},
 series = {Onward! 2019},
 year = {2019},
 location = {Athens, Greece},
 pages = {46--61},
 numpages = {16},
 publisher = {ACM},
}

Static Local Coordination Avoidance for Distributed Objects Tim Soethout, Tijs van der Storm, Jurgen J. Vinju. SPLASH AGERE 2019.

In high-throughput, distributed systems, such as large-scale banking infrastructure, synchronization between actors becomes a bottle-neck in high-contention scenarios. This results in delays for users, and reduces opportunities for scaling such systems. This paper proposes Static Local Coordination Avoidance, which analyzes application invariants at compile time to detect whether messages are independent, so that synchronization at run time is avoided, and parallelism is increased. Analysis shows that in industry scenarios up to 60% of operations are independent. Initial performance evaluation shows that, in comparison to a standard 2-phase commit baseline, throughput is increased, and latency is reduced. As a result, scalability bottlenecks in high-contention scenarios in distributed actor systems are reduced for independent messages.

@inproceedings{Soethout2019,
 author = {Soethout, Tim and van der Storm, Tijs and Vinju, Jurgen J.},
 title = {Static Local Coordination Avoidance for Distributed Objects},
 booktitle = {Proceedings of the 9th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control},
 series = {AGERE 2019},
 year = {2019},
 location = {Athens, Greece},
 pages = {21--30},
 numpages = {10},
 publisher = {ACM},
}

2018

Bacatá: a language parametric notebook generator (tool demo). Mauricio Verano Maurino, Jurgen J. Vinju and Tijs van der Storm. (SLE)

Interactive notebooks allow people to communicate and collaborate through a single rich document that might include live code, multimedia, computed results, and documentation, which is persisted as a whole for reproducibility. Notebooks are currently being used extensively in domains such as data science, data journalism, and machine learning. However, constructing a notebook interface for a new language requires a lot of effort. In this tool paper, we present Bacatá, a language parametric notebook generator for domain-specific languages (DSL) based on the Jupyter framework. Bacatá is designed so that language engineers may reuse existing language components (such as parsers, code generators, interpreters, etc.) as much as possible. Moreover, we explain the design of Bacatá and how DSL notebooks can be generated with minimum effort in the context of the Rascal meta programming system and language workbench.

@InProceedings{Verano2018,
 title                = {Bacat\'a: A Language Parametric Notebook Generator (Tool Demo)},
 author               = {Verano Merino, Mauricio and Vinju, Jurgen and van der Storm, Tijs},
 pages                = {210--214},
 booktitle            = {Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering},
 year                 = 2018,
 month                = jan
}

To-many or To-one? All-in-one! Efficient Purely Functional Multi-Maps with Type-Heterogeneous Hash-Tries. Michael Steindorfer and Jurgen J. Vinju. Programming Languages Design and Implementation (PLDI).

An immutable multi-map is a many-to-many thread-friendly map data structure with expected fast insert and lookup oper- ations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming languages, or in static analysis of object-oriented systems. Collection data structures are assumed to carefully balance execution time of operations with memory consumption characteristics and need to scale gracefully from a few elements to multiple gigabytes at least. When processing larger in-memory data sets the overhead of the data structure encoding itself becomes a memory usage bottleneck, dominating the overall performance. In this paper we propose AXIOM, a novel hash-trie data structure that allows for a highly efficient and type-safe multi-map encoding by distinguishing inlined values of singleton sets from nested sets of multi-mappings. AXIOM strictly generalizes over previous hash-trie data structures by supporting fine-grained type-heterogeneous content. We detail the design and optimizations of AXIOM and further compare it against state-of-the-art immutable maps and multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a real world case in static analysis. AXIOM reduces the key-value storage overhead by 1.87 x; with specializing and inlining across collection boundaries it improves by 5.1 x.

@inproceedings{pldi18,,
  author = {Michael Steindorfer and Jurgen Vinju},
  title = {To-many or To-one? All-in-one! Efficcient Purely Functional Multi-Maps with Type-Heterogeneous Hash-Tries},
  booktitle = {Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018},
  publisher = {ACM},
  year = {2018},
}

An Empirical Evaluation of OSGi Dependencies Best Practices in the Eclipse IDE Lina Ochoa, Thomas Degueule and Jurgen J. Vinju. Mining Software Repositories (MSR)

OSGi is a module system and service framework that aims to fill Java's lack of support for modular development. Using OSGi, developers divide software into multiple bundles that declare constrained dependencies towards other bundles. However, there are various ways of declaring and managing such dependencies, and it can be confusing for developers to choose one over another. Over the course of time, experts and practitioners have defined ``best practices'' related to dependency management in OSGi. The underlying assumptions are that these best practices (i) are indeed relevant and (ii) help to keep OSGi systems manageable and efficient. In this paper, we investigate these assumptions by first conducting a systematic review of the best practices related to dependency management issued by the OSGi Alliance and OSGi-endorsed organizations. Using a large corpus of OSGi bundles (1,124 core plug-ins of the Eclipse IDE), we then analyze the use and impact of 6 selected best practices. Our results show that the selected best practices are not widely followed in practice. Besides, we observe that following them strictly reduces classpath size of individual bundles by up to 23\% and results in up to 13\% impact on performance at bundle resolution time. In summary, this paper contributes an initial empirical validation of industry-standard OSGi best practices. Our results should influence practitioners especially, by providing evidence of the impact of these best practices in real-world systems.

@inproceedings{msr17,
  author = {Lina Ochoa and Thomas Degueule and Jurgen J. Vinju},
  title = {An Empirical Evaluation of OSGi Dependencies Best Practices in the Eclipse IDE},
  booktitle = {Proceedings of the 15th International Conference  on Mining Software Repositories},
  publisher = {IEEE},
  year = {2018},
}

2017

Corrigendum: Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions published on 9 December 2015 Davy Landman, Alexander Serebrenik, Eric Bouwers, Jurgen Vinju

During the preparation of the corresponding chapter in Davy Landman's PhD thesis, some minor graphical and statistical discrepancies were found in the paper “Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions.” To support future reproduction and use of this work, we prepared the current erratum, containing several updated figures, a diagnosis of the cause of the errors, and an explanation of the effect on the original paper. None of the issues reported in this erratum influence the conclusions of the original paper.

@Article{corrlandman,
 title                = {Corrigendum to: Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions published on 9 December 2015},
 author               = {Landman, Davy and Serebrenik, Alexander and Bouwers, Eric and Vinju, Jurgen},
 journal              = {Journal of Software: Evolution and Process},
 volume               = {29},
 number               = {10},
 year                 = 2017,
 month                = oct,
 doi                  = {10.1002/smr.1914}
}

Bacatá: a generic notebook generator for DSLs Mauricio Verano Merino, Jurgen J. Vinju, and Tijs van der Storm (DSLDI 2017)

Interactive notebooks, such as provided by the Jupyter platform, are gaining traction in scientific computing, data science, and machine learning. Developing a Jupyter kernel machinery for a new language, however, requires consider- able effort. In this extended abstract, we present Bacatá, a language-parametric bridge between Jupyter and the Rascal language workbench. Reusing existing language compo- nents, such as a parsers, interpreters, Read-Eval-Print Loop (REPLs) and autocomplete, Bacatá generates a Jupyter kernel machinery so that the DSL can be used in notebook form. We sketch the architecture of Bacatá and demonstrate it in action using a DSL for image processing, called Amalga.>

@inproceedings{mauricio17,
  title={Bacat{\'a}: a generic notebook generator for {DSLs}},
  author={Mauricio Verano Merino and Jurgen J. Vinju and Tijs van der Storm},
  year={2017},
  booktitle={Proceedings of the Workshop on Domain-Specific Language Design and Implementation},
}

Manifesto from Dagstuhl Perspectives Workshop 16252 - Engineering Academic Software Allen, Alice ; Aragon, Cecilia ; Becker, Christoph ; Carver, Jeffrey ; Chis, Andrei ; Combemale, Benoit ; Croucher, Mike ; Crowston, Kevin ; Garijo, Daniel ; Gehani, Ashish ; Goble, Carole ; Haines, Robert ; Hirschfeld, Robert ; Howison, James ; Huff, Kathryn ; Jay, Caroline ; Katz, Daniel S. ; Kirchner, Claude ; Kuksenok, Katie ; Lämmel, Ralf ; Nierstrasz, Oscar ; Turk, Matt ; van Nieuwpoort, Rob ; Vaughn, Matthew ; Vinju, Jurgen J.

Software is often a critical component of scientific research. It can be a component of the academic research methods used to produce research results, or it may itself be an academic research result. Software, however, has rarely been considered to be a citable artifact in its own right. With the advent of open-source software, artifact evaluation committees of conferences, and journals that include source code and running systems as part of the published artifacts, we foresee that software will increasingly be recognized as part of the academic process. The quality and sustainability of this software must be accounted for, both a prioro and a posteriori. The Dagstuhl Perspectives Workshop on "Engineering Academic Software" has examined the strengths, weaknesses, risks, and opportunities of academic software engineering. A key outcome of the workshop is this Dagstuhl Manifesto, serving as a roadmap towards future professional software engineering for software-based research instruments and other software produced and used in an academic context. The manifesto is expressed in terms of a series of actionable "pledges" that users and developers of academic research software can take as concrete steps towards improving the environment in which that software is produced.

@Article{allen_et_al:DM:2017:7146,
  author =	{Alice Allen and Cecilia Aragon and Christoph Becker and Jeffrey Carver and Andrei Chis and Benoit Combemale and Mike Croucher and Kevin Crowston and Daniel Garijo and Ashish Gehani and Carole Goble and Robert Haines and Robert Hirschfeld and James Howison and Kathryn Huff and Caroline Jay and Daniel S. Katz and Claude Kirchner and Katie Kuksenok and Ralf L{\"a}mmel and Oscar Nierstrasz and Matt Turk and Rob van Nieuwpoort and Matthew Vaughn and Jurgen J. Vinju},
  title =	,
  pages =	{1--20},
  journal =	{Dagstuhl Manifestos},
  ISSN =	{2193-2433},
  year =	{2017},
  volume =	{6},
  number =	{1},
  editor =	{Alice Allen et al.},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7146},
  URN =		{urn:nbn:de:0030-drops-71468},
  doi =		{10.4230/DagMan.6.1.1},
  annote =	{Keywords: Academic software, Research software, Software citation, Software sustainability}
}

Hoe Zwaar is Licht, a book on the Dutch "Nationale Wetenschaps Agenda" (NWA), 2017, Beatrice de Graaf en Alexander Rinnooy Kan, with contribution of yours truly on pages 311-313 on the question if software could 'evolve itself', Uitgeverij Balans.

🏆 Challenges for Static Analysis of Java Reflection – Literature Review and Empirical Study.Davy Landman, Alexander Serebrenik and Jurgen J. Vinju. ICSE 2017. ACM SIGSOFT Distinguished Paper Award.

The behavior of software using the Java Reflection API is fundamentally hard to predict by analyzing code. Only recently static analysis approaches resolve reflection in the context of a set of unsound yet pragmatic assumptions. In this paper we survey what approaches exist and what their limitations are. We then analyze how real-world Java code uses the Reflection API, and how many Java projects contain code challenging state-of-the-art static analysis. Using a systematic literature review we collected and categorized all known methods of statically approximating reflective Java code. Next to this we constructed a representative corpus of Java systems and collected descriptive statistics of the usage of the Reflection API. We then applied an analysis on the abstract syntax trees of all source code to count code idioms which go beyond the limitation boundaries of static analysis approaches. The resulting data answers the research questions. The corpus, the tool and the results are openly available. We conclude that the need for unsound assumptions to resolve reflection is widely supported. In our corpus, reflection can not be ignored for 78% of the projects. Common challenges for analysis tools such as non-exceptional exceptions, programmatic filtering meta objects, semantics of collections, and dynamic proxies, widely occur in the corpus. For Java Software Engineers prioritizing on robustness, we list tactics to obtain more easy to analyze reflection code, and for static analysis tool builders we provide a list of opportunities to have significant impact on real Java code.

@inproceedings{icse17,
  author = {Davy Landman and Alexander Serebrenik and Jurgen J. Vinju},
  title = {Challenges for Static Analysis of Java Reflection – Literature Review and Empirical Study},
  booktitle = {Proceedings of IEEE International Conference on Software Engineering (ICSE 2017)},
  publisher = {IEEE},
  year = {2017},
  month = may,
}

2016

Making Sense of Source Code. Bits & Chips, May 2016, Techwatch.

@inproceedings{msosc,
  author = "Jurgen J. Vinju",
  title = "Making sense of source code",
  booktitle = {Bits \& Chips},
  year = 2016,
  month = may,
  publisher = "Techwatch",
}

Legacy is Leuk en Leerzaam Automatiseringsgids, April 2016, AG Connect.

@inproceedings{lilel,
 author = {Jurgen J. Vinju},
 title = {Legacy is leuk en leerzaam},
 booktitle = {Automatiseringsgids},
 year = 2016,
 month = apr,
 day = 29,
 publisher = {AG Connect},
}

Solving the Bank with Rebel - On the design of the Rebel specification language and its application inside a bank Joost Bosman, Jouke Stoel, Tijs van der Storm, Jurgen J. Vinju. Industry Track for Software Language Engineering 2016 (ITSLE).

Large organizations like banks suffer from the ever growing complexity of their systems. Evolving the software becomes harder and harder since a single change can affect a much larger part of the system than predicted upfront. A large contributing factor to this problem is that the actual domain knowledge is often implicit, incomplete, or out of date, making it differently cult to reason about the correct behavior of the system as a whole. With Rebel we aim to capture and centralize the domain knowledge and relate it to the running systems. Rebel is a form al specification language for controlling the intrinsic complexity of software for financial enterprise system s. In collaboration with ING , a large Dutch bank, we developed the Rebel specification language and an Integrated Specification Environment (ISE), currently offering automated simulation and checking of Rebel specifications using a Satisiability Modulo Theories (SMT) solver. In this paper we report on our design choices for Rebel, the implem entation and features of the ISE, and our initial observations on the application of Rebel inside the bank.

@inproceedings{rebel16,
  author = {Joost Bosman and Jouke Stoel and Tijs van der Storm and Jurgen J. Vinju},
  title = {Solving the Bank with Rebel - On the design of the Rebel specification language and its application inside a bank},
  booktitle = {Proceedings of the Industry Track for Software Language Engineering (ITSLE)},
  year = 2016,
  publisher = {ACM DL},
}

Lightning Talk: "I solemnly pledge" - A Manifesto for Personal Responsibility in the Engineering of Academic Software. Alice Allen, Cecilia Aragon, Christophe Becker, Jeffrey C. Carver, Andrei Chis, Benoit Combemale, Mike Croucher, Kevin Crowston, Daniel Garijo, Ashish Gehani, Carole Goble, Robert Haines, Robert Hirschfeld, James Howison, Kathryn Huff, Caroline Jay, Daniel S. Katz, Claude Kirchner, Kateryna Kuksenok, Ralf Lämmel, Oscar Nierstrasz, Matthew Turk, Rob van Nieuwpoort, Matthew Vaughn and Jurgen J. Vinju

Software is fundamental to academic research work, both as part of the method and as the result of research. In June 2016 25 people gathered at Schloss Dagstuhl for a week-long Perspectives Workshop and began to develop a manifesto which places emphasis on the scholarly value of academic software and on personal responsibility. Twenty pledges cover the recognition of academic software, the academic software process and the intellectual content of academic software. This is still work in progress. Through this lightning talk, we aim to get feedback and hone these further, as well as to inspire the WSSSPE audience to think about actions they can take themselves rather than actions they want others to take. We aim to publish a more fully developed Dagstuhl Manifesto by December 2016.

@inproceedings{wssspe,
title = {Lightning Talk: ``{I} solemnly pledge'' --- A Manifesto for Personal Responsibility in the Engineering of Academic Software},
author = {Alice Allen and Cecilia Aragon and Christophe Becker and Jeffrey C. Carver and Andrei Chis and Benoit Combemale and Mike Croucher and Kevin Crowston and Daniel Garijo and Ashish Gehani and Carole Goble and Robert Haines and Robert Hirschfeld and James Howison and Kathryn Huff and Caroline Jay and Daniel S. Katz and Claude Kirchner and Kateryna Kuksenok and Ralf L{\"a}mmel and Oscar Nierstrasz and Matthew Turk and Rob van Nieuwpoort and Matthew Vaughn and Jurgen Vinju},
booktitle = {Proceedings of the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)},

🏆 🏆Towards a Universal Code Formatter through Machine Learning. Terence Parr and Jurgen J. Vinju. International Conference on Software Language Engineering (SLE) 2016. Best paper award. See also the technical report which includes all the algorithms. This paper won the best paper award in 2016 and the MIP (Most Influential Paper) award in 2026.

There are many declarative frameworks that allow us to implement code formatters relatively easily for any specific language, but constructing them is cumbersome. The first problem is that "everybody" wants to format their code differently, leading to either many formatter variants or a ridiculous number of configuration options. Second, the size of each implementation scales with a language's grammar size, leading to hundreds of rules. In this paper, we solve the formatter construction problem using a novel approach, one that automatically derives formatters for any given language without intervention from a language expert. We introduce a code formatter called CODEBUFF that uses machine learning to abstract formatting rules from a representative corpus, using a carefully designed feature set. Our experiments on Java, SQL, and ANTLR grammars show that CODEBUFF is efficient, has excellent accuracy, and is grammar invariant for a given language. It also generalizes to a 4th language tested during manuscript preparation.

@inproceedings{sle16,
 author = {Terence Parr and Jurgen J. Vinju},
 title = {Towards a Universal Code Formatter through Machine Learning},
 booktitle = {Proceedings of the 2016 International Conference on Software Language Engineering},
 series = {SLE 2016},
 year = {2016},
 publisher = {ACM},
}

Michael Steindorfer and Jurgen J. Vinju. Towards a Software Product Line of Trie-Based Collections. Generative Programming and Component Engineering (GPCE) 2016.

Collection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain users from using an optimal variant for a given problem. Although a wide range of specialized collections is available for the Java Virtual Machine (JVM), they introduce yet another dependency and complicate user adoption by requiring specific Application Program Interfaces (APIs) incompatible with the standard library. A product line for collection data structures would relieve library designers from optimizing for the general case. Furthermore, a product line allows evolving the potentially large code base of a collection family efficiently. The challenge is to find a small core framework for collection data structures which covers all variations without exhaustively listing them, while supporting good performance at the same time. We claim that the concept of Array Mapped Tries (AMTs) embodies a high degree of commonality in the sub-domain of immutable collection data structures. AMTs are flexible enough to cover most of the variability, while minimizing code bloat in the generator and the generated code. We implemented a Data Structure Code Generator (DSCG) that emits immutable collections based on an AMT skeleton foundation. The generated data structures outperform competitive hand-optimized implementations, and the generator still allows for customization towards specific workloads.

@inproceedings{gpce16,
 author = {Steindorfer, Michael J. and Vinju, Jurgen J.},
 title = {Towards a Software Product Line of Trie-Based Collections},
 booktitle = {Proceedings of the 2016 International Conference on Generative Programming: Concepts and Experiences},
 series = {GPCE 2016},
 year = {2016},
 publisher = {ACM},
}

Mark Hills, Paul Klint and Jurgen J. Vinju. Enabling PHP software engineering research in Rascal Science of Computer Programming, 2016.

Today, PHP is one of the most popular programming languages, and is commonly used in the open source community and in industry to build large application frameworks and web applications. In this paper, we discuss our ongoing work on PHP AiR, a framework for PHP Analysis in Rascal. PHP AiR is focused especially on program analysis and empirical software engineering, and is being used actively and effectively in work on evaluating PHP feature usage and system evolution, on program analysis for refactoring and security validation, and on source code metrics. We describe the requirements and design decisions for PHP AiR, summarize current research using PHP AiR, discuss lessons learned, and briefly sketch future work.`

@article{Hills2016,
title = "Enabling \{PHP\} software engineering research in Rascal ",
journal = "Science of Computer Programming ",
year = "2016",
issn = "0167-6423",
doi = "http://dx.doi.org/10.1016/j.scico.2016.05.003",
url = "http://www.sciencedirect.com/science/article/pii/S0167642316300296",
author = "Mark Hills and Paul Klint and Jurgen J. Vinju",
}

🏆Michael Steindorfer and Jurgen J. Vinju. Performance Modeling of Maximal Sharing - Experience Report 7th ACM/SPEC International Conference on Performance Engineering (ICPE) 2016. Best paper award.

t is noticeably hard to predict the effect of optimization strategies in Java without implementing them. “Maximal sharing” (a.k.a. “hash-consing”) is one of these strategies that may have great benefit in terms of time and space, or may have detrimental overhead. It all depends on the redundancy of data and the use of equality. We used a combination of new techniques to predict the impact of maximal sharing on existing code: Object Re- dundancy Profiling (ORP) to model the effect on memory when sharing all immutable objects, and Equals-Call Profil- ing (ECP) to reason about how removing redundancy impacts runtime performance. With comparatively low effort, using the MAximal SHaring Oracle (MASHO), a prototype pro- filer based on ORP and ECP, we can uncover optimization opportunities that otherwise would remain hidden. We report on the experience of applying MASHO to real and complex case: we conclude that ORP and ECP combined can accurately predict gains and losses of maximal sharing, and also that (by isolating variables) a cheap predictive model can sometimes provide more accurate information than an expensive experiment can.

@inproceedings{michi16,
 author = {Michael Steindorfer and Jurgen J. Vinju.},
 title = {Performance Modeling of Maximal Sharing},
 booktitle = {7th ACM/SPEC International Conference on Performance Engineering (ICPE)},
 year = 2016,
}

Harald Altinger, Yanja Dajsuren, Franz Wotawa, Jurgen Vinju and Sebastian Siegl. On Error-Class Distribution in Automotive Model-Based Software. SANER 2016/

Software fault prediction promises to be a powerful tool in supporting test engineers upon their decision where to define testing hotspots. However, there are limitations on a cross project prediction and a lack of reports upon application to industrial software, as well as the power of metrics to represent bugs. In this paper, we present a novel analysis based upon faults discovered in model-based automotive software projects and their relationship to metrics used to perform fault prediction. Using our previously released dataset on software metrics, we report bug classes discovered during heavy testing of those automotive software. As the software has been developed following strict coding and development guidelines, we present the results based on a comparison between the discovered error classes and those which might derive a reduced potential error set. Using the three projects from our dataset we determine if any of these bug classes are project specific.

@inproceedings{saner16,
  title = {On Error-Class Distribution in Automotive Model-Based Software},
  author = {Harald Altinger and Yanja Dajsuren and Franz Wotawa and Jurgen Vinju and Sebastian Siegl},
  booktitle = {IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER},
  year = 2016,
}

Davy Landman, Alexander Serebrenik, Eric Bouwers and Jurgen J. Vinju. Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions Journal of Software: Evolution and Process. 2016.

Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation. We conducted an extensive literature study of the CC/SLOC correlation results. Next, we tested correlation on large Java (17.6 M methods) and C (6.3 M functions) corpora. Our results show that linear correlation between SLOC and CC is only moderate as caused by increasingly high variance. We further observe that aggregating CC and SLOC as well as performing a power transform improves the correlation.

@ARTICLE{Landman2015,
  author = { Davy Landman and Alexander Serebrenik and Eric Bouwers and Jurgen J. Vinju },
  title = { {Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions} },
  journal = { Journal of Software: Evolution and Process },
  year = { 2016 },
}

2015

Towards multilingual programming environments Tijs van der Storm and Jurgen Vinju. Science of Computer Programming. Volume 97, issue 1. 2015.

Software projects consist of different kinds of artifacts: build files, configuration files, markup files, source code in different software languages, and so on. At the same time, however, most integrated development environments (IDEs) are focused on a single (programming) language. Even if a programming environment supports multiple languages (e.g., Eclipse), IDE features such as cross-referencing, refactoring, or debugging, do not often cross language boundaries. What would it mean for programming environment to be truly multilingual? In this short paper we sketch a vision of a system that integrates IDE support across language boundaries. We propose to build this system on a foundation of unified source code models and metaprogramming. Nevertheless, a number of important and hard research questions still need to be addressed.

@article{multilingual,
 author = {van der Storm, Tijs and Vinju, Jurgen J.},
 title = {Towards Multilingual Programming Environments},
 journal = {Sci. Comput. Program.},
 issue_date = {January 2015},
 volume = {97},
 number = {P1},
 month = jan,
 year = {2015},
 issn = {0167-6423},
 pages = {143--149},
 numpages = {7},
 publisher = {Elsevier},
}

Bas Basten, Jeroen van den Bos, Mark Hills, Paul Klint, Arnold Lankamp, Bert Lisser, Atze van der Ploeg, Tijs van der Storm, Jurgen Vinju. Modular Language Implementation in Rascal: Experience Report. In: Science of Computer Programming. Elsevier 2015.

All software evolves, and programming languages and programming language tools are no exception. And just like in ordinary software construction, modular implementations can help ease the process of changing a language implementation and its dependent tools. However, the syntactic and semantic dependencies between language features make this a challenging problem. In this paper we detail how programming languages can be implemented in a modular fashion using the Rascal meta-programming language. Rascal supports extensible definition of concrete syntax, abstract syntax and operations on concrete and abstract syntax trees like matching, traversal and transformation. As a result, new language features can be added without having to change existing code. As a case study, we detail our solution of the LDTA’11 Tool Challenge: a modular implementation of Oberon-0, a relatively simple imperative programming language. The approach we sketch can be applied equally well to the implementation of domain-specific languages.

@article{rascal2015,
title = "Modular language implementation in Rascal – experience report ",
journal = "Science of Computer Programming ",
volume = "114",
number = "",
pages = "7 - 19",
year = "2015",
author = "Bas Basten and Jeroen van den Bos and Mark Hills and Paul Klint and Arnold Lankamp and Bert Lisser and Atze van der Ploeg and Tijs van der Storm and Jurgen Vinju",
publisher = {Elsevier},
}

Cleverton Hentz and Jurgen J. Vinju and Anamaria Martins Moreira. Reducing the Cost of Grammar-Based Testing Using Pattern Coverage ICTSS 2015

@inproceedings{hentz15,
  author    = {Cleverton Hentz and
               Jurgen J. Vinju and
               Anamaria Martins Moreira},
  title     = {Reducing the Cost of Grammar-Based Testing Using Pattern Coverage},
  booktitle = {Testing Software and Systems - 27th {IFIP} {WG} 6.1 International
               Conference, {ICTSS} 2015, Sharjah and Dubai, United Arab Emirates,
               November 23-25, 2015, Proceedings},
  pages     = {71--85},
  year      = {2015},
}

Bas Basten, Mark Hills, Paul Klint, Davy Landman, Ashim Shahi, Michael Steindorfer and Jurgen J. Vinju.M3: a General Model for Code Analytics in Rascal SWAN 2015.

This short paper introduces M3, a simple and extensible model for capturing facts about source code for future analysis. M3 is a core part of the standard library of the Rascal meta programming language. We motivate it, position it to related work and detail the key design aspects.

@INPROCEEDINGS{Basten2015,
  title = { {$M^3$: a General Model for Code Analytics in Rascal} },
  author = { Bas Basten and Mark Hills and Paul Klint and Davy Landman and Ashim Shahi and Michael Steindorfer and Jurgen Vinju },
  booktitle = { Proceedings of the first International Workshop on Software Analytics, SWAN },
  year = { 2015 },
}

Di Ruscio, Davide Kolovos, Dimitrios S. Korkontzelos, Ioannis Matragkas, Nicholas and Vinju, Jurgen. OSSMETER: A Software Measurement Platform for Automatically Analysing Open Source Software Projects ESEC/FSE 2015 Tool Demonstrations Track.

@INPROCEEDINGS{,
     author = {Di Ruscio, Davide and Kolovos, Dimitrios S. and Korkontzelos, Ioannis and Matragkas, Nicholas and Vinju, Jurgen},
      title = {OSSMETER: A Software Measurement Platform for Automatically Analysing Open Source Software Projects},
  booktitle = {ESEC/FSE 2015 Tool Demonstrations Track},
       year = {2015}
}

Almeida, B., Ananiadou, S., Bagnato, A., Barbero, A. B., Di Rocco, J., Di Ruscio, D., Kolovos, D. S., Korkontzelos, I., Hansen, S., Malo, P., Matragkas, N., Paige, R. F. and Vinju, J. OSSMETER: Automated Measurement and Analysis of Open Source Software .In: Proceedings of the Projects Showcase at the Software Technologies: Applications and Foundations 2015 (STAF 2015)

Michael Steindorder and Jurgen J. Vinju, Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections OOPSLA 2015.

The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3– 6.7 x) and equality checking (3–25.4 x).

@inproceedings{steindorfer2015,
  title = {Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections}
  author = {Michael Steindorder and Jurgen J. Vinju}.
  year = 2015,
  booktitle = {Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)},
  editor = {Patrick Eugster},
}

2014

Magiel Bruntink and Jurgen J. Vinju", Looking Towards a Future where Software is Controlled by the Public (and not the other way around). ERCIM News 99, 2014.

Nowadays, software has a ubiquitous presence in everyday life and this phenomenon gives rise to a range of challenges that affect both individuals and society as a whole. In this article we argue that in the future, the domain of software should no longer belong to technical experts and system integrators alone. Instead it should transition to a firmly engaged public domain, similar to city planning, social welfare and security. The challenge that lies at the heart of this problem is the ability to understand, on a technical level, what all the different software actually is and what it does with our information. Read more.

@article{ercim991,
  author = {Magiel Bruntink and Jurgen J. Vinju},
  title = {Looking Towards a Future where Software is Controlled by the Public (and not the other way around)}
  journal = {ERCIM News}
  issue = 99,
  year = 2014,
}

Anthony Cleve and Jurgen J. Vinju, Software Quality - Introduction to the Special Theme. ERCIM News 99.

The introduction of fast and cheap computer and networking hardware enables the spread of software. Software, in a nutshell, represents an unprecedented ability to channel creativity and innovation. The joyful act of simply writing computer programs for existing ICT infrastructure can change the world. We are currently witnessing how our lives can change rapidly as a result, at every level of organization and society and in practically every aspect of the human condition: work, play, love and war. The act of writing software does not imply an understanding of the resulting creation. We are surprised by failing software (due to bugs), the inability of rigid computer systems to "just do what we want", the loss of privacy and information security, and last but not least, the million euro software project failures that occur in the public sector. These surprises are generally not due to negligence or unethical behaviour but rather reflect our incomplete understanding of what we are creating. Our creations, at present, are all much too complex and this lack of understanding leads to a lack of control. The introduction of fast and cheap computer and networking hardware enables the spread of software. Software, in a nutshell, represents an unprecedented ability to channel creativity and innovation. The joyful act of simply writing computer programs for existing ICT infrastructure can change the world. We are currently witnessing how our lives can change rapidly as a result, at every level of organization and society and in practically every aspect of the human condition: work, play, love and war.

@article{ercim991,
  author = {Anthony Cleve and Jurgen J. Vinju},
  title = {Software Quality - Introduction to the Special Theme},
  journal = {ERCIM News}
  issue = 99,
  year = 2014,
}

Michael Steindorfer and Jurgen J. Vinju, Code Specialization for Memory Efficient Hash Tries (Short Paper). GPCE 2014, Vasteras, Sweden.

The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming. In this paper we present a product family of hash tries. We gen- erate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation. A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective.

@inproceedings{gpce14,
 author = {Steindorfer, Michael J. and Vinju, Jurgen J.},
 title = {Code Specialization for Memory Efficient Hash Tries (Short Paper)},
 booktitle = {Proceedings of the 2014 International Conference on Generative Programming: Concepts and Experiences},
 series = {GPCE 2014},
 year = {2014},
 isbn = {978-1-4503-3161-6},
 location = {V\&\#228;ster\&\#229;s, Sweden},
 pages = {11--14},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/2658761.2658763},
 doi = {10.1145/2658761.2658763},
 acmid = {2658763},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Code generation, Hash trie, Immutability, Memory optimization, Performance, Persistent data structure, Specialization},
}

Mark Hills, Paul Klint and Jurgen J. Vinju, Static, Lightweight Includes Resolution for PHP. ASE 2014, Vasteras, Sweden.

Dynamic languages include a number of features that are challenging to model properly in static analysis tools. In PHP, one of these features is the include expression, where an arbitrary expression provides the path of the file to include at runtime. In this paper we present two complementary analyses for statically resolving PHP includes, one that works at the level of individual PHP files and one targeting PHP programs, possibly consisting of multiple scripts. To evaluate the eﬀectiveness of these analyses we have applied the first to a corpus of 20 open-source systems, totaling more than 4.5 million lines of PHP, and the second to a number of programs from a subset of these systems. Our results show that, in many cases, includes can be either resolved to a speciﬁc ﬁle or a small subset of possible files, enabling better IDE features and more advanced program analysis tools for PHP.

@inproceedings{ase2014,
 author = {Hills, Mark and Klint, Paul and Vinju, Jurgen J.},
 title = {Static, Lightweight Includes Resolution for PHP},
 booktitle = {Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering},
 series = {ASE '14},
 year = {2014},
 isbn = {978-1-4503-3013-8},
 location = {Vasteras, Sweden},
 pages = {503--514},
 numpages = {12},
 doi = {10.1145/2642937.2643017},
 acmid = {2643017},
 publisher = {ACM},
 address = {New York, NY, USA},
}

OSSMETER Deliverable D3.2 – Report on Source Code Activity Metrics. EU FP7 STREP Project Deliverable for OSSMETER.

OSSMETER Deliverable D3.3 – Language Agnostic Source Code Quality Analysis. EU FP7 STREP Project Deliverable for OSSMETER.

OSSMETER Deliverable D3.4 – Language-Specific Source Code Quality Analysis. EU FP7 STREP Project Deliverable for OSSMETER.

Davy Landman, Alexander Serebrenik and J.J. Vinju, Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods. in 30th IEEE International Conference on Software Maintenance and Evolution, ICSME 2014, 2014.

Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation. We test this claim by studying a corpus of 17.8M methods in 13K open-source Java projects. Our results show that direct linear correlation between SLOC and CC is only moderate, as caused by high variance. We observe that aggregating CC and SLOC over larger units of code improves the correlation, which explains reported results of strong linear correlation in literature. We suggest that the primary cause of correlation is the aggregation. Our conclusion is that there is no strong linear correlation between CC and SLOC of Java methods, so we do not conclude that CC is redundant with SLOC. This conclusion contradicts earlier claims from literature, but concurs with the widely accepted practice of measuring of CC next to SLOC.

@INPROCEEDINGS{Landman2014,
  author = { Davy Landman and Alexander Serebrenik and Jurgen Vinju },
  title = { {Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods} },
  booktitle = { 30th IEEE International Conference on Software Maintenance and
  Evolution, ICSME 2014 },
  year = { 2014 },
  datalink = { http://homepages.cwi.nl/~landman/icsme2014/ },
}

2013

OSSMETER DeliverableD3.1 – Report on Domain Analysis of OSS Quality Attributes. EU FP7 STREP Project Deliverable for OSSMETER.

oftware projects consist of different kinds of artifacts: build files, configuration files, markup files, source code in different software languages, and so on. At the same time, however, most integrated development environments (IDEs) are focused on a single (programming) language. Even if a programming environment supports multiple languages (e.g., Eclipse), IDE features such as cross-referencing, refactoring, or debugging, do not often cross language boundaries. What would it mean for programming environment to be truly multilingual? In this short paper we sketch a vision of a system that integrates IDE support across language boundaries. We propose to build this system on a foundation of unified source code models and metaprogramming. Nevertheless, a number of important and hard research questions still need to be addressed.

cle{vanderStorm2013,
title = "Towards multilingual programming environments ",
journal = "Science of Computer Programming ",
issn = "0167-6423",
doi = "http://dx.doi.org/10.1016/j.scico.2013.11.041",
url = "http://www.sciencedirect.com/science/article/pii/S0167642313003341",
author = "Tijs van der Storm and Jurgen Vinju",
}

Anastasia Izmaylova, Paul Klint, Ashim Shahi and Jurgen J. Vinju. M3: An Open Model For Measuring Code Artifacts BENEVOL 2013.

In the context of the EU FP7 project ``OSSMETER'' we are developing an infra-structure for measuring source code. The goal of OSSMETER is to obtain insight in the quality of open-source projects from all possible perspectives, including product, process and community. This is a "white paper" on M3, a set of code models, which should be easy to construct, easy to extend to include language specifics and easy to consume to produce metrics and other analyses. We solicit feedback on its usability.

@techreport{21830,
author       = {Izmaylova, A. and Klint, P. and Shahi, A. and Vinju, J. J.},
title        = {M3: {An} {Open} {Model} {For} {Measuring} {Code} {Artifacts}},
series       = {BENEVOL},
year         = {2013},
month        = {December},
number       = {arXiv-1312.1188},
publisher    = {Cornell University Library},
institution  = {CWI},
url          = {http://arxiv.org/abs/1312.1188},
}

Davy Landman, Paul Klint, and Jurgen J. Vinju Exploring the Limits of Domain Model Recovery

We are interested in re-engineering families of legacy applications towards using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting domain knowledge from the source code of legacy applications? Reverse engineering domain knowledge from source code is sometimes considered very hard or even impossible. Is it also difficult for "modern legacy systems"? In this paper we select two open-source applications and answer the following research questions: which parts of the domain are implemented by the application, and how much can we manually recover from the source code? To explore these questions, we compare manually recovered domain models to a reference model extracted from domain literature, and measured precision and recall. The recovered models are accurate: they cover a significant part of the reference model and they do not contain much junk. We conclude that domain knowledge is recoverable from "modern legacy" code and therefore domain model recovery can be a valuable component of a domain re-engineering process.

@INPROCEEDINGS{limits,
  author = { Paul Klint and Davy Landman and Jurgen Vinju },
  title = {Exploring the Limits of Domain Model Recovery},
  booktitle = { 29th IEEE International Conference on Software Maintenance (ICSM)},
  year = { 2013 },
}

Ali Afroozeh, Mark van den Brand, Adrian Johnstone, Elizabeth Scott and Jurgen J. Vinju. Safe Specification of Operator Precedence Rules SLE 2013.

In this paper we present an approach to specifying opera- tor precedence based on declarative disambiguation constructs and an implementation mechanism based on grammar rewriting. We identify a problem with existing generalized context-free parsing and disambigua- tion technology: generating a correct parser for a language such as OCaml using declarative precedence specification is not possible without resorting to some manual grammar transformation. Our approach provides a fully declarative solution to operator precedence specification for context-free grammars, is independent of any parsing technology, and is safe in that it guarantees that the language of the resulting grammar will be the same as the language of the specification grammar. We evaluate our new approach by specifying the precedence rules from the OCaml reference manual against the highly ambiguous reference grammar and validate the output of our generated parser.

@inproceedings{sle2013-1,
  title = {Safe Specification of Operator Precedence Rules},
  author = {Ali Afroozeh and Mark van den Brand and Adrian Johnstone and Elizabeth Scott and Jurgen J. Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2013,
  publisher = {Springer},
  series = {LNCS},
}

Anastasia Izmaylova and Jurgen J. Vinju. A Modular Language Parametric Framework for Type Constraint Based Refactorings. (DRAFT).

Refactoring tools are among the most desirable in the programmer's toolbox. Any refactoring tool -specific for a particular language and for a specific kind of refactoring- represents a considerable investment. At an increasing rate new languages are introduced, and new features are introduced to existing languages. The development of refactoring tools is forced to keep with this evolution. The extension of a general purpose language like Java with generics is a good example that requires both adaptations to existing refactoring tools, as well as the introduction of new refactoring tools specific for generics. We propose a modular language-parametric framework, called "TyMoRe" (TYpe-related MOdular REfactoring), for constraint-based type refactorings. It enables reuse between languages and reuse between different refactorings for the same language. The framework uses functional monadic composition to achieve the desired modularity and compositionality. The effectiveness of TyMoRe is demonstrated by our prototype of the ``Infer Generic Type Arguments'' refactoring for a large subset of Java.

This article is an unpublished draft.

Mark Hills, Paul Klint and Jurgen J. Vinju. An empirical study of PHP feature usage. Proceedings of the International Symposium in Software Testing and Analysis (ISSTA), July 2013. Lugano Switserland.

PHP is one of the most popular languages for server-side application development. The language is highly dynamic, providing programmers with a large amount of flexibility. However, these dynamic features also have a cost, making it difficult to apply traditional static analysis techniques used in standard code analysis and transformation tools. As part of our work on creating analysis tools for PHP, we have conducted a study over a significant corpus of open-source PHP systems, looking at the sizes of actual PHP programs, which features of PHP are actually used, how often dynamic features appear, and how distributed these features are across the files that make up a PHP website. We have also looked at whether uses of these dynamic features are truly dynamic or are, in some cases, statically understandable, allowing us to identify specific patterns of use which can then be taken into account to build more precise tools. We believe this work will be of interest to creators of analysis tools for PHP, and that the methodology we present can be leveraged for other dynamic languages with similar features.

@inproceedings{issta,
  author = {Hills, Mark and Klint, Paul and Vinju, Jurgen J.},
  title = {An Empirical Study of PHP feature usage: a static analysis perspective},
  booktitle = {ISSTA},
  editor = {Pezz\`e, Mauro and Harman, Mark},
  pages = {325-335},
  publisher = {ACM},
  year = 2013,
}

2012

Mark Hills, Paul Klint and Jurgen J. Vinju. Scripting a refactoring with Rascal and Eclipse. Proceedings of the Fifth Workshop on Refactoring Tools.

@inproceedings{WRT2012,
 author = {Hills, Mark and Klint, Paul and Vinju, Jurgen J.},
 title = {Scripting a refactoring with Rascal and Eclipse},
 booktitle = {Proceedings of the Fifth Workshop on Refactoring Tools},
 series = {WRT '12},
 year = {2012},
 pages = {40--49},
 publisher = {ACM},
}

Mark Hills, Paul Klint and Jurgen Vinju. Program Analysis Scenarios in Rascal. 9th International Workshop on Rewriting Logic and its Applications (WRLA 2012).

Rascal is a meta programming language focused on the implemen- tation of domain-specific languages and on the rapid construction of tools for software analysis and software transformation. In this paper we focus on the use of Rascal for software analysis. We illustrate a range of scenarios for building new software analysis tools through a number of examples, including one showing integration with an existing Maude-based analysis. We then focus on ongoing work on alias analysis and type inference for PHP, showing how Rascal is being used, and sketching a hypothetical solution in Maude. We conclude with a high-level discussion on the commonalities and differences between Rascal and Maude when applied to program analysis.

@inproceedings{wrla12,
  title = "Program Analysis Scenarios in Rascal",
  author = {Mark Hills and Paul Klint and Jurgen J. Vinju},
  booktitle = {9th International Workshop on Rewriting Logic and Its Applications (WRLA 2012)},
  note = {Invited Paper},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  year = 2012
}

Mark Hills, Paul Klint and Jurgen J. Vinju. Meta-Language Support for Type-Safe Access to External Resources. International Conference on Software Language Engineering (SLE).

Meta-programming applications often require access to het- erogenous sources of information, often from different technological spaces (grammars, models, ontologies, databases), that have specialized ways of defining their respective data schemas. Without direct language support, obtaining typed access to this external, potentially changing, informa- tion is a tedious and error-prone engineering task. The Rascal meta- programming language aims to support the import and manipulation of all of these kinds of data in a type-safe manner. The goal is to lower the engineering effort to build new meta programs that combine information about software in unforeseen ways. In this paper we describe built-in language support, so called resources, for incorporating external sources of data and their corresponding data-types while maintaining type safety. We demonstrate the applicability of Rascal resources by example, showing resources for RSF files, CSV files, JDBC-accessible SQL databases, and SDF2 grammars. For RSF and CSV files this requires a type inference step, allowing the data in the files to be loaded in a type-safe manner without requiring the type to be declared in advance. For SQL and SDF2 a direct translation from their respective schema languages into Rascal is instead constructed, providing a faithful translation of the declared types or sorts into equivalent types in the Rascal type system. An overview of related work and a discussion conclude the paper.

@inproceedings{sle2012,
  title = {Meta-Language Support for Type-Safe Access to External Resources},
  author = {Mark Hills and Paul Klint and Jurgen J. Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2012,
  publisher = {Springer},
  series = {LNCS},
}

Jurgen J. Vinju and Michael W. Godfrey. What Does Control Flow Really Look Like? Eyeballing the Cyclomatic Complexity Metric. International Working Conference on Source Code Analysis and Manipulation.

Assessing the understandability of source code remains an elusive yet highly desirable goal for software developers and their managers. While many metrics have been suggested and investigated empirically, the McCabe cyclomatic complexity metric (CC) --- which is based on control flow complexity --- seems to hold enduring fascination within both industry and the research community. However, the CC metric also has obvious limitations. For example, it is easy to produce example code that seems trivial to understand yet has a high CC value; at the same time, one can also produce "spaghetti" code with many GOTOs that has the same CC value as a well-structured alternative. In this work, we explore the causal relationship between CC and understandability through quantitative and qualitative studies, and through thought experiments and discussion. Empirically, we examine eight well-known open source Java systems by grouping the abstract control flow patterns of the methods into equivalence classes and exploring the results. We found several surprising results: first, the number of unique control flow patterns is relatively low; second, CC often does not accurately reflect the intricacies of Java control flow; and third, methods with high CC often have very low entropy, suggesting that they may be relatively easy to understand. These findings appear to challenge the widely-held belief that there is a clear-cut causal relationship between understandability and cyclomatic complexity, and suggest that at the very least CC and similar measures need to be reconsidered and refined if they are to be used as a metric for code understandability.

@inproceedings{cc,
	Author = {Jurgen J. Vinju and Michael W. Godfrey},
	Title = {What does control flow really look like? Eyeballing the Cyclomatic Complexity Metric},
	Booktitle = {Ninth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)},
	Publisher = {IEEE Computer Society},
	Year = {2012},
}

Mark Hills, Paul Klint, Tijs van der Storm and Jurgen J. Vinju. A one-stop-shop for Software Evolution Tool Construction. ERCIM News 2012-88, 2012.

Real problems in software evolution render impossible a fixed, one-size-fits-all approach, and these problems are usually solved by gluing together various tools and languages. Such ad-hoc integration is cumbersome and costly. With the Rascal meta-programming language the Software Analysis and Transformation research group at CWI explores whether it is feasible to develop an approach that offers all necessary meta-programming and visualization techniques in a completely integrated language environment. We have applied Rascal with success in constructing domain specific languages and experimental refactoring and visualization tools.

@article{ERCIM2012,
  author    = {Mark Hills and
	       Paul Klint and
	       Tijs van der Storm and
	       Jurgen J. Vinju},
  title     = {A One-Stop-Shop for Software Evolution Tool Construction},
  journal   = {ERCIM News},
  volume    = {2012},
  number    = {88},
  year      = {2012},
  ee        = {http://ercim-news.ercim.eu/en88/special/a-one-stop-shop-for-software-evolution-tool-construction},
  bibsource = {DBLP, http://dblni-trier.de}
}

2011

Mark Hills, Paul Klint, and Jurgen J. Vinju. A case of visitor versus interpreter pattern. In Proceedings of the 49th International Conference on Objects, Models, Components and Patterns, TOOLS, 2011.

We compare the Visitor pattern with the Interpreter pattern,investigating a single case in point for the Java language. We have produced and compared two versions of an interpreter for a programming language. The first version makes use of the Visitor pattern. The second version was obtained by using an automated refactoring to transform uses of the Visitor pattern to uses of the Interpreter pattern. We compare these two nearly equivalent versions on their maintenance characteristics and execution efficiency. Using a tailored experimental research method we can highlight differences and the causes thereof. The contributions of this paper are that it isolates the choice between Visitor and Interpreter in a realistic software project and makes the difference experimentally observable.

@inproceedings{TOOLS2011,
  title = {A Case of Visitor versus Interpreter Pattern},
  author = {Mark Hills and Paul Klint and Jurgen J. Vinju},
  year = {2011},
  booktitle = {Proceedings of the 49th International Conference on Objects, Models, Components and Patterns},
  series = {TOOLS},
}

Jeroen van den Bos, Mark Hills, Paul Klint, Tijs van der Storm, and Jurgen J. Vinju. Rascal: From Algebraic Specification to Meta-Programming AMMSE 2011, EPTCS Volume 56, pp 15-32, 2011.

Algebraic specification has a long tradition in bridging the gap between specification and programming by making specifications executable. Building on extensive experience in designing, implementing and using specification formalisms that are based on algebraic specification and term rewriting (namely Asf and Asf+Sdf), we are now focusing on using the best concepts from algebraic specification and integrating these into a new programming language: Rascal. This language is easy to learn by non-experts but is also scalable to very large meta-programming applications. We explain the algebraic roots of Rascal and its main application areas: software analysis, software transformation, and design and implementation of domain-specific languages. Some example applications in the domain of Model-Driven Engineering (MDE) are described to illustrate this.

@Inproceedings{EPTCS56.2,
  author    = "van den Bos, Jeroen and Hills, Mark and Klint, Paul and van der Storm, Tijs and Vinju, Jurgen J.",
  year      = "2011",
  title     = "Rascal: From Algebraic Specification to Meta-Programming",
  editor    = "Dur\'an, Francisco and Rusu, Vlad",
  booktitle = "Proceedings Second International Workshop on Algebraic Methods in Model-based Software Engineering (AMMSE)",
  series    = "Electronic Proceedings in Theoretical Computer Science",
  volume    = "56",
  publisher = "Open Publishing Association",
  pages     = "15-32",
}

Bas Basten, Paul Klint, and Jurgen Vinju. Ambiguity detection: Scaling to scannerless. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011.

Static ambiguity detection would be an important aspect of language workbenches for textual software languages. The challenge is that automatic ambiguity detection of context-free grammars is undecidable. Sophisticated approximations and optimizations do exist, but these do not scale to grammars for so-called "scannerless parsers", as of yet. We extend previous work on ambiguity detection for context-free grammars to cover disambiguation techniques that are typical for scannerless parsing, such as longest match and reserved keywords. This paper contributes a new algorithm for ambiguity detection in character-level grammars, a prototype implementation of this algorithm and validation on several real grammars. The total run-time of ambiguity detection for character-level grammars for languages such as C and Java is dramatically reduced by several orders of magnitude, without loss of precision. The result is that ambiguity detection for realistics grammars can be done efficiently and may now become a tool in language workbenches.

@inproceedings{sle2,
  title = {Ambiguity Detection: Scaling to Scannerless},
  author = {Bas Basten and Paul Klint and Jurgen Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2011,
  publisher = {Springer},
  series = {LNCS},
}

Bas Basten and Jurgen Vinju. Parse forest diagnostics with Dr. Ambiguity. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011.

In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity.
We first analyze the diversity of ambiguities in grammars for programming languages and the diversity of solutions to these ambiguities. Then we introduce Dr. Ambiguity: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5.

@inproceedings{sle3,
  title = {Parse Forest Diagnostics with Dr. Ambiguity},
  author = {Bas Basten and Jurgen Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2011,
  publisher = {Springer},
  series = {LNCS},
}

Mark Hills, Paul Klint, and Jurgen Vinju. RLSrunner: Linking Rascal with K for Program Analysis. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011.

The Rascal meta-programming language provides a number of features supporting the development of program analysis tools. However, sometimes the analysis to be developed is already implemented by another system. In this case, Rascal can provide a useful front-end for this system, handling the parsing of the input program, any transformation (if needed) of this program into individual analysis tasks, and the display of the results generated by the analysis. In this paper we describe a tool, RLSRunner, which provides this integration with static analysis tools defined using the K framework, a rewriting-based framework for defining the semantics of programming languages.

@inproceedings{sle1,
  title = {RLSRunner: Linking Rascal with K for Program Analysis},
  author = {Mark Hills and Paul Klint and Jurgen Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2011,
  publisher = {Springer},
  series = {LNCS},
}

2010

Stijn de Gouw, Frank de Boer, and Jurgen Vinju. Prototyping a tool environment for run-time assertion checking in jml with communication histories. In 12th Workshop on Formal Techniques for Java-like Programs, 2010.

In this paper we present prototype tool-support for the run-time assertion checking of the Java Modeling Language (JML) extended with communication histories specified by attribute grammars. Our tool suite integrates Rascal, a meta programming language and ANTLR, a popular parser generator. Rascal instantiates a generic model of history updates for a given Java program annotated with history specifications. ANTLR is used for the actual evaluation of history assertions.

@inproceedings{FTfJP2010,
  Author = {Stijn de Gouw and Frank de Boer and Jurgen Vinju},
  Booktitle = {12th Workshop on Formal Techniques for Java-like Programs},
  Title = {Prototyping a tool environment for run-time assertion checking in JML with Communication Histories},
  Year = {2010}}

Diego Ordóñez Camacho, Kim Mens, Mark van den Brand, and Jurgen Vinju. Automated Generation of Program Translation and Verification Tools using Annotated Grammars. Science of Computer Programming, 72(1):3-20, jan 2010.

Automatically generating program translators from source and target language specifications is a non-trivial problem. In this paper we focus on the problem of automating the process of building translators between operations languages, a family of DSLs used to program satellite operations procedures. We exploit their similarities to semi-automatically build transformation tools between these DSLs. The input to our method is a collection of annotated context-free grammars. To simplify the overall translation process even more, we also propose an intermediate representation common to all operations languages. Finally, we discuss how to enrich our annotated grammars model with more advanced semantic annotations to provide a verification system for the translation process. We validate our approach by semi-automatically deriving translators between some real world operations languages, using the prototype tool which we implemented for that purpose.

@article{SCP2010,
	Title = {Automated Generation of Program Translation and Verification Tools using Annotated Grammars},
	Author = {Diego Ord\`o\~nez Camacho and Kim Mens and Mark van den Brand and Jurgen Vinju},
	Doi = {http://dx.doi.org/10.1016/j.scico.2009.10.003},
	Journal = {Science of Computer Programming},
	Publisher = {Elsevier}
	Month = {jan},
	Number = {1},
	Pages = {3-20},
	Volume = {72},
	Year = {2010},
}

Paul Klint, Tijs van der Storm, and Jurgen Vinju. On the Impact of DSL Tools on the Maintainability of Language Implementations. In Proceedings of the tenth workshop on Language Descriptions Tools and Applications, 2010.

Does the use of DSL tools improve the maintainability of language implementations compared to implementations from scratch? We present empirical results on aspects of maintainability of six implementations of the same DSL using different languages (Java, JavaScript, C#) and DSL tools (ANTLR, OMeta, Microsoft “M”). Our evaluation indicates that the maintainability of language implementations is indeed higher when constructed using DSL tools.

@inproceedings{ldta2010,
  Author = {Paul Klint and Tijs van der Storm and Jurgen Vinju},
  Booktitle = {Proceedings of the tenth workshop on Language Descriptions Tools and Applications (LDTA)},
  Title = {On the Impact of DSL tools on the Maintainability of Language Implementations.},
	Series = {Electronic Notes in Theoretical Computer Science},
  Publisher = {Elsevier}
  Year = {2010}
}

Vincent Lussenburg, Tijs van der Storm, Jurgen J. Vinju, and Jos Warmer. Mod4j: A Qualitative Case Study of Model-driven Software Development. In Dorina Petriu, Nicolas Rouquette, and Øystein Haugen, editors, Model Driven Engineering Languages and Systems, 13th International Conference, MODELS 2010, Oslo, Norway, October 3-8, 2010. Proceedings, Lecture Notes in Computer Science. Springer, 2010.

Model-driven software development (MDSD) has been on the rise over the past few years and is becoming more and more mature. However, evaluation in real-life industrial context is still scarce. In this paper, we present a case-study evaluating the applicability of a state-of-the-art MDSD tool, MOD4J, a suite of domain specific languages (DSLs) for developing administrative enterprise applications. MOD4J was used to partially rebuild an industrially representative application. This implementation was then compared to a base implementation based on elicited success criteria. Our evaluation leads to a number of recommendations to improve MOD4J. We conclude that having extension points for hand-written code is a good feature for a model driven software development environment.

@inproceedings{MODELS2010,
	  Author = {Vincent Lussenburg and Tijs {van der Storm} and Jurgen J. Vinju and Jos Warmer},
	  Title = {Mod4J: A Qualitative Case Study of Model-Driven Software Development},
	  Booktitle = {Model Driven Engineering Languages and Systems, 13th International Conference, MODELS 2010, Oslo, Norway, October 3-8, 2010. Proceedings},
	  Editor = {Dorina Petriu and Nicolas Rouquette and {\O}ystein Haugen},
	  Publisher = {Springer},
	  Series = {Lecture Notes in Computer Science},
	  Year = {2010}
  }

Bas Basten and Jurgen Vinju. Faster ambiguity detection by grammar filtering. In Claus Brabrand and Pierre-Etienne Moreau, editors, Proceedings of the tenth workshop on Language Descriptions Tools and Applications, 2010.

Real programming languages are often defined using ambiguous context-free grammars. Some ambiguity is intentional while other ambiguity is accidental. A good grammar development environment should therefore contain a static ambiguity checker to help the grammar engineer. Ambiguity of context-free grammars is an undecidable property. Nevertheless, various imperfect ambiguity checkers exist. Exhaustive methods are accurate, but suffer from non-termination. Termination is guaranteed by approximative methods, at the expense of accuracy. In this paper we combine an approximative method with an exhaustive method. We present an extension to the Noncanonical Unambiguity Test that identifies production rules that do not contribute to the ambiguity of a grammar and show how this information can be used to significantly reduce the search space of exhaustive methods. Our experimental evaluation on a number of real world grammars shows orders of magnitude gains in efficiency in some cases and negligible losses of efficiency in others.

  @inproceedings{LDTA2010,
	  Author = {Bas Basten and Jurgen Vinju},
	  Title = {Faster Ambiguity Detection by Grammar Filtering},
	  Booktitle = {Proceedings of the tenth workshop on Language Descriptions Tools and Applications},
	  Editor = {Claus Brabrand and Pierre-Etienne Moreau},
	  Publisher = {Elsevier Electronic Notes in Theoretical Computer Science},
	  Year = {2010}
  }

2009

Paul Klint, Tijs van der Storm, and Jurgen Vinju. EASY Meta-programming with Rascal. Leveraging the Extract-Analyze-Synthesize Paradigm for Meta-programming. In Proceedings of the 3rd International Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE'09), LNCS. Springer, 2010.

  @inproceedings{RascalGTTSE,
    title = {EASY Meta-Programming with Rascal. Leveraging the Extract-Analyze-SYnthesize Paradigm for Meta-Programming},
    author = {Paul  Klint and Tijs van der Storm and Jurgen J.  Vinju},
    year = {2010},
    booktitle = {Proceedings of the 3rd International Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE'09)},
    location = {Braga, Portugal},
    series = {LNCS},
    publisher = {Springer},
  }

🏆 Paul Klint, Tijs van der Storm, and Jurgen J. Vinju. Rascal: A Domain Specific Language for Source Code Analysis and Manipulation. In Ninth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2009, Edmonton, Alberta, Canada, September 20-21, 2009, pages 168-177. IEEE Computer Society, 2009. (Won the IEEE SCAM MIP award in 2019)

Many automated software engineering tools require tight integration of techniques for source code analysis and manipulation. State-of-the-art tools exist for both, but the domains have remained notoriously separate because different computational paradigms fit each domain best. This impedance mismatch hampers the development of each new problem solution since desired functionality and scalability can only be achieved by repeated, ad hoc, integration of different techniques. RASCAL is a domain-specific language that takes away most of this boilerplate by providing high-level integration of source code analysis and manipulation on the conceptual, syntactic, semantic and technical level. We give an overview of the language and assess its merits by implementing a complex refactoring.

  @inproceedings{rascal,
	  Author = {Paul Klint and Tijs van der Storm and Jurgen J. Vinju},
	  Title = {RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation},
	  Booktitle = {Ninth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)},
	  Doi = {http://doi.ieeecomputersociety.org/10.1109/SCAM.2009.28},
	  Isbn = {978-0-7695-3793-1},
	  Pages = {168-177},
	  Publisher = {IEEE Computer Society},
	  Year = {2009},
  }

Paul Klint, Jurgen J. Vinju, and Tijs van der Storm. Language Design for Meta-programming in the Software Composition Domain. In Alexandre Bergel and Johan Fabry, editors, Software Composition, 8th International Conference, SC 2009, Zurich, Switzerland, July 2-3, 2009. Proceedings, volume 5634 of Lecture Notes in Computer Science, pages 1-4. Springer, 2009.

  @inproceedings{SC2009,
	  Author = {Paul Klint and Jurgen J. Vinju and Tijs van der Storm},
	  Title = {Language Design for Meta-programming in the Software Composition Domain},
	  Booktitle = {Software Composition},
	  Doi = {http://dx.doi.org/10.1007/978-3-642-02655-3_1},
	  Editor = {Alexandre Bergel and Johan Fabry},
	  Isbn = {978-3-642-02654-6},
	  Pages = {1-4},
	  Publisher = {Springer},
	  Series = {Lecture Notes in Computer Science},
	  Volume = {5634},
	  Year = {2009}
  }

Giorgios Economopoulos, Paul Klint, and Jurgen J. Vinju. Faster scannerless GLR parsing. In Oege de Moor and Michael I. Schwartzbach, editors, Compiler Construction, 18th International Conference, CC 2009, York, UK, March 22-29, 2009. Proceedings, volume 5501 of Lecture Notes in Computer Science, pages 126-141. Springer, 2009.

Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.

  @inproceedings{CC2009,
	  Author = {Giorgios R. Economopoulos and Paul Klint and Jurgen J. Vinju},
	  Title = {Faster Scannerless {GLR} Parsing},
	  Booktitle = {Compiler Construction (CC)},
	  Doi = {http://dx.doi.org/10.1007/978-3-642-00722-4_10},
	  Editor = {Oege de Moor and Michael I. Schwartzbach},
	  Isbn = {978-3-642-00721-7},
	  Pages = {126-141},
	  Publisher = {Springer},
	  Series = {Lecture Notes in Computer Science},
	  Volume = {5501},
	  Year = {2009},
  }

Philippe Charles, Robert M. Fuhrer, Stanley M. Sutton Jr., Evelyn Duesterwald, and Jurgen Vinju. Accelerating the Creation of Customized, Language-specific IDEs in Eclipse. In Shail Arora and Gary T. Leavens, editors, Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA., pages 191-206, 2009.

Full-featured integrated development environments have become critical to the adoption of new programming languages. Key to the success of these IDEs is the provision of services tailored to the languages. However, modern IDEs are large and complex, and the cost of constructing one from scratch can be prohibitive. Generators that work from language specifications reduce costs but produce environments that do not fully reflect distinctive language characteristics. We believe that there is a practical middle ground between these extremes that can be effectively addressed by an open, semi-automated strategy to IDE development. This strategy is to reduce the burden of IDE development as much as possible, especially for internal IDE details, while opening opportunities for significant customizations to IDE services. To reduce the effort needed for customization we provide a combination of frameworks, templates, and generators. We demonstrate an extensible IDE architecture that embodies this strategy, and we show that this architecture can be used to produce customized IDEs, with a moderate amount of effort, for a variety of interesting languages.

  @inproceedings{imp,
	  Author = {Philippe Charles and Robert M. Fuhrer and Stanley M. Sutton Jr. and Evelyn Duesterwald and Jurgen Vinju},
	  Title = {Accelerating the Creation of Customized, Language-Specific IDEs in Eclipse},
	  Booktitle = {Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)},
	  Editor = {Shail Arora and Gary T. Leavens},
	  Pages = {191-206},
	  Year = {2009}
  }

2008

Paul Klint, Taeke Kooiker, and Jurgen J. Vinju. Language Parametric Module Management for IDEs. Electronic Notes in Theoretical Computer Science, 203(2):3-19, 2008.

An integrated development environment (IDE) monitors all the changes that a user makes to source code modules and responds accordingly by flagging errors, by re-parsing, by rechecking, or by recompiling modules and by adjusting visualizations or other information derived from a module. A module manager is the central component of the IDE that is responsible for this behavior. Although the overall functionality of a module manager in a given IDE is fixed, its actual behavior strongly depends on the programming languages it has to support. What is a module? How do modules depend on each other? What is the effect of a change to a module? We propose a concise design for a language parametric module manager: a module manager that is parameterized with the module behavior of a specific language. We describe the design of our module manager and discuss some of its properties. We also report on the application of the module manager in the construction of IDEs for the specification language ASF+SDF as well as for Java. Our overall goal is the rapid development (generation) of IDEs for programming languages and domain specific languages. The module manager presented here represents a next step in the creation of such generic language workbenches.

  @article{LDTA2008,
    title = {Language Parametric Module Management for IDEs},
    author = {Paul Klint and Taeke Kooiker and Jurgen J. Vinju},
    year = {2008},
    doi = {http://dx.doi.org/10.1016/j.entcs.2008.03.041},
    tags = {programming languages, SDF, code generation, language design, programming, Meta-Environment, ASF+SDF, Java, IDE, generic programming},
    journal = {Electronic Notes in Theoretical Computer Science},
    volume = {203},
    number = {2},
    pages = {3-19},
  }

2007

Jurgen J. Vinju. Annotated parse trees for a language parametric IDE. In PLIDE, November 2007.

M.G.J.van den Brand, M.Bruntink, G.R.Economopoulos, H.A.deJong, P.Klint, T. Kooiker, T. van der Storm, and Jurgen J. Vinju. Using The Meta-environment for Maintenance and Renovation. In Proceedings of the Conference on Software Maintenance and Reengineering (CSMR'07). IEEE Computer Society Press, 2007.

The Meta-Environment is a flexible framework for lan- guage development, source code analysis and source code transformation. We highlight new features and demonstrate how the system supports key functionalities for software evolution: fact extraction, software analysis, visualization, and software transformation.

@InProceedings{MetaEnv07,
    author = {M.G.J. van den Brand and M. Bruntink and G.R. Economopoulos and H.A. de Jong and P. Klint and T. Kooiker and T. van der Storm and J.J. Vinju},
    title = {Using {T}he {M}eta-Environment for {M}aintenance and {R}enovation},
    booktitle = {Proceedings of the 11th European Conference on Software Maintenance and Reengineering ({CSMR'07})},
    pages = {331--332},
    year = {2007},
    publisher = {IEEE Computer Society Press}

}

2006

Jurgen J. Vinju and J.R. Cordy. How to make a bridge between transformation and analysis technologies? In J.R. Cordy, R. Lämmel, and A. Winter, editors, Transformation Techniques in Software Engineering, number 05161 in Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany, 2006.

  @inproceedings{dagstuhl,
	  Author = {J.J. Vinju and J.R. Cordy},
	  Booktitle = {Transformation Techniques in Software Engineering},
	  Editor = {J.R. Cordy and R. L{\"a}mmel and A. Winter},
	  Issn = {1862-4405},
	  Number = {05161},
	  Publisher = {Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany},
	  Series = {Dagstuhl Seminar Proceedings},
	  Title = {How to make a bridge between transformation and analysis technologies?},
	  Year = {2006}
  }

Diego Ordóñez Camacho, Kim Mens, Mark van den Brand, and Jurgen J. Vinju. Automated derivation of translators from annotated grammars. In Language Descriptions Tools and Applications, ENCTS, pages 121-137, 2006.

M.G.J. van den Brand, A.T. Kooiker, Jurgen J. Vinju, and N.P. Veerman. A Language Independent Framework for Context-sensitive Formatting. In CSMR '06: Proceedings of the Conference on Software Maintenance and Reengineering, pages 103-112, Washington, DC, USA, 2006. IEEE Computer Society Press.

Automated formatting is an important technique for the software maintainer. It is either applied separately to improve the readability of source code, or as part of a source code transformation tool chain. In this paper we report on the application of generic tools for constructing formatters. In an industrial setting automated formatters need to be tailored to the requirements of the customer. The (legacy) programming language or dialect and the corporate formatting conventions are specific and non-negotiable. Can generic formatting tools deal with such unexpected requirements? Driven by an industrial case of nearly 80 thousand lines of Cobol code, several limitations in existing formatting technology have been addressed. We improved its flexibility by replacing a generative phase by a generic tool, and we added a little expressiveness to the formatting back end. Most importantly, we employed a multi-stage formatting framework that can cope with any kind of formatting convention using more computational power.

@inproceedings{van2006language,
  title={A language independent framework for context-sensitive formatting},
  author={van den Brand, Mark GJ and Kooiker, A Taeke and Vinju, Jurgen J and Veerman, Niels P},
  booktitle={Software Maintenance and Reengineering, 2006. CSMR 2006. Proceedings of the 10th European Conference on},
  pages={10--pp},
  year={2006},
  organization={IEEE}
}

Jurgen J. Vinju. UPTR: a simple parse tree representation format. In Software Transformation Systems Workshop, October 2006.

2005

J.J.Vinju. Analysis and Transformation of Source Code by Parsing and Rewriting. PhD thesis, Universiteit van Amsterdam, November 2005.

In this thesis the subject of study is source code. More precisely, I am interested in tools that help in describing, analyzing and transforming source code. The overall question is how well qualified and versatile the programming language ASF+SDF is when applied to source code analysis and transformation. The main technical issues that are addressed are ambiguity of context-free languages and improving two important quality attributes of analyses and transformations: conciseness and fidelity. The overall result of this research is a version of the language that is better tuned to the domain of source code analysis and transformation, but is still firmly grounded on the original: a hybrid of context-free grammars and term rewriting. The results that are presented have a broad technical spectrum because they cover the entire scope of ASF+SDF. They include disambiguation by filtering parse forests, the type-safe automation of tree traversal for conciseness, improvements in language design resulting in higher resolution and fidelity, and better interfacing with other programming environments. Each solution has been validated in practice, by me and by others, mostly in the context of industrial sized case studies. In this introductory chapter we first set the stage by sketching the objectives and requirements of computer aided software engineering. Then the technological background of this thesis is introduced: generic language technology and ASF+SDF. We zoom in on two particular technologies: parsing and term rewriting. We identify research questions as we go and summarize them at the end of this chapter.

  @phdthesis{thesis2005,
	  Author = {J.J. Vinju},
	  Month = nov,
	  Supervisor = {Paul Klint and {Mark van} den Brand},
	  School = {Universiteit van Amsterdam},
	  Title = {Analysis and Transformation of Source Code by Parsing and Rewriting},
	  Year = {2005}}

M.G.J. van den Brand, A.T. Kooiker, N.P. Veerman, and Jurgen J. Vinju. An architecture for context-sensitive formatting (extended abstract). In International Conference on Software Maintenance, 2005.

M. Bravenboer, R. Vermaas, Jurgen J. Vinju, and E. Visser. Generalized type-based disambiguation of meta programs with concrete object syntax. In Generative Programming and Component Engineering (GPCE), 2005.

In meta programming with concrete object syntax, object-level programs are composed from fragments written in concrete syntax. The use of small program fragments in such quotations and the use of meta-level expressions within these fragments (anti-quotation) often leads to ambiguities. This problem is usually solved through explicit disambiguation, resulting in considerable syntactic overhead. A few systems manage to reduce this overhead by using type information during parsing. Since this is hard to achieve with traditional parsing technology, these systems provide specific combinations of meta and object languages, and their implementations are difficult to reuse. In this paper, we generalize these approaches and present a language independent method for introducing concrete object syntax without explicit disambiguation. The method uses scannerless generalized-LR parsing to parse meta programs with embedded objectlevel fragments, which produces a forest of all possible parses. This forest is reduced to a tree by a disambiguating type checker for the meta language. To validate our method we have developed embeddings of several object languages in Java, including AspectJ and Java itself.

  @inproceedings{BVVV05,
	  Author = {M. Bravenboer and R. Vermaas and J.J. Vinju and E. Visser},
	  Booktitle = {Generative Programming and Component Engineering (GPCE)},
	  Title = {Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax},
	  Year = {2005}
  }

M.G.J. van den Brand, B.Cornelissen, P.A. Olivier, and J.J Vinju. TIDE: a Generic Debugging Framework. In J. Boyland and G. Hedin, editors, Language Design Tools and Applications, June 2005.

A language specific interactive debugger is one of the tools that we expect in any mature programming environment. We present applications of TIDE: a generic debugging framework that is related to the ASF+SDF Meta-Environment. TIDE can be applied to different levels of debugging that occur in language design. Firstly, TIDE was used to obtain a full-fledged debugger for language specifications based on term rewriting. Secondly, TIDE can be instantiated for any other programming language, including but not limited to domain specific languages that are defined and implemented using ASF+SDF. We demonstrate the common debugging interface, and indicate the amount of effort needed to instantiate new debuggers based on TIDE.

  @inproceedings{ldta05,
	  Author = {Brand, {M.G.J. van den} and B. Cornelissen and Olivier, P.A. and Vinju, J.J},
	  Booktitle = {Language Design Tools and Applications},
	  Series = {Electronic Notes in Theoretical Computer Science},
	  Publisher = {Elsevier},
	  Editor = {J. Boyland and G. Hedin},
	  Month = jun,
	  Title = {TIDE: a generic debugging framework},
	  Year = 2005
  }

M.G.J. van den Brand, P.E. Moreau, and Jurgen J. Vinju. A Generator of Efficient Strongly Typed Abstract Syntax Trees in Java. IEE Proceedings-Software, 2005.

Abstract syntax trees are a very common data-structure in language related tools. For example compilers, interpreters, documentation generators, and syntax-directed editors use them extensively to extract, transform, store and produce information that is key to their functionality. We present a Java back-end for ApiGen, a tool that generates implementations of abstract syntax trees. The generated code is characterized by strong typing combined with a generic interface and maximal sub-term sharing for memory efficiency and fast equality checking. The goal of this tool is to obtain safe and more efficient programming interfaces for abstract syntax trees. The contribution of this work is the combination of generating a strongly typed data-structure with maximal sub-term sharing in Java. Practical experience shows that this approach is beneficial for extremely large as well as smaller data types.

  @article{IEE2005,
      title = ,
      author = {Van Den Brand, Mark and Moreau, Pierre-Etienne and Vinju, Jurgen},
      booktitle = ,
      publisher = {IEEE},
      pages = {70--87},
      journal = {IEE Proceedings - Software Engineering},
      volume = {152},
      number = {2 },
      year = {2005},
  }

Jurgen J. Vinju. Type-driven automatic quotation of concrete object code in meta programs. In N. Guelfi and A. Savidis, editors, Rapid Integration of Software Engineering techniques, volume 3475 of LNCS, 2005.

Meta programming can be facilitated by the ability to represent program fragments in concrete syntax instead of abstract syntax. The resulting meta programs are more self-documenting. One caveat in concrete meta programming is the syntactic separation between the meta language and the object language. To solve this problem, many meta programming systems use quoting and anti-quoting to indicate precisely where level switches occur. These “syntactic hedges” can obfuscate the concrete program fragments. This paper describes an algorithm for inferring quotes, such that the meta programmer no longer needs to explicitly indicate transitions between the meta and object languages.

  @inproceedings{RISE2005,
    title = {Type-Driven Automatic Quotation of Concrete Object Code in Meta Programs},
    author = {Jurgen J. Vinju},
    year = {2005},
    pages = {97-112},
    booktitle = {Rapid Integration of Software Engineering Techniques, Second International Workshop, RISE 2005, Heraklion, Crete, Greece, September 8-9, 2005, Revised Selected Papers},
    editor = {Nicolas Guelfi and Anthony Savidis},
    volume = {3943},
    series = {Lecture Notes in Computer Science},
    publisher = {Springer},
    isbn = {3-540-34063-7},
  }

Jurgen J. Vinju, Paul Klint,Tijs van deri Storm. Term Rewriting Meets Aspect Oriented Programming. In Aart Middeldorp, Vincent van Oostrom, Femke van Raamsdonk, and Roel C. de Vrijer, editors, Processes, Terms and Cycles: Steps on the Road to Infinity, Essays Dedicated to Jan Willem Klop, on the Occasion of His 60th Birthday, volume 3838 of Lecture Notes in Computer Science. Springer, 2005.

2004

M.G.J. van den Brand and J.J.Vinju. Generation by Transformation in ASF+SDF. In GPCE Workshop on Software Transformation Systems (STS), 2004.

2003

M.G.J. van den Brand, P.Klint, and J.J. Vinju. Term Rewriting with Traversal Functions. ACM Transactions on Software Engineering and Methodology (TOSEM), 12(2):152-190, 2003.

Term rewriting is an appealing technique for performing program analysis and program transformation. Tree (term) traversal is frequently used but is not supported by standard term rewriting. We extend many-sorted, first-order term rewriting with traversal functions that automate tree traversal in a simple and type safe way. Traversal functions can be bottom-up or top-down traversals and can either traverse all nodes in a tree or can stop the traversal at a certain depth as soon as a matching node is found. They can either define sort preserving transformations or mappings to a fixed sort. We give small and somewhat larger examples of traversal functions and describe their operational semantics and implementation. An assessment of various applications and a discussion conclude the paper.

@article{journals/tosem/BrandKV03,
  author = {van den Brand, Mark and Klint, Paul and Vinju, Jurgen J.},
  journal = {ACM Trans. Softw. Eng. Methodol.},
  number = 2,
  pages = {152-190},
  title = {Term rewriting with traversal functions.},
  volume = 12,
  year = 2003
}

M.G.J. van den Brand, S. Klusener, L. Moonen, and Jurgen J. Vinju. Generalized Parsing and Term Rewriting - Semantics Directed Disambiguation. In Barret Bryant and Joãao Saraiva, editors, Third Workshop on Language Descriptions Tools and Applications, Electronic Notes in Theoretical Computer Science, 2003.

Generalized parsing technology provides the power and flexibility to attack real-world parsing applications. However, many programming languages have syntactical ambiguities that can only be solved using semantical analysis. In this paper we propose to apply the paradigm of term rewriting to filter ambiguities based on semantical information. We start with the definition of a representation of ambiguous derivations. Then we extend term rewriting with means to handle such derivations. Finally, we apply these tools to some real world examples, namely C and COBOL. The resulting architecture is simple and efficient as compared to semantic directed parsing.

  @inproceedings{BMV03,
	  Author = {Brand, {M.G.J. van den} and Klusener, S. and Moonen, L. and Vinju, J.J.},
	  Title = { {G}eneralized {P}arsing and {T}erm {R}ewriting - {S}emantics {D}irected {D}isambiguation},
	  Booktitle = {Third Workshop on Language Descriptions Tools and Applications},
	  Editor = {Barret Bryant and Jo{\~a}o Saraiva},
	  Series = {Electronic Notes in Theoretical Computer Science},
	  Publisher = {Elsevier}
	  Year = 2003
  }

M.G.J. van den Brand, P.E. Moreau, and Jurgen J. Vinju. Environments for Term Rewriting Engines for Free! In R. Nieuwenhuis, editor, Proceedings of the 14th International Conference on Rewriting Techniques and Applications (RTA'03). Springer-Verlag, 2003.

2002

M.G.J. van den Brand, P. Klint, and Jurgen J. Vinju. Term Rewriting with Type-safe Traversal Functions. In B. Gramlich and S. Lucas, editors, Second International Workshop on Reduction Strategies in Rewriting and Programming (WRS 2002), volume 70 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, 2002.

M.G.J. van den Brand, J. Scheerder, Jurgen J. Vinju, and E. Visser. Disambiguation Filters for Scannerless Generalized LR Parsers. In R. Nigel Horspool, editor, Compiler Construction, volume 2304 of LNCS, pages 143-158. Springer-Verlag, 2002.

In this paper we present the fusion of generalized LR parsing and scannerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax definition becomes a full declarative description of a language. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects.

2001

M.G.J. van den Brand, A. van Deursen, J. Heering, H.A. de Jong, M. de Jonge, T. Kuipers, P. Klint, L. Moonen, P. A. Olivier, J. Scheerder, Jurgen J. Vinju, E. Visser, and J. Visser. The ASF+SDF Meta-Environment: a Component-Based Language Development Environment. In R. Wilhelm, editor, CC'01, volume 2027 of LNCS, pages 365-370. Springer-Verlag, 2001.

The ASF+SDF Meta-Environment is an interactive development environment for the automatic generation of interactive systems for constructing language definitions and generating tools for them. Over the years, this system has been used in a variety of academic and commercial projects ranging from formal program manipulation to conversion of COBOL systems. Since the existing implementation of the Meta-Environment started exhibiting more and more characteristics of a legacy system, we decided to build a completely new, component-based, version. We demonstrate this new system and stress its open architecture.

@inproceedings{BDHJJKKMOSVVV01,
author =     {Brand, {M.G.J. van den} and Deursen, {A. van} and J. Heering and
              Jong, {H.A. de} and Jonge, {M. de} and T. Kuipers and
              P. Klint and L. Moonen and P.A. Olivier and J. Scheerder and
              J.J. Vinju and E. Visser and J. Visser},
title =     {The {ASF}+{SDF} {M}eta-{E}nvironment: a {C}omponent-{B}ased
             {L}anguage {D}evelopment {E}nvironment},
editor =    {R. Wilhelm},
series =    {Lecture Notes in Computer Science},
Volume =    {2027},
booktitle = {Compiler Construction (CC '01)},
year =      {2001},
pages =	    {365--370},
publisher = {Springer-Verlag}
}

2000

M.G.J. van den Brand and Jurgen J. Vinju. Rewriting with Layout. In Claude Kirchner and Nachum Dershowitz, editors, Proceedings of RULE2000, 2000. @inproceedings{withlayout, author = {M.G.J. van den Brand and J.J. Vinju}, title = {Rewriting with Layout}, editor = {Claude Kirchner and Nachum Dershowitz}, booktitle={Proceedings of the Second International Workshop on Rule-based Programming {(RULE’00)}}, year = 2000 }

Rewriting technology has proved to be an adequate and powerful mechanism to perform source code transformations. These transformations can not only be efficiently implemented using rewriting technology, but it also provides a firmer grip on the source code syntax. However, an important shortcoming of rewriting technology is that source code comments and layout are lost during rewriting. We propose ``rewriting with layout'' to solve this problem. We present a rewriting algorithm that keeps the layout of sub-terms that are not rewritten, and reuses the layout occurring in the right-hand side of the rewrite rules.

1999

🏆 J.J Vinju. Optimizations of List Matching in the ASF+SDF compiler. Master's thesis, University of Amsterdam, September 1999. Cum laude.

June 19, 2025	- Waardevolle software bestaat niet, op zichzelf
June 02, 2022	- Metaprogramming for Maintenance at Philips
January 21, 2022	- Noisy numbers
March 07, 2019	- Unroll: three ways of reasoning while debugging
June 26, 2018	- Groupthink is a helpful concept for improving online PC meetings
May 22, 2018	- Why software engineering researchers should engineer software
May 14, 2018	- What is code and what is coding?
May 06, 2016	- Making sense of source code
February 12, 2016	- Automatic software analysis in context - inaugural speech TU Eindhoven
September 24, 2014	- Looking towards a future where software is controlled by the public (and not the other way round)

I am contributing or have contributed to the following projects:

Rascal - Metaprogramming Language (2009-present)
ASF+SDF Meta-Environment (2000-2010)
Syntax Definition Formalism (SDF2) (2000-2010)
Eclipse IDE Metatooling Platform (2007-present)
ATerm library (2000-2010)
ELAN4 (2003-2004)

This is a selection of presentation slides. Some talk and interviews have been recorded; where possible I’ve linked the media content.

2026

🏆🎓 SLE Conference 2026, Rennes. Towards a Universal Code Formatters through Machine Learning, Tuesday July 2nd, 2026. Most influential paper from 2026 honorary talk. Together with Terence Parr, Google.

2024

🏭🎓 LangDev conference 2025, Amsterdam. High fidelity source-to-source transformations with parse tree diff, Thursday October 16th, 2025

🏭 Adviesbureau ICT Toetsing (ACICT) congres 2024 (besloten), Wat is legacy software waard?, joint presentation with Gernot Eggen (ASML), Thursday November 14th, 2024

📰 Volkskrant, Gezocht: stokoude computerkennis, newspaper interview in connection with Future of COBOL and Mainframe in The Netherlands, Saturday January 20th, 2024.

🏭 Software Engineering with COBOL and Mainframe: how special is that?, January 18th, 2024. At the Future of COBOL and Mainframe in The Netherlands, CWI Business & Society event, CWI Amsterdam.

📰 Binnenlands Bestuur Hoe een stokoude programmeertaal de overheid in zijn greep houdt, digital newspaper item in connection with Future of COBOL and Mainframe in The Netherlands, January 19th, 2024.

📻 NOS Radio 1 Journaal Programmeertaal COBOL verdwijnt langzaam; hoe erg is dat?, January 18th, 2023, national radio interview in connection with Future of COBOL and Mainframe in The Netherlands.

📰 Financieel Dagblad Tekort specialisten oude computertaal risico voor continuïteit betalingsverkeer, frontpage newspaper interview in connection with Future of COBOL and Mainframe in The Netherlands, Wednesday January 17th, 2024.

📻 Business News Radio BNR Digitaal, radio interview in connection with Future of COBOL and Mainframe in The Netherlands, Wednesday January 10th, 2024.

2023

🎓 Comparing Bottom-up with Top-down Parsing Architectures for the Syntax Definition Formalism from a Disambiguation Standpoint, April 5th 2023. Eelco Visser Commemorative Symposium, TU Delft.

🎓 Rascal Lab: Sustainable Research Software Infrastructure for Software Engineering, March 29th 2023 Symposium on Software Engineering to the occasion of the doctoral defense of Lina Ochoa Venegas. TU Eindhoven.

🏭 Rapid Prototyping of Language Servers with Rascal, VScode, and Gitpod, March 23rd, 2023. Gitpod community event @ Adyen, Amsterdam.

📺 NOS Acht uur Journaal, News item on Cool:gen and COBOL software at the Dutch Tax office, interview in the 8 o’clock news. Wednesday, March 1st, 2023.

2022

🏭 Pecha Kucha - Rekenen zonder Fouten (Dutch), October 14th, 2022, Pecha Kucha Peelland, with video impression

🏭 Automating maintenance: the way out of the software renovation paradox , June 9th, 2022 Bits & Chips event, Eindhoven. Talk to go with the article Metaprogramming for Maintenance at Philips in Bits & Chips magazine, which reports on the journal article Large-scale semi-automated migration of legacy C/C++ test code by Mathijs Schuts, Rodin Aarssen, Paul Tielemans and Jurgen Vinju.

🎓📺 RadCal: Design and Theory of Reliable Numerical Programming Languages with First-class Errors, poster at ICT.OPEN 2022 with a two minute pitch video.

🎓 Path-Sensitive Atomic Commit: Local Coordination Avoidance for Distributed Transactions, March 23rd 2022, <Programming> conference 2022. Porto, Portugal, paper by Tim Soethoet, Jurgen Vinju, Tijs van der Storm.

🎓 Bacatá: Notebooks for DSLs, Almost for Free, March 23rd 2022, <Programming> conference 2022. Porto, Portugal, paper by Mauricio Verano Merino, Jurgen Vinju and Tijs van der Storm.

🏭🎓 Generating VScode extensions for DSLs using Rascal, March 3rd 2022, Strumenta Online Community.

2021

🏫📺 What if [y]our code were data? Analyzing large code bases using Rascal, September 22nd, 2021, CODAM college, Amsterdam. The talk was recorded and published on YouTube.

🏭📺 Software Maintenance Competences, April 13th, 2021, TNO International Digital Enablement Week IDEW’21. The talk was recorded and published on Vimeo.

2017

📺 AVRO-TROS Een Vandaag: De voorspelbare mens 1: wat is een algoritme?, interview recorded on television and published on avrotros.nl

🏆🎓 Challenges for Static Analysis of Java Reflection - Literature Review and Empirical Study, May 26, 2017, International Conference on Software Engineering (ICSE), Buenos Aires, Argentina.

2016

🎓 Intreerede Automatische software-analyse in context en Presentatie slides bij de intreerede Eindhoven, Februari 12th, 2016. Eindhoven, The Netherlands.

2015

🎓 OSSMETER Pitch EU Concertation Meeting - Turning cloud research into innovative software & services, March 25th 2015, Brussels, Belgium.

🎓 Software Engineering: The War Against Complexity, Keynote Open Tool Demonstrations Day Cha-Q project; Change-centric Software Engineering, Antwerp University, February 24, 2015, Antwerp, Belgium.

🎓 Public/Private Collaboration {in,for,with} Software Engineering EARMA conference, Leiden, June 30th, The Netherlands

🎓 Challenges and Opportunities of Big Software-based Innovation NWO Big Software Match Making Day, July 1st, 2015, Utrecht, The Netherlands.

2014

📻 NPO Radio 1, De Kennis van Nu, Interview (in Dutch) about the first programmer and software philosopher Ada Lovelace. Recorded live on radio and published at the NTR website.

🎓 SEN Symposium Introduction, December 3, 2014, CWI, Amsterdam.

🎓 Optimizing Hash-tries for Fast and Lean Immutable Collection Libraries, IFIP WG2.4, Stellenbosch, SA.

🏭 Software Research at CWI, Breakfast Meeting Amsterdam Economic Board, August 28, 2014, Amsterdam.

2013

🎓 M3: an open model for measuring source code artifacts, December 17, BENEVOL in Mons, Belgium.

🎓 CWI SWAT & Rascal, November 14th, 2013, NWO Special Interest Group Software Engineering, Nikhef, Amsterdam, The Netherlands.

🎓 Introducing SLE 2014 in Vasteras, Sweden

🏫 Debugging and all that for Master Software Engineering, May 2nd, Centrum Wiskunde & Informatica,

🏫 Slides on Modularity for Bachelor Computer Science, Jan 13th, Universiteit van Amsterdam, The Netherla

🎓 Software Analysis and Transformation with RascalJan 11th, 2013, BioAssist Meeting, Utrecht, The Netherlands.

2012

🎓 Introduction to Rascal and Eyeballing the Cyclomatic Complexity Metric May 11th, 2012, INRIA Lille Software Engineering.

🏭 Constructing specialist software tools using Rascal: Metrics. April 24 2012, Sogyo

🎓 The mechanics of building a DSL using Rascal. April 17th 2012, IPA Spring Days, Gelderen

🎓 Professional Feedback. March 29th, 2012, CSMR Doctoral Symposium, Szeged (Hungary).

2011

🎓 A case of visitor versus interpreter pattern. June 30th, 2011. Zurich. TOOLS conference. This presentation expains our paper on comparing the impact of choosing between the two functionally inter-changeable design patterns on maintainability of an AST-based language interpreter.

🎓 Controlled Experiments in Software Engineering, October 28th, 2011. Amsterdam. Theoretical Computer Science Amsterdam (TCSA) Day, Amsterdam.

2006

🎓 UPTR: a universal parse tree representation format (relevant to Parsing@SLE audience about how to compare parsers), Software Transformation Systems Workshop, Vancouver.

2002

🎓 Realities of Scientific Software Engineering (an old presentation on software development in an academic environment, as presented to the researchers of the Proteo group in INRIA-LORIA, Nancy, France)

PhD theses

Rodin Aarssen, TU Eindhoven (TBA)
Jasper Denkers, TU Delft (2024), Domain Specific Languages For Digital Printing Systems promotors Eelco Visser^†, Andy Zaidman, TU Delft and Jurgen Vinju, TU Eindhoven. Also supervised by Louis C. M. van Gool at Canon Production Printing, Venlo.
Jouke Stoel, TU Eindhoven (2023), Solving the Bank - Lightweight Specification and Verification Techniques for Enterprise Software TU Eindhoven (Jurgen Vinju first promotor, Tijs van der Storm second promotor, Mark van den Brand co-promotor)
🏆 Lina María Ochoa Venegas (2023), Break the Code? Breaking Changes and their Impact on Software Evolution,TU Eindhoven (Jurgen Vinju first promotor, Mark van den Brand second promotor and Thomas Degueule co-promotor) cum laude
🏆 Tim Soethout (2022), Banking on Domain Knowledge for Faster Transactions --- Leveraging Models to Avoid Coordination, TU Eindhoven, (Jurgen Vinju first promotor, Tijs van der Storm second promotor) VERSEN first national Phd thesis award 2023.
🏆 Mauricio Merino Verano (2022), Engineering Language-Parametric End-User Programming Environments for DSLs, TU Eindhoven (Jurgen Vinju first promotor, Mark van den Brand second Promotor, Tijs van der Storm co-promotor) VERSEN third national Phd thesis award 2023.
⓶ Ali Afroozeh and Anastasia Izmaylova (2019), Practical General Top-down Parsers Universiteit van Amsterdam (Paul Klint first promotor, Jurgen Vinju second promotor).
🏆 Davy Landman (2017), Reverse Engineering Source Code. Universiteit van Amsterdam (Paul Klint first promotor, Jurgen Vinju second promotor), awarded with IPA Best Dissertation 2017, and SIGSOFT ICSE Distinguished Paper Award for one of the chapters.
🏆 Michael Steindorfer (2017), Efficient Immutable Collections. Universiteit van Amsterdam (Paul Klint first promotor, Jurgen Vinju second promotor), awarded ICPE distinguished paper award for one of the chapters.
Bas Basten (2011), Ambiguity Detection for Programming Language Grammars. Universiteit van Amsterdam (Paul Klint promotor, Jurgen Vinju co-promotor)

Master's theses

Tar van Krieken (2023), Deriving Syntax Highlighting Grammars From Character-level Context-free Grammars: Algorithm Development, Analysis, and Future Directions, TU Eindhoven
Ruichen Hu (2023), An Automated Approach to Check Software Architecture Erosion, TU Eindhoven at Philips Healthcare. (Michel Chaudron, Mathijs Schuts co-supervisors)
Amber Schippers (2022), A Generalised Implementation of Symbolic Execution Using the Z3 Theorem Prover, TU Eindhoven at Codean
Guillermo Antoñanzas Martínez (2022), Business Intelligence Adoption of DevOps Methodologies, TU Eindhoven (Lina Maria Ochoa Venegas primary supervisor)
Mohammed El Mochoui (2021), Universiteit van Amsterdam, Deriving metric thresholds for the SIG Test Code Quality Model: A benchmarking study
Jaro Reinders (2021), TU Eindhoven, Automatic Generation of C Library Bindings - Inferring Nullability through Structure Fields
Adrian Zborowski (2017), Universiteit van Amsterdam, Oxidize Open Framework for Idiomatic Rule Preservation in Rust Programming Language
Nick Lodewijks (2017), Universiteit van Amsterdam, Clone-and-Own Analysis of an Industrial Automation System
Alex Kok (2017), Universiteit van Amsterdam, Property-based testing Rebel semantics in the generated code.
Thanusjan Tharumarajah (2017), Universiteit van Amsterdam, Runtime testing generated systems from Rebel specifications.
Kinson Michel (2017), Universiteit van Amsterdam,Comparison of IDE extracted effort versus static metrics for assessing software maintainability. (supervised by Aiko Yamashita)
Ruud van der Weijde (2017), Universiteit van Amsterdam,Type inference for PHP A constraint based type inference written in Rascal.
Tom van Duist (2016), Universiteit van Amsterdam,Scaling CEP - Using Distributed Stream Computing to Scale Complex Event Processing.
Roy de Wildt (2016), Universiteit van Amsterdam, Assessing the Effectiveness of Fault-Proneness Prediction Models Across Software Systems.
Iwan Flameling (2015), Universiteit van Amsterdam, An automatic CSRF protection tool.
Omar Pakker (2015), Universiteit van Amsterdam, Graph-Based Querying On top of the Entity Framework
Maria Gouseti (2014), Universiteit van Amsterdam,A General Framework for Concurrency Aware Refactorings. Awarded top grade.
Arie van der Veek (2013), Universiteit van Amsterdam, Coupling as a trade-off in an Enterprise Service Bus
Peter Klijn (2013), Universiteit van Amsterdam, How accurately do Java profilers predict runtime performance bottlenecks?
Richard Bos (2013), Universiteit van Amsterdam, Finding lightweight opportunities for parallelism in .NET C♯<
Vlad Lep (2013), Universiteit van Amsterdam, Noise detection in software engineering datasets using Gaussian Processes (Magiel Bruntink, co-supervisor)
Ioana Rucareanu (2013), Universiteit van Amsterdam, PHP: Securing Against SQL Injection (Mark Hills, co-supervisor)
Henk Bosman (2013), Universiteit van Amsterdam, Predicting bugs and issues with automated code reviews
Dimitrios Kyritsis (2013), Universiteit van Amsterdam, PHP re-factoring: HTML templates (Mark Hills co-supervisor)
Chris Mulder (2013), Universiteit van Amsterdam, Reducing Dynamic Feature Usage in PHP Code (Mark Hills co-supervisor)
Christos K. Tsigkanos (2013), Universiteit van Amsterdam, Stateful discovery of attack manifestations on networks and systems
Koen G.L. Hanselman (2013), Universiteit van Amsterdam, Detection of the Abstract Factory Pattern: an experimental study
Vladimir Komsiyski (2013), Universiteit van Amsterdam, Binary Differencing for Media Files
Hans van Bakel (2012), Universiteit van Amsterdam, Reducing coupling to lower maintenance effort
Jorge Nicolas Barrionuevo (2012), Universiteit van Amsterdam, The Core of Open Source Systems
Pieter Bregman (2012), Universiteit van Amsterdam, Onderhoudbaarheid vs. betrouwbaarheid "een case study"
Dennis van Leeuwen (2012), Universiteit van Amsterdam, Comprehensible Method Names: Focusing on the Nouns
Ashim Shahi (2012), Universiteit van Amsterdam, Classifying the classifiers for file fragment classification (Jeroen van den Bos co-supervisor)
Luuk Stevens (2012), Universiteit van Amsterdam, Automatically Analyzing the Consistency and Preciseness of Class Names
Jouke Stoel (2012), Universiteit van Amsterdam, Exploring the Detection of Method Naming Anomalies
Aart van den Dolder (2011), Universiteit van Amsterdam, Bepaling van de geschiktheid van Oracle Forms applicaties voor inbeheername door middel van automatische code review volgens het SIG Maintainability model
Randy Fluit (2011) Universiteit van Amsterdam, Differencing Context-free Grammars (Tijs van der Storm co-supervisor)
Rob van der Horst (2011), Universiteit van Amsterdam, The Influence of First-Class Relations on Coupling and Cohesion : A Case Study
Marvin Jacobsz (2011), Universiteit van Amsterdam, Een performance analyse van "Hiphop for PHP"
Christian Köppe (2011), Universiteit van Amsterdam, DoKRe - A Method for Automated Domain Knowledge Recovery from Source Code
Jeroen Bach (2010), Universiteit van Amsterdam, Theory and experimental evaluation of object-relational mapping optimization techniques : How to ORM and how not to ORM
Steven Raemaekers (2010), Universiteit van Amsterdam, Testing Semantic Clone Detection Candidates
Nico Schoenmaker (2010), Universiteit van Amsterdam, Over de understandability van subtype polymorfisme in objectgeoriëenteerde systemen
Waruzjan Shahbazian (2010), Universiteit van Amsterdam, Rminer: An integrated model for repository mining using Rascal : A feasibility study
Sander Vellinga (2010), Universiteit van Amsterdam, Identifying behavior changes after PHP language migration using static source-code analysis
David Walschots (2010), Universiteit van Amsterdam, A case study on the cost and benefits of bus-oriented architectures
Maarten Wullink (2010), Universiteit van Amsterdam, Data model Maintainability : A comparative study of maintainability metrics
Jeldert Pol (2009), Universiteit van Amsterdam,Extreme Team Collaboration : Synchronous collaboration in Eclipse (Paul Klint supervisor)
David van Dijk (2009), Universiteit van Amsterdam, Changeability in Model Driven Web Development
Vincent Lussenburg (2009), Universiteit van Amsterdam, Mod4j : A qualitative case study of industrially applied model-driven software development
Karel Pieterson (2009), Universiteit van Amsterdam, Leerbaarheid van Programmeertalen
Arend van Beelen (2008), Universiteit van Amsterdam, Distributed Database Design for Social Network Graphs
Jan Derriks (2007), Universiteit van Amsterdam, Fortran grammatica-extractie (Paul Klint co-supervisor)
Anton Gerdessen (2007), Universiteit van Amsterdam, Framework comparison method. Comparing two frameworks based on technical domains, focussing on customisability and modifiability
Ricardo Lindooren (2007), Universiteit van Amsterdam, Testability of Dependency injection. An attempt to find out how the testability of source code is affected when the dependency injection principle is applied to it.
Arjen van Schie (2007), Universiteit van Amsterdam, Programming for a parallel future. "Improving the modularity and encapsulation for the implementation of concurrency concerns."
Ron Valkering (2007), Universiteit van Amsterdam, Syntax Error Handling in Scannerless Generalized LR Parsers
Renze de Vries (2007), Universiteit van Amsterdam, Service Oriented Architecture Degradatie onderhoudbaarheid referentiearchitectuur
Paul Bakker (2006), Universiteit van Amsterdam, The Framework Productivitity Measurement Method. Meten van de productiviteitwinst bij het gebruik van een webframework
Sannie Kwakman (2006), Universiteit van Amsterdam, Variability through Aspect Oriented Programming in J2ME game development (used to be confidential)
Maarten Pater (2006), Universiteit van Amsterdam, Searching in public protein databases for novel Peroxisomal PTS1 containing Proteins (confidential) (Jan van Eijck co-supervisor)
Bart den Haak (2006) Universiteit van Amsterdam, Dynamic configurable web visualization of complex data relations
Tim Prijn (2006), Universiteit van Amsterdam, Framework Software Quality Analysis: A Case Study Analyzing the software quality supported by a J2EE meta-framework
Julien Rentrop (2006), Universiteit van Amsterdam, Software Metrics as Benchmarks for Source Code Quality of Software
Youri op 't Roodt (2006), Universiteit van Amsterdam, The effect of Ajax on performance and usability in web environments
Said Lakhloufi (2004), Universiteit van Amsterdam, JFC/Swing Editor voor ASF+SDF Meta-Environment (Mark van den Brand, Taeke Kooiker, Hayco de Jong, Paul Klint co-supervisors)
Bas Cornelissen (2004), Universiteit van Amsterdam, Using TIDE to Debug ASF+SDF at Multiple Levels (Paul Klint co-supervisor)

Bachelor's theses

Damien DeCampos (2022), Ada-air: Ada Analysis in Rascal, Université Paris Saclay, (Pierre van de Laar, TNO-ESI, co-supervisor)
Elephtera Hendriks (2009), Universiteit van Amsterdam, Parsing macros without the pre-processor

Email	Jurgen.Vinju@cwi.nl jurgen@vinju.org
Snailmail	Science Park 123 P.O. Box 95079 NL-1090 GB AMSTERDAM
Visit	Science Park 123 1098 XG AMSTERDAM Room L221
Phone	+31205924102
LinkedIn	http://nl.linkedin.com/in/jurgenvinju
Twitter	http://www.twitter.com/jurgenvinju
Researchr	http://researchr.org/profile/jurgenjvinju
ResearchGate	https://www.researchgate.net/profile/Jurgen_Vinju/