I am a researcher in the field of Software Engineering. My academic position is group leader of SWAT - Software Analysis & Transformation at CWI, and group leader of ATEAMS at INRIA Lille - Nord Europe, and I am part-time full professor at Eindhoven University of Technology starting September 1st, 2014. ATEAMS and SWAT are the same team hosted by both CWI and INRIA as a form of international collaboration.

Download my CV as pdf

Research interests

In theory source code is written text which can be changed at any time. In reality the source code of real software systems is mostly too complex to read and understand. The source code of normal software systems is actually quite difficult to manipulate and adapt to changing circumstances and requirements. Perhaps it should not have been called software after all. To make matters more interesting, the older systems are the more complex they become.

My personal goals are to:

  • help software engineers to analyze source code to efficiently maintain it
  • help software engineers effectively improve source code by code generation, refactoring and source-to-source transformation.
  • understand which design decisions influence the flexibility and understandability of source code.
  • enable the construction of software tools for source code generation, analysis, transformation, and visualization by a larger group of software engineers.

Private interests

  • My family
  • Programming

When time permits:


Di Ruscio, Davide Kolovos, Dimitrios S. Korkontzelos, Ioannis Matragkas, Nicholas and Vinju, Jurgen. OSSMETER: A Software Measurement Platform for Automatically Analysing Open Source Software Projects ESEC/FSE 2015 Tool Demonstrations Track.

     author = {Di Ruscio, Davide and Kolovos, Dimitrios S. and Korkontzelos, Ioannis and Matragkas, Nicholas and Vinju, Jurgen},
      title = {OSSMETER: A Software Measurement Platform for Automatically Analysing Open Source Software Projects},
  booktitle = {ESEC/FSE 2015 Tool Demonstrations Track},
       year = {2015}

Almeida, B., Ananiadou, S., Bagnato, A., Barbero, A. B., Di Rocco, J., Di Ruscio, D., Kolovos, D. S., Korkontzelos, I., Hansen, S., Malo, P., Matragkas, N., Paige, R. F. and Vinju, J. OSSMETER: Automated Measurement and Analysis of Open Source Software .In: Proceedings of the Projects Showcase at the Software Technologies: Applications and Foundations 2015 (STAF 2015)

Michael Steindorder and Jurgen J. Vinju, Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections to appear in OOPSLA 2015.

The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3– 6.7 x) and equality checking (3–25.4 x).
  title = {Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections}
  author = {Michael Steindorder and Jurgen J. Vinju}.
  year = 2015,
  booktitle = {Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)},
  editor = {Patrick Eugster},


Magiel Bruntink and Jurgen J. Vinju", Looking Towards a Future where Software is Controlled by the Public (and not the other way around). ERCIM News 99, 2014.

Nowadays, software has a ubiquitous presence in everyday life and this phenomenon gives rise to a range of challenges that affect both individuals and society as a whole. In this article we argue that in the future, the domain of software should no longer belong to technical experts and system integrators alone. Instead it should transition to a firmly engaged public domain, similar to city planning, social welfare and security. The challenge that lies at the heart of this problem is the ability to understand, on a technical level, what all the different software actually is and what it does with our information. Read more.
  author = {Magiel Bruntink and Jurgen J. Vinju},
  title = {Looking Towards a Future where Software is Controlled by the Public (and not the other way around)}
  journal = {ERCIM News}
  issue = 99,
  year = 2014,

Anthony Cleve and Jurgen J. Vinju, Software Quality - Introduction to the Special Theme. ERCIM News 99.

The introduction of fast and cheap computer and networking hardware enables the spread of software. Software, in a nutshell, represents an unprecedented ability to channel creativity and innovation. The joyful act of simply writing computer programs for existing ICT infrastructure can change the world. We are currently witnessing how our lives can change rapidly as a result, at every level of organization and society and in practically every aspect of the human condition: work, play, love and war. The act of writing software does not imply an understanding of the resulting creation. We are surprised by failing software (due to bugs), the inability of rigid computer systems to “just do what we want”, the loss of privacy and information security, and last but not least, the million euro software project failures that occur in the public sector. These surprises are generally not due to negligence or unethical behaviour but rather reflect our incomplete understanding of what we are creating. Our creations, at present, are all much too complex and this lack of understanding leads to a lack of control. Read more
  author = {Anthony Cleve and Jurgen J. Vinju},
  title = {Software Quality - Introduction to the Special Theme},
  journal = {ERCIM News}
  issue = 99,
  year = 2014,

Michael Steindorfer and Jurgen J. Vinju, Code Specialization for Memory Efficient Hash Tries (Short Paper). GPCE 2014, Vasteras, Sweden.

The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming. In this paper we present a product family of hash tries. We gen- erate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation. A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective.
 author = {Steindorfer, Michael J. and Vinju, Jurgen J.},
 title = {Code Specialization for Memory Efficient Hash Tries (Short Paper)},
 booktitle = {Proceedings of the 2014 International Conference on Generative Programming: Concepts and Experiences},
 series = {GPCE 2014},
 year = {2014},
 isbn = {978-1-4503-3161-6},
 location = {V\&\#228;ster\&\#229;s, Sweden},
 pages = {11--14},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/2658761.2658763},
 doi = {10.1145/2658761.2658763},
 acmid = {2658763},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Code generation, Hash trie, Immutability, Memory optimization, Performance, Persistent data structure, Specialization},

Mark Hills, Paul Klint and Jurgen J. Vinju, Static, Lightweight Includes Resolution for PHP. ASE 2014, Vasteras, Sweden.

Dynamic languages include a number of features that are challenging to model properly in static analysis tools. In PHP, one of these features is the include expression, where an arbitrary expression provides the path of the file to include at runtime. In this paper we present two complementary analyses for statically resolving PHP includes, one that works at the level of individual PHP files and one targeting PHP programs, possibly consisting of multiple scripts. To evaluate the effectiveness of these analyses we have applied the first to a corpus of 20 open-source systems, totaling more than 4.5 million lines of PHP, and the second to a number of programs from a subset of these systems. Our results show that, in many cases, includes can be either resolved to a specific file or a small subset of possible files, enabling better IDE features and more advanced program analysis tools for PHP.
 author = {Hills, Mark and Klint, Paul and Vinju, Jurgen J.},
 title = {Static, Lightweight Includes Resolution for PHP},
 booktitle = {Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering},
 series = {ASE '14},
 year = {2014},
 isbn = {978-1-4503-3013-8},
 location = {Vasteras, Sweden},
 pages = {503--514},
 numpages = {12},
 doi = {10.1145/2642937.2643017},
 acmid = {2643017},
 publisher = {ACM},
 address = {New York, NY, USA},


Davy Landman, Alexander Serebrenik and J.J. Vinju, Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods. in 30th IEEE International Conference on Software Maintenance and Evolution, ICSME 2014, 2014.

Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation. We test this claim by studying a corpus of 17.8M methods in 13K open-source Java projects. Our results show that direct linear correlation between SLOC and CC is only moderate, as caused by high variance. We observe that aggregating CC and SLOC over larger units of code improves the correlation, which explains reported results of strong linear correlation in literature. We suggest that the primary cause of correlation is the aggregation. Our conclusion is that there is no strong linear correlation between CC and SLOC of Java methods, so we do not conclude that CC is redundant with SLOC. This conclusion contradicts earlier claims from literature, but concurs with the widely accepted practice of measuring of CC next to SLOC.
  author = { Davy Landman and Alexander Serebrenik and Jurgen Vinju },
  title = { {Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods} },
  booktitle = { 30th IEEE International Conference on Software Maintenance and
  Evolution, ICSME 2014 },
  year = { 2014 },
  datalink = { http://homepages.cwi.nl/~landman/icsme2014/ },


T. van der Storm and J. J. Vinju, Towards multilingual programming environments Science of Computer Programming, 2013.

oftware projects consist of different kinds of artifacts: build files, configuration files, markup files, source code in different software languages, and so on. At the same time, however, most integrated development environments (IDEs) are focused on a single (programming) language. Even if a programming environment supports multiple languages (e.g., Eclipse), IDE features such as cross-referencing, refactoring, or debugging, do not often cross language boundaries. What would it mean for programming environment to be truly multilingual? In this short paper we sketch a vision of a system that integrates IDE support across language boundaries. We propose to build this system on a foundation of unified source code models and metaprogramming. Nevertheless, a number of important and hard research questions still need to be addressed.
title = "Towards multilingual programming environments ",
journal = "Science of Computer Programming ",
issn = "0167-6423",
doi = "http://dx.doi.org/10.1016/j.scico.2013.11.041",
url = "http://www.sciencedirect.com/science/article/pii/S0167642313003341",
author = "Tijs van der Storm and Jurgen Vinju",


Anastasia Izmaylova, Paul Klint, Ashim Shahi and Jurgen J. Vinju. M3: An Open Model For Measuring Code Artifacts BENEVOL 2013.

In the context of the EU FP7 project ``OSSMETER'' we are developing an infra-structure for measuring source code. The goal of OSSMETER is to obtain insight in the quality of open-source projects from all possible perspectives, including product, process and community. This is a "white paper" on M3, a set of code models, which should be easy to construct, easy to extend to include language specifics and easy to consume to produce metrics and other analyses. We solicit feedback on its usability.
author       = {Izmaylova, A. and Klint, P. and Shahi, A. and Vinju, J. J.},
title        = {M3: {An} {Open} {Model} {For} {Measuring} {Code} {Artifacts}},
series       = {BENEVOL},
year         = {2013},
month        = {December},
number       = {arXiv-1312.1188},
publisher    = {Cornell University Library},
institution  = {CWI},
url          = {http://arxiv.org/abs/1312.1188},

Davy Landman, Paul Klint, and Jurgen J. Vinju Exploring the Limits of Domain Model Recovery

We are interested in re-engineering families of legacy applications towards using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting domain knowledge from the source code of legacy applications? Reverse engineering domain knowledge from source code is sometimes considered very hard or even impossible. Is it also difficult for "modern legacy systems"? In this paper we select two open-source applications and answer the following research questions: which parts of the domain are implemented by the application, and how much can we manually recover from the source code? To explore these questions, we compare manually recovered domain models to a reference model extracted from domain literature, and measured precision and recall. The recovered models are accurate: they cover a significant part of the reference model and they do not contain much junk. We conclude that domain knowledge is recoverable from "modern legacy" code and therefore domain model recovery can be a valuable component of a domain re-engineering process.
  author = { Paul Klint and Davy Landman and Jurgen Vinju },
  title = {Exploring the Limits of Domain Model Recovery},
  booktitle = { 29th IEEE International Conference on Software Maintenance (ICSM)},
  year = { 2013 },

Ali Afroozeh, Mark van den Brand, Adrian Johnstone, Elizabeth Scott and Jurgen J. Vinju. Safe Specification of Operator Precedence Rules SLE 2013.

In this paper we present an approach to specifying opera- tor precedence based on declarative disambiguation constructs and an implementation mechanism based on grammar rewriting. We identify a problem with existing generalized context-free parsing and disambigua- tion technology: generating a correct parser for a language such as OCaml using declarative precedence specification is not possible without resorting to some manual grammar transformation. Our approach provides a fully declarative solution to operator precedence specification for context-free grammars, is independent of any parsing technology, and is safe in that it guarantees that the language of the resulting grammar will be the same as the language of the specification grammar. We evaluate our new approach by specifying the precedence rules from the OCaml reference manual against the highly ambiguous reference grammar and validate the output of our generated parser.
  title = {Safe Specification of Operator Precedence Rules},
  author = {Ali Afroozeh and Mark van den Brand and Adrian Johnstone and Elizabeth Scott and Jurgen J. Vinju}
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2013,
  publisher = {Springer},
  series = {LNCS},


Anastasia Izmaylova and Jurgen J. Vinju. A Modular Language Parametric Framework for Type Constraint Based Refactorings. (DRAFT).

Refactoring tools are among the most desirable in the programmer's toolbox. Any refactoring tool -specific for a particular language and for a specific kind of refactoring- represents a considerable investment. At an increasing rate new languages are introduced, and new features are introduced to existing languages. The development of refactoring tools is forced to keep with this evolution. The extension of a general purpose language like Java with generics is a good example that requires both adaptations to existing refactoring tools, as well as the introduction of new refactoring tools specific for generics. We propose a modular language-parametric framework, called "TyMoRe" (TYpe-related MOdular REfactoring), for constraint-based type refactorings. It enables reuse between languages and reuse between different refactorings for the same language. The framework uses functional monadic composition to achieve the desired modularity and compositionality. The effectiveness of TyMoRe is demonstrated by our prototype of the ``Infer Generic Type Arguments'' refactoring for a large subset of Java.
This article is an unpublished draft.

Mark Hills, Paul Klint and Jurgen J. Vinju. An empirical study of PHP feature usage. Proceedings of the International Symposium in Software Testing and Analysis (ISSTA), July 2013. Lugano Switserland.

PHP is one of the most popular languages for server-side application development. The language is highly dynamic, providing programmers with a large amount of flexibility. However, these dynamic features also have a cost, making it difficult to apply traditional static analysis techniques used in standard code analysis and transformation tools. As part of our work on creating analysis tools for PHP, we have conducted a study over a significant corpus of open-source PHP systems, looking at the sizes of actual PHP programs, which features of PHP are actually used, how often dynamic features appear, and how distributed these features are across the files that make up a PHP website. We have also looked at whether uses of these dynamic features are truly dynamic or are, in some cases, statically understandable, allowing us to identify specific patterns of use which can then be taken into account to build more precise tools. We believe this work will be of interest to creators of analysis tools for PHP, and that the methodology we present can be leveraged for other dynamic languages with similar features.
  author = {Hills, Mark and Klint, Paul and Vinju, Jurgen J.},
  title = {An Empirical Study of PHP feature usage: a static analysis perspective},
  booktitle = {ISSTA},
  editor = {Pezz\`e, Mauro and Harman, Mark},
  pages = {325-335},
  publisher = {ACM},
  year = 2013,


Mark Hills, Paul Klint and Jurgen J. Vinju. Scripting a refactoring with Rascal and Eclipse. Proceedings of the Fifth Workshop on Refactoring Tools.

 author = {Hills, Mark and Klint, Paul and Vinju, Jurgen J.},
 title = {Scripting a refactoring with Rascal and Eclipse},
 booktitle = {Proceedings of the Fifth Workshop on Refactoring Tools},
 series = {WRT '12},
 year = {2012},
 pages = {40--49},
 publisher = {ACM},

Mark Hills, Paul Klint and Jurgen Vinju. Program Analysis Scenarios in Rascal. 9th International Workshop on Rewriting Logic and its Applications (WRLA 2012).

Rascal is a meta programming language focused on the implemen- tation of domain-specific languages and on the rapid construction of tools for software analysis and software transformation. In this paper we focus on the use of Rascal for software analysis. We illustrate a range of scenarios for building new software analysis tools through a number of examples, including one showing integration with an existing Maude-based analysis. We then focus on ongoing work on alias analysis and type inference for PHP, showing how Rascal is being used, and sketching a hypothetical solution in Maude. We conclude with a high-level discussion on the commonalities and differences between Rascal and Maude when applied to program analysis.
  title = "Program Analysis Scenarios in Rascal",
  author = {Mark Hills and Paul Klint and Jurgen J. Vinju},
  booktitle = {9th International Workshop on Rewriting Logic and Its Applications (WRLA 2012)},
  note = {Invited Paper},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  year = 2012

Mark Hills, Paul Klint and Jurgen J. Vinju. Meta-Language Support for Type-Safe Access to External Resources. International Conference on Software Language Engineering (SLE).

Meta-programming applications often require access to het- erogenous sources of information, often from different technological spaces (grammars, models, ontologies, databases), that have specialized ways of defining their respective data schemas. Without direct language support, obtaining typed access to this external, potentially changing, informa- tion is a tedious and error-prone engineering task. The Rascal meta- programming language aims to support the import and manipulation of all of these kinds of data in a type-safe manner. The goal is to lower the engineering effort to build new meta programs that combine information about software in unforeseen ways. In this paper we describe built-in language support, so called resources, for incorporating external sources of data and their corresponding data-types while maintaining type safety. We demonstrate the applicability of Rascal resources by example, showing resources for RSF files, CSV files, JDBC-accessible SQL databases, and SDF2 grammars. For RSF and CSV files this requires a type inference step, allowing the data in the files to be loaded in a type-safe manner without requiring the type to be declared in advance. For SQL and SDF2 a direct translation from their respective schema languages into Rascal is instead constructed, providing a faithful translation of the declared types or sorts into equivalent types in the Rascal type system. An overview of related work and a discussion conclude the paper.
  title = {Meta-Language Support for Type-Safe Access to External Resources},
  author = {Mark Hills and Paul Klint and Jurgen J. Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2012,
  publisher = {Springer},
  series = {LNCS},

Jurgen J. Vinju and Michael W. Godfrey. What Does Control Flow Really Look Like? Eyeballing the Cyclomatic Complexity Metric. International Working Conference on Source Code Analysis and Manipulation. experiment

Assessing the understandability of source code remains an elusive yet highly desirable goal for software developers and their managers. While many metrics have been suggested and investigated empirically, the McCabe cyclomatic complexity metric (CC) --- which is based on control flow complexity --- seems to hold enduring fascination within both industry and the research community. However, the CC metric also has obvious limitations. For example, it is easy to produce example code that seems trivial to understand yet has a high CC value; at the same time, one can also produce "spaghetti" code with many GOTOs that has the same CC value as a well-structured alternative. In this work, we explore the causal relationship between CC and understandability through quantitative and qualitative studies, and through thought experiments and discussion. Empirically, we examine eight well-known open source Java systems by grouping the abstract control flow patterns of the methods into equivalence classes and exploring the results. We found several surprising results: first, the number of unique control flow patterns is relatively low; second, CC often does not accurately reflect the intricacies of Java control flow; and third, methods with high CC often have very low entropy, suggesting that they may be relatively easy to understand. These findings appear to challenge the widely-held belief that there is a clear-cut causal relationship between understandability and cyclomatic complexity, and suggest that at the very least CC and similar measures need to be reconsidered and refined if they are to be used as a metric for code understandability.
	Author = {Jurgen J. Vinju and Michael W. Godfrey},
	Title = {What does control flow really look like? Eyeballing the Cyclomatic Complexity Metric},
	Booktitle = {Ninth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)},
	Publisher = {IEEE Computer Society},
	Year = {2012},

Mark Hills, Paul Klint, Tijs van der Storm and Jurgen J. Vinju. A one-stop-shop for Software Evolution Tool Construction. ERCIM News 2012-88, 2012.

Real problems in software evolution render impossible a fixed, one-size-fits-all approach, and these problems are usually solved by gluing together various tools and languages. Such ad-hoc integration is cumbersome and costly. With the Rascal meta-programming language the Software Analysis and Transformation research group at CWI explores whether it is feasible to develop an approach that offers all necessary meta-programming and visualization techniques in a completely integrated language environment. We have applied Rascal with success in constructing domain specific languages and experimental refactoring and visualization tools.
  author    = {Mark Hills and
	       Paul Klint and
	       Tijs van der Storm and
	       Jurgen J. Vinju},
  title     = {A One-Stop-Shop for Software Evolution Tool Construction},
  journal   = {ERCIM News},
  volume    = {2012},
  number    = {88},
  year      = {2012},
  ee        = {http://ercim-news.ercim.eu/en88/special/a-one-stop-shop-for-software-evolution-tool-construction},
  bibsource = {DBLP, http://dblni-trier.de}


Mark Hills, Paul Klint, and Jurgen J. Vinju. A case of visitor versus interpreter pattern. In Proceedings of the 49th International Conference on Objects, Models, Components and Patterns, TOOLS, 2011.

We compare the Visitor pattern with the Interpreter pattern,investigating a single case in point for the Java language. We have produced and compared two versions of an interpreter for a programming language. The first version makes use of the Visitor pattern. The second version was obtained by using an automated refactoring to transform uses of the Visitor pattern to uses of the Interpreter pattern. We compare these two nearly equivalent versions on their maintenance characteristics and execution efficiency. Using a tailored experimental research method we can highlight differences and the causes thereof. The contributions of this paper are that it isolates the choice between Visitor and Interpreter in a realistic software project and makes the difference experimentally observable.
  title = {A Case of Visitor versus Interpreter Pattern},
  author = {Mark Hills and Paul Klint and Jurgen J. Vinju},
  year = {2011},
  booktitle = {Proceedings of the 49th International Conference on Objects, Models, Components and Patterns},
  series = {TOOLS},

Jeroen van den Bos, Mark Hills, Paul Klint, Tijs van der Storm, and Jurgen J. Vinju. Rascal: From Algebraic Specification to Meta-Programming AMMSE 2011, EPTCS Volume 56, pp 15-32, 2011.

Algebraic specification has a long tradition in bridging the gap between specification and programming by making specifications executable. Building on extensive experience in designing, implementing and using specification formalisms that are based on algebraic specification and term rewriting (namely Asf and Asf+Sdf), we are now focusing on using the best concepts from algebraic specification and integrating these into a new programming language: Rascal. This language is easy to learn by non-experts but is also scalable to very large meta-programming applications. We explain the algebraic roots of Rascal and its main application areas: software analysis, software transformation, and design and implementation of domain-specific languages. Some example applications in the domain of Model-Driven Engineering (MDE) are described to illustrate this.
  author    = "van den Bos, Jeroen and Hills, Mark and Klint, Paul and van der Storm, Tijs and Vinju, Jurgen J.",
  year      = "2011",
  title     = "Rascal: From Algebraic Specification to Meta-Programming",
  editor    = "Dur\'an, Francisco and Rusu, Vlad",
  booktitle = "Proceedings Second International Workshop on Algebraic Methods in Model-based Software Engineering (AMMSE)",
  series    = "Electronic Proceedings in Theoretical Computer Science",
  volume    = "56",
  publisher = "Open Publishing Association",
  pages     = "15-32",

Bas Basten, Paul Klint, and Jurgen Vinju. Ambiguity detection: Scaling to scannerless. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011.

Static ambiguity detection would be an important aspect of language workbenches for textual software languages. The challenge is that automatic ambiguity detection of context-free grammars is undecidable. Sophisticated approximations and optimizations do exist, but these do not scale to grammars for so-called "scannerless parsers", as of yet. We extend previous work on ambiguity detection for context-free grammars to cover disambiguation techniques that are typical for scannerless parsing, such as longest match and reserved keywords. This paper contributes a new algorithm for ambiguity detection in character-level grammars, a prototype implementation of this algorithm and validation on several real grammars. The total run-time of ambiguity detection for character-level grammars for languages such as C and Java is dramatically reduced by several orders of magnitude, without loss of precision. The result is that ambiguity detection for realistics grammars can be done efficiently and may now become a tool in language workbenches.
  title = {Ambiguity Detection: Scaling to Scannerless},
  author = {Bas Basten and Paul Klint and Jurgen Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2011,
  publisher = {Springer},
  series = {LNCS},

Bas Basten and Jurgen Vinju. Parse forest diagnostics with Dr. Ambiguity. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011.

In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity.
We first analyze the diversity of ambiguities in grammars for programming languages and the diversity of solutions to these ambiguities. Then we introduce Dr. Ambiguity: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5.
  title = {Parse Forest Diagnostics with Dr. Ambiguity},
  author = {Bas Basten and Jurgen Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2011,
  publisher = {Springer},
  series = {LNCS},

Mark Hills, Paul Klint, and Jurgen Vinju. RLSrunner: Linking Rascal with K for Program Analysis. In International Conference on Software Language Engineering (SLE), LNCS. Springer, 2011.

The Rascal meta-programming language provides a number of features supporting the development of program analysis tools. However, sometimes the analysis to be developed is already implemented by another system. In this case, Rascal can provide a useful front-end for this system, handling the parsing of the input program, any transformation (if needed) of this program into individual analysis tasks, and the display of the results generated by the analysis. In this paper we describe a tool, RLSRunner, which provides this integration with static analysis tools defined using the K framework, a rewriting-based framework for defining the semantics of programming languages.
  title = {RLSRunner: Linking Rascal with K for Program Analysis},
  author = {Mark Hills and Paul Klint and Jurgen Vinju},
  booktitle = {International Conference on Software Language Engineering (SLE)},
  year = 2011,
  publisher = {Springer},
  series = {LNCS},


Stijn de Gouw, Frank de Boer, and Jurgen Vinju. Prototyping a tool environment for run-time assertion checking in jml with communication histories. In 12th Workshop on Formal Techniques for Java-like Programs, 2010.

In this paper we present prototype tool-support for the run-time assertion checking of the Java Modeling Language (JML) extended with communication histories specified by attribute grammars. Our tool suite integrates Rascal, a meta programming language and ANTLR, a popular parser generator. Rascal instantiates a generic model of history updates for a given Java program annotated with history specifications. ANTLR is used for the actual evaluation of history assertions.
  Author = {Stijn de Gouw and Frank de Boer and Jurgen Vinju},
  Booktitle = {12th Workshop on Formal Techniques for Java-like Programs},
  Title = {Prototyping a tool environment for run-time assertion checking in JML with Communication Histories},
  Year = {2010}}

Diego Ordóñez Camacho, Kim Mens, Mark van den Brand, and Jurgen Vinju. Automated Generation of Program Translation and Verification Tools using Annotated Grammars. Science of Computer Programming, 72(1):3-20, jan 2010.

Automatically generating program translators from source and target language specifications is a non-trivial problem. In this paper we focus on the problem of automating the process of building translators between operations languages, a family of DSLs used to program satellite operations procedures. We exploit their similarities to semi-automatically build transformation tools between these DSLs. The input to our method is a collection of annotated context-free grammars. To simplify the overall translation process even more, we also propose an intermediate representation common to all operations languages. Finally, we discuss how to enrich our annotated grammars model with more advanced semantic annotations to provide a verification system for the translation process. We validate our approach by semi-automatically deriving translators between some real world operations languages, using the prototype tool which we implemented for that purpose.
	Title = {Automated Generation of Program Translation and Verification Tools using Annotated Grammars},
	Author = {Diego Ord\`o\~nez Camacho and Kim Mens and Mark van den Brand and Jurgen Vinju},
	Doi = {http://dx.doi.org/10.1016/j.scico.2009.10.003},
	Journal = {Science of Computer Programming},
	Publisher = {Elsevier}
	Month = {jan},
	Number = {1},
	Pages = {3-20},
	Volume = {72},
	Year = {2010},

Paul Klint, Tijs van der Storm, and Jurgen Vinju. On the Impact of DSL Tools on the Maintainability of Language Implementations. In Proceedings of the tenth workshop on Language Descriptions Tools and Applications, 2010.

Does the use of DSL tools improve the maintainability of language implementations compared to implementations from scratch? We present empirical results on aspects of maintainability of six implementations of the same DSL using different languages (Java, JavaScript, C#) and DSL tools (ANTLR, OMeta, Microsoft “M”). Our evaluation indicates that the maintainability of language implementations is indeed higher when constructed using DSL tools.
  Author = {Paul Klint and Tijs van der Storm and Jurgen Vinju},
  Booktitle = {Proceedings of the tenth workshop on Language Descriptions Tools and Applications (LDTA)},
  Title = {On the Impact of DSL tools on the Maintainability of Language Implementations.},
	Series = {Electronic Notes in Theoretical Computer Science},
  Publisher = {Elsevier}
  Year = {2010}

Vincent Lussenburg, Tijs van der Storm, Jurgen J. Vinju, and Jos Warmer. Mod4j: A Qualitative Case Study of Model-driven Software Development. In Dorina Petriu, Nicolas Rouquette, and Øystein Haugen, editors, Model Driven Engineering Languages and Systems, 13th International Conference, MODELS 2010, Oslo, Norway, October 3-8, 2010. Proceedings, Lecture Notes in Computer Science. Springer, 2010.

Model-driven software development (MDSD) has been on the rise over the past few years and is becoming more and more mature. However, evaluation in real-life industrial context is still scarce. In this paper, we present a case-study evaluating the applicability of a state-of-the-art MDSD tool, MOD4J, a suite of domain specific languages (DSLs) for developing administrative enterprise applications. MOD4J was used to partially rebuild an industrially representative application. This implementation was then compared to a base implementation based on elicited success criteria. Our evaluation leads to a number of recommendations to improve MOD4J. We conclude that having extension points for hand-written code is a good feature for a model driven software development environment.
	  Author = {Vincent Lussenburg and Tijs {van der Storm} and Jurgen J. Vinju and Jos Warmer},
	  Title = {Mod4J: A Qualitative Case Study of Model-Driven Software Development},
	  Booktitle = {Model Driven Engineering Languages and Systems, 13th International Conference, MODELS 2010, Oslo, Norway, October 3-8, 2010. Proceedings},
	  Editor = {Dorina Petriu and Nicolas Rouquette and {\O}ystein Haugen},
	  Publisher = {Springer},
	  Series = {Lecture Notes in Computer Science},
	  Year = {2010}

Bas Basten and Jurgen Vinju. Faster ambiguity detection by grammar filtering. In Claus Brabrand and Pierre-Etienne Moreau, editors, Proceedings of the tenth workshop on Language Descriptions Tools and Applications, 2010.

Real programming languages are often defined using ambiguous context-free grammars. Some ambiguity is intentional while other ambiguity is accidental. A good grammar development environment should therefore contain a static ambiguity checker to help the grammar engineer. Ambiguity of context-free grammars is an undecidable property. Nevertheless, various imperfect ambiguity checkers exist. Exhaustive methods are accurate, but suffer from non-termination. Termination is guaranteed by approximative methods, at the expense of accuracy. In this paper we combine an approximative method with an exhaustive method. We present an extension to the Noncanonical Unambiguity Test that identifies production rules that do not contribute to the ambiguity of a grammar and show how this information can be used to significantly reduce the search space of exhaustive methods. Our experimental evaluation on a number of real world grammars shows orders of magnitude gains in efficiency in some cases and negligible losses of efficiency in others.
	  Author = {Bas Basten and Jurgen Vinju},
	  Title = {Faster Ambiguity Detection by Grammar Filtering},
	  Booktitle = {Proceedings of the tenth workshop on Language Descriptions Tools and Applications},
	  Editor = {Claus Brabrand and Pierre-Etienne Moreau},
	  Publisher = {Elsevier Electronic Notes in Theoretical Computer Science},
	  Year = {2010}


Paul Klint, Tijs van der Storm, and Jurgen Vinju. EASY Meta-programming with Rascal. Leveraging the Extract-Analyze-Synthesize Paradigm for Meta-programming. In Proceedings of the 3rd International Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE'09), LNCS. Springer, 2010.

    title = {EASY Meta-Programming with Rascal. Leveraging the Extract-Analyze-SYnthesize Paradigm for Meta-Programming},
    author = {Paul  Klint and Tijs van der Storm and Jurgen J.  Vinju},
    year = {2010},
    booktitle = {Proceedings of the 3rd International Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE'09)},
    location = {Braga, Portugal},
    series = {LNCS},
    publisher = {Springer},

Paul Klint, Tijs van der Storm, and Jurgen J. Vinju. Rascal: A Domain Specific Language for Source Code Analysis and Manipulation. In Ninth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2009, Edmonton, Alberta, Canada, September 20-21, 2009, pages 168-177. IEEE Computer Society, 2009.

Many automated software engineering tools require tight integration of techniques for source code analysis and manipulation. State-of-the-art tools exist for both, but the domains have remained notoriously separate because different computational paradigms fit each domain best. This impedance mismatch hampers the development of each new problem solution since desired functionality and scalability can only be achieved by repeated, ad hoc, integration of different techniques. RASCAL is a domain-specific language that takes away most of this boilerplate by providing high-level integration of source code analysis and manipulation on the conceptual, syntactic, semantic and technical level. We give an overview of the language and assess its merits by implementing a complex refactoring.
	  Author = {Paul Klint and Tijs van der Storm and Jurgen J. Vinju},
	  Title = {RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation},
	  Booktitle = {Ninth IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)},
	  Doi = {http://doi.ieeecomputersociety.org/10.1109/SCAM.2009.28},
	  Isbn = {978-0-7695-3793-1},
	  Pages = {168-177},
	  Publisher = {IEEE Computer Society},
	  Year = {2009},

Paul Klint, Jurgen J. Vinju, and Tijs van der Storm. Language Design for Meta-programming in the Software Composition Domain. In Alexandre Bergel and Johan Fabry, editors, Software Composition, 8th International Conference, SC 2009, Zurich, Switzerland, July 2-3, 2009. Proceedings, volume 5634 of Lecture Notes in Computer Science, pages 1-4. Springer, 2009.

	  Author = {Paul Klint and Jurgen J. Vinju and Tijs van der Storm},
	  Title = {Language Design for Meta-programming in the Software Composition Domain},
	  Booktitle = {Software Composition},
	  Doi = {http://dx.doi.org/10.1007/978-3-642-02655-3_1},
	  Editor = {Alexandre Bergel and Johan Fabry},
	  Isbn = {978-3-642-02654-6},
	  Pages = {1-4},
	  Publisher = {Springer},
	  Series = {Lecture Notes in Computer Science},
	  Volume = {5634},
	  Year = {2009}

Giorgios Economopoulos, Paul Klint, and Jurgen J. Vinju. Faster scannerless GLR parsing. In Oege de Moor and Michael I. Schwartzbach, editors, Compiler Construction, 18th International Conference, CC 2009, York, UK, March 22-29, 2009. Proceedings, volume 5501 of Lecture Notes in Computer Science, pages 126-141. Springer, 2009.

Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.
	  Author = {Giorgios R. Economopoulos and Paul Klint and Jurgen J. Vinju},
	  Title = {Faster Scannerless {GLR} Parsing},
	  Booktitle = {Compiler Construction (CC)},
	  Doi = {http://dx.doi.org/10.1007/978-3-642-00722-4_10},
	  Editor = {Oege de Moor and Michael I. Schwartzbach},
	  Isbn = {978-3-642-00721-7},
	  Pages = {126-141},
	  Publisher = {Springer},
	  Series = {Lecture Notes in Computer Science},
	  Volume = {5501},
	  Year = {2009},

Philippe Charles, Robert M. Fuhrer, Stanley M. Sutton Jr., Evelyn Duesterwald, and Jurgen Vinju. Accelerating the Creation of Customized, Language-specific IDEs in Eclipse. In Shail Arora and Gary T. Leavens, editors, Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA., pages 191-206, 2009.

Full-featured integrated development environments have become critical to the adoption of new programming languages. Key to the success of these IDEs is the provision of services tailored to the languages. However, modern IDEs are large and complex, and the cost of constructing one from scratch can be prohibitive. Generators that work from language specifications reduce costs but produce environments that do not fully reflect distinctive language characteristics. We believe that there is a practical middle ground between these extremes that can be effectively addressed by an open, semi-automated strategy to IDE development. This strategy is to reduce the burden of IDE development as much as possible, especially for internal IDE details, while opening opportunities for significant customizations to IDE services. To reduce the effort needed for customization we provide a combination of frameworks, templates, and generators. We demonstrate an extensible IDE architecture that embodies this strategy, and we show that this architecture can be used to produce customized IDEs, with a moderate amount of effort, for a variety of interesting languages.
	  Author = {Philippe Charles and Robert M. Fuhrer and Stanley M. Sutton Jr. and Evelyn Duesterwald and Jurgen Vinju},
	  Title = {Accelerating the Creation of Customized, Language-Specific IDEs in Eclipse},
	  Booktitle = {Proceedings of the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)},
	  Editor = {Shail Arora and Gary T. Leavens},
	  Pages = {191-206},
	  Year = {2009}


Paul Klint, Taeke Kooiker, and Jurgen J. Vinju. Language Parametric Module Management for IDEs. Electronic Notes in Theoretical Computer Science, 203(2):3-19, 2008.

An integrated development environment (IDE) monitors all the changes that a user makes to source code modules and responds accordingly by flagging errors, by re-parsing, by rechecking, or by recompiling modules and by adjusting visualizations or other information derived from a module. A module manager is the central component of the IDE that is responsible for this behavior. Although the overall functionality of a module manager in a given IDE is fixed, its actual behavior strongly depends on the programming languages it has to support. What is a module? How do modules depend on each other? What is the effect of a change to a module? We propose a concise design for a language parametric module manager: a module manager that is parameterized with the module behavior of a specific language. We describe the design of our module manager and discuss some of its properties. We also report on the application of the module manager in the construction of IDEs for the specification language ASF+SDF as well as for Java. Our overall goal is the rapid development (generation) of IDEs for programming languages and domain specific languages. The module manager presented here represents a next step in the creation of such generic language workbenches.
    title = {Language Parametric Module Management for IDEs},
    author = {Paul Klint and Taeke Kooiker and Jurgen J. Vinju},
    year = {2008},
    doi = {http://dx.doi.org/10.1016/j.entcs.2008.03.041},
    tags = {programming languages, SDF, code generation, language design, programming, Meta-Environment, ASF+SDF, Java, IDE, generic programming},
    journal = {Electronic Notes in Theoretical Computer Science},
    volume = {203},
    number = {2},
    pages = {3-19},


Jurgen J. Vinju. Annotated parse trees for a language parametric IDE. In PLIDE, November 2007.


M.G.J.van den Brand, M.Bruntink, G.R.Economopoulos, H.A.deJong, P.Klint, T. Kooiker, T. van der Storm, and Jurgen J. Vinju. Using The Meta-environment for Maintenance and Renovation. In Proceedings of the Conference on Software Maintenance and Reengineering (CSMR'07). IEEE Computer Society Press, 2007.

The Meta-Environment is a flexible framework for lan- guage development, source code analysis and source code transformation. We highlight new features and demonstrate how the system supports key functionalities for software evolution: fact extraction, software analysis, visualization, and software transformation.
    author = {M.G.J. van den Brand and M. Bruntink and G.R. Economopoulos and H.A. de Jong and P. Klint and T. Kooiker and T. van der Storm and J.J. Vinju},
    title = {Using {T}he {M}eta-Environment for {M}aintenance and {R}enovation},
    booktitle = {Proceedings of the 11th European Conference on Software Maintenance and Reengineering ({CSMR'07})},
    pages = {331--332},
    year = {2007},
    publisher = {IEEE Computer Society Press}



Jurgen J. Vinju and J.R. Cordy. How to make a bridge between transformation and analysis technologies? In J.R. Cordy, R. Lämmel, and A. Winter, editors, Transformation Techniques in Software Engineering, number 05161 in Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany, 2006.

	  Author = {J.J. Vinju and J.R. Cordy},
	  Booktitle = {Transformation Techniques in Software Engineering},
	  Editor = {J.R. Cordy and R. L{\"a}mmel and A. Winter},
	  Issn = {1862-4405},
	  Number = {05161},
	  Publisher = {Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany},
	  Series = {Dagstuhl Seminar Proceedings},
	  Title = {How to make a bridge between transformation and analysis technologies?},
	  Year = {2006}

Diego Ordóñez Camacho, Kim Mens, Mark van den Brand, and Jurgen J. Vinju. Automated derivation of translators from annotated grammars. In Language Descriptions Tools and Applications, ENCTS, pages 121-137, 2006.


M.G.J. van den Brand, A.T. Kooiker, Jurgen J. Vinju, and N.P. Veerman. A Language Independent Framework for Context-sensitive Formatting. In CSMR '06: Proceedings of the Conference on Software Maintenance and Reengineering, pages 103-112, Washington, DC, USA, 2006. IEEE Computer Society Press.

Automated formatting is an important technique for the software maintainer. It is either applied separately to improve the readability of source code, or as part of a source code transformation tool chain. In this paper we report on the application of generic tools for constructing formatters. In an industrial setting automated formatters need to be tailored to the requirements of the customer. The (legacy) programming language or dialect and the corporate formatting conventions are specific and non-negotiable. Can generic formatting tools deal with such unexpected requirements? Driven by an industrial case of nearly 80 thousand lines of Cobol code, several limitations in existing formatting technology have been addressed. We improved its flexibility by replacing a generative phase by a generic tool, and we added a little expressiveness to the formatting back end. Most importantly, we employed a multi-stage formatting framework that can cope with any kind of formatting convention using more computational power.
  title={A language independent framework for context-sensitive formatting},
  author={van den Brand, Mark GJ and Kooiker, A Taeke and Vinju, Jurgen J and Veerman, Niels P},
  booktitle={Software Maintenance and Reengineering, 2006. CSMR 2006. Proceedings of the 10th European Conference on},

Jurgen J. Vinju. UPTR: a simple parse tree representation format. In Software Transformation Systems Workshop, October 2006.



J.J.Vinju. Analysis and Transformation of Source Code by Parsing and Rewriting. PhD thesis, Universiteit van Amsterdam, November 2005.

In this thesis the subject of study is source code. More precisely, I am interested in tools that help in describing, analyzing and transforming source code. The overall question is how well qualified and versatile the programming language ASF+SDF is when applied to source code analysis and transformation. The main technical issues that are addressed are ambiguity of context-free languages and improving two important quality attributes of analyses and transformations: conciseness and fidelity. The overall result of this research is a version of the language that is better tuned to the domain of source code analysis and transformation, but is still firmly grounded on the original: a hybrid of context-free grammars and term rewriting. The results that are presented have a broad technical spectrum because they cover the entire scope of ASF+SDF. They include disambiguation by filtering parse forests, the type-safe automation of tree traversal for conciseness, improvements in language design resulting in higher resolution and fidelity, and better interfacing with other programming environments. Each solution has been validated in practice, by me and by others, mostly in the context of industrial sized case studies. In this introductory chapter we first set the stage by sketching the objectives and requirements of computer aided software engineering. Then the technological background of this thesis is introduced: generic language technology and ASF+SDF. We zoom in on two particular technologies: parsing and term rewriting. We identify research questions as we go and summarize them at the end of this chapter.
	  Author = {J.J. Vinju},
	  Month = nov,
	  Supervisor = {Paul Klint and {Mark van} den Brand},
	  School = {Universiteit van Amsterdam},
	  Title = {Analysis and Transformation of Source Code by Parsing and Rewriting},
	  Year = {2005}}

M.G.J. van den Brand, A.T. Kooiker, N.P. Veerman, and Jurgen J. Vinju. An industrial application of context-sensitive formatting. In International Conference on Software Maintenance, 2005.


M. Bravenboer, R. Vermaas, Jurgen J. Vinju, and E. Visser. Generalized type-based disambiguation of meta programs with concrete object syntax. In Generative Programming and Component Engineering (GPCE), 2005.

In meta programming with concrete object syntax, object-level programs are composed from fragments written in concrete syntax. The use of small program fragments in such quotations and the use of meta-level expressions within these fragments (anti-quotation) often leads to ambiguities. This problem is usually solved through explicit disambiguation, resulting in considerable syntactic overhead. A few systems manage to reduce this overhead by using type information during parsing. Since this is hard to achieve with traditional parsing technology, these systems provide specific combinations of meta and object languages, and their implementations are difficult to reuse. In this paper, we generalize these approaches and present a language independent method for introducing concrete object syntax without explicit disambiguation. The method uses scannerless generalized-LR parsing to parse meta programs with embedded objectlevel fragments, which produces a forest of all possible parses. This forest is reduced to a tree by a disambiguating type checker for the meta language. To validate our method we have developed embeddings of several object languages in Java, including AspectJ and Java itself.
	  Author = {M. Bravenboer and R. Vermaas and J.J. Vinju and E. Visser},
	  Booktitle = {Generative Programming and Component Engineering (GPCE)},
	  Title = {Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax},
	  Year = {2005}

M.G.J. van den Brand, B.Cornelissen, P.A. Olivier, and J.J Vinju. TIDE: a Generic Debugging Framework. In J. Boyland and G. Hedin, editors, Language Design Tools and Applications, June 2005.

A language specific interactive debugger is one of the tools that we expect in any mature programming environment. We present applications of TIDE: a generic debugging framework that is related to the ASF+SDF Meta-Environment. TIDE can be applied to different levels of debugging that occur in language design. Firstly, TIDE was used to obtain a full-fledged debugger for language specifications based on term rewriting. Secondly, TIDE can be instantiated for any other programming language, including but not limited to domain specific languages that are defined and implemented using ASF+SDF. We demonstrate the common debugging interface, and indicate the amount of effort needed to instantiate new debuggers based on TIDE.
	  Author = {Brand, {M.G.J. van den} and B. Cornelissen and Olivier, P.A. and Vinju, J.J},
	  Booktitle = {Language Design Tools and Applications},
	  Series = {Electronic Notes in Theoretical Computer Science},
	  Publisher = {Elsevier},
	  Editor = {J. Boyland and G. Hedin},
	  Month = jun,
	  Title = {TIDE: a generic debugging framework},
	  Year = 2005

M.G.J. van den Brand, P.E. Moreau, and Jurgen J. Vinju. A Generator of Efficient Strongly Typed Abstract Syntax Trees in Java. IEE Proceedings-Software, 2005.

Abstract syntax trees are a very common data-structure in language related tools. For example compilers, interpreters, documentation generators, and syntax-directed editors use them extensively to extract, transform, store and produce information that is key to their functionality. We present a Java back-end for ApiGen, a tool that generates implementations of abstract syntax trees. The generated code is characterized by strong typing combined with a generic interface and maximal sub-term sharing for memory efficiency and fast equality checking. The goal of this tool is to obtain safe and more efficient programming interfaces for abstract syntax trees. The contribution of this work is the combination of generating a strongly typed data-structure with maximal sub-term sharing in Java. Practical experience shows that this approach is beneficial for extremely large as well as smaller data types.
      title = ,
      author = {Van Den Brand, Mark and Moreau, Pierre-Etienne and Vinju, Jurgen},
      booktitle = ,
      publisher = {IEEE},
      pages = {70--87},
      journal = {IEE Proceedings - Software Engineering},
      volume = {152},
      number = {2 },
      year = {2005},

Jurgen J. Vinju. Type-driven automatic quotation of concrete object code in meta programs. In N. Guelfi and A. Savidis, editors, Rapid Integration of Software Engineering techniques, volume 3475 of LNCS, 2005.

Meta programming can be facilitated by the ability to represent program fragments in concrete syntax instead of abstract syntax. The resulting meta programs are more self-documenting. One caveat in concrete meta programming is the syntactic separation between the meta language and the object language. To solve this problem, many meta programming systems use quoting and anti-quoting to indicate precisely where level switches occur. These “syntactic hedges” can obfuscate the concrete program fragments. This paper describes an algorithm for inferring quotes, such that the meta programmer no longer needs to explicitly indicate transitions between the meta and object languages.
    title = {Type-Driven Automatic Quotation of Concrete Object Code in Meta Programs},
    author = {Jurgen J. Vinju},
    year = {2005},
    pages = {97-112},
    booktitle = {Rapid Integration of Software Engineering Techniques, Second International Workshop, RISE 2005, Heraklion, Crete, Greece, September 8-9, 2005, Revised Selected Papers},
    editor = {Nicolas Guelfi and Anthony Savidis},
    volume = {3943},
    series = {Lecture Notes in Computer Science},
    publisher = {Springer},
    isbn = {3-540-34063-7},

Jurgen J. Vinju, Paul Klint,Tijs van deri Storm. Term Rewriting Meets Aspect Oriented Programming. In Aart Middeldorp, Vincent van Oostrom, Femke van Raamsdonk, and Roel C. de Vrijer, editors, Processes, Terms and Cycles: Steps on the Road to Infinity, Essays Dedicated to Jan Willem Klop, on the Occasion of His 60th Birthday, volume 3838 of Lecture Notes in Computer Science. Springer, 2005.



M.G.J. van den Brand and J.J.Vinju. Generation by Transformation in ASF+SDF. In GPCE Workshop on Software Transformation Systems (STS), 2004.



M.G.J. van den Brand, P.Klint, and J.J. Vinju. Term Rewriting with Traversal Functions. ACM Transactions on Software Engineering and Methodology (TOSEM), 12(2):152-190, 2003.

Term rewriting is an appealing technique for performing program analysis and program transformation. Tree (term) traversal is frequently used but is not supported by standard term rewriting. We extend many-sorted, first-order term rewriting with traversal functions that automate tree traversal in a simple and type safe way. Traversal functions can be bottom-up or top-down traversals and can either traverse all nodes in a tree or can stop the traversal at a certain depth as soon as a matching node is found. They can either define sort preserving transformations or mappings to a fixed sort. We give small and somewhat larger examples of traversal functions and describe their operational semantics and implementation. An assessment of various applications and a discussion conclude the paper.
  author = {van den Brand, Mark and Klint, Paul and Vinju, Jurgen J.},
  journal = {ACM Trans. Softw. Eng. Methodol.},
  number = 2,
  pages = {152-190},
  title = {Term rewriting with traversal functions.},
  volume = 12,
  year = 2003

M.G.J. van den Brand, S. Klusener, L. Moonen, and Jurgen J. Vinju. Generalized Parsing and Term Rewriting - Semantics Directed Disambiguation. In Barret Bryant and Joãao Saraiva, editors, Third Workshop on Language Descriptions Tools and Applications, Electronic Notes in Theoretical Computer Science, 2003.

Generalized parsing technology provides the power and flexibility to attack real-world parsing applications. However, many programming languages have syntactical ambiguities that can only be solved using semantical analysis. In this paper we propose to apply the paradigm of term rewriting to filter ambiguities based on semantical information. We start with the definition of a representation of ambiguous derivations. Then we extend term rewriting with means to handle such derivations. Finally, we apply these tools to some real world examples, namely C and COBOL. The resulting architecture is simple and efficient as compared to semantic directed parsing.
	  Author = {Brand, {M.G.J. van den} and Klusener, S. and Moonen, L. and Vinju, J.J.},
	  Title = { {G}eneralized {P}arsing and {T}erm {R}ewriting - {S}emantics {D}irected {D}isambiguation},
	  Booktitle = {Third Workshop on Language Descriptions Tools and Applications},
	  Editor = {Barret Bryant and Jo{\~a}o Saraiva},
	  Series = {Electronic Notes in Theoretical Computer Science},
	  Publisher = {Elsevier}
	  Year = 2003

M.G.J. van den Brand, P.E. Moreau, and Jurgen J. Vinju. Environments for Term Rewriting Engines for Free! In R. Nieuwenhuis, editor, Proceedings of the 14th International Conference on Rewriting Techniques and Applications (RTA'03). Springer-Verlag, 2003.



M.G.J. van den Brand, P. Klint, and Jurgen J. Vinju. Term Rewriting with Type-safe Traversal Functions. In B. Gramlich and S. Lucas, editors, Second International Workshop on Reduction Strategies in Rewriting and Programming (WRS 2002), volume 70 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, 2002.


M.G.J. van den Brand, J. Scheerder, Jurgen J. Vinju, and E. Visser. Disambiguation Filters for Scannerless Generalized LR Parsers. In R. Nigel Horspool, editor, Compiler Construction, volume 2304 of LNCS, pages 143-158. Springer-Verlag, 2002.

In this paper we present the fusion of generalized LR parsing and scannerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax definition becomes a full declarative description of a language. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects.


M.G.J. van den Brand, A. van Deursen, J. Heering, H.A. de Jong, M. de Jonge, T. Kuipers, P. Klint, L. Moonen, P. A. Olivier, J. Scheerder, Jurgen J. Vinju, E. Visser, and J. Visser. The ASF+SDF Meta-Environment: a Component-Based Language Development Environment. In R. Wilhelm, editor, CC'01, volume 2027 of LNCS, pages 365-370. Springer-Verlag, 2001.

The ASF+SDF Meta-Environment is an interactive development environment for the automatic generation of interactive systems for constructing language definitions and generating tools for them. Over the years, this system has been used in a variety of academic and commercial projects ranging from formal program manipulation to conversion of COBOL systems. Since the existing implementation of the Meta-Environment started exhibiting more and more characteristics of a legacy system, we decided to build a completely new, component-based, version. We demonstrate this new system and stress its open architecture.
author =     {Brand, {M.G.J. van den} and Deursen, {A. van} and J. Heering and
              Jong, {H.A. de} and Jonge, {M. de} and T. Kuipers and
              P. Klint and L. Moonen and P.A. Olivier and J. Scheerder and
              J.J. Vinju and E. Visser and J. Visser},
title =     {The {ASF}+{SDF} {M}eta-{E}nvironment: a {C}omponent-{B}ased
             {L}anguage {D}evelopment {E}nvironment},
editor =    {R. Wilhelm},
series =    {Lecture Notes in Computer Science},
Volume =    {2027},
booktitle = {Compiler Construction (CC '01)},
year =      {2001},
pages =	    {365--370},
publisher = {Springer-Verlag}


M.G.J. van den Brand and Jurgen J. Vinju. Rewriting with Layout. In Claude Kirchner and Nachum Dershowitz, editors, Proceedings of RULE2000, 2000.

Rewriting technology has proved to be an adequate and powerful mechanism to perform source code transformations. These transformations can not only be efficiently implemented using rewriting technology, but it also provides a firmer grip on the source code syntax. However, an important shortcoming of rewriting technology is that source code comments and layout are lost during rewriting. We propose ``rewriting with layout'' to solve this problem. We present a rewriting algorithm that keeps the layout of sub-terms that are not rewritten, and reuses the layout occurring in the right-hand side of the rewrite rules.


J.J Vinju. Optimizations of List Matching in the ASF+SDF compiler. Master's thesis, University of Amsterdam, September 1999.


Two things researchers in software engineering should do is publish their research prototypes as open-source software and immerse themselves in the activity of software engineering. The reason for the first is that software lends itself perfectly for sharing, especially if its government funded software. There is no excuse not to do this. The reason for the second is that software engineering is so wickedly complex and rapidly evolving, that without doing it yourself it is easy to misunderstand what the problems are or to recognize good solutions.

I am contributing or have contributed to the following projects:

This is a selection of presentation slides.


OSSMETER Pitch EU Concertation Meeting - Turning cloud research into innovative software & services, March 25th 2015, Brussels, Belgium.

Software Engineering: The War Against Complexity, Keynote Open Tool Demonstrations Day Cha-Q project; Change-centric Software Engineering, Antwerp University, February 24, 2015, Antwerp, Belgium.

Public/Private Collaboration {in,for,with} Software Engineering EARMA conference, Leiden, June 30th, The Netherlands

Challenges and Opportunities of Big Software-based Innovation NWO Big Software Match Making Day, July 1st, 2015, Utrecht, The Netherlands.


SEN Symposium Introduction, December 3, 2014, CWI, Amsterdam.

Optimizing Hash-tries for Fast and Lean Immutable Collection Libraries, IFIP WG2.4, Stellenbosch, SA.

Software Research at CWI, Breakfast Meeting Amsterdam Economic Board, August 28, 2014, Amsterdam.


M3: an open model for measuring source code artifacts, December 17, BENEVOL in Mons, Belgium.

CWI SWAT & Rascal, November 14th, 2013, NWO Special Interest Group Software Engineering, Nikhef, Amsterdam, The Netherlands.

Introducing SLE 2014 in Vasteras, Sweden

Debugging and all that for Master Software Engineering, May 2nd, Centrum Wiskunde & Informatica,

Slides on Modularity for Bachelor Computer Science, Jan 13th, Universiteit van Amsterdam, The Netherla

Software Analysis and Transformation with RascalJan 11th, 2013, BioAssist Meeting, Utrecht, The Netherlands.


Introduction to Rascal and Eyeballing the Cyclomatic Complexity Metric May 11th, 2012, INRIA Lille Software Engineering.

Constructing specialist software tools using Rascal: Metrics. April 24 2012, Sogyo

The mechanics of building a DSL using Rascal. April 17th 2012, IPA Spring Days, Gelderen

Professional Feedback. March 29th, 2012, CSMR Doctoral Symposium, Szeged (Hungary).


A case of visitor versus interpreter pattern. June 30th, 2011. Zurich. TOOLS conference. This presentation expains our paper on comparing the impact of choosing between the two functionally inter-changeable design patterns on maintainability of an AST-based language interpreter.


UPTR: a universal parse tree representation format (relevant to Parsing@SLE audience about how to compare parsers), Software Transformation Systems Workshop, Vancouver.


Realities of Scientific Software Engineering (an old presentation on software development in an academic environment, as presented to the researchers of the Proteo group in INRIA-LORIA, Nancy, France)

PhD theses

Master's theses

  1. Iwan Flameling (2015), An automatic CSRF protection tool.
  2. Omar Pakker (2015), Graph-Based Querying On top of the Entity Framework
  3. Maria Gouseti (2014), A General Framework for Concurrency Aware Refactorings
  4. Arie van der Veek (2013), Coupling as a trade-off in an Enterprise Service Bus
  5. Peter Klijn (2013), How accurately do Java profilers predict runtime performance bottlenecks?
  6. Richard Bos (2013), Finding lightweight opportunities for parallelism in .NET C#
  7. Vlad Lep (2013), Noise detection in software engineering datasets using Gaussian Processes (Magiel Bruntink, co-supervisor)
  8. Ioana Rucareanu (2013), PHP: Securing Against SQL Injection (Mark Hills, co-supervisor)
  9. Henk Bosman (2013), Predicting bugs and issues with automated code reviews
  10. Dimitrios Kyritsis (2013), PHP re-factoring: HTML templates (Mark Hills co-supervisor)
  11. Chris Mulder (2013), Reducing Dynamic Feature Usage in PHP Code (Mark Hills co-supervisor)
  12. Christos K. Tsigkanos (2013), Stateful discovery of attack manifestations on networks and systems
  13. Koen G.L. Hanselman (2013) Detection of the Abstract Factory Pattern: an experimental study
  14. Vladimir Komsiyski (2013), Binary Differencing for Media Files
  15. Hans van Bakel (2012) Reducing coupling to lower maintenance effort
  16. Jorge Nicolas Barrionuevo (2012) The Core of Open Source Systems
  17. Pieter Bregman (2012) Onderhoudbaarheid vs. betrouwbaarheid "een case study"
  18. Dennis van Leeuwen (2012) Comprehensible Method Names: Focusing on the Nouns
  19. Ashim Shahi (2012) Classifying the classifiers for file fragment classification (Jeroen van den Bos co-supervisor)
  20. Luuk Stevens (2012) Automatically Analyzing the Consistency and Preciseness of Class Names
  21. Jouke Stoel (2012) Exploring the Detection of Method Naming Anomalies
  22. Aart van den Dolder (2011), Bepaling van de geschiktheid van Oracle Forms applicaties voor inbeheername door middel van automatische code review volgens het SIG Maintainability model
  23. Randy Fluit (2011) Differencing Context-free Grammars (Tijs van der Storm co-supervisor)
  24. Rob van der Horst (2011), The Influence of First-Class Relations on Coupling and Cohesion : A Case Study
  25. Marvin Jacobsz (2011), Een performance analyse van "Hiphop for PHP"
  26. Christian Köppe (2011), DoKRe - A Method for Automated Domain Knowledge Recovery from Source Code
  27. Jeroen Bach (2010), Theory and experimental evaluation of object-­relational mapping optimization techniques : How to ORM and how not to ORM
  28. Steven Raemaekers (2010), Testing Semantic Clone Detection Candidates
  29. Nico Schoenmaker (2010), Over de understandability van subtype polymorfisme in objectgeoriëenteerde systemen
  30. Waruzjan Shahbazian (2010), Rminer: An integrated model for repository mining using Rascal : A feasibility study
  31. Sander Vellinga (2010), Identifying behavior changes after PHP language migration using static source-code analysis
  32. David Walschots (2010), A case study on the cost and benefits of bus-oriented architectures
  33. Maarten Wullink (2010), Data model Maintainability : A comparative study of maintainability metrics
  34. Jeldert Pol (2009), Extreme Team Collaboration : Synchronous collaboration in Eclipse(Paul Klint supervisor)
  36. Vincent Lussenburg (2009), Mod4j : A qualitative case study of industrially applied model-driven software development
  37. Karel Pieterson (2009), Leerbaarheid van Programmeertalen
  38. Arend van Beelen (2008), Distributed Database Design for Social Network Graphs
  39. Jan Derriks (2007) Fortran grammatica-extractie (Paul Klint co-supervisor)
  40. Anton Gerdessen (2007) Framework comparison method. Comparing two frameworks based on technical domains, focussing on customisability and modifiability
  41. Ricardo Lindooren (2007) Testability of Dependency injection. An attempt to find out how the testability of source code is affected when the dependency injection principle is applied to it.
  42. Arjen van Schie (2007) Programming for a parallel future. "Improving the modularity and encapsulation for the implementation of concurrency concerns."
  43. Ron Valkering (2007) Syntax Error Handling in Scannerless Generalized LR Parsers
  44. Renze de Vries (2007) Service Oriented Architecture Degradatie onderhoudbaarheid referentiearchitectuur
  45. Paul Bakker (2006) The Framework Productivitity Measurement Method. Meten van de productiviteitwinst bij het gebruik van een webframework
  46. Sannie Kwakman (2006) Variability through Aspect Oriented Programming in J2ME game development (confidential)
  47. Maarten Pater (2006) Searching in public protein databases for novel Peroxisomal PTS1 containing Proteins (confidential) (Jan van Eijck co-supervisor)
  48. Bart den Haak (2006) Dynamic configurable web visualization of complex data relations
  49. Tim Prijn (2006) Framework Software Quality Analysis: A Case Study Analyzing the software quality supported by a J2EE meta-framework
  50. Julien Rentrop (2006) Software Metrics as Benchmarks for Source Code Quality of Software
  51. Youri op 't Roodt (2006) The effect of Ajax on performance and usability in web environments
  52. Said Lakhloufi (2004) JFC/Swing Editor voor ASF+SDF Meta-Environment (Mark van den Brand, Taeke Kooiker, Hayco de Jong, Paul Klint co-supervisors)
  53. Bas Cornelissen (2004), Using TIDE to Debug ASF+SDF at Multiple Levels (Paul Klint co-supervisor)

Bachelor's theses

  1. Elephtera Hendriks (2009), Parsing macros without the pre-processor



 Email Jurgen.Vinju@cwi.nl
 Snailmail Science Park 123
P.O. Box 95079
 Visit Science Park 123
Room L221
 Phone +31205924102
 Skype skype://jurgen.vinju
 LinkedIn http://nl.linkedin.com/in/jurgenvinju
 Twitter http://www.twitter.com/jurgenvinju 
 Facebook http://www.facebook.com/jurgen.vinju
 Researchr http://researchr.org/profile/jurgenjvinju
 ResearchGate https://www.researchgate.net/profile/Jurgen_Vinju/

profile for jurgenv at Stack Overflow, Q&A for professional and enthusiast programmers