Photo of Davy Landman

Davy Landman

PhD student

SWAT, CWI, Netherlands

Data for ICSME 2014

Publication

  • Davy Landman, Alexander Serebrenik and Jurgen Vinju, Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods, 30th IEEE International Conference on Software Maintenance and Evolution, ICSME 2014, 2014AbstractSlides BibTEX

    Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation.

    We test this claim by studying a corpus of 17.8M methods in 13K open-source Java projects. Our results show that direct linear correlation between SLOC and CC is only moderate, as caused by high variance. We observe that aggregating CC and SLOC over larger units of code improves the correlation, which explains reported results of strong linear correlation in literature. We suggest that the primary cause of correlation is the aggregation.

    Our conclusion is that there is no strong linear correlation between CC and SLOC of Java methods, so we do not conclude that CC is redundant with SLOC. This conclusion contradicts earlier claims from literature, but concurs with the widely accepted practice of measuring of CC next to SLOC.

    @INPROCEEDINGS{Landman2014,
      author = { Davy Landman and Alexander Serebrenik and Jurgen Vinju },
      title = { {Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods} },
      booktitle = { 30th IEEE International Conference on Software Maintenance and
      Evolution, ICSME 2014 },
      year = { 2014 },
      datalink = { http://homepages.cwi.nl/~landman/icsme2014/ },
      abstract = { Measuring the internal quality of source code is one of the traditional
          goals of making software development into an engineering discipline.
          Cyclomatic Complexity (CC) is an often used source code quality metric,
          next to Source Lines of Code (SLOC). However, the use of the CC metric is
          challenged by the repeated claim that CC is redundant with respect to SLOC
          due to strong linear correlation.
    
          We test this claim by studying a corpus of 17.8M methods in 13K
          open-source Java projects. Our results show that direct linear correlation
          between SLOC and CC is only moderate, as caused by high variance. We
          observe that aggregating CC and SLOC over larger units of code improves
          the correlation, which explains reported results of strong linear
          correlation in literature. We suggest that the primary cause of
          correlation is the aggregation.
    
          Our conclusion is that there is no strong linear correlation between CC
          and SLOC of Java methods, so we do not conclude that CC is redundant with
          SLOC. This conclusion contradicts earlier claims from literature, but
          concurs with the widely accepted practice of measuring of CC next to SLOC.
       }}

Data

This site contains the data and scripts used in the IEEE International Conference on Software Maintenance and Evolution 2014 (ICSME2014) publication.