Thousands of lines of code, millions of dollars

Djun Kim - November 16, 2007 - 5:45pm

Inspired by Ohloh, and a need to start scoping migration of a large cluster of Drupal sites from Drupal 4.7 to Drupal 5.x/6.x, I've started some work on a static analysis / code metrics tool specifically intended for PHP / Drupal.

Code is available from Bryght's public svn repository, with repo URL https://svn.bryght.com/dev/svn/scripts/metrics.

The metrics.php program is currently implemented as a command line script, which is pointed at a directory containing code to be analyzed:

% php metrics.php code_dir

Reports are generated on standard output.

History

This code started from the sloccount.php script written by Arto Bendiken. I've tried to extend the script by making it more comprehensive and precise, but also more flexible and general. The architecture is intended to be pluggable, allowing users to easily write analysis tools for new types of files.

Currently there are two supported file types: generic, and PHP.

The 'generic' code analysis supports 'C' style code, with in-line comments initiated by '//' and comment blocks set of by '/*' and '*/'. This gives rough metrics for javascript and CSS files.

The PHP analyzer uses PHP's tokenizer and a rudimentary parser to obtain additional information about PHP code.

Analysis is completely separated from reporting: the analysis phase builds an array of metrics per file, where each metric can itself be structured.

Current reporting includes basic statistics about number of lines, number of comments (inline and doc style), number of functions, and number of tokens (identified by class, e.g., control, operator,....) as well as variable and const listings (and lines where they are used), functions and lines where they are defined, a basic CoCoMo estimate, and frequency distribution for PHP tokens.

Future work

This is currently very rough. Now that I've written an initial version, I can see that it needs to be completely re-done.

Currently the code is extensible to handle different file types - though the details of this need to be made more sane. However, really the analysis and reporting should be object oriented to take advantage of inheritance - there should be a Drupal analyzer, which extends the PHP analyzer, which in turn extends the generic analyzer.

There's lots of room for creativity in terms of packaging this, handling more file types, displaying output, and building more meaningful models.

An easy project (at least the first 80%) would be to write a CSS tokenizer / parser.

One interest I would like to pursue is cost models that are statistically based, with parameters computed from analyzing known costs of real-world code.

Measuring complexity based on factors such as Boolean and arithmetic expressions, recursion, coverage of particular parts of the API (e.g., complexity of SQL statements) - all of these are possible extensions.

Another is Drupal-specific analysis, which would be able to look at the number of hooks implemented, the version of Drupal which code is written to, the number of core vs. contrib vs. non-Drupal functions called per code unit.

An interesting project would be to develop a cost model for updating a particular site to a given version of Drupal, based on API changes, and estimates of amount of code that needs to be changed.

Selection and automatic generation of test cases (unit test stubs) might also be an application.

Feedback and contributions are very welcome!

See attached text file for an example of (simplified) output of metrics.php run on the CVS checkout of the contributions repository DRUPAL-4-7 tag on Nov 9, 2007:

  • 2990 PHP files
  • 267 javascript files
  • 495 css files
  • 632 other ('interesting') files
for a total of 4,384 files analyzed. 45,248 KB of code in files considered; containing 874,838 lines of code and 23,790 function definitions. Total CoCoMo 'organic' model cost: $10,245,457.00 at $60,000 per year per developer.

collectables

alangilmore - March 13, 2008 - 5:34pm

Spring and Summer are fast approaching and shopping online has never been easier at: The Enchanted Gift Shop. We have a huge selection of items from clothing to home decor and collectables plus much more. Free shipping on all orders of 100.00 or more and discounts for bulk orders.
http://www.online-giftshop.net

Super useful!

mcantelon - November 27, 2007 - 11:51am

I ended up adding output of periods to see progress:

/** Iterate over files in given directory, gathering statistics */
function analyze_dir($dir, &$file_counts) {
  foreach (glob("$dir/*") as $file) {
    print '.';

Hello I´m very interesting

nicoduka - April 10, 2008 - 12:35pm

Hello

I´m very interesting in this item , I want to ask someone if you can give me more information and links to see more of this ..

Thanks

Bye

_____________________

Recetas

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.