[metrics] Mythical man month 10 lines per developer day - how close on large projects?

Everybody always says that they can beat the "10 lines per developer per day" from the "Mythical Man Month", and starting a project, I can usually get a couple hundred lines in in a day.

But at my previous employer, all the developers were very sharp, but it was a large project, over a million lines of code, with very onerous certification requirements, and interfacing with other multiple-million line projects. At some point, as an exercise in curiosity, I plotted lines of code in the shipping product in my group (not counting tools we developed), and sure enough, incrementally, it came to around 12 lines net add per developer per day. Not counting changes, test code, or the fact that developers weren't working on the actual project code every day.

How are other people doing? And what sort of requirements do you face (I imagine its a factor)?

This question is related to metrics

The answer is

It would be much better to realize that talking of physical lines of code is pretty meaningless. The number of physical Lines of Code (LoC) is so dependent on the coding style that it can vary of an order of magnitude from one developer to another one.

In the .NET world there are a convenient way to count the LoC. Sequence point. A sequence point is a unit of debugging, it is the code portion highlighted in dark-red when putting a break point. With sequence point we can talk of logical LoC, and this metric can be compared across various .NET languages. The logical LoC code metric is supported by most .NET tools including VisualStudio code metric, NDepend or NCover.

For example, here is a 8 LoC method (beginning and ending brackets sequence points are not taken account):

alt text

The production of LoC must be counted in the long term. Some days you'll spit more than 200 LoC, some others days you'll spend 8 hours fixing a bug by not even adding a single LoC. Some days you'll clean dead code and will remove LoC, some days you'll spend all your time refactoring existing code and not adding any new LoC to the total.

Personally, I count a single LoC in my own productivity score only when:

  1. It is covered by unit-tests
  2. it is associated to some sort of code contract (if possible, not all LoC of course can be checked by contracts).

In this condition, my personal score over the last 5 years coding the NDepend tool for .NET developers is an average of 80 physical LoC per day without sacrificing by any mean the code quality. The rhythm is sustained and I don't see it decreased any time soon. All in all, NDepend is a C# code base that currently weights around 115K physical LoC

For those who hates counting LoC (I saw many of them in comments here), I attest that once adequately calibrated, counting LoC is an excellent estimation tool. After coding and measuring dozens of features achieved in my particular context of development, I reached the point where I can estimate precisely the size of any TODO feature in LoC, and the time it'll take me to deliver it to production.

It's easy to get a couple of hundred lines of code per day. But try to get a couple of hundred quality lines of code per day and it's not so easy. Top that with debugging and going through days with little or no new lines per day and the average will come down rather quickly. I've spent weeks debugging difficult issues and the answer being 1 or 2 lines of code.

Our codebase is about 2.2MLoC for about 150 man-years effort. That makes it about 75 lines of c++ or c# per developer per day, over the whole life of the project.

Steve McConnell gives an interesting statistic in his book "Software Estimation" (p62 Table 5.2) He distinguish between project types (Avionic, Business, Telco, etc) and project size 10 kLOC, 100 kLOC, 250 kLOC. The numbers are given for each combination in LOC/StaffMonth. E.G. Avionic: 200, 50, 40 Intranet Systems (Internal): 4000, 800, 600 Embedded Systems: 300, 70, 60

Which means: eg. for Avionic 250-kLOC project there are 40 (LOC/Month) / 22 (Days/Month) == <2LOC/day!

On one of my current projects, in some modules, I am proud to have contributed a negative line count to the code base. Identifying which areas of code have grown unnecessary complexity and can be simplified with a cleaner and clearer design is a useful skill.

Of course some problems are inherently complex and required complex solutions, but on most large projects areas which have had poorly defined or changing requirements tend to have overly complex solutions with a higher number of issues per line.

Given a problem to solve I much prefer the solution that reduces the line count. Of course, at the start of small project I can generate many more than ten lines of code per day, but I tend not to think of the amount of code that I've written, only what it does and how well it does it. I certainly wouldn't aim to beat ten lines per day or consider it an achievement to do so.

How are other people doing?

I am the only full-time dev at our company and have written 500,000 lines of OCaml and F# code over the past 7 years, which equates to about 200 lines of code per day. However, the vast majority of that code is tutorial examples consisting of hundreds of separate projects each a few hundred lines long. Also, there is a lot of duplication between the OCaml and the F#. We are not maintaining any in-house code bases larger than 50kLOC.

In addition to developing and maintaining our own software, I have also consulted for many clients in industry over the past 7 years. For the first client, I wrote 2,000 lines of OCaml over 3 months which is 20 lines of code per day. For the next client, four of us wrote a compiler that generated millions of lines of C/C++/Python/Java/OCaml code as well as documentation in 6 months which is 2,000 lines of code per day per developer. For another client, I replaced 50kLOC of C++ with 6kLOC of F# in 6 months which is -352 lines of code per day. For yet another client, I am rewriting 15kLOC of OCaml in F# which will be the same size so 0 lines of code per day.

For our current client, I will replace 1,600,000 lines of C++ and Mathematica code with ~160kLOC of F# in 1 year (by writing a bespoke compiler) which will be -6,000 lines of code per day. This will be my most successful project to date and will save our client millions of dollars a year in on-going costs. I think everyone should aim to write -6,000 lines of code per day.

You should stop using this metric, it is meaningless for the most part. Cohesion, coupling and complexity are more important metrics than lines of code.

There is no such thing as a silver bullet.

A single metric like that is useless by itself.

For instance, I have my own class library. Currently, the following statistics are true:

Total lines: 252.682
Code lines: 127.323
Comments: 99.538
Empty lines: 25.821

Let's assume I don't write any comments at all, that is, 127.323 lines of code. With your ratio, that code library would take me around 10610 days to write. That's 29 years.

I certainly didn't spend 29 years writing that code, since it's all C#, and C# hasn't been around that long.

Now, you can argue that the code isn't all that good, since obviously I must've surpassed your 12 lines a day metric, and yes, I'll agree to that, but if I'm to bring the timeline down to when 1.0 was released (and I didn't start actually making it until 2.0 was released), which is 2002-02-13, about 2600 days, the average is 48 lines of code a day.

All of those lines of code are good? Heck no. But down to 12 lines of code a day?

Heck no.

Everything depends.

You can have a top notch programmer churning out code in the order of thousands of lines a day, and a medium programmer churning out code in the order of hundreds of lines a day, and the quality is the same.

And yes, there will be bugs.

The total you want is the balance. Amount of code changed, versus the number of bugs found, versus the complexity of the code, versus the hardship of fixing those bugs.

I like this quote:

If we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent". - Edsger Dijkstra

Some times you have contributed more by removing code than adding

I think project size and the number of developers involved are big factors in this. I'm far above this over my career but I've worked alone all that time so there's no loss to working with other programmers.

Good planning, good design and good programmers. You get all that togheter and you will not spend 30 minutes to write one line. Yes, all projects require you to stop and plan,think over,discuss, test and debug but at two lines per day every company would need an army to get tetris to work...

Bottom line, if you were working for me at 2 lines per hours, you'd better be getting me a lot of coffes andmassaging my feets so you didn't get fired.

I think this comes from from the waterfall development days, where the actual development phase of a project could be as little as 20-30% of the total project time. Take the total lines of code and divide by the entire project time and you'll get around 10 lines/day. Divide by just the coding period, and you'll get closer to what people are quoting.

One suspects this perennial bit of manager-candy was coined when everything was a sys app written in C because if nothing else the magic number would vary by orders of magnitude depending on the language, scale and nature of the application. And then you have to discount comments and attributes. And ultimately who cares about the number of lines of code written? Are you supposed to be finished when you've reach 10K lines? 100K? So arbitrary.

It's useless.

Without actually checking my copy of "The Mythical Man-Month" (everybody reading this should really have a copy readily available), there was a chapter in which Brooks looked at productivity by lines written. The interesting point, to him, was not the actual number of lines written per day, but the fact that it seemed to be roughly the same in assembler and in PL/I (I think that was the higher-level language used).

Brooks wasn't about to throw out some sort of arbitrary figure of productivity, but he was working from data on real projects, and for all I can remember they might have been 12 lines/day on the average.

He did point out that productivity could be expected to vary. He said that compilers were three times as hard as application programs, and operating systems three times as hard as compilers. (He seems to have liked using multipliers of three to separate categories.)

I don't know if he appreciated then the individual differences between programmer productivity (although in an order-of-magnitude argument he did postulate a factor of seven difference), but as we know superior productivity isn't just a matter of writing more code, but also writing the right code to do the job.

There's also the question of the environment. Brooks speculated a bit about what would make developers faster or slower. Like lots of people, he questioned whether the current fads (interactive debugging using time-sharing systems) were any better than the old ways (careful preplanning for a two-hour shot using the whole machine).

Given that, I would disregard any actual productivity number he came up with as useless; the continuing value of the book is in the principles and more general lessons that people persist in not learning. (Hey, if everybody had learned them, the book would be of historical interest only, much like all of Freud's arguments that there is something like a subconscious mind.)