Thursday, February 4, 2010

Intro to Programming

As I mentioned earlier I am teaching a programming course at Centennial College.

I decided to use a non standard introduction by starting with doing back of the envelope calculations.  In particular using what I refer to as Penzias numbers, to the best of my recollection I found them in a biographical sketch of Penzias.  They are somewhat non standard since Google doesn't find any references.

This is a number system using just the digits 0, 1, and 3, and only defining multiply, where 3x3=10. Mathematicians will note that 3 equals root 10, and that this is a minor refinement of order of magnitude calculations.

This worked well, the examples (how much water flows through the Mississippi at the widest point) generated big numbers, which lead into scientific notation and multiplication of exponents, which lead into the binary number system.

Also the examples demonstrated the technique of solving a question by breaking it down into smaller parts, which will be a constant theme in the course.

Wednesday, February 3, 2010

Fundamental Class Design Principles

The obvious complexity metric of a class library from the point of view outside the class is the number of interconnections between classes, in particular, getter and setter connections.

There are some exceptions to this metric.  In a Model View Controller design the controller will need set access to the model values and the view will need get access.  In a language like C++ that has the concept of friend classes this tight relationship can be explicitly specified.  In other languages the connections need to be observed.

Direct access from further away than functionally coupled classes is an indication of either a design issue or a violation of the existing design.

Tuesday, February 2, 2010

Back to Blogging

The blog has been on a hiatus.  Time to get back to it.

I am now teaching web programming part time at Centennial.  I have some comments to make about that (and I expect more in the future).

When I gave the talk at XP Toronto about the subject of this blog, I made an error.  I said that it's about function call tree design only.  I have since decided that is not true.  Given a system implemented in an object oriented language (almost all now), the design must include the class partitioning as well.

Thursday, January 21, 2010

Span of Control - The Mosque

One of the ways to evaluate a design is to look at the span of control of each component.  If the span is too wide, the component may have absorbed too many details from lower components.  If is too narrow, the partitioning may not be being effective at reducing complexity.

The too narrow case has a sharp bound.  If there is only one thing below a component, that component should be looked at.

The too wide case is a little fuzzier.  It's probably something like 5 plus or minus 2. 

There are exceptions.  If the too narrow case is being called from many places and it makes a semantic transformation to what it is calling, that is probably a good thing.

If the too wide case is something like a command parser where the core is a giant switch statement where each case is one of the commands, then it is probably not usefully reduceable. In that case there should be no other control flow statements in the procedure and each switch case should be a single line.  Hoist surrounding control flow into an outer function and create functions for any multiline cases.

One of the consequences of this is that good designs tend to show a mosque shape.  The get wider until a maximum level and then narrow slightly.  This arises from the delegation structure at the top calling unique routines but the final work being done by fewer common routines (possibly from a library).

Friday, January 15, 2010

Coupling

Coupling is even more critical than cohesion as a design principle.  Bad cohesion leads to comprehension problems, which (as a first order effect) leads to slower code modification.  Bad coupling leads directly to bugs.

Types of coupling (worst to best)

Content -  uses internal structure
Common - uses global data
External - externally imposed structure
Control - passing down a control switch
Stamp - composite parameters
Data - elementary parameters.
Message - no parameters
None - no coupling at all

You do need to couple to get anything done.  The key is to keep the coupling as loose as possible.

The more egregious forms of coupling are obviated my modern language design.  It is not possible to modify local variables from outside the routine they are defined in.  Globals are still possible, so discipline is required to not make use of them.

Polymorphism in OOP is the attack on control coupling.  Instead of passing down a flag to be tested, the controlling routine creates an object with the desired behaviour so the controlled routine does not have to handle any decisions.

The costs of stamp coupling can be mitigated by encapsulation.  If the external interface can be held constant it doesn't matter to the routine that the implementation has changed.

The Don't Repeat Yourself (DRY) principle is a powerful attack on the problems caused by coupling.  If the changes can be made in one place, and everything derives behaviour (correctly) from that definition, changes cannot cause bugs. Correctly is of course the key word there.  Careful data structure design is still required.

Thursday, January 14, 2010

Cohesion

Cohesion is the first principle to consider when doing design.  At thi level this is the concept that each procedure is responsible for only one thing.  My earlier post about commenting rules was really all bout how to identify the cohesion level for a procedure.

Note: Cohesion can also be applied to higher order groupings, such as classes.

There is a ranked taxonomy of cohesion (from worst to best).

Coincidental  - There really isn't any.

Logical  - It all does the same kind of thing (more applicable to higher order groupings).

Temporal - It all happens at the same tiime.

Procedural - One thing is required to follow another.

Communicational - They all work on the same stuff.

Sequential - The results of one part are the input to the next.

Functional  - It just does one thing

It is not always possible to get good cohesion in a procedure (eg an initialization routine).  It is important when confronting that situation to simplify the procedure as much as possible.  A low cohesion routine which is a list of function calls (each of which is a higher cohesion routine) will be part of a better system than one which is full of if statements and loops.

Wednesday, January 13, 2010

What is Design?

Once we have a problem that is bigger than we want to deal with in a single block of code (which happens much sooner than some people seem to believe) we have to decide how to partition the code to solve the problem.  This partitioning, and the data transfer decisions that derive from it is what I am calling design.

Tuesday, January 12, 2010

Programming Taxonomy

There are some assumptions that I am making about how programming is done that I think I should make explicit, even if they are common to the point of near ubiquity.

Almost all programming is done using the approach of procedural programming, broadly defined. That is, we program a computer by specifying a procedure to follow.  It is possible to use different approaches, Prolog, in particular, is a language that does not embody the procedural approach.

But,  I hear objections, what about object oriented programming, or functional programming?

These are both subsets of procedural programming.  Object oriented programming is all about how to package the procedures (and the data they operate on) to cope with the complexity of the programs we are trying to write. 

Functional programming restricts the procedures available to the programmer to those that do not change state in order to simplify what we need to understand about the program in order to write it correctly (it also makes it easier to partition it during execution, helpful when trying to increase parallelism).

Monday, January 11, 2010

History of Design Principles

Given that I am going to be writing blog entries on design principles I thought I would start by making some introductory remarks on the history of said principles.

The first thinking about computer programming focused on data structures and algorithms.  The approach to these was and is mostly mathematical.  There is no inherent attention paid to the implementation in this discipline.

Programming design principles derive almost universally from the work around structured programming.  Object Orientation is a development of linguistic structures and their application to package the design principles developed by the structured programming movement.

The opening salvo in the movement was Dijkstra's letter (retitled by Wirth), "Go-to Statement Considered Harmful".  This lead fairly directly to a programming style and languages that used if, for, while and like constructs.  These principles are pretty much universally accepted now.

This sufficed at the lowest level of code construction, but further principles were required when considering deisgn at a slightly higher level.  The question here is not what happens inside a subroutine (to use an early word) but how to decide what to do in a subroutine and how they should communicate.

The basis of this will be the subject of my next post.

Friday, January 8, 2010

Design Driven Design

So it's a redundant redundancy.

This is my theoretical push back against low level and mechanical Test Driven Design (TDD).

TDD says take a spec, write the test, have it fail, implement the spec, have the test pass, repeat.
When you come across a resulting bad design, refactor.

I think we can do a little better than this within the scope of a sprint.

Design Driven Design (DDD) says take the specs for a sprint, do a design, evaluate the design against design principles, redo if necessary, write the tests for the design, redo the design if you can't write the tests, then implement.

We do have design principles.  I described one in my last post about how a commenting style can expose bad design.  I will review more of them in coming posts.

Thursday, January 7, 2010

How Simple Commenting Rules Improve Code Structure

In my last post I described my rules for commenting code.

Here I am going to talk about how following those rules makes your code better.

The statement is slightly backwards. What I am really saying is that if you can legitimately describe your code using the commenting structure I describe your code will be better than if you can't.

In particular: If you can accurately and completely describe a function with one simple statement (with no, ifs, ands, or buts) that function is much more likely to belong to a well designed system than if you can't.

The bad news is that you have to design the system and write the comments before you can apply the test to see if the system is well designed. The good news is that is relatively early in the system development.

This is not the holy grail of a rule that will allow for the mechanical generation of high quality designs. It is an after the fact test that should indicate design quality, which is, I think, as good as we are going to get.

Wednesday, January 6, 2010

Comments on Comments

Jason Baker wrote a blog entry Myths about Comments. I'm going to make some comments here.

My standard on expected comments is this:
One comment at the start of every function describing what it does.
One comment per argument describing what it is.
One comment for anything that must be consistent with something else in another place (eg these outputs need to be to be in the same order as the inputs in foo).

Wednesday, July 8, 2009

Why did it do THAT?

In keeping with this blog's theme of decomposition (insert Beethoven joke here) bughunting can be divided into 2 parts, identifying the bug and determining the cause. Repair takes us back to the "How do you do this?" question.

Here I will focus on the tracking down the cause part of the issue. Identification is more about testing than code writing.

Finding hard bugs is a subspecies of the scientific method. Create a theory of why the bug occurs and prove it wrong. Repeat. Eventually you find a correct theory.

Being able to easily construct mental models of what is going on is critical to this process. These models come in two flavours, a model of how things are supposed to work and models that explain a given (erroneous) behaviour.

Given a good model of how things were designed will lead to the software component that is generating the error. At that point an inspection of the local code and its inputs can start you on the trail of the bug. Breakpoints, tracing, and debugging output come into play here.

Lacking such a model (the joys of legacy code) you need to start constructing possible models of what could be going on. At this point the inventiveness referred to the in the design post becomes very useful.

How do you do THIS?

There are three components to the ability to solve hard design problems. Knowledge of the tools at hand, the skill to combine them in novel ways to create solutions, and the ability to evaluate the results and select the best solution.

Knowledge of the tools at hand is the easy part, both in application and in identifying individuals who have this knowledge. Schools, training courses, books, conferences, websites, the list of resources available to improve in this area is long and varied.

It is not entirely clear how much inventiveness is a learned skill as opposed to an innate aptitude. There are large differences between individuals in this area but it also seems apparent that solving a lot of different problems can increase someones skill level.

The ability to evaluate the results is a very important part of solving hard problems. If there are metrics that can be used to compare results this component is not particularly difficult. However, that is a very large "if". In general this requires judgment, and good judgment is frequently associated with a history of burned fingers.

Hard Problems

The first question when considering the issue of hard problems is: Why put out the effort to solve them? There are two basic reasons for attacking hard problems. Either the problem is significant or finding the solution is satisfying. Puzzles and games appeal purely to the latter motivation. Solving significant problems have a strong tendency to provide economic benefits to the solver, either directly, or via an organization they are a member of.

When looking at problems taxonomy arises quickly. What kind of hard problems are we looking at? At this point in this blog I am looking at 2 kinds of problems, software design and software debugging. Or, how do you do THIS?, and why is it doing THAT?

Code Quality

First: Which aspect of code quality am I concerned with here? Of all the qualities that code has correctness takes pride of place, if it is not correct not a lot else matters. However, this is not what most people assume by the phrase "code quality" since it is assumed as a basic attribute, rather than a distinguishing attribute. Then there are the performance qualities, time and space. Given Moore's Law (even after running off the end of it for speed), optimizing compilers, and other advances this is not an issue in the small. If there are performance issues they need to be addressed (in the absence of pathology) with algorithmic and architectural attacks.

For me, the prime code quality issue that is contentious enough to write about is comprehensibility. If people can read the code it will be maintainable, portable, and all the other things that people want in code when they are looking at it.

What drives comprehensibility? Or, to flip the question, what drives incomprehensibility? In computer programs the primary barrier to understanding is complexity. One of the primary controllable factors driving complexity is scale (this also applies to looking at computer systems and projects at higher levels).

To reduce complexity, reduce the scale.

This means that computer programs need to be factored into small subroutines. Keeping routines small requires constructing them from lower level abstractions. Identifying those abstractions and designing good interfaces for them are the primary skills required to create good code.