New programming model

Last week I started tweeting about a new programming language/model that I'm working on with Tijs van der Storm. If you've talked to me about software in the last 10 years, you've probably been exposed to some of my ideas on this topic. I've been trying to make sense of it for a while, but now I'm pleased to say that it is coming together in a concrete way. We aren't ready to announce anything yet, but I can give you some idea of our guiding principles.

* Program with forests, not trees
The idea here is that all data/information should be represented explicitly as semantically integrated networks of typed values with attributes and relationships. There are two important aspects to this idea, which immediate distinguish our approach from both OO and FP. In contrast to OO, we declare structures on a larger level of granularity, to capture semantic integrity of collections of objects, rather than on individual objects. An OO programmer sees individual objects (the "trees") but cannot really see the "forest". Relative to FP, we allow explicit cycles, so that our representation is graph-based, rather than being based on trees as in FP. I know that lazy functional programs can express cyclic structures, but the cycles are not observable. We tend to call our forests "models" although it is best not to import too many assumptions from MDD or UML when we use the term.

*Support many languages. This means that we support domain-specific language. In effect, every information model you create is a language. It can have multiple interpretations. The distinction between textual and visual languages is unimportant, because text and graphics are just two different presentations of an underlying information structure.

*Dynamic checking
The structure of a model are described by other models, which represent structural and behavioral constraints. That is, all data is described by metadata. And the metadata can be more interesting than just structural types. At the top we use the typical self-describing models. However, since everything is a value, all checking is done dynamically.

* Generic operations
Because our "types" have lots of useful information in them, and can be manipulated just like any other value, its easy to write very generic operations, including equality, differencing, parsing, analysis, etc. We do extreme polytypic/generic programming, but don't worry about static checking. We'll worry about that later :-) Richer metadata (aka types or meta-models) means more powerful generic operations.

* Use code for transformations, but never generate code
We like code. Its great for projecting models onto models or computing analysis of models. We are developing a family of cyclic maps, which are like FP maps but they work on our circular structures. But you should never ever explicitly generate code. This is the big mistake of a lot of work on model-driven development. Instead, we use partial evaluation to generate code. Partial evaluation is great because it turns interpreters into compilers automatically (if you are careful!). Model to model transformations are fine and can be written in either code or as an interpretation of some other transformation language. But requiring all transformations to be models (not code), or generating code from models, is bad. I know others might disagree, but this is what we believe.

*Extreme feature-oriented modularity. That is, every idea should be written once. Allow mixins and inheritance/composition at all levels. These are very natural operations on models: to compose them and merge them. Its not easy, but we think we can make it work. You have to compose the syntax and the semantics cleanly. We are inspired by Don Batory's work here.

Our goal is to create "Smaltalk of Modeling". That is, a simple and elegant system that is based on models all the way down. It has a small well defined kernel and we are working on building real applications too, as we build the system. We are implementing in Ruby, although this is just because it is such a great language for this kind of reflective exploration. Our new system is not object-oriented, it is model-oriented. But we are looking for the key ideas in the modeling world, and not necessarily adopting any conventional wisdom. We are exploring!

Paul Graham on Objects in Arc

I just realized that Paul's note is about 10 years old. I'll leave my comments here, but I'm sure a lot has changed since then.... I just read Paul Graham's explanation for why Arc isn't especially object-oriented. My comments below correspond to his points:
  1. "Object-oriented programming is exciting if you have a statically-typed language without lexical closures or macros." Smalltalk, Ruby, C#, Scala, and Python all have lexical closures, and their use of objects is quite exciting (lexical closures are being added to Java real soon now). All Smalltalk control structures are user-defined as well, although it doesn't have full macros. It is true that objects can be used as a stand-in for closures, so there is a little truth to this comment. On the other hand, object-oriented programming is quite popular in dynamically typed languages, so I'm not sure why Paul thinks OO is tied to static typing.
  2. "Object-oriented programming is popular in big companies, because it suits the way they write software." This is ridiculous. Smalltalk, Ruby, PHP, Python, and Lua (to name a few) are all quite popular but are not tied to "big companies". Lots of people like C++ too, at big and small companies. I think that Paul is showing a surprising lack of awareness of reality here.
  3. "Object-oriented programming generates a lot of what looks like work." Object-oriented programs are often more verbose than other styles. Partly its all the types, which means that Smalltalk, Ruby, Python etc are more concise than Java. But partly it is because OO languages encourage (require?) programmers to create modules and put in extensibility hooks everywhere, and these take up space. These hooks are called classes and methods. Haskell programs are usually concise, but are often not very extensible or interoperable.
  4. "If a language is itself an object-oriented program, it can be extended by users." "Overloading"? This has nothing to do with objects! What are you thinking, Paul? Overloading is about selecting an appropriate method based on its static type.
  5. "Object-oriented abstractions map neatly onto the domains of certain specific kinds of programs, like simulations and CAD systems." Yes, OO abstractions map very neatly into certain kinds of programs, like GUIs, operating systems, services, plugin architectures, etc. They are not good for everything, certainly, but they are good for lots of domains.
Object-oriented programming is different from normal programming. There is so much confusion about objects that I begin to wonder if very many people really understand what object-oriented programming is. Certainly there isn't much in Paul's comments to provide evidence that he really understands it.

Here is a quick dictionary to translate OO names into Lispish descriptions.
  • "Dynamic dispatch" is just calling an function value.
  • "Polymorphism" is two different function values that have the same interface.
  • "Objects" are just functional representations of data.
  • "Classes" are just functions that create collections of first-class functions.
OK, so my definition of "object" looks funny. But most common definitions are wrong. Objects are just collections of first-class functions (you might call them "multi-closures" since they are closures with multiple entry points). Go back and look at how SIMULA was implemented -- it just captured the current environment and returned it as a value.

It is interesting to note that OO programs make more use of higher-order first-class functions (because all objects are collections of first-class functions) than most functional programs. This is another reason that OO is hard to grok. But Paul shouldn't have a problem with that.

As a small example, which do you think is a better approach to files? Here is the conventional approach without objects:
(define (scan stream)
(if (not (at-end? stream))
(print (read stream)))
(scan (open-input-file "testdata.txt"))
This is very limiting, because it requires a global read function that can understand how to read from every kind of stream! If I want to create my own kind of stream, I'm out of luck.

Now here is the OO version:
(define (scan stream)
(if (not (stream 'at-end?))
(print (stream 'read)))
(scan (open-input-file "testdata.txt"))
This is very nice, because anyone can implement a function that understands the 'at-end? and 'read messages. Its immediately extensible!

Remember Paul, that the lambda-calculus was the first object-oriented language: all its data is represented behaviorally as objects. Are you sure you aren't using objects?