Depending Upon the Kindness of
Strangers:
Notes on Open Source and Free Software
J. L. Sloan
2006-01-01
Disclaimer
This is a set of notes on my observations
regarding open source software from my personal perspective, which is as a
developer of open source software, as a user of open source software, as an
employee of a multinational corporation that uses open source software, and as
the owner of a small technology business. I am not a lawyer, nor any kind of
legal expert, nor do I play one on TV. My degrees are in computer science, not
law. These notes do not in any way constitute legal advice.
Definition
Microsoft defines
open source software as “software in which both source and binaries are distributed
or accessible for a given product, usually for free.” What distinguishes open
source from “shareware” or “public domain software” is that open source
software is licensed by its copyright holder in such a way to prevent any
restrictions being placed on the distribution of the source and binaries of the
software, and to require the distribution of any modifications that may be made
to the software. “Distribution” here means “to make available”. This can range
from providing software on CD-ROMs to placing it on a publicly accessible web
site.
The Open Source Initiative
The Open Source
Initiative (OSI) (http://www.opensource.org) is a non-for-profit organization founded in
1998 that promotes the creation and use of open source software. They define open
source software as software having a license which conforms to a set of
principles which include the following.
- The license cannot restrict anyone
anywhere from redistributing the software. Redistribution includes either
selling it or giving it away. This prohibits the charging of a royalty
fee.
- The distribution of the software must
include source code. The license must allow this source code to be
distributed with the software.
- The license must allow modifications to
the source code. It must allow new works to be derived from the software.
It must allow the modifications and derived works to be redistributed
under the same license terms.
- The license cannot discriminate against
persons, groups, or fields of endeavor. It cannot require that the software
only be used in a specific product. It cannot restrict other software,
including requiring that other software be open source.
As if often said,
open source software is “free” in the sense of “free speech”, not “free beer”.
For example, you can sell open source software on magnetic or optical media,
but you cannot prevent the person to whom you sell it from selling copies of it,
with or without modifications, or even giving it away. As we will see later,
the various open source license agreements do prevent that person from
restricting access to that same open source code in any way.
The Free Software Foundation
A related concept is
that of free software, in the same sense of “freedom”, not “free beer”. The
Free Software Foundation (FSF) (http://www.fsf.org)
is a not-for-profit foundation founded in 1985 by Richard Stallman, a
researcher at MIT. Its purpose is to promote the creation and use of free
software.
The FSF has produced
a number of free software packages under the brand GNU, a recursive acronym
meaning “GNU’s Not UNIX”. Although they did not produce the Linux operating
system kernel itself, many of the standard utilities that ship with every Linux
distribution are part of the GNU software distribution. Most notably, the GNU
Compiler Collection (GCC), which includes what are probably the most used C and
C++ compilers and associated run-time libraries in the world, are distributed
under the GNU banner.
According to the
FSF, free software has a license which conforms to a set of principles which
include the following.
- You must be allowed to run the software
for any purpose.
- You must be allowed to study how the
software works and to adapt it to your needs.
- You must be allowed to redistribute
copies of the software to anyone anywhere.
- You must be allowed to modify the
software, and distribute your improvements to anyone anywhere.
The GNU General Public License
Besides producing
one of the most used collections of free software, the FSF is also known for
originating a series of software licenses widely used to promote free software.
The core of these licenses is the GNU General Public License (GPL). The GPL is
referred to by the FSF as a “copyleft”(as opposed to a “copyright”) license.
The FSF defines copyleft as “a general method for
making a program or other work free, and requiring all modified and extended
versions of the program to be free as well”.
The GPL is perhaps best
known for its “viral licensing” clause:
You
must cause any work that you distribute or publish, that in whole or in part
contains or is derived from the Program or any part thereof, to be licensed as
a whole at no charge to all third parties under the terms of this License.
This clause requires
any work derived from software licensed under the GPL to also be licensed under
the GPL. Just what constitutes a “derived work” is a matter of some debate, as
you might imagine, and requires some fairly subtle distinctions for those that
are not technically inclined.
For example, the
Linux kernel is licensed under the GPL. Source code modifications to the Linux
kernel are clearly covered by the GPL. In addition, device drivers compiled as
part of the kernel are also covered under the GPL. However, device driver
incorporated dynamically at run time as separate, dynamically loadable, modules
are not covered by the GPL. This allows a company to develop a “closed” device
driver using proprietary technology, yet use the driver with Linux.
Furthermore, programs compiled by the GNU Compiler Collection and run under the
Linux kernel, both of which are licensed under the GPL, are not themselves covered
by the GPL.
The viral licensing
implications for derived works was sufficiently restrictive that a second, less
restrictive, license, the GNU Lesser General Public License (LGPL) was
formulated by the FSF. The LGPL was intended to be applied to software
libraries. The LGPL includes the following clause.
A program that contains no derivative
of any portion of the Library, but is designed to work with the Library by
being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a
derivative work of the Library, and therefore falls outside the scope of this License.
However, linking a "work that uses
the Library" with the Library creates an executable that is a derivative
of the Library (because it contains portions of the Library), rather than a "work
that uses the library". The
executable is therefore covered by this License.
Again, what constitutes a “work that uses the library” is
open to interpretation. It is generally accepted that an application that is
dynamically linked uses the library but is not a derived work, while an
application that is statically linked is a derived work. The distinction is a
technical one. When an application is statically linked, portions of the
library are incorporated into the binary executable file of the application
itself. When you copy the application, you are copying that portion of the
library as well. When an application is dynamically linked, it makes uses of a
separate copy of the library in memory that is shared among all applications on
the same computer. There is no portion of the library incorporated into the
application. Shared libraries are a technology used in Linux and most variants
of UNIX including MacOS, as well as in Windows where they are called Dynamic
Link Libraries (DLLs).
The prohibition against static linking is enough of an issue
that some open source software, such as that intended for embedded applications
which typically require static linking, frequently include a clause modifying
the LGPL that specifically permits static linking without any viral licensing
implications. Here is an example of such a clause.
As a special exception, if other files
instantiate templates or use macros or inline functions from this file, or you
compile this file and link it with other works to produce a work based on this
file, this file does not by itself cause the resulting work to be covered by
the GNU Lesser General Public License. However the source code for this file
must still be made available in accordance with the GNU Lesser General Public
License.
This exception does not invalidate any
other reasons why a work based on this file might be covered by the GNU Lesser
General Public License.
(I have adopted just such a clause for my own open source
development intended for embedded applications. For examples, see the Digital
Aggregates Desperado library available at http://www.diag.com/navigation/downloads/Desperado.html.)
The FSF also offers licenses for things other than software,
for example documentation.
Other Licenses
The FSF licenses are very widely used. It is obviously a
good idea to make sure you know which variant of the FSF license applies to the
software you are using. In addition, there are many other licenses for open
software that are also widely used. These licenses were written for specific
software projects, then adopted by other, perhaps unrelated, projects. These
include the BSD License, the Artistic License, the Open Software License, the
Apache License, the Common Public License, the Mozilla Public License, and many
others. They merit close scrutiny before you make use of any of the software to
which they apply in any commercial application.
The OSI has a section of its website indicating which of
these licenses meet with their approval. The FSF has a section of its website
discussing whether or not these licenses are compatible with the GPL or whether
they exhibit a copyleft provision.
It may an overly broad a statement, but my interpretation is
that all free software is open source, but not all open source software is
free, at least not in the sense meant by the FSF. The FSF has a strong
philosophical viewpoint against proprietary intellectual property, particularly
when applied to software and algorithms. It is a view echoed in the term
“copyleft”. It is not necessarily held by other open source advocates.
It is possible for software to be dual licensed: the
copyright holder can license the software under more than one license. For
example, the software can be licensed under the GPL, but also under a
commercial license that allows the user to modify the software in a closed,
proprietary way, typically for a fee paid to the copyright holder. The FSF
discourages this kind of thing since it violates their charter, but it may be
an attractive option for some software and applications.
(Although it seems controversial in the politically charged
climate of open source and free software advocates, I personally believe dual
licensing offers the best of both worlds to both the producer and consumer of
open source software.)
Open Source Developers
There are several mechanisms through which open source
software is produced. The most common is that some individual writes some
software, typically for their own use, that they think might also be useful to
someone else, applies an appropriate open source license (frequently the GPL)
to it, and announces its availability to the world. There are web sites devoted
to just such announcements, such as http://www.sourceforge.net
and http://www.freshmeat.net. Large
established projects with large user communities will have dedicated web sites
for the distribution of their product(s), such as http://www.fsf.org,
http://www.linux.org, and http://www.eclipse.org. All of these web sites host
discussion groups in which announcements can be made when new releases are
available, and in which users can ask for assistance, report bugs, and submit
suggested modifications.
Eric S. Raymond, an open source advocate, has said that much
open source software is the result of a single developer scratching a personal
itch. In fact, the Linux kernel itself was originally the result of one such
developer, Finnish computer scientist Linus Torvalds.
CIO Magazine reported that 58% of open source developers are
professional software developers with 11 years of experience on average, and
30% are paid to write open source.
The flip side is that 70% of open source developers are
doing it sans monetary compensation. They write open source code for the sheer
joy of writing code. They write it because they need to use the software themselves
and their efforts are informed and magnified by the contributions of a larger
user community. They write it to get respect among their peers.
(This last motivation is frequently cited as the most
common. I find this remarkable from the perspective of a consumer of open
source software, but completely understandable as a producer, and will remark
on it further, below.)
The smaller open source software projects remain the products
of individuals working on them as a hobby. The larger open source software
projects necessarily evolve into multi-developer efforts. For example, the
Linux kernel is now maintained by a group of dedicated developers under the
“benign dictatorship” of Torvalds. The Apache web server, which has more than
60% market share for web server software, beating even Microsoft, is maintained
by a dedicated group of developers with a rotating leadership.
There are some common forces at work on every successful
open source project. Most projects, large or small, have at their core a single
lead developer (perhaps the only developer) whose responsibility it is to
provide the technical vision, and to approve changes and fixes. (The Apache
project is a rarity in that apparently its members vote on changes.) Each
project is a collaborative effort between its developer(s) and its user
community. Users of open source software have a much more direct link to the
developer(s) than is typical with closed software, where multiple tiers of
(frequently dysfunctional, in my own experience as a consumer) technical
support may separate the user from the engineering staff.
Because the source code is always available for any open
source software, users can not only report bugs, but develop fixes. Approval
and incorporation of the fixes into the new release of software is ultimately
the decision of the lead developer. And depending on the open source license,
users may be required to submit their fixes to the software developer, whose is
in turn required to distribute those fixes that are accepted into the code base
to the user community. This creates a lot of synergy.
Because the source is required to be made available, its
distribution also serves as a very broad system of backup and escrow. Open
source software cannot be orphaned. If the original developer dies, goes mad,
gets a life, or otherwise loses interest, there is at least the potential for
another developer to pick up the project. Open source software cannot “go out
of business”, although the consumer of it may be left with the responsibility to
maintain it themselves.
Open Source Users
Open source developers are responsible for deciding what
contributions from the user community to incorporate and how, and for the initial
testing of the software. But the user community serves as a large base of beta
testers. There is a strong innovation adoption curve (as described by Everett
Rogers in his book Diffusion of Innovation) at work here that is visible
in projects like the Linux kernel.
Innovators have
read-only anonymous access directly to the Linux source code repository used by
the software developers. They can download the latest source code that is “hot
off the presses”, frequently unstable, undergoing a lot of churn, but which
incorporates critical bug fixes or new, perhaps experimental, features in which
they are interested or even to which they are contributing themselves. Even so,
the changes in the code base will have been tested and vetted by the core group
of Linux developers under the leadership of Torvalds. The innovators provide
quick feedback to those developers, often including bug fixes, through the
various web-based discussion groups.
The Linux kernel convention is that odd numbered releases
(e.g. 2.7) incorporate new features that are have not been widely tested. Early adopters download the odd numbered
releases with the expectation that the software will mostly work. These users
can also be expected to provide bug reports and occasionally bug fixes.
Both innovators and early adopters are frequently researchers
that are depending on the newest features of Linux, or may be developing those
features, as part of their own work. But they also may be companies which develop
their own applications on top of the Linux kernel. Increasingly, mainstream
companies like Oracle are marketing Linux versions of their own closed,
proprietary products, particularly as Linux continues to gain market share
among server (as opposed to desktop or laptop) operating systems. These
companies are depending on getting the “latest and greatest” in order get a
head start on their own development, or maintain their technical lead over
their competitors, even though the official release of their Linux-based
products may be for a later, more stable, version of the kernel.
Even numbered releases (e.g. 2.6) of the Linux kernel are
considered “ready for prime time”, well tested, and consist of bug fixes to the
odd numbered release. Early majority
users download the latest even numbered releases, which are a good compromise
between stability and innovation.
Early majority users are frequently companies which
specialize in selling packaged Linux distributions on CD-ROM and DVD. There are
many of these companies now, each with a slightly different market focus.
Redhat specializes in large enterprise customers. SUSE (a division of Novell)
has a distribution that is particularly suited for deployment on laptops.
Montavista specializes in configurations for real-time and embedded
applications. Ubunto has tools and modifications for localized languages and
accessibility for those with disabilities. Gentoo is Linux for übergeeks. (For
an example outside the Linux kernel, Cygnus Solutions, now owned by Redhat,
specialized in packaging the selling the GNU Compiler Collection and related
tools from the Free Software Foundation, with a particular emphasis support for
the embedded market.)
Late
majority users may wait for one of the companies that provide
pre-packaged Linux distributions on CD-ROM or DVD, such as Redhat, SUSE, etc.
By the time these users install Linux, the features in their release will be
well tested by thousands of earlier users. Installation of these packaged
distributions is frequently driven by a graphics user interface (GUI) front
end, simplifying configuration.
Laggards will
stick with the prior even-numbered Linux release (e.g. 2.4), happy that they
have an extraordinarily stable, well tested, well understood product.
Smaller open source projects, often the product of a single
individual, go through a similar process, albeit with a much smaller user community.
Eric Raymond describes how he evolved from a user of the open source e-mail
utility “fetchmail” (originally called “popclient” in an earlier incarnation)
into becoming its maintainer when the original developer got a life. Raymond
quickly found himself as the central point of contact to a community of about
250 users (no doubt a fraction of the total number of fetchmail users) who were
submitting bug reports, suggestions for new features, and bug fixes. The
product eventually reached maturity at which point bug reports became rare. But
at the project’s height, Raymond was releasing a new version about once a day.
Concluding Remarks
Open source development exploits the “Wisdom of Crowds”
phenomena in which a large base of collaborators contribute in one way or
another to a project, typically managed by the vision of a single coordinator.
It leverages off the fact that, thanks to the Internet, communication between
users and developers is cheap and fast, that computing resources, thanks to the
personal computer, are cheap enough that many users can run and test a software
product, and that, apparently, the needs of a vast number of computer users are
not being met by commercial, closed, proprietary, software.
The open source development process is remarkably market and
consensus driven when compared to the centrally planned process that drives
most closed software development. That such a process can generate software of
a quality equal to or surpassing commercial software, without resorting to the
formal (and expensive) requirements and testing efforts typified in closed
software development organizations, nor conforming to any but the lowest level
of the Capability Maturity Model (CMM) put forth by the Software Engineering
Institute (SEI), cannot be doubted. Les Hatton is a professor of Forensic
Software Engineering at the
Open source does not dominate the entire computing community
however. Windows is still the predominant platform for the desktop, and my
personal experience indicates it will stay that way for some time to come.
However, Windows, and Microsoft in general, has lost the battle for server operating
system share. Linux is the predominant platform for servers, and Apache has at
least 60% of the web server market share. Relatively new open source applications
like Asterisk, an open source PBX, are making in-roads in markets that have
been traditionally serviced by closed systems.
It is an interesting dynamic that so many companies have
“bet the farm” on open source by leveraging it extensively in their own closed products.
Often this is because no credible commercial alternative exists. These
companies are depending upon the kindness of strangers. The numbers seem to
suggest that much of this open source software is being produced by disaffected
developers whose salaries they pay to perform other work.
The reason most cited by developers for doing open source
development is to gain the respect of their peers. You have to wonder if, as
corporations treat their engineers more and more like disposable assets, an
increasing number of engineers will turn to writing open source software to get
the satisfaction they lack from their “day jobs”, while perhaps subsidizing
their employers who make use of this software.
You also have to wonder how much open source software could
be produced if the developers of such software were not being subsidized by
their “day jobs” developing closed software. Open source advocates like Eric
Raymond argue in favor of open source business models. Raymond cites business
models such as using open source software to sell closed software, to sell
hardware, or to sell services. I remain unconvinced, but open minded, and even
hopeful.
The 70% number cited by CIO Magazine implies that there is
an immense “natural resource” of technical talent going un-exploited by high
tech companies. Until upper management understands that engineers do not
respond to “shut up and row” as a motivation, these resources are probably not
only un-exploited, but un-exploitable, by their employers.
Bibliography
M. Fink, The Business and Economics of Linux and Open
Source, Prentice-Hall, 2003
FSF, http://www.fsf.org
L. Hatton, “Linux and the CMM”, http://www.leshatton.org/WEB_1199.html,
1999
C. Koch, “Your Open Source Plan”, CIO Magazine,
OSI, http://www.opensource.org
E. Raymond, The Cathedral & The Bazaar, O’Reilly,
2001
A. St. Laurent, Understanding Open Source & Free
Software Licensing, O’Reilly, 2004
J. Surowiecki, The Wisdom of Crowds, Random House,
2004
Author
John Sloan is a principal in the
Digital Aggregates Corporation (http://www.diag.com),
which specializes in applying object oriented design and implementation to
real-time, embedded, and device software applications. Digital Aggregates
maintains Desperado, a library of components for embedded applications,
amounting to about 60,000 lines of C++ open source code and documentation
licensed under a version of an FSF license. John is also a developer in R&D
of a multinational corporation that leverages open source for its own extensive
product line.
