K&K Software Lynxviewer

Ergebnis für URL: http://www.ibiblio.org/osrt/develpro.html
                       A Quantitative Profile of a Community of

                             Open Source Linux Developers

              Bert J Dempsey, Debra Weiss, Paul Jones, and Jane Greenberg

                     UNC Open Source Research Team [1](see note 1)

                       School of Information and Library Science

                      University of North Carolina at Chapel Hill

                        Chapel Hill, North Carolina 27599-3360

                                [2]osrt@metalab.unc.edu

                                    October 6, 1999

                                       Abstract

   Open source software, or free software, has generated much interest and debate in
   the wake of a number of high-impact applications and systems produced under open
   source models for development and distribution. Despite the high degree of
   interest, little hard data exists to-date on the membership of collaborative open
   source communities and the evolutionary process of their repositories. This paper
   contributes a baseline quantitative study of one of the oldest continuous
   repositories for the Linux open source project (the UNC MetaLab Linux Archives),
   including demographic information on its broad community of developers. Our
   methodology is a close examination of collection statistics, including custom
   monitoring scripts on the server, as well as an analysis of the contents of
   user-generated metadata embedded within the Archives. User-generated metadata
   files in a format known as the Linux Software Map (LSM) are required when
   submitting open source software for inclusion in non-mirrored portions of the
   MetaLab Linux Archives. The over 4500 LSMs in the Archives then provide a
   demographic profile of contributors of LSM-accompanied software as well as other
   information on this broad subset of the Linux community. To explore repository
   evolution directly, an instrumented Linux Archives mirror was developed, and
   aggregate statistics on content changes seen over a month-long period are
   reported. In sum, our results quantify aspects of the global Linux development
   effort in dimensions that have not been documented before now, as well as
   providing a guide for more detailed future studies.


   Introduction

   Open source development communities have successfully created, distributed, and
   continued to evolve many important software projects---the GNU project's
   utilities and libraries including the gcc compiler and Emacs editor, the Perl and
   Tcl languages, the Apache WWW server, and the Linux and FreeBSD operating system.
   Open source, or free software, means more than access to source code (see
   Appendix A), and there is not universal agreement on a single open-source
   development model. Nonetheless, the guiding principle for open source software is
   that, by sharing source code, developers cooperate under a model of rigorous
   peer-review and take advantage of "parallel debugging" that leads to innovation
   and rapid advancement in developing and evolving software products. Open-source
   licensing, moreover, ensures an open market in integration and support for these
   products downstream.

   Software production and distribution driven by the open source model thus has
   strong practical advantages as well as its strong appeal to those who, in Richard
   Stallman's words, see open source software in a "social advantage, allowing users
   to cooperate, and an ethical advantage, respecting the user's freedom. [3]"
   Advocates emphasizing the business reasons for adopting an open source model have
   engendered in recent years an on-going---and often acrimonious--- debate over the
   ultimate impact of open source communities. Some have proposed that free software
   methods leveraging the Internet represent an alternative economic model for
   engendering and managing robust software that will dramatically reshape the
   multi-billion dollar commercial software industry. Skeptics meanwhile continue to
   challenge the idea that the technical and organizational approach represented by
   open-source development can really scale up in the coming years and produce the
   robust software required for large-scale mainstream computing [1]. The stakes in
   this debate are clearly quite high.

   A prime difficulty in understanding and drawing conclusions about open source
   collaborative development has been the sketchy information available on exactly
   who participates in open source development and how their software archives
   evolve. This lack of information is understandable given the distributed, organic
   process of collaborative development in open source communities.

   The contribution of this paper is a baseline quantitative study of a broad
   community of developers within the Linux open source effort, which, due to its
   influence and increasing user base, is widely regarded as a cornerstone project
   for large-scale open-source development. Our work characterizes a very large
   repository of Linux-related materials and analyzes information embedded within
   the collection on the nature of its contributors. Derived from a variety of
   collection meta-data statistics, the data and analysis here supports the
   assertions that Linux community is indeed very vibrant, geographically diverse,
   and engaged in a broadening the quantity and scope of the freely available Linux
   software and documentation.

   Background on Open Source Development

   The genesis of the open-source model for software development and distribution
   goes back to the earliest days of software in university environments.
   Open-source software is an alternative term for "free software", which was
   popularized by the seminal Free Software Foundation, founded in 1984 by MIT
   researcher Richard Stallman. The Free Software Foundation is the parent
   organization for the GNU (GNU's Not Unix) project. Stallman's vision was to
   develop a free operating system, complete with standard software tools such as
   compilers, interpreters, text editors, mailers, and so forth, in order to
   recreate a community of cooperating hackers that he felt had been lost . Under
   his direction, the Free Software Foundation popularized the term "free software"
   as explained in the now-classic distinction, free as in "free speech", not "free
   beer". That is, free software may or may not be distributed with a monetary cost,
   but the knowledge that underlies the program, i.e., the source code, should be
   freely available in order to empower future innovation. Software source code is a
   form of scientific knowledge, and just as scientists publish so that other
   scientists can build on their results, computer scientists must publish their
   source code in order to foster continued innovation in computing.

   Unfortunately, the term "free software" has negative connotations for many in the
   commercial computing world, and the tone adopted by Stallman, the most prominent
   free software advocate for some time, was distinctly anti-business. In early
   1997, a group of leaders in the free software community decided to address this
   problem head-on with a marketing campaign designed to "argue for 'free software'
   on pragmatic grounds of reliability, cost, and strategic business risk". . They
   were goaded to action largely by frustration over what they felt was the
   unrecognized potential of free software as a driver of innovation and the basis
   for the development of commercial-grade software, despite the successes of
   Apache, Linux, and other projects. An initial decision of the group, which would
   become the Open Source Initiative, was to choose the term "open source" for their
   campaign to avoid the baggage being carried by the term "free software".

   A key component of Stallman's effort in developing a successful free software
   organization was to formulate a licensing agreement that would prevent businesses
   from taking free software and using it in binary-only redistributions for
   commercial gain. Stallman developed the GNU General Public License, known as the
   GPL or "copyleft", to address this issue. In subsequent years, other open-source
   efforts adopted variations on copyright statements designed to enable open-source
   works to thrive while not hampering the ability of developers to incorporate
   open-source work effectively [3](see note 2). For its part, The Open Source
   Initiative adopted a set of criteria, titled "The Open Source Definition", for
   open-source licensing. Based on an earlier document by Bruce Perens, the Open
   Source Definition explicitly mentions some example licenses that fit its
   criteria, including that of the GNU project (the GNU GPL), the Berkeley Unix
   Project (BSD), the X Consortium, and a few others. For reference, the Open Source
   Definition, Version 1.7, is reproduced in Appendix A.

   Linux: Open-Source Development on a Global Scale

   Internet connectivity has enabled the open-source notion of cooperative,
   peer-reviewed software development to be deployed on a global scale. Perhaps the
   most influential open-source project to-date has been and continues to be the
   Linux operating system. Linux began as a personal project of a graduate student
   in Finland, Linus Torvalds, in 1991. The Linux project now represents a mature
   operating system that runs on the popular hardware platforms. Linux is playing an
   increasingly significant role in the business plans of established computing
   companies, in university research labs, and in the development of a new set of
   companies focused on Linux support and integration issues. According to April
   1999 statistics at the Internet Operating System Counter site , Linux is now the
   operating system at over 30% of Internet server sites. Linux has been estimated
   to have 10 percent of the server market share in the Unix market with growth
   trends suggesting it will dominate the Unix arena in a few years.

   The Linux Kernel Project continues to be led by Linus Torvalds himself, with a
   significant array of co-developers throughout the world. In addition, the Linux
   community of application-level developers and documenters has grown in proportion
   with the rising tide of users and installed systems. In a recent interview, Linus
   has said that in the near future "the most exciting developments for Linux will
   happen in user space, not kernel space ". Thus, increasing focus and energy are
   now being directed towards creating applications and utilities that will spread
   the use and usefulness of the Linux kernel work.

   The focus of this paper is with this latter group, which we call Application
   Contributors to distinguish them from the contributors within the Kernel Project
   itself. A priori, the overlap between these Application Contributors and
   developers in the Kernel Project is unknown. Linus and other prominent Linux
   developers do show up in the set of Application Contributors. As a group, the
   Application Contributors would be expected to represent a broader range of
   contributors since their contributions are non-juried and include those who make
   small contributions of specific applications or utilities. (Application
   contributions can also be very large and complex programs, e.g., a Linux port of
   the Sendmail server.)

   To explore the nature of contributions by Application Contributors and their
   collective profile, the paper presents a discussion of collection and server
   statistics gleaned from one of the oldest and most comprehensive Linux repository
   site, metalab.unc.edu, run by UNC MetaLab (formerly the original SunSITE). With
   components of the repository at MetaLab mirrored from other key Linux sites
   (e.g., [4]www.redhat.com and kernel.org), the MetaLab collection includes
   virtually all Linux-related materials available on the WWW, including all major
   distributions of the base kernel code, the Linux Documentation Project
   (coordinated and hosted by MetaLab), and a large archive of contributed software
   and auxiliary materials. Our study specifically focuses on this latter portion of
   the MetaLab archives, and its contributors are, by definition, the community of
   Application Contributors that we profile.

   Towards this effort, we analyze the approximately 4500 user-generated metadata
   files (Linux Software Maps (LSMs)) embedded within the collection materials. The
   LSMs offer self-reported information on the demographics of Linux developers
   involved mostly in application-level tools and utilities (as opposed to the
   largely separate Linux Kernel Project) as recorded over a 5-year period. Also, we
   collected data on repository changes across a large portion of the MetaLab
   archives by instrumenting a month-long monitoring experiment on the MetaLab
   server. This experiment yielded data on the global pattern of change in
   repository contents.

   Below we briefly review the history of the Linux repository at MetaLab and the
   role of LSMs in the archive. The sections following then present our quantitative
   profile of the collective set of LSM-based contributors and patterns of content
   change.


   History of Linux Repository at UNC MetaLab

   Not long after Linus Torvalds released his first copy of the Linux kernel to the
   Internet in 1991, an American mirror of his FTP site was established at
   banjo.concert.net in the Raleigh-Durham area of North Carolina. However the Linux
   project was growing quickly and soon banjo was short of disk space. In 1992,
   Jonathan Magid, who had just become the student systems administrator of a new
   project called SunSITE at the University of North Carolina, agreed to take on not
   only the mirror but to collect contributions of Linux-related software. Since
   then the site, now MetaLab.unc.edu, has been a major resource for the Linux
   community--both developers and users.

   The over 125,000 files in the Linux repository at UNC MetaLab is available
   through FTP and HTTP under the /pub/Linux portion of the server. We will refer to
   these materials henceforth in this paper as the MetaLab (ML) Linux Archives or
   the MetaLab (ML) Linux repository. As noted above, our LSM-based analysis focuses
   on the subset of the ML Linux Archives in which LSMs play a prominent role,
   namely the /pub/Linux tree excluding the docs and distributions subdirectories.
   Along with other collections, /pub/Linux is served from high-performance machines
   connected with excellent high-speed Internet connectivity. Not infrequently,
   access counts to the ML Linux Archives via HTTP and FTP exceed 100,000
   transactions per 24-hour period.

   Contributions to the ML Linux Archives are required to be accompanied by a small
   metadata file in a format called the Linux Software Map (LSM). This convention
   arose naturally from the Linux community needs, as detailed in the history below,
   and as such is widely adhered to by contributors. Since our analysis relies in
   part on summary information taken from the set of all LSMs, we present a short
   overview that clarifies the origins and role of LSMs.

   History of Linux Software Maps

   Although the Linux Archives in 1992 was minuscule compared to its size today, the
   speed of access for many users and contributors was so slow that just downloading
   random but interesting files to see if they contained useful software or not was
   impractical. Jeff Kopmanis of Michigan decided that a small descriptive metadata
   file to be called a Linux Software Map or LSM should be associated with each
   entry [5](see note 3). After surfing several gopher sites and posting to various
   Linux newsgroups and an e-mail exchange with Jonathan Magid, Kopmanis took a
   close look at a working proposal to the IETF for metadata called Internet
   Anonymous FTP Archives or IAFA. Kopmanis was working from the 1992 version of
   IAFA, which he modified to better suit the specific needs of the Linux community.
   (Interestingly enough the IAFA metadata description has yet to become an RFC and
   is not widely used).

   Kopmanis slightly revised the LSM description (in a version noted by a beginning
   tag of Begin2) after receiving feedback from other users and contributors. Before
   changing jobs in August 1994, Kopmanis posted notes to Linux newsgroups asking
   for a new LSM keeper. Lars Wirzenius of Finland was selected as the new LSM
   keeper. He instituted several changes to the LSM including:
     * no limits on line lengths
     * less awkward multi-line format
     * nicer method (for the user) for specifying the FTP site and files
     * one entry for all files comprising a package

   This new LSM (noted by Begin3 as the first tag) went into effect on August 4,
   1994. The template is included in Appendix B.

   In October 1996, Wirzenius passed on the job of keeping the LSMs to Aaron Schrab
   of Indiana who has the job of LSM keeper presently. People with specific interest
   in LSMs are subscribed to the LSM-workers mailing list at execpc.com. This list
   is maintained by Schrab.

   Aaron Schrab notes in the LSM.README [6](see note 4):

   All entries have been entered by volunteers all over the world via email using
       the template below [in the file named LSM.README]. New versions [of the LSM]
       will appear first on sunsite.unc.edu and will be announced in the newsgroup
       comp.os.linux.announce. Discussions pertaining to the LSM will be held in the
       newsgroup, comp.os.linux.misc.

   Purpose of LSMs within MetaLab Linux Archives

   From their beginning, Linux Software Map entries were designed to help developers
   make their contributions highly available to users and to other developers by
   serving as finding aids as well as a standardized means of announcing new
   software (to comp.os.linux.announce and other newsgroups). LSMs also insure that
   authors are properly credited if and when their software is integrated into Linux
   distributions.

   LSMs are created according to the LSM metadata template consisting of 14 metadata
   elements, five of which are mandatory. The five mandatory fields are: Title,
   Version, Entered-date, Description, and Primary-site fields. Information on
   creating LSMs is kept at
   [7]http://metalab.unc.edu/pub/Linux/docs/linux-software-map/lsm-template. The LSM
   metadata schema is based on the IAFA (Internet Anonymous FTP Archives) metadata
   schema that was developed for Archie . As noted above, the LSM metadata schema
   has gone through a series of revisions that were initiated and overseen mainly by
   Jeff Kopmanis, with input from members of the Linux community, and it is now in
   its third revision.

   LSM generation permits the authors to record their expert knowledge about the
   resource that has been created, rather than a second hand representation, which
   is practiced with many other metadata schemas in the networked environment. When
   a contributor submits software to the MetaLab Linux Archives, he places the
   software and an associated LSM into [8]ftp://metalab.unc.edu/incoming/Linux/ This
   area is inspected daily by the Archives maintainers using a program called keeper
   which was written by Eric Raymond then modified by Miles Efron. The Linux
   Archivist using keeper reviews the LSM information and places the software and
   LSM in their correct home in the Archive. LSMs help the archivist replace older
   obsoleted versions of software by use of standard names and version numbers. The
   LSM is then forwarded to the LSM maintainer for inclusion in the definitive LSM
   list at execpc.com. The LSM list at execpc.com is regularly mirrored by major
   Linux sites worldwide including MetaLab [9](see note 5).

   Many, but not all by any means, contributors of Linux software use LSMs as their
   means of describing their software as they send announcements to
   comp.os.linux.announce and other newsgroups. Several LSM searches assist users
   and developers in finding software including:
     * [10]http://www.linux.org/apps/lsm.html (last updated 13-Nov-1998)
     * [11]http://www.boutell.com/lsm/ (keyword and title searching only)
     * [12]http://metalab.unc.edu/linsearch/ (complete LSM template-based searching
       with support for complex searches)


   Profiles of the ML Linux Archives and LSM-Based Contributors

   In this section we present data sets and analysis designed to provide selected
   quantitative characterizations of the open source contribution process as
   reflected in the ML Linux Archives. The data presented is derived from
   server-side programs that (1) summarize

                                   [primarysite.jpg]

   filesystem statistics of the repository, (2) analyze the contents of the over
   4,500 LSM files found in the ML Archives, and (3) monitored and recorded content
   changes over a month-long period. For the data reported on LSM content, the
   number of LSM files varies across statistics since some LSMs contain missing or
   unusable (e.g., a date field as "Thursday") metadata information in some of their
   fields.

   Role of the ML Archives in Linux Community

   Since MetaLab mirrors most other popular Linux archives including most
   distributions, MetaLab constitutes a superset of most archives and includes most,
   if not all, current LSM documented software. Evidence of the central role of the
   ML Linux Archives can be found by examining the Primary-Site field in LSMs. This
   field allows authors to specify the primary Internet server (and path on that
   server) on which their Linux contribution resides. As seen in Figure 1, 58% of
   all LSMs list metalab.unc.edu (or, the

                                    [lsmbydate.jpg]

   older domain name, sunsite.unc.edu) as the primary repository site, and the next
   most popular sites appear in 3% or less of the LSMs. This statistic may be biased
   by the fact that the ML Archives require an LSM with each submission whereas
   other repositories may not.

   Characteristics of the Content in the ML Linux Archives

   Table 1 provides summary statistics on the six top-level subdirectories in the ML
   Linux Archives with the most LSM metadata files. This summary was taken on
   September 21, 1999. At this time, the ML Linux Archives had a total file count of
   129,109 files with 4633 LSMs. However, most of these files (94, 401) are located
   in the distributions subdirectory of the archive where only 104 of the LSM files
   are found. LSM contributions may find their way into individual distributions of
   Linux, although this study does not investigate the extent to which this occurs.
   The distributions generally are using the Redhat Package Manager (over one-third
   of files under the distributions subdirectory have the extension .rpm) or other
   archiving tools to encapsulate related files and their metadata. Another 14,075
   files are under the docs subdirectory, where only 23 LSMs were located since LSMs
   are oriented towards metadata for software contributions.


                     Top-level directory in /pub/Linux on MetaLab

                                    Number of LSMs

                                     Total Number

                                  of Files and Bytes

                                       Number of

                                 .tar.gz or .tgz Files

   apps

                                         1312

                                  3904 files (994 MB)

                                         1382

   system

                                         1301

                                     3865 (391MB)

                                         1555

   X11

                                          397

                                     2495 (567MB)

                                         1039

   utils

                                          373

                                     1670 (218 MB)

                                          579

   games

                                          297

                                     896 (130 MB)

                                          356

   devel

                                          288

                                     1433 (1.3 GB)

                                          487

       Table 1: Six Subdirectories in ML Linux Archives Containing the Most LSMs

   Excluding the distributions and docs subdirectories, there are 4455 LSMs covering
   the 20,633 files in the remaining portion of the ML Linux Archives. Table 1 shows
   the top six subdirectories in terms of number of LSMs. It also presents the total
   number of files ending in .tar.gz or .tgz to indicate the high density of
   compressed file archives of this form, the most common format for submission of
   software source code.

                                 [lsmbyemailsuff.jpg]

                                [lsmcontribbyemail.jpg]

   Profile of LSM-Accompanied Contributions

   As noted earlier, the ML Archives require the Linux Software Map metadata file to
   accompany contributions sent to the ML Linux Archives. Figure 2 breaks down the
   LSMs in the ML Linux Archives by year for the LSM files that included full date
   information. In interpreting this graph, it is important to remember that the
   policy of the MetaLab archive has been to replace old LSMs when a new version of
   a software package arrives. Thus, this data is not an accurate longitudinal study
   of how many contributions have been made in which years. Rather, it shows that
   portions of the existing archive extend back to 1993, but many of the
   contributions have been added or updated in the recent past, e.g., almost
   one-fourth of the LSMs are additions or changes during the first 6 months of the
   1999 year.

   The LSM files also contain a field for including an author's email information.
   Figures 3 and 4 give a breakdown of this information by email suffix. The data
   shows that Linux Application Contributors indeed come from both the commercial
   and the educational world. Moreover, while com, edu, and net are difficult to
   accurately associate with geographical information, the demographics of
   contributors reveals a strikingly strong

                                 [numberoflsmcont.jpg]

   European influence within the Linux community. To clarify, Figure 4 presents the
   same data but with all suffixes representing European countries aggregated. Note
   that Figure 4 underrepresents European participation in Linux development since
   some authors with .com email suffixes are presumably located in Europe.

   A separate question of interest is how often Linux developers contribute to
   open-source software. We extracted a frequency count of authors' last names from
   the LSM files that contain author data and present this information in Figure 5.
   As seen there, the vast majority of LSM authors have contributed only one or two
   items, with only a very small number of developers having produced five or more
   contributions. Only thirteen contributors have ten or more contributions to their
   credit. This data then speaks to the breadth of the Linux developer community: as
   seen in the data for Application Contributors, the open-source development effort
   has not been dominated by a few very prolific developers, but rather, over time,
   many participants adding isolated contributions.

                                 [dailycontribsum.jpg]

   Studying the ML Linux Repository Dynamics

   As suggested in the data of Figure 2, the ML Linux Archives is an active
   repository where contributors continue to submit new and updated materials on a
   daily basis. To provide a baseline profile of the repository contribution
   process, specifically that driven by the Application Contributors, a repository
   monitoring experiment was undertaken.

   The experiment took the form of a local mirror that gathered change information
   from a portion of the ML Linux Archives on a twice-daily basis. Due to resource
   limitations, we restricted our observations to an 8-gigabyte subset of the ML
   Linux Archives, specifically the six subdirectories under /pub/Linux with the
   most LSM files (see Table 1) as well as the docs subdirectory. These directories
   then represent important areas where Application Contributors send
   LSM-accompanied software and documentation. The mirror script for the experiment
   ran twice a day, at 8:30 a.m. and 8:30 p.m., to synchronize with the source
   directories on metalab.unc.edu. Full details of the mirroring experiment are
   described in [13]http://ils.unc.edu/ils/research/reports/TR-1999-01.pdf.

   Figure 6 shows the rate of activity seen in our month-long monitoring experiment.
   As seen here, the Archives receive constant traffic from open source contributors
   updating existing materials and adding new items. At times, the process is very
   bursty, most probably due to external events such as a major release of new
   software as well as occasional update backlogs when MetaLab staff are off-line
   over holidays.

   More significantly, Figure 7 shows that about one-third of all activity observed
   in this month-long period resulted from changes to existing files in the
   repository, e.g., new releases of existing software utilities, whereas about
   two-thirds represents the addition of new files to the repository. Figure 8
   breaks out this activity by some common file types. We note that about 1.5% (59)
   of the LSMs in the Archives were updated in this month while over 4% (179) of the
   total were added in this month. In line with these changes then, many of the
   changes are seen to involve compressed archives using the .gz compression; in
   fact, over one-half of activity represents new or updated .gz files as software
   packages are added and changed. (Note that the standard practice at MetaLab has
   been to remove old files when new versions of open source software or
   documentation is submitted. The data in the figures does not track file deletions
   as a separate operation in the update process.)

   While our data is not conclusive, the data in Figures 7 and 8 corroborate that in
   Figure 2 to suggest that the ML Linux Archives is vibrant and contains little
   stale or outdated materials as a percentage of the entire collection. An
   interesting follow-on study would be to connect our baseline numbers directly
   with the question of how often do Application Contributors update (or,
   alternatively, abandon) their contributions after

                                   [newcontrib.jpg]

   creating them? Definitive information on this aspect of open source contribution
   would speak to the robustness of the open-source process over time and the extent
   to which the community is able to carry forward the work of other contributors
   from the past.

                                  [changebyfile.jpg]

   Conclusion

   This article reports on quantitative summaries of the large repository of
   Linux-related materials found in the MetaLab Linux Archives, one of the oldest
   continuous repositories for Linux materials. User-generated metadata files in a
   format known as the Linux Software Map (LSM) are required when submitting open
   source software for inclusion in the MetaLab Linux Archives. Our study examined
   information inside the over 4500 LSMs, along with longitudinal collection
   statistics from the MetaLab filesystem, in order to extract a quantitative
   profile of the large group of Linux developers represented there, a group we have
   called Application Contributors in the Linux community.

   Our results confirm a very broad participation at this level in the open source
   software associated with the Linux Archives. A strong bias towards European
   developers is revealed, reflecting the European roots of Linux perhaps, as well
   as a balance between .com and .edu contributors. Date information in LSMs and our
   month-long monitoring experiment reveal a very active repository where users
   submit many changes to existing files as well as adding new files. This
   phenomenon along with other questions raised by our baseline study warrant
   further investigation to develop a fuller understanding of the dynamics of
   large-scale cooperation over time, as exhibited in the Linux open-source
   development effort. Such work will shed light on what aspects of the Linux effort
   can be taken to be representative in determining the viability and robustness of
   large-scale open-source development projects in the future.

   References


   Appendix A

   The Open Source Definition, Version 1.7
       (reproduced from [14]http://opensource.org/osd.html)

   Open source doesn't just mean access to the source code. The distribution terms
   of open-source software must comply with the following criteria:

   1. Free Redistribution

   The license may not restrict any party from selling or giving away the software
   as a component of an aggregate software distribution containing programs from
   several different sources. The license may not require a royalty or other fee for
   such sale. [15](rationale)

   2. Source Code

   The program must include source code, and must allow distribution in source code
   as well as compiled form. Where some form of a product is not distributed with
   source code, there must be a well-publicized means of obtaining the source code
   for no more than a reasonable reproduction cost -- preferably, downloading via
   the Internet without charge. The source code must be the preferred form in which
   a programmer would modify the program. Deliberately obfuscated source code is not
   allowed. Intermediate forms such as the output of a preprocessor or translator
   are not allowed. [16](rationale)

   3. Derived Works

   The license must allow modifications and derived works, and must allow them to be
   distributed under the same terms as the license of the original software.
   [17](rationale)

   4. Integrity of The Author's Source Code.

   The license may restrict source-code from being distributed in modified form only
   if the license allows the distribution of "patch files" with the source code for
   the purpose of modifying the program at build time. The license must explicitly
   permit distribution of software built from modified source code. The license may
   require derived works to carry a different name or version number from the
   original software. [18](rationale)

   5. No Discrimination Against Persons or Groups.

   The license must not discriminate against any person or group of persons.
   [19](rationale)

   6. No Discrimination Against Fields of Endeavor.

   The license must not restrict anyone from making use of the program in a specific
   field of endeavor. For example, it may not restrict the program from being used
   in a business, or from being used for genetic research. [20](rationale)

   7. Distribution of License.

   The rights attached to the program must apply to all to whom the program is
   redistributed without the need for execution of an additional license by those
   parties. [21](rationale)

   8. License Must Not Be Specific to a Product.

   The rights attached to the program must not depend on the program's being part of
   a particular software distribution. If the program is extracted from that
   distribution and used or distributed within the terms of the program's license,
   all parties to whom the program is redistributed should have the same rights as
   those that are granted in conjunction with the original software distribution.
   [22](rationale)

   9. License Must Not Contaminate Other Software.

   The license must not place restrictions on other software that is distributed
   along with the licensed software. For example, the license must not insist that
   all other programs distributed on the same medium must be open-source software.
   [23](rationale)

   Conformance

   (This section is not part of the Open Source Definition.)

   We think the Open Source Definition captures what the great majority of the
   software community originally meant, and still mean, by the term "Open Source".
   However, the term has become widely used and its meaning has lost some precision.
   The OSI Certified mark is OSI's way of certifying that the license under which
   the software is distributed conforms to the OSD; the generic term "Open Source"
   cannot provide that assurance, but we still encourage use of the term "Open
   Source" to mean conformance to the OSD. For information about the OSI Certified
   mark, and for a list of licenses that OSI has approved as conforming to the OSD,
   see [24]this page.


   Bruce Perens wrote the first draft of this document as `The Debian Free Software
   Guidelines', and refined it using the comments of the Debian developers in a
   month-long e-mail conference in June, 1997. He removed the Debian-specific
   references from the document to create the `Open Source Definition'.


   Appendix B

   Annotated Linux Software Map Template
       (excerpted from
       [25]http://metalab.unc.edu/pub/Linux/docs/linux-software-map/lsm-template)

   Begin3

   Title: The name of the package. Please use the same title for the LSM entry of
   each version, so as to make it easier to find entries for new versions of
   packages that already have one in the data base.

   Version: Version number or other designation. Use a date if nothing else is
   appropriate.

   Entered-date: Date in format ddMMMyy of when the LSM entry was last modified,
   where dd is 2-digit day of month, MMM is ALL-CAPITALIZED first 3 English month
   letters, and yy is last two digits of the year in the Gregorian calendar. Note
   that you should fill in both Version and Entered-date.

   Description: Short description of the package.

   Keywords: A short list of carefully selected keywords that describe

   the package.

   Author: Original author(s) of package. In RFC822 format (i.e., something that
   will fit into a From: or To: header of anormal Internet mail message). Preferred
   format:

   mailname@site.domain.top (Full name)

   Other formats will be converted to this format, if time and energy of LSM
   maintainer will allow it. Multiple persons may be given, one per line.

   Maintained-by: Maintainer(s) of Linux port. Same format as Author.

   Primary-site: A specification of on which site, in which directory, and which
   files are part of the package. First line gives site and base directory, the rest
   give the sizes and names of all files. Names are either relative to the base
   directory, or full pathnames. If the ftp site does not use Unix style pathname
   syntax, then the full pathname must be given every time. The pathname must not
   contain spaces. Example:

   Primary-site: sunsite.unc.edu /pub/Linux/docs

   10kB lsm-1994.01.01.tar.gz

   997 lsm-template

   22 M /pub/Linux/util/lsm-util.tar.gz

   The file size may be given in bytes (no suffix), kilobytes (k, kb), or megabytes
   (M, MB). The suffix may be separated with spaces, and may be in upper case or
   lower case. The size can be left off.

   For very large packages that are contained within one directory (say, a
   distribution), only the directory need be listed. Adding a trailing slash makes
   it clear that it is a directory.

   The filename should be the final location, not an "incoming" directory. If you
   don't know the final location, at least make a good guess (since files _will_ be
   moved from incoming, it is not a good guess).

   Alternate-site: One alternate site may be given. It should not be a site that
   mirrors the primary site (these are best found from a list of mirror sites), but
   should be one that maintained separately. More sites carrying the package can be
   found using Archie. The syntax is the same as for Primary-site, but if there is
   only one line (i.e., no files are specified), they are assumed to be the same as
   for Primary-site.

   Alternate-site: ftp.funet.fi /pub/OS/Linux/doc/lsm

   Alternate-site: foo.bar /pub/lsm

   11 kB lsm-1994-01-01.cpio.Z

   0.1 kB lsm-template.Z

   22 MB lsm-util.tar.gz

   Original-site: The original package, if this is a port to Linux. Syntax is as in
   Primary-site, with the same handling of missing filenames as in Alternate-site.

   Platforms: Software or hardware that is required, if unusual. A C compiler or
   floppy disk would not be unusual, but a Python interpreter or tape drive probably
   would be. If the requirements are evident from the description, it need not be
   repeated here.

   Copying-policy: Copying policy. Use "GPL" for GNU Public License, "BSD" for the
   Berkeley style of copyright, "Shareware" for shareware, and some other
   description for other styles of copyrights. If the use or copying requires
   payment, it must be indicated.

   End


   Endnotes:

   1. See [26]http://metalab.unc.edu/osrt/ for related work by the Open Source
   Research Team. [27](Back)

   2. It is interesting to note that the GNU Emacs FAQ
   ([28]http://www.gnu.org/software/emacs/emacs-faq.text) dated February 1999 points
   out: "The real legal meaning of the GNU General Public License (copyleft) will
   only be known if and when a judge rules on its validity and scope. There has
   never been a copyright infringement case involving the GPL to set any
   precedents". [29](Back)

   3. Personal correspondence, Jeff Kopmanis, August 1999. [30](Back)

   4. [31]http://MetaLab.unc.edu/pub/linux/docs/LSM/LSM.README [32](Back)

   5. The numbers of active LSMs in the MetaLab archive actually outnumber those in
   the official LSM listing maintained by Aaron Schrab by about 500 entries in
   September 1999. [33](Back)

References

   1. http://www.ibiblio.org/osrt/develpro.html#endnotes
   2. mailto:osrt@metalab.unc.edu
   3. http://www.ibiblio.org/osrt/develpro.html#endnotes
   4. http://www.redhat.com/
   5. http://www.ibiblio.org/osrt/develpro.html#endnotes
   6. http://www.ibiblio.org/osrt/develpro.html#endnotes
   7. http://metalab.unc.edu/pub/Linux/docs/linux-software-map/lsm-template
   8. ftp://metalab.unc.edu/incoming/Linux/
   9. http://www.ibiblio.org/osrt/develpro.html#endnotes
  10. http://www.linux.org/apps/lsm.html
  11. http://www.boutell.com/lsm/
  12. http://metalab.unc.edu/linsearch/
  13. http://ils.unc.edu/ils/research/reports/TR-1999-01.pdf
  14. http://opensource.org/osd.html
  15. http://opensource.org/osd-rationale.html#clause1
  16. http://opensource.org/osd-rationale.html#clause2
  17. http://opensource.org/osd-rationale.html#clause3
  18. http://opensource.org/osd-rationale.html#clause4
  19. http://opensource.org/osd-rationale.html#clause5
  20. http://opensource.org/osd-rationale.html#clause6
  21. http://opensource.org/osd-rationale.html#clause7
  22. http://opensource.org/osd-rationale.html#clause8
  23. http://opensource.org/osd-rationale.html#clause9
  24. http://opensource.org/certification-mark.html
  25. http://metalab.unc.edu/pub/Linux/docs/linux-software-map/lsm-template
  26. http://metalab.unc.edu/osrt/
  27. http://www.ibiblio.org/osrt/develpro.html#note1
  28. http://www.gnu.org/software/emacs/emacs-faq.text
  29. http://www.ibiblio.org/osrt/develpro.html#note2
  30. http://www.ibiblio.org/osrt/develpro.html#note3
  31. http://MetaLab.unc.edu/pub/linux/docs/LSM/LSM.README
  32. http://www.ibiblio.org/osrt/develpro.html#note4
  33. http://www.ibiblio.org/osrt/develpro.html#note5
Usage: http://www.kk-software.de/kklynxview/get/URL
e.g. http://www.kk-software.de/kklynxview/get/http://www.kk-software.de
Errormessages are in German, sorry ;-)