The license is an important element in open source projects as the license define acceptable use cases, user rights, and contribution guidelines. There are different ways to identify the license from the project source code such as SPDX headers, the LICENSE file, and the COPYING file. However an open source project may contain files from other projects and may use different licenses for different files.
Apertis has certain licensing expectations. In order to improve the accuracy of the licensing information, Apertis performs license scanning as part of it’s continuous integration process
Ensuring continuous maintenance of open source licence documentation
Maintaining the open source licenses documentation is an incremental process:
When Apertis is rebased on a new version of Debian or new packages are added, the licensing is checked for all packages involved. From a project perspective, Apertis teams tries to do a full scan on all projects at each release cycle.
During development, updates are monitored. The integration of a new project in Apertis and the update of source code are the operations that can result in the update of a license. New projects can be integrated at any time in Apertis. If new sources for a project already in Apertis are received: the licensing of the project can change, or the licensing for some distributables within this project can differ from the prevalent license.
Open source software shipped with devices that users buy adds significant licensing constraints to the software stack of preview and product releases. These constraints do not affect development releases, and it is possible to save some work on those releases.
Regular checks of the whole archive have been integrated into Apertis CI pipelines to provide early detection of any change to the licensing status of each package. A copyright report is generated and kept updated using scan-copyrights, helping Apertis maintainers to detect problematic licenses or missing information which may require a manual check.
Source code scanning with scan-copyrights
In order to validate the licensing of a package, the entire package source tree
needs to be scanned to detect and find copyright holders and known licenses for
each file. In order to achieve that, the scan-copyrights
tool has been
integrated to Apertis CI pipeline, rescanning and updating a copyright report
for each package on every commit.
Written in Perl, scan-copyrights
tool from
libconfig-model-dpkg-perl
uses licensecheck
to parse the source files, detect known licenses and copyright statements,
outputting the result in plain text or a Debian copyright file format.
Apertis packages keep an exhaustive copyright report in
debian/apertis/copyright
, containing information for every file in the
source tree. During this process, missing information and unacceptable
licenses are reported, which may require manual review from developers to
complete the package copyright report. Two files are used for this purpose:
-
debian/apertis/copyright.yml
: Contains a mapping YAML structure, where the key is a Perl pattern used to match a path, to manually provide the correct copyright information. See Filling_the_blanks. -
debian/apertis/copyright.whitelist
: Using git ignore format, lists files that will be ignored if reported with a missing/unacceptable license. Note that CI pipeline updates the copyright report with information for every file and will fail reporting on those problematic entries that weren’t whitelisted.
By default scan-copyrights
tries to create short reports by using wildcards to
create generic entries for folders.
Unfortunately, in some scenarios this also requires the use wildcards while
whitelisting or overriding licenses that do not comply with Apertis policies.
Since the use of wildcards in these scenarios could lead to errors, this approach
should only be used if it is strictly needed.
To prevent that, another file can be used to tell scan-copyrights
to not use
wildcards and report the license of every file, even if they all have the
same license in a given folder.
This is activated when the file debian/apertis/copyright-long
is present.
The content of the file is ignored but it good practice to write a comment in
it explaining why it is there.
Future improvements
FOSSology is a license reporting tool. It is being
integrated into Apertis as a
replacement for scan-copyrights
as part of an effort to enable end-to-end
tracking of licensing information. Although
scan-copyrights has helped a lot
on automating the process, the
approach using FOSSology will result in a finer grained and more reliable
license identification through to the identification of the licensing
applicable to each binary package.