The Linking Open Data cloud diagram

This web page is the home of the LOD cloud diagram. This image shows datasets that have been published in Linked Data format, by contributors to the Linking Open Data community project and other individuals and organisations. It is based on metadata collected and curated by contributors to the Data Hub . Clicking the image will take you to an interactive SVG version, where each dataset is a hyperlink to its entry in Datahub.

The diagram is maintained by Andrejs Abele and John McCrae (Insight Centre for Data Analytics at NUI Galway). For any questions and comments, please email and The original version was developed by Richard Cyganiak and Anja Jentzsch.

Last updated: 2017-08-22

Can I use this diagram in my slides, paper, book? #

Yes. This work is available under a CC-BY-SA license. This means you can include it in any other work under the condition that you give proper attribution. If you create derivative works (such as modified or extended versions of the diagram), then you must also license them as CC-BY-SA.

Please give attribution along the following lines:

"Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak."

The diagram is available in PNG and SVG versions.

How can I get my dataset into the diagram? #

First, make sure that you publish data according to the Linked Data principles. We interpret this as:

  • There must be resolvable (or ) URIs.
  • They must resolve, with or without content negotiation, to RDF data in one of the popular RDF formats (RDFa, RDF/XML, Turtle, N-Triples).
  • The dataset must contain at least 1000 triples. (Hence, your FOAF file most likely does not qualify.)
  • The dataset must be connected via RDF links to a dataset that is already in the diagram. This means, either your dataset must use URIs from the other dataset, or vice versam. We arbitrarily require at least 50 links.
  • Access of the entire dataset must be possible via RDF crawling, via an RDF dump, or via a SPARQL endpoint.
  • Please add it to the Data Hub, the open registry of data and content packages. See the Guidelines for Collecting Metadata on Linked Datasets in the Data Hub for more details. (Before creating a new Data Hub record, please double-check whether a record already exists for your dataset.)

If you have any problem please contact John McCrae

Why is my dataset not included? #

See the question above—please make sure that it meets the criteria, is in the Data Hub, and that we know about it. Other possible reasons why we exclude some datasets are:

  • The dataset is published through a SPARQL endpoint, without resolvable entity URIs.
  • The dataset is published as an RDF dump, without resolvable entity URIs.
  • The dataset is a cache, copy or aggregation of existing RDF datasets without original data.
  • The dataset is a service that produces RDF in response to the client submitting some input data (other than an entity URI).
  • The dataset is not interlinked with other datasets. (This applies to several large-scale FOAF/SIOC/GoodRelations enabled websites.)

Datasets of these kinds are important and valuable. They are, however, outside of the scope that we (somewhat arbitrarily) choose to display in this particular diagram.

Are all these datasets really open? #

Probably not. Unfortunately, most publishers do not publish their data with an explicit license. This leaves re-users in the dark about the specific rights that are granted or reserved by the publisher.

Given this state of affairs, we take a liberal view of what we consider “open”. If the data is openly accessible from a network point of view – that is, it's not behind an authorization check or paywall – then we will probably add it to the Cloud.

Before using any data, you should always check the publisher's website for the terms and conditions. If you don't find anything, then the safest course of action is to assume that the publisher reserves all rights…

(Note that the Data Hub takes a stricter view on openness and considers a dataset “open” only if it has an explicit license that meets the Open Definition.)

Why don't you also show XYZ in the diagram? #

This diagram shows a particular perspective on the Web of Data. There are many other possible, perfectly valid, and valuable perspectives as well, that focus on other data formats, on other publishing methods, and on highlighting other aspects besides size, topic and interlinks. We chose to show this particular view, and encourage everyone to explore and visualise other views as well. See the Related Resources section for similar visualisations.

When will you update the diagram? #

We occasionally update the diagram. Ask us if you need a more precise answer.

Can I get the older versions? #


What exactly does the diagram mean? #

The image shows datasets that are published in Linked Data format and are interlinked with other dataset in the cloud.

The size of the circles corresponds to the number edges connected to each dataset. The numbers are calculated based on connected datasets in the diagram.

Circle sizeEdge count

The line indicate the existence of at least one link between two datasets. A link, for our purposes, is an RDF triple where subject and object URIs are in the namespaces of different datasets.

In the interactive version, color of the line indicates the direction of the link, e.g., if a link from A to B is green then it means that dataset A contains RDF triples that use identifiers from B, and if it is red, it means that dataset B contains RDF triples that use identifiers from A .

State of the LOD Cloud #

Some versions of the cloud have separate pages with statistics about these datasets. There is no plan to continue this for any future versions of the diagram.

Related resources #

Other projects provide deeper information on the datasets in the LOD Cloud diagram:

  • LODStats: Computes comprehensive statistics about publicly available RDF datasets, including many that in the LOD Cloud diagram.
  • Linked Open Vocabularies: Analyses the RDF vocabularies used in these datasets.
  • Mondeca Labs: SPARQL Endpoint Status: Keeps track of the uptime and health of SPARQL endpoints that provide access to these datasets

Here are some similar or related efforts that visualise the Web of Data on a high level.

