In the EO4EU Platform, the knowledge graph functionality empowers users with an expanded search capability, enabling them to perform free text queries that tap into semantic search principles. This functionality grants access to a wide range of earth observation data in a way that is intuitive for both experts and non-experts, unlocking previously undiscovered resources.
The knowledge graph orchestrates the semantic processing of available datasets, sourced from diverse providers such as Copernicus Services and third-party platforms, as mentioned earlier. These datasets are often accompanied by textual descriptions. Moreover, the knowledge graph employs embedding techniques to convert these textual descriptions into structured representations that can be processed effectively. It is designed to provide an extended search capacity, allowing users to perform free text queries.
This semantic search capability enables users to enter natural language
queries, and search for earth observation data in a way that does not require specialised knowledge.This democratises access to earth observation data, making it accessible to both experts and non-experts alike. Users can explore and discover resources they might not have known existed, significantly expanding the reach and utility of earth observation data.
Using the EO4EU Knowledge Graph
- The search process is initiated through textual queries.
- Whether it's a user-generated query or an automatic request from another component via a dedicated API, the knowledge graph takes these queries and transforms them into internal representations in the form of vectors. These vectors represent the semantic essence of the query.
- The knowledge graph assesses the similarity between the internal representation of the user's query or component's request and the vectors generated for each dataset in the repository, and then sorts the results by their similarity score.
- This semantic matching process identifies datasets that align with the user's intent, presenting them as a list of results. By default, the knowledge graph considers all sources when trying to match the user query, however it is possible to make source-specific queries. Each result in the list includes the name of the dataset and its origin, as well as other metadata, offering users valuable context for their selection.
- After selecting a dataset of interest, users can further refine their selection of the contained data, by providing filters such as products, features, or other options. This refinement step ensures that users obtain data that aligns precisely with their needs.
- Finally, users can request the code necessary to access the selected dataset. The KG responds with the code corresponding to the user's chosen dataset and provides the appropriate API call for the relevant data provider. This feature streamlines the process of integrating the selected data into users' workflows, facilitating ease of use and efficiency.
Key benefits
- Linking and harmonising data from multiple sources
- Representing hierarchical and complex relationships
- Visualising data flows and community structures
- Adapting easily to schema changes
- Enabling advanced analytics with graph algorithms and machine learning
Importance in Earth Observation
Earth observation relies on data from satellites, sensors, and instruments to monitor our planet. Knowledge graphs integrate these diverse datasets, enabling cross-domain analysis and interdisciplinary research. They capture critical metadata—such as spatial, temporal, and sensor details—to improve data reliability and interpretation.
Unlike traditional platforms that store large, disconnected data dumps, knowledge graphs create a web of linked information, enhancing exploration and visualisation of Earth systems. This approach supports better decision-making and communication across scientific and policy communities.
Available Data
These sources encompass a wide range of data providers, including satellite missions, ground-based sensors, environmental monitoring networks, and other relevant repositories of earth observation data. The selection process has been carried out to ensure the inclusion of datasets that align with the project's objectives and can effectively support the eight (8) distinct use cases organised within the project framework. Complementary to existing sources, i.e. the Copernicus Services (ADS, CDS, CLMS, Marine, Sentinel, Data space), as well as third-party platforms including ADAM, Istat.it, INSPIRE, CMCC, FAO, ECMWF, NOAA, several datasets have also been included, as described in Table 6 to support the use case demonstration.
- Full list of available datasets (94)
-
ECMWF high resolution forecast dew point: This dataset consists of high-resolution weather forecasts generated by the European Centre for Medium-Range Weather Forecasts (ECMWF) that specifically focus on predicting dew point temperatures.
ECMWF high resolution forecast air temperature: This dataset comprises high-resolution weather forecasts produced by the ECMWF, with a primary focus on predicting air temperatures.
CAMS forecast O3: This dataset provides forecasts for atmospheric ozone (O3) concentrations. It is essential for understanding air quality, stratosphere-troposphere interactions, and the impact of ozone on human health and the environment.
CAMS forecast CO: CAMS Forecast Carbon Monoxide (CO) provides predictions for carbon monoxide levels in the atmosphere. Monitoring CO is crucial for assessing air quality and understanding its sources, such as vehicle emissions and biomass burning.
CAMS forecast SO2: This dataset offers forecasts for sulphur dioxide (SO2) concentrations in the atmosphere. SO2 is a significant air pollutant, and its monitoring is essential for environmental and health assessments.
CAMS forecast NO2: CAMS Forecast Nitrogen Dioxide (NO2) provides forecasts for nitrogen dioxide levels in the atmosphere. NO2 is a key component of urban air pollution and is linked to respiratory problems and environmental impacts.
CAMS forecast PM10: This dataset delivers forecasts for particulate matter with a diameter of 10 micrometres or less (PM10). PM10 includes fine dust particles that can affect air quality and human health.
CAMS forecast PM2.5: CAMS Forecast Fine Particulate Matter (PM2.5) offers predictions for very fine particulate matter with a diameter of 2.5 micrometres or less. PM2.5 is associated with respiratory and cardiovascular health issues and is a critical air quality indicator.
CAMS forecast birch: CAMS provides forecasts related to birch pollen levels, aiding individuals with allergies and healthcare professionals in managing allergic reactions during birch pollen seasons.
CAMS forecast olive: This dataset offers forecasts for olive pollen levels. It helps individuals with allergies prepare for periods of high olive pollen and take preventive measures.
CAMS forecast grass: CAMS forecast grass provides predictions for grass pollen levels. Grass pollen is a common allergen, and these forecasts assist allergy sufferers in planning and managing their symptoms.
CAMS forecast ragweed: This dataset provides forecasts for ragweed pollen levels. Ragweed is a significant source of allergies, and these forecasts are valuable for individuals seeking to minimise exposure.
CAMS forecast alder: CAMS forecast alder offers predictions for alder pollen levels. These forecasts are beneficial for individuals with allergies to alder pollen, helping them prepare for peak pollen seasons.
CAMS forecast mugwort: This dataset provides forecasts for mugwort pollen levels. Mugwort is a common allergen, and these forecasts assist individuals with allergies in taking precautionary measures during peak pollen times.
Sensory operational information (speed, draft, cons., power, fuel cons., etc): This dataset comprises essential sensory operational information related to maritime or transportation activities. It includes real-time data on various critical parameters such as vessel speed, draft (the depth of the ship's hull below the waterline), consumption rates (e.g., fuel consumption), power usage, and other relevant operational metrics.
Weather data (NOAA): This dataset includes comprehensive weather information sourced from the National Oceanic and Atmospheric Administration (NOAA). It encompasses meteorological data such as temperature, precipitation, humidity, wind speed, and more, providing critical insights for weather forecasting, climate research, and environmental monitoring.
AgERA5 2m temperature: The AgERA5 dataset offers precise records of 2-metre air temperature. It is particularly useful for agricultural and environmental applications, enabling the monitoring of temperature variations that impact crop growth, frost risk, and climate-related decision-making.
AgERA5 Precipitation flux: AgERA5 Precipitation Flux provides data on precipitation patterns and amounts. This information is vital for agriculture, hydrology, and water resource management, aiding in drought monitoring, flood prediction, and irrigation planning.
AgERA5 2m relative humidity: This dataset records relative humidity levels at a 2-metre height above the ground. It plays a crucial role in agriculture and climate studies by assessing moisture levels in the air, which influence crop health and evaporation rates.
AgERA5 10m wind speed: AgERA5 10m wind speed data offers insights into wind speeds at a 10-metre elevation. This information is invaluable for wind energy assessments, weather forecasting, and understanding wind patterns affecting agriculture, forestry, and construction industries.
AgERA5 solar radiation flux: The AgERA5 solar radiation flux dataset measures incoming solar radiation. It is indispensable for solar energy generation assessments, weather forecasting, and environmental modelling, aiding in understanding solar energy availability and its impact on ecosystems.
Sentinel-1 GRD: Sentinel-1 GRD is a dataset generated by the Sentinel-1 satellite mission, providing ground range detected synthetic aperture radar (SAR) imagery. It is used for applications like monitoring changes in Earth's surface, including land deformation, sea ice detection, and disaster management.
Sentinel-2 L1C - L2A: Sentinel-2 offers Level-1C (L1C) and Level-2A (L2A) products. L1C includes top-of-atmosphere reflectance data useful for various earth observation applications, while L2A provides atmospherically corrected and cloud-screened imagery, making it ideal for land cover classification, vegetation monitoring, and land use analysis.
PRISMA: This dataset includes a wealth of hyperspectral imagery and associated data collected by the PRISMA satellite (an earth observation satellite mission developed and operated by ASI). The hyperspectral data allows for precise and comprehensive analysis of various Earth phenomena, including agriculture, forestry, land use, environmental monitoring, and more.
ESA World Cover: The ESA World Cover dataset offers global land cover information derived from satellite imagery. It provides high-resolution land cover classification data, aiding in environmental modelling, biodiversity assessments, and urban planning.
Sicily CTR (Regional Technical Chart): Sicily CTR, or Regional Technical Chart, is a geospatial dataset providing detailed geographic information about the region of Sicily. It includes topographic, cartographic, and spatial data, serving various purposes in land management, navigation, and urban planning within Sicily.
ISTAT crop and production: The ISTAT (Italian National Institute of Statistics) Crop and Production dataset offers comprehensive information on agricultural crops and production in Italy. It includes data on crop types, yields, and production volumes, supporting agricultural planning, market analysis, and policy decision-making.
EURO-CORDEX (historical simulations and projections) - Temperature: This dataset comprises historical climate simulations and future climate projections for temperature across the European region. It provides valuable insights into past temperature trends and forecasts temperature changes, aiding climate research, adaptation strategies, and policy development.
EURO-CORDEX (historical simulations and projections) - Precipitation: This dataset offers historical climate simulations and future projections of precipitation patterns in Europe. It is crucial for understanding past precipitation trends and anticipating changes in precipitation, which have implications for water resource management, agriculture, and flood risk assessment.
EURO-CORDEX (historical simulations and projections) - Relative humidity: EURO-CORDEX provides historical climate simulations and projections of relative humidity levels in Europe. This data is essential for studying changes in humidity patterns, assessing their impact on ecosystems, and supporting climate adaptation efforts.
EURO-CORDEX (historical simulations and projections) - wind speed: This dataset includes historical climate simulations and future projections of wind speed across Europe. It is valuable for analysing wind patterns, understanding their influence on renewable energy resources and weather-related risks, and planning for wind energy production.
EURO-CORDEX (historical simulations and projections) - Solar radiation: EURO-CORDEX offers historical climate simulations and projections of solar radiation levels in Europe. This data is critical for assessing solar energy potential, optimising solar power generation, and studying the effects of solar radiation on various environmental and economic factors.
CMCC VHR (historical simulations and projections) - Temperature: This dataset includes historical climate simulations and future climate projections for temperature at a very high resolution. It provides detailed information on past temperature trends and offers forecasts of temperature changes, facilitating climate research and adaptation planning.
CMCC VHR (historical simulations and projections) - Precipitation: This dataset offers historical climate simulations and future projections of precipitation patterns at a very high resolution. It is essential for analysing past precipitation trends and predicting changes in precipitation, which have significant implications for water resource management, agriculture, and flood risk assessment.
CMCC VHR (historical simulations and projections) - Dew point temperature: This dataset provides historical climate simulations and projections of dew point temperature at a very high resolution. Dew point temperature is a key parameter for understanding humidity levels and atmospheric moisture content, impacting various environmental and meteorological processes.
CMCC VHR (historical simulations and projections) - Wind speed: This dataset includes historical climate simulations and future projections of wind speed at a very high resolution. It offers detailed information on wind patterns, enabling the assessment of their influence on renewable energy resources, weather phenomena, and wind energy production.
CMCC VHR (historical simulations and projections) - Solar radiation: This dataset comprises historical climate simulations and projections of solar radiation levels at a very high resolution. It is crucial for assessing solar energy potential, optimising solar power generation, and studying the effects of solar radiation on various environmental and economic factors.
ERA5-Land - Temperature: This dataset provides temperature data derived from the ERA5-Land reanalysis. It includes historical temperature records at a high spatial and temporal resolution, offering insights into past temperature trends.
ERA5-Land - Precipitation: ERA5-Land Precipitation offers historical precipitation data at a detailed spatial and temporal resolution, aiding in the analysis of past precipitation patterns.
ERA5-Land - Radiation: This dataset includes historical radiation data derived from the ERA5-Land reanalysis. It provides information on solar radiation, which is essential for various applications, including renewable energy assessment and climate studies.
ERA5-Land- VPD: ERA5-Land VPD provides historical data on vapour pressure deficit, a critical parameter for understanding atmospheric moisture. It is valuable for studies related to agricultural and environmental processes.
CERRA - Temperature: This dataset offers temperature data as part of the Climate and Environmental Retrieval and Retrieval from Archives (CERRA) project. It provides historical temperature records at a high resolution.
CERRA- Precipitation: CERRA Precipitation offers historical precipitation data derived from the project's retrieval efforts, aiding in the analysis of past precipitation patterns.
CERRA - Radiation: This dataset includes historical radiation data derived from the CERRA project, offering insights into solar radiation levels over time.
CERRA - VPD CERRA VPD provides historical data on vapour pressure deficit, contributing to atmospheric moisture analyses and environmental studies.
EURO-CORDEX (historical simulations and projections) - VPD: This dataset includes historical climate simulations and future projections of Vapor Pressure Deficit (VPD) for the European region. VPD is crucial for understanding atmospheric moisture conditions and their impact on ecosystems.
SOILGRIDS: SOILGRIDS provides information on soil properties and characteristics at a global scale. It is used for soil-related research, land management, and environmental modelling.
ERA5-Land - Available Soil Water: This dataset offers historical records of available soil water content, aiding in the assessment of soil moisture levels.
CERRA - Available Soil Water: CERRA Available Soil Water provides historical data on soil moisture content, supporting studies related to soil and hydrology.
ESDB- Available Soil Water: ESDB (European Soil Database) - Available Soil Water offers information on soil water content, which is critical for agricultural and environmental research.
Site soil fertility rating: This dataset provides an assessment or rating of soil fertility at specific locations or sites. Soil fertility rating typically takes into account various soil properties, such as nutrient content, pH levels, organic matter content, and texture, to determine how suitable the soil is for supporting plant growth and agriculture.
EU-DEM: EU-DEM provides detailed elevation data for the European region, supporting topographic analysis and geospatial applications.
PROFUND - CO2 Content: PROFUND offers data on carbon dioxide (CO2) content, contributing to studies on greenhouse gases and climate change.
Soil class: This dataset provides information about soil classifications within the geographical region of Austria. Soil classes categorise soils based on their properties, such as texture, composition, and structure, and are typically used for land use planning, agriculture, and environmental management.
Seedling mass (species specific): This dataset provides information about the mass or weight of seedlings of various plant species. This dataset includes measurements of the mass of seedlings at specific stages of their development, such as during germination or early growth.
Stand mass (species specific): This dataset provides information about the mass or biomass of plant stands or vegetation communities, with a focus on specific plant species within those stands. This dataset includes measurements of the biomass of different plant species within a defined area or stand, often in forests or other natural ecosystems.
WF (species specific): This dataset contains detailed information about the biomass or mass of foliage (leaves) for various plant species during the initial stages of their growth. This dataset offers species-specific data, allowing researchers and ecologists to precisely quantify and compare the amount of foliage produced by different plant species at the start of their life cycles.
WR (species specific): This dataset provides detailed information about the biomass or mass of plant roots, specifically at the initial stages of a plant's growth cycle. This dataset is species-specific, meaning it offers data on the mass of roots for various plant species.
WS (species specific): This dataset contains detailed information about the biomass or mass of plant stems at the initial stages of a plant's growth cycle. This dataset is species-specific, meaning it provides data on the mass of stems for various plant species.
Stocking (species specific): This dataset provides specific information about the initial population or density of plant species within a given area or ecosystem. This dataset offers data on the number, count, or density of individual plants or trees belonging to different species when they are initially established in a particular location.
ASW: The ASW (Available Soil Water) dataset provides information about the amount of water that is available for plant use in the soil. It represents the portion of soil moisture that can be accessed by plant roots for growth and development.
Silvicultural "events": Fertilising data: This dataset contains information about specific events or activities related to forest management, specifically focusing on fertilisation. Silviculture is the practice of managing forests, and in this context, the dataset tracks and records details about fertilisation activities in forested areas.
Silvicultural "events": Min ASW data: This dataset contains information related to silvicultural (forest management) activities, specifically focusing on monitoring and recording minimum available soil water (ASW) levels. ASW represents the amount of water available for plants in the soil, and this dataset tracks the lowest ASW values observed during various forest management events or operations.
Silvicultural "events": Irrigation data: This dataset contains information about specific irrigation activities carried out in forested areas as part of silvicultural (forest management) practices. This dataset includes details such as the timing, location, method, and volume of irrigation applied to forested lands.
Silvicultural "events": Thinning data: This dataset contains information about thinning activities conducted in forested areas as part of silvicultural (forest management) practices. Thinning involves the selective removal of certain trees or vegetation to improve the health and growth of the remaining trees. This dataset includes details such as the date of thinning, the location within the forest where thinning occurred, the species of trees that were thinned, the number of trees removed, and the size or age of the trees involved in the thinning operation.
Silvicultural "events": Defoliation data: This dataset contains information about instances of defoliation events in forested areas. Defoliation refers to the removal or damage of leaves or foliage on trees, often caused by factors such as insect infestations, diseases, or environmental stressors. This dataset includes details about the type of defoliating agent (e.g., insects, pathogens), the affected tree species, the location and extent of the defoliation event, and the timing of the event.
Species specific parameters: This dataset contains a collection of data and information that is uniquely tailored to individual species. This dataset includes a wide range of parameters, characteristics, or attributes that are specific to each species, such as biological traits, ecological requirements, genetic information, and habitat preferences.
Point scale gauges (retrieved observations): This dataset consists of data collected from point-scale measurement devices, such as weather gauges or sensors, that retrieve specific observations at localised points within a geographical area. This dataset includes a range of retrieved observations, which may encompass meteorological data like temperature, humidity, precipitation, wind speed, and atmospheric pressure, as well as environmental parameters such as soil moisture, water levels, or air quality measurements.
Point scale gauges (processed observations): This dataset comprises data collected from point-scale measurement devices, such as weather gauges or sensors, which have been processed and analysed to extract meaningful information. This dataset includes processed and quality-controlled measurements of various environmental variables, such as temperature, humidity, precipitation, wind speed, and atmospheric pressure. The data have undergone calibration, error correction, or data filtering to ensure accuracy and reliability.
Point scale gauges (digitised observations): These datasets comprise digitised observations from point scale gauges, providing ground-level meteorological and environmental data.
UERRA (reanalysis): UERRA is a reanalysis dataset offering high-resolution historical climate data.
E-OBS (interpolation): E-OBS provides interpolated climate data for Europe, assisting in climate studies and regional climate assessments.
Climate (sub-)seasonal forecasts: These datasets include climate forecasts at sub-seasonal timescales, aiding in seasonal weather predictions.
Copernicus Land vegetation products: These products offer information on land vegetation cover and health, supporting ecological and land use studies.
Copernicus Land land cover products: These products provide land cover information, aiding in land use and land change analyses.
Soil products (SOILGRIDS and/or ESDB): These datasets offer comprehensive information on soil properties and characteristics.
OpenStreetMap - roads and railways: OpenStreetMap data includes information on roads and railways, supporting navigation, transportation, and urban planning.
ERA5 Land LAI (leaf area index): This dataset provides information on leaf area index, a critical parameter for vegetation studies and environmental modelling.
FAO locust location reference swarms: These datasets offer reference information on locust swarms, supporting pest management and agricultural assessments.
FAO locust location reference hoppers: Similar to the above, these datasets provide reference information on locust hoppers.
ECMWF high resolution forecast total precipitation: This dataset contains detailed and high-resolution forecasts of total precipitation generated by the European Centre for Medium-Range Weather Forecasts (ECMWF). This dataset provides predictions of the amount of precipitation, such as rain or snow, expected to fall over specific geographic regions and time periods.
ECMWF high resolution forecast solar radiation: This dataset contains detailed and high-resolution forecasts of solar radiation generated by ECMWF. This dataset provides predictions of the amount of solar energy or sunlight that is expected to reach the Earth's surface over specific geographic regions and time periods.
ECMWF high resolution forecast soil water content: This dataset consists of detailed and high-resolution forecasts of the moisture content in the soil generated by ECMWF. This dataset provides predictions of the amount of water present in the soil at various depths and locations over specific geographic regions and time periods.
Seasonal weather forecast - temperature: These datasets offer seasonal weather forecasts, including temperature predictions.
Seasonal weather forecast - solar radiation: These datasets include seasonal forecasts of solar radiation levels.
Seasonal weather forecast - soil water content: These datasets offer seasonal forecasts of soil water content, aiding in agricultural planning and drought prediction.
ENDVI10_metop LST: This dataset provides information on land surface temperatures (LST) as measured by the ENDVI10_metop sensor. It offers insights into the temperature of the Earth's surface, which is crucial for climate studies, agriculture, and environmental monitoring.
ENDVI10_metop NIR SWIR: This dataset includes near-infrared (NIR) and shortwave infrared (SWIR) data captured by the ENDVI10_metop sensor. NIR and SWIR data are valuable for a range of applications, including vegetation health assessments and geological studies.
ENDVI10_metop NIR SWIR archive: This dataset archives historical near-infrared and shortwave infrared data from the ENDVI10_metop sensor. It allows for retrospective analysis and long-term studies of Earth's surface characteristics.
ENDVI10_metop NDVI: This dataset provides measurements of the Normalized Difference Vegetation Index (NDVI) from the ENDVI10_metop sensor. NDVI is widely used to assess vegetation health, land cover changes, and environmental conditions.
ENDVI10_metop NDVI archive: This dataset archives historical NDVI measurements from the ENDVI10_metop sensor. It facilitates long-term analyses of vegetation dynamics and environmental changes.
Eumetsat LSA-SAF FRMv2: This dataset offers land surface analysis data from the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) Land Surface Analysis Satellite Application Facility (LSA-SAF). It includes various land surface parameters for meteorological and environmental applications.
Static regional vegetation fire danger: This dataset focuses on assessing and presenting the potential risk and danger of vegetation-based wildfires within specific regional areas in Sicily. This dataset includes static or long-term evaluations of factors like vegetation types, fuel loads, weather conditions, topography, and historical fire data to estimate the likelihood and severity of wildfires in different regions of Sicily, Italy.
FIRMS Fire HotSpots: FIRMS Fire HotSpots provides information on fire hotspots and fire-related data. It is a valuable resource for monitoring wildfires, assessing fire risk, and supporting fire management efforts.