Grouping#
Annexes - Data analysis - 3rd cycle#
This section contains some of the exploratory data analysis that supported the revision of the data model.
It is not relevant for the understanding of the proposed model, but may be informative for data providers involved in the testing phase of the 4th cycle dataflows.
About the EEA discodata service
The example queries can be executed interactively in the EEA discodata service.
Note that some queries may timeout.
If you need to analyse the entire European dataset, download it in CSV format.
For example, the link https://discodata.eea.europa.eu/download/WISE_WFD/v2r1/GWB_GroundWaterBody
downloads the data of the [WISE_WFD].[v2r1].[GWB_GroundWaterBody] table.
The data for this section can be derived from the following tables:
[WISE_WFD].[v2r1].[SWB_SurfaceWaterBody_QualityElement]
[WISE_WFD].[v2r1].[SWB_SurfaceWaterBody_QualityElement_qeGrouping]
[WISE_WFD].[v2r1].[Monitoring_MonitoringSite_ChemicalEcologicalQuantitativeMonitoring]
Published WISE Spatial datasets
To simplify the analysis, the information was extracted into a graph structure with nodes representing waterbodies, and edges representing the link between monitored waterbodies and the water bodies with grouping.
Show description
SELECT [countryCode]
,[fromNodeId] -- waterbody providing the monitoring data
,[toNodeId] -- waterbody being assessed
,[qeCode] -- quality element being assessed
FROM [edges]
-- Waterbodies as "nodes"
SELECT [countryCode] -- country code
,[nodeId] -- waterbody identifier
,[waterCategory] -- RW, LW, TW, CW
,[lat] -- latitude in WGS84
,[lon] -- longitude in WGS84
,[outDegree] -- number of waterbody X QE using monitoring data from this waterbody
,[inDegree] -- number of waterbody X QE providing monitoring data for the assessment of this waterbody
FROM [nodes]
-- "nodes" with qeStatus plus derived information
SELECT [countryCode] -- country code
,[nodeId] -- waterbody identifier
,[waterCategory] -- waterbody category
,[qeCode] -- quality element code
,[qeStatus] -- quality element status ('1','2','3','4','5','unknown',...)
,[assessmentMethod] -- assessment method
,[outDegree] -- number of waterbodies that are assessed based on data from this waterbody for the quality element
,[inDegree] -- number of waterbodies provide data for the assessment of this waterbody and quality element
,[isMonitored] -- Monitoring schema: 1 if the waterbody is monitored for the quality element, 0 otherwise,
FROM [nodes_qeStatus]
-- data from the Monitoring schema reporting
SELECT [countryCode] -- country code
,[nodeId] -- waterbody identifier
,[qeCode] -- quality element code
,[isMonitored] -- Monitoring schema: 1 if the waterbody is monitored for the quality element, 0 otherwise
FROM [nodes_qeCode_isMonitored]
Grouping and quality elements#
“Reporting on the status assessment of Quality Elements (QEs) is expected not only where monitoring results are available for specific water bodies but also for all water bodies for which this information is available (e.g. through grouping). A status value should, therefore, be given for each of the relevant QEs that have been assessed for the water body and subsequently used to classify the ecological status or potential of the water body.” – From the WFD Reporting Guidance 2022
Grouping with waterbodies that were not assessed by monitoring#
“Schema element: qeGrouping
Guidance on completion of schema element: Conditional. If no monitoring data is available for this quality element in this surface water body and status has been derived through grouping by extrapolating monitoring data from other surface water bodies, indicate the codes of the surface water bodies which have been monitored and used for the classification of this water body. For example, if the status of surface water body A has been determined by extrapolating monitoring data from surface water bodies B and C, then the euSurfaceWaterBodyCode for surface water bodies B and C should be reported in this element. Quality checks: Conditional check: Report if and only if qeMonitoringResults is ‘Grouping’. Within-schema check: Each water body code reported must be identical to a thematicIdIdentifier reported for surface water bodies in spatial data. Cross-schema check: The SurfaceWaterBodyCode belongs to a different RBD. Please check that for that euSurfaceWaterBodyCode and qeCode, the value of qeMonitoringResults is ‘Grouping’. Cross-schema check: If a euSurfaceWaterBodyCode is valid and belongs to the same RBD, then the following condition must be true: for that euSurfaceWaterBodyCode and qeCode, the value of qeMonitoringResults must be ‘Grouping’.” – From the WFD Reporting Guidance 2022
The “Cross-schema check” specifications are incorrect and inconsistent with the “Guidance on completion of schema element” that requires the extrapolation to be made from a waterbody with qeMonitoringResults = ‘Monitoring’ (and not ‘Grouping’). The implementation of the quality control (i.e. the code base) was imperfect:
For grouping across waterbodies in different river basin districts, only an indicative ERROR could be flagged (because the reporting was done at RBD scale). The resulting inconsistencies affect 11861 of 230115 records (5.2%), mostly in Greece.
For grouping with waterbodies in the same river basin district, the inconsistencies affect only 669 records (0.3%). The cause was possibly a failure in the quality control process, but it is not possible to trace it back at this point.
Show results
country |
qeCode |
wb in different RBD |
wb in same RBD |
waterbodies |
|---|---|---|---|---|
BE |
QE1-1 - Phytoplankton |
0 |
1 |
1 |
BE |
QE3-1-6-2 - Phosphorus conditions |
0 |
1 |
1 |
BG |
QE1-2-3 - Macrophytes |
0 |
2 |
2 |
BG |
QE1-2-4 - Phytobenthos |
0 |
1 |
1 |
BG |
QE1-4 - Fish |
0 |
1 |
1 |
EL |
QE1-1 - Phytoplankton |
0 |
2 |
2 |
EL |
QE1-2-1 - Macroalgae |
0 |
2 |
2 |
EL |
QE1-2-2 - Angiosperms |
0 |
9 |
9 |
EL |
QE1-2-3 - Macrophytes |
1577 |
1 |
1578 |
EL |
QE1-2-4 - Phytobenthos |
985 |
26 |
1011 |
EL |
QE1-3 - Benthic invertebrates |
1155 |
59 |
1214 |
EL |
QE1-4 - Fish |
1100 |
53 |
1153 |
EL |
QE2-2 - River continuity conditions |
1801 |
78 |
1879 |
EL |
QE2-3 - Morphological conditions |
977 |
104 |
1081 |
EL |
QE3-1-1 - Transparency conditions |
0 |
76 |
76 |
EL |
QE3-1-3 - Oxygenation conditions |
931 |
6 |
937 |
EL |
QE3-1-6-1 - Nitrogen conditions |
931 |
6 |
937 |
EL |
QE3-1-6-2 - Phosphorus conditions |
821 |
6 |
827 |
EL |
QE3-3 - River Basin Specific Pollutants |
1574 |
22 |
1596 |
ES |
QE1-2-3 - Macrophytes |
0 |
13 |
13 |
ES |
QE1-2-4 - Phytobenthos |
0 |
6 |
6 |
ES |
QE1-3 - Benthic invertebrates |
0 |
5 |
5 |
ES |
QE1-4 - Fish |
0 |
1 |
1 |
ES |
QE2-3 - Morphological conditions |
0 |
3 |
3 |
ES |
QE3-1-3 - Oxygenation conditions |
0 |
4 |
4 |
ES |
QE3-1-4 - Salinity conditions |
0 |
6 |
6 |
ES |
QE3-1-5 - Acidification status |
0 |
2 |
2 |
ES |
QE3-3 - River Basin Specific Pollutants |
0 |
3 |
3 |
HR |
QE1-2-1 - Macroalgae |
0 |
2 |
2 |
IE |
QE1-2 - Other aquatic flora |
0 |
36 |
36 |
IT |
QE1-2-3 - Macrophytes |
1 |
5 |
6 |
IT |
QE1-2-4 - Phytobenthos |
0 |
3 |
3 |
IT |
QE1-3 - Benthic invertebrates |
1 |
7 |
8 |
IT |
QE1-4 - Fish |
0 |
4 |
4 |
IT |
QE2-1 - Hydrological or tidal regime |
1 |
1 |
2 |
IT |
QE2-2 - River continuity conditions |
0 |
1 |
1 |
IT |
QE2-3 - Morphological conditions |
0 |
1 |
1 |
IT |
QE3-1-2 - Thermal conditions |
0 |
1 |
1 |
IT |
QE3-1-3 - Oxygenation conditions |
1 |
3 |
4 |
IT |
QE3-1-5 - Acidification status |
0 |
1 |
1 |
IT |
QE3-1-6-1 - Nitrogen conditions |
1 |
3 |
4 |
IT |
QE3-1-6-2 - Phosphorus conditions |
1 |
3 |
4 |
IT |
QE3-3 - River Basin Specific Pollutants |
3 |
7 |
10 |
LT |
QE2-2 - River continuity conditions |
0 |
1 |
1 |
LT |
QE3-1-3 - Oxygenation conditions |
0 |
1 |
1 |
LV |
QE3-1-6-1 - Nitrogen conditions |
0 |
1 |
1 |
LV |
QE3-1-6-2 - Phosphorus conditions |
0 |
1 |
1 |
SE |
QE1-1 - Phytoplankton |
0 |
7 |
7 |
SE |
QE1-4 - Fish |
0 |
3 |
3 |
SE |
QE3-1-1 - Transparency conditions |
0 |
4 |
4 |
SE |
QE3-1-5 - Acidification status |
0 |
58 |
58 |
SE |
QE3-1-6-2 - Phosphorus conditions |
0 |
14 |
14 |
SK |
QE3-3 - River Basin Specific Pollutants |
0 |
2 |
2 |
Show code
1SELECT [countryCode]
2 ,[qeCode]
3 ,SUM(IIF([to_euRBDCode] != [from_euRBDCode],1,0)) AS [numberInDifferentBasin]
4 ,SUM(IIF([to_euRBDCode] = [from_euRBDCode],1,0)) AS [numberInSameBasin]
5 ,COUNT(*) AS [totalNumber]
6 FROM
7(SELECT -- toNode
8
9 a.[countryCode]
10 ,a.[qeCode]
11
12 ,a.[euRBDCode] as [to_euRBDCode]
13 ,a.[euSurfaceWaterBodyCode] as [to_euSurfaceWaterBodyCode]
14 ,a.[surfaceWaterBodyCategory] as [to_surfaceWaterBodyCategory]
15 ,a.[qeMonitoringResults] as [to_qeMonitoringResults]
16 ,a.[qeGrouping] as [to_qeGrouping]
17
18 -- fromNode
19 ,b.[euRBDCode] as [from_euRBDCode]
20 ,b.[surfaceWaterBodyCategory] as [from_surfaceWaterBodyCategory]
21 ,b.[qeCode] as [from_qeCode]
22 ,b.[qeMonitoringResults] as [from_qeMonitoringResults]
23 ,b.[qeGrouping] as [from_qeGrouping]
24
25 FROM [WISE_WFD].[v2r1].[SWB_SurfaceWaterBody_QualityElement_qeGrouping] a
26
27 LEFT JOIN [WISE_SOW].[v2r1].[SWB_SurfaceWaterBody_QualityElement_qeGrouping] b
28 ON a.[countryCode] = b.[countryCode]
29 AND a.[qeGrouping] = b.[euSurfaceWaterBodyCode]
30 AND a.[qeCode] = b.[qeCode]
31 AND a.[cYear] = b.[cYear]
32
33 WHERE a.[cYear] = 2022 AND a.[hasDescriptiveData] = 1
34 AND a.[qeMonitoringResults] = 'Grouping'
35 --AND a.[qeCode] != 'QE3-3 - River Basin Specific Pollutants'
36 ) a
37
38 WHERE [from_qeMonitoringResults] != 'Monitoring'
39 GROUP BY [countryCode], [qeCode]
40 ORDER BY [countryCode], [qeCode]
Inconsistencies with the data in the Monitoring schema#
The information provided in the Monitoring schema might have provided an alternative data source to understand which waterbodies where actually monitored.
However, in that specific case there was no quality control specified between the grouping data and the Monitoring schema data.
Therefore, and as the results of the Table 70 show, the inconsistencies between the grouping data and the monitoring schema data are high. According to the grouping data, 3550 monitored waterbodies had their data extrapolated for the assessment of waterbodies not monitored. However, for 2225 of those waterbodies (62%) there is no evidence (in the Monitoring schema) of the waterbody having been monitored for (at least one of) the expected Quality Element.
qeCode |
countries |
waterBodies |
with inconsistency |
percentage |
|---|---|---|---|---|
QE1-1 - Phytoplankton |
13 |
247 |
29 |
11 |
QE1-2 - Other aquatic flora |
4 |
45 |
22 |
48 |
QE1-2-1 - Macroalgae |
3 |
29 |
3 |
10 |
QE1-2-2 - Angiosperms |
3 |
44 |
5 |
11 |
QE1-2-3 - Macrophytes |
11 |
684 |
125 |
18 |
QE1-2-4 - Phytobenthos |
11 |
1643 |
351 |
21 |
QE1-3 - Benthic invertebrates |
15 |
1875 |
306 |
16 |
QE1-4 - Fish |
14 |
1022 |
108 |
10 |
QE2-1 - Hydrological or tidal regime |
6 |
250 |
12 |
4 |
QE2-2 - River continuity conditions |
7 |
325 |
99 |
30 |
QE2-3 - Morphological conditions |
8 |
441 |
126 |
28 |
QE3-1-1 - Transparency conditions |
8 |
66 |
55 |
83 |
QE3-1-2 - Thermal conditions |
7 |
386 |
171 |
44 |
QE3-1-3 - Oxygenation conditions |
16 |
1616 |
1034 |
63 |
QE3-1-4 - Salinity conditions |
9 |
374 |
161 |
43 |
QE3-1-5 - Acidification status |
11 |
644 |
369 |
57 |
QE3-1-6-1 - Nitrogen conditions |
16 |
1717 |
1058 |
61 |
QE3-1-6-2 - Phosphorus conditions |
17 |
1800 |
1189 |
66 |
QE3-3 - River Basin Specific Pollutants |
13 |
1820 |
544 |
29 |
«all» |
19 |
3550 |
2225 |
62 |
Show code
1SELECT [qeCode]
2 ,COUNT(DISTINCT [countryCode]) AS [numberOfCountries]
3 ,COUNT(DISTINCT [nodeId]) AS [numberOfWaterBodies]
4 ,COUNT(DISTINCT IIF([isMonitored] = 0,[nodeId],NULL)) AS [number_withInconsistency]
5 ,COUNT(DISTINCT IIF([isMonitored] = 0,[nodeId],NULL))*100/ COUNT(DISTINCT [nodeId]) AS [percentage_withInconsistency]
6 FROM [nodes_qeStatus]
7 WHERE NOT ([outDegree] > 0 AND [inDegree] > 0)
8 AND [outDegree] > 0
9 GROUP BY [qeCode] WITH ROLLUP
10 ORDER BY [qeCode]
Basic errors#
countryCode |
from |
to |
qeCode |
|---|---|---|---|
IE |
IEWE_160_0700 |
IEWE_160_0700 |
QE1-2 - Other aquatic flora |
IT |
IT19CW04794 |
IT19CW04794 |
QE3-1-3 - Oxygenation conditions |
IT |
IT19CW04794 |
IT19CW04794 |
QE3-1-6-1 - Nitrogen conditions |
IT |
IT19CW04794 |
IT19CW04794 |
QE3-1-6-2 - Phosphorus conditions |
IT |
ITF015RWN011012156SABATO18SS3 |
ITF015RWN011012156SABATO18SS3 |
QE1-2-4 - Phytobenthos |
IT |
ITF015RWN011012156SABATO18SS3 |
ITF015RWN011012156SABATO18SS3 |
QE1-3 - Benthic invertebrates |
Show code
1SELECT [countryCode]
2 ,[fromNodeId]
3 ,[toNodeId]
4 ,[qeCode]
5 FROM [edges]
6 WHERE [fromNodeId] = [toNodeId];
country |
nodeId |
waterCategory |
qeCode |
qeStatus |
outDegree |
inDegree |
isMonitored |
|---|---|---|---|---|---|---|---|
IE |
IESW_060_0100 |
TW |
QE1-2 - Other aquatic flora |
3 |
1 |
1 |
0 |
IE |
IEWE_160_0700 |
CW |
QE1-2 - Other aquatic flora |
3 |
3 |
1 |
0 |
IT |
IT09CI_R000OM173CA |
RW |
QE3-1-6-2 - Phosphorus conditions |
2 |
1 |
1 |
0 |
IT |
IT09CI_R000OM173CA |
RW |
QE3-1-3 - Oxygenation conditions |
2 |
1 |
1 |
0 |
IT |
IT09CI_R000OM173CA |
RW |
QE3-1-6-1 - Nitrogen conditions |
2 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE3-1-5 - Acidification status |
1 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE3-1-2 - Thermal conditions |
1 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE3-1-3 - Oxygenation conditions |
1 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE1-3 - Benthic invertebrates |
2 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE3-1-6-2 - Phosphorus conditions |
1 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE3-1-6-1 - Nitrogen conditions |
1 |
1 |
1 |
0 |
IT |
IT10N0100121AF |
RW |
QE3-3 - River Basin Specific Pollutants |
1 |
1 |
1 |
0 |
IT |
IT19CW04794 |
CW |
QE3-1-6-2 - Phosphorus conditions |
2 |
1 |
1 |
0 |
IT |
IT19CW04794 |
CW |
QE3-3 - River Basin Specific Pollutants |
2 |
1 |
1 |
0 |
IT |
IT19CW04794 |
CW |
QE3-1-3 - Oxygenation conditions |
2 |
1 |
1 |
0 |
IT |
IT19CW04794 |
CW |
QE3-1-6-1 - Nitrogen conditions |
2 |
1 |
1 |
0 |
IT |
ITF015RWN011012156SABATO18SS3 |
RW |
QE1-3 - Benthic invertebrates |
5 |
1 |
1 |
0 |
IT |
ITF015RWN011012156SABATO18SS3 |
RW |
QE1-2-4 - Phytobenthos |
1 |
1 |
1 |
0 |
HR |
HRJMO013 |
CW |
QE1-2-1 - Macroalgae |
1 |
2 |
1 |
1 |
IT |
IT09CI_R000OM173CA |
RW |
QE3-3 - River Basin Specific Pollutants |
3 |
1 |
1 |
1 |
Show code
1 SELECT [countryCode]
2 ,[nodeId]
3 ,[waterCategory]
4 ,[qeCode]
5 ,[qeStatus]
6 ,[outDegree]
7 ,[inDegree]
8 ,[isMonitored]
9 FROM [WISE_Sandbox].[tmp_grouping].[nodes_qeStatus]
10 WHERE [outDegree] > 0 AND [inDegree] > 0
11 ORDER BY [isMonitored],[nodeId];
Subset of countries for further analysis#
The proportion of problematic records is relatively low: 1081 out of 88156 records (1.2%) and 343 out of 16958 waterbodies (2%).
Nevertheless, a decision was taken to further analyse only a subset of 8 countries that have data regarding grouping where none of above issues was detected.
From the subset of countries:
Austria, Romania and Portugal have the highest absolute number of water bodies where grouping was applied,
Austria uses several monitored waterbody for the assessment of a given waterbody. Other countries reported similar approaches (EL,ES,IT,SE,SK), but were not included in the subset, due to data quality issues.
The remaining countries use only one monitored waterbody for each waterbody assessed via grouping. Monitored waterbodies may supply data for the assessment of several non-monitored water bodies.
Show results
country |
qeCount |
wbMonitoring |
wbGrouping |
wbEither |
maxOutDegree |
maxInDegree |
|---|---|---|---|---|---|---|
AT |
6 |
1278 |
8002 |
8076 |
1718 |
25 |
CZ |
11 |
24 |
29 |
53 |
3 |
1 |
DK |
9 |
10 |
13 |
23 |
3 |
1 |
HU |
6 |
72 |
78 |
149 |
2 |
1 |
NL |
11 |
31 |
61 |
88 |
14 |
1 |
PT |
16 |
148 |
228 |
376 |
7 |
1 |
RO |
15 |
318 |
1497 |
1760 |
145 |
1 |
SI |
6 |
2 |
3 |
5 |
2 |
1 |
Show code
1SELECT [countryCode]
2 ,COUNT(DISTINCT qeCode) AS [qeCount]
3 ,COUNT(DISTINCT IIF([outDegree] > 0, nodeId,NULL)) AS [wbMonitoring]
4 ,COUNT(DISTINCT IIF([inDegree] > 0, nodeId,NULL)) AS [wbGrouping]
5 ,COUNT(DISTINCT nodeId) AS [wbEither]
6 ,MAX(outDegree) AS [maxOutDegree]
7 ,MAX(inDegree) AS [maxInDegree]
8 FROM [nodes_qeStatus]
9 WHERE [countryCode] NOT IN
10 (
11 SELECT [countryCode]
12 -- ,COUNT(*) as [count]
13 FROM [WISE_Sandbox].[tmp_grouping].[nodes_qeStatus]
14 WHERE ([outDegree] > 0 AND [inDegree] > 0) -- error
15 OR ([outDegree] > 0 AND [assessmentMethod] != 'Monitoring') -- also an error
16 GROUP BY [countryCode] )
17 GROUP BY [countryCode]