1. Useful Links
  • Homepage
  • Workpackages
    • Project Management & Coordination
    • Communication
    • AI/ML Lab
    • Ecosystem Monitoring
    • Standards, Methodology & Implementation
    • Knowledge Repository
    • Earth Observation Data
    • Editing
    • Imputation
    • Text to Code
    • Supply Chain Networks
    • Large Language Models
    • Synthetic Data
  • Useful Links
  • News
  • Deliverables & Tasks
  • About Project

On this page

  • Useful Links
    • Earth Observation
    • Programming
    • Quality
    • Other resources
    • USGS
  • Edit this page
  • View source
  • Report an issue

Useful Links

The following links lead to external trainings, tutorials, projects, documents, and papers related to topics covered by Working Packages of AIML4OS project

Earth Observation

  • Introduction to Machine Learning for Earth Observation
  • The OECD Laboratory for Geospatial Analysis
  • ESA’s Newcomers Earth Observation Guide | Eurostat CROS
  • Copernicus Data Space Ecosystem | Europe’s eyes on Earth
  • EUSPA - European Union Agency for the Space Programme (requires login)
  • Copernicus MOOC – Learn to harness the power of space data
  • Basics of Optical Remote Sensing – EO4GEO
  • Image processing and analysis – EO4GEO
  • FAO Webinar Series: Earth observation data for agricultural statistics
  • ESA - EO science for society
  • 12th ESA Training Course on Earth Observation 2022
  • Space4Climate (UK Agency) activities enable a seamless supply chain of climate data from space assets; helping to identify end user requirements and facilitate trusted climate services development to meet these, promoting global economic and societal benefit.

EO resources without links

  • 2021_ EO_Use of satellite data - file missions, file indices 1 and 2
  • From strategies to practical use of Earth observation data for official statistics (2023)
  • Statistical Cartography (2024)
  • Integration of statistics and geospatial information – From geocoding to statistical maps (2024)
  • Advanced Earth observation (2024)
  • Usage of the Copernicus data space ecosystem for earth observation data for official statistics (2025)
  • Educational materials from the 12th ESA Training Course on Earth Observation 2022 held in Riga, Latvia, from 27 June – 01 July 2022.

Programming

  • Insee - Best programming practices with Git and R
  • Lino Galiana from Insee - Data science with Python
  • Insee - Introduction to MLOps with MLflow (slides)
  • Insee - Putting data science projects into production
  • Insee - Introduction to ensemble algorithms
  • Python Data Science Handbook
  • R for Data Science
  • Standford Machine Learning Notes
  • An Introduction to Statistical Learning with Applications in R
  • UNECE - Machine Learning for Official Statistics
  • Puts, Daas - Machine Learning from the Perspective of Official Statistic
  • UNECE - A quality framework for statistical algorithms
  • A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation
  • UNECE - Organisational aspects of implementing ML based data editing in statistical production
  • UNECE – HLG-MOS Machine Learning Project Edit and Imputation Theme Report

Quality

  • Van Delden, A., J. Burger, and M. Puts. 2023. ‘‘Ten Propositions on Machine Learning in Official Statistics.’’ AStA Wirtschafts- und Sozialstatistisches Archiv 17 (3): 195–221. DOI: https://doi.org/10.1007/s11943-023-00330-0
  • Kowarik, A., et al (2020). “Deliverable K3: Revised Version of the Quality Guidelines for the Acquisition and Usage of Big Data”, Work Package K Methodology and quality, ESSnet Big Data II
  • Reinert, R., et al (2016). “Work Package 1: Checklist for Evaluating the Quality of Input Data”, ESSnet KOMUSO
  • De Waal, T., et al (2019). “Quality measures for multisource statistics.” Statistical Journal of the IAOS 35.2
  • Kowarik, A., et al (2025). “Deliverable 4.5: Quality Guidelines for acquiring and using web scraped data”, Work Package 4 Methodology and quality, ESSnet Trusted Smart Statistcs – Web IntelligenceNetwork
  • AI Act: High Level Summary
  • Piela R., (2024). “Incorporating AI into Statistical Standards: Enhancing GSBPM with (Generative) AI” UNECE|ModernStats World Workshop, October 2024
  • GPAI models gudelines
  • Saidani, Y., Dumpert, F., Borgs, C., Brand, A., Nickl, A., Rittmann, A., Rohde, J., Salwiczek, C., Storfinger, N., and Straub, S. 2023. “Qualitätsdimensionen maschinellen Lernens in der amtlichen Statistik.” AStA Wirtschafts- und Sozialstatistisches Archiv 17: 253–303. DOI: https://doi.org/10.1007/s11943-023-00329-7
  • Y. Saidani, F. Dumpert, Quality dimensions and quality guidelines for machine learning in official statistics, in Foundations and Advances of Machine Learning in Official Statistics, F.Dumpert, Chap.4 (Springer, Berlin, 2025). DOI: https://doi.org/10.1007/978-3-032-10004-7_4
  • United Nations Economic Commission for Europe (UNECE) (2021) Machine learning for official statistics
  • Yung W, Karkimaa J, Scannapieco M, Barcarolli G, Zardetto D, Sanchez JAR, Braaksma B, Buelens B, Burger J. 2018. The use of machine learning in official statistics.
  • Yung, W., T. Siu-Ming, B. Buelens, H. Chipman, F. Dumpert, G. Ascari, F. Rocci, J. Burger, and I. Choi. 2022. “A quality framework for statistical algorithms.” Statistical Journal of the IAOS 38: 291-308.

Other resources

  • Text classification for Ecoicop classification
  • machine learning introductory lecture Statistics Norway
  • Handbook of Statistical Data Editing and Imputation
  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics
  • Machine Learning Refined: Foundations, Algorithms, and Applications. Cambridge University Press

Papers and publications

  • Foundations and Advances of Machine Learning in Official Statistics. Florian Dumpert, Springer (2025)
  • Robust quasi-randomization-based estimation with ensemble learning for missing data. Scandinavian Journal of Statistics
  • A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications
  • Probability machines: consistent probability estimation using nonparametric learning machines.. Methods Inf Med.
  • Splitting on categorical predictors in random forests.. PeerJ.
  • A Score Function to Prioritize Editing in Household Survey Data: A Machine Learning Approach. Journal of Official Statistics
  • Let the data speak: Machine learning methods for data editing and imputation. UNECE / Work Session on Statistical Data Editing
  • Machine Learning tool for editing in the Italian Register of the Public Administration, a proposal. UNECE HLG-MOS Machine Learning Project
  • Data cleaning and machine learning: a systematic literature review. Autom Softw Eng
  • Moving towards the standardized process of automatic statistical data editing using machine learning techniques. UNECE Expert Meeting on Statistical Data Editing
  • Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets. UNECE Expert Meeting on Statistical Data Editing
  • Improving statistical data editing with Machine Learning: first use cases in Statistics Spain (INE). UNECE Expert Meeting on Statistical Data Editing
  • Automatic selective editing approach using machine learning: an application to VAT data. UNECE Expert Meeting on Statistical Data Editing

USGS

A selection of resources from the United States Geological Survey about use of R.

Beyond Basic R

  • The case for reproducibility
  • Beyond Basic R - Introduction and Best Practices
  • Beyond Basic R - Version Control with Git
  • Beyond Basic R - Mapping
  • Beyond Basic R - Plotting with ggplot2 and Multiple Plots in One Figure
  • Beyond Basic R - Data Munging
  • Beyond Basic R - Data Munging

Reproducible Data Science in R

  • Reproducible Data Science in R: Say the quiet part out loud with assertion tests
  • Reproducible Data Science in R: Flexible functions using tidy evaluation
  • Reproducible Data Science in R: Writing better functions
  • Reproducible Data Science in R: Writing functions that work for you
  • Reproducible Data Science in R: Iterate, don’t duplicate

Other topics

  • Duplicating Quarto elements with code templates to reduce copy and paste errors
  • Charting ‘tidycensus’ data with R
Synthetic Data
News
  • Edit this page
  • View source
  • Report an issue