Useful links
The following links lead to external trainings, tutorials, projects, documents, and papers related to topics covered by Working Packages of AIML4OS project
Earth Observation
- Introduction to Machine Learning for Earth Observation
- The OECD Laboratory for Geospatial Analysis
- ESA’s Newcomers Earth Observation Guide | Eurostat CROS
- Copernicus Data Space Ecosystem | Europe’s eyes on Earth
- EUSPA - European Union Agency for the Space Programme (requires login)
- Copernicus MOOC – Learn to harness the power of space data
- Basics of Optical Remote Sensing – EO4GEO
- Image processing and analysis – EO4GEO
- FAO Webinar Series: Earth observation data for agricultural statistics
- ESA - EO science for society
- 12th ESA Training Course on Earth Observation 2022
- Space4Climate (UK Agency) activities enable a seamless supply chain of climate data from space assets; helping to identify end user requirements and facilitate trusted climate services development to meet these, promoting global economic and societal benefit.
EO resources without links
- 2021_ EO_Use of satellite data - file missions, file indices 1 and 2
- From strategies to practical use of Earth observation data for official statistics (2023)
- Statistical Cartography (2024)
- Integration of statistics and geospatial information – From geocoding to statistical maps (2024)
- Advanced Earth observation (2024)
- Usage of the Copernicus data space ecosystem for earth observation data for official statistics (2025)
- Educational materials from the 12th ESA Training Course on Earth Observation 2022 held in Riga, Latvia, from 27 June – 01 July 2022.
Programming
- Insee - Best programming practices with Git and R
- Lino Galiana from Insee - Data science with Python
- Insee - Introduction to MLOps with MLflow (slides)
- Insee - Putting data science projects into production
- Insee - Introduction to ensemble algorithms
- Python Data Science Handbook
- R for Data Science
- Standford Machine Learning Notes
- An Introduction to Statistical Learning with Applications in R
- UNECE - Machine Learning for Official Statistics
- Puts, Daas - Machine Learning from the Perspective of Official Statistic
- UNECE - A quality framework for statistical algorithms
- A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation
- UNECE - Organisational aspects of implementing ML based data editing in statistical production
- UNECE – HLG-MOS Machine Learning Project Edit and Imputation Theme Report
Quality
- Van Delden, A., J. Burger, and M. Puts. 2023. ‘‘Ten Propositions on Machine Learning in Official Statistics.’’ AStA Wirtschafts- und Sozialstatistisches Archiv 17 (3): 195–221. DOI: https://doi.org/10.1007/s11943-023-00330-0
- Kowarik, A., et al (2020). “Deliverable K3: Revised Version of the Quality Guidelines for the Acquisition and Usage of Big Data”, Work Package K Methodology and quality, ESSnet Big Data II
- Reinert, R., et al (2016). “Work Package 1: Checklist for Evaluating the Quality of Input Data”, ESSnet KOMUSO
- De Waal, T., et al (2019). “Quality measures for multisource statistics.” Statistical Journal of the IAOS 35.2
- Kowarik, A., et al (2025). “Deliverable 4.5: Quality Guidelines for acquiring and using web scraped data”, Work Package 4 Methodology and quality, ESSnet Trusted Smart Statistcs – Web IntelligenceNetwork
- AI Act: High Level Summary
- Piela R., (2024). “Incorporating AI into Statistical Standards: Enhancing GSBPM with (Generative) AI” UNECE|ModernStats World Workshop, October 2024
- GPAI models gudelines
- Saidani, Y., Dumpert, F., Borgs, C., Brand, A., Nickl, A., Rittmann, A., Rohde, J., Salwiczek, C., Storfinger, N., and Straub, S. 2023. “Qualitätsdimensionen maschinellen Lernens in der amtlichen Statistik.” AStA Wirtschafts- und Sozialstatistisches Archiv 17: 253–303. DOI: https://doi.org/10.1007/s11943-023-00329-7
- Y. Saidani, F. Dumpert, Quality dimensions and quality guidelines for machine learning in official statistics, in Foundations and Advances of Machine Learning in Official Statistics, F.Dumpert, Chap.4 (Springer, Berlin, 2025). DOI: https://doi.org/10.1007/978-3-032-10004-7_4
- United Nations Economic Commission for Europe (UNECE) (2021) Machine learning for official statistics
- Yung W, Karkimaa J, Scannapieco M, Barcarolli G, Zardetto D, Sanchez JAR, Braaksma B, Buelens B, Burger J. 2018. The use of machine learning in official statistics.
- Yung, W., T. Siu-Ming, B. Buelens, H. Chipman, F. Dumpert, G. Ascari, F. Rocci, J. Burger, and I. Choi. 2022. “A quality framework for statistical algorithms.” Statistical Journal of the IAOS 38: 291-308.
Other resources
- Text classification for Ecoicop classification
- machine learning introductory lecture Statistics Norway
- Handbook of Statistical Data Editing and Imputation
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics
- Machine Learning Refined: Foundations, Algorithms, and Applications. Cambridge University Press
Papers and publications
- Foundations and Advances of Machine Learning in Official Statistics. Florian Dumpert, Springer (2025)
- Robust quasi-randomization-based estimation with ensemble learning for missing data. Scandinavian Journal of Statistics
- A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications
- Probability machines: consistent probability estimation using nonparametric learning machines.. Methods Inf Med.
- Splitting on categorical predictors in random forests.. PeerJ.
- A Score Function to Prioritize Editing in Household Survey Data: A Machine Learning Approach. Journal of Official Statistics
- Let the data speak: Machine learning methods for data editing and imputation. UNECE / Work Session on Statistical Data Editing
- Machine Learning tool for editing in the Italian Register of the Public Administration, a proposal. UNECE HLG-MOS Machine Learning Project
- Data cleaning and machine learning: a systematic literature review. Autom Softw Eng
- Moving towards the standardized process of automatic statistical data editing using machine learning techniques. UNECE Expert Meeting on Statistical Data Editing
- Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets. UNECE Expert Meeting on Statistical Data Editing
- Improving statistical data editing with Machine Learning: first use cases in Statistics Spain (INE). UNECE Expert Meeting on Statistical Data Editing
- Automatic selective editing approach using machine learning: an application to VAT data. UNECE Expert Meeting on Statistical Data Editing
USGS
A selection of resources from the United States Geological Survey about use of R.
Beyond Basic R
Reproducible Data Science in R
- Reproducible Data Science in R: Say the quiet part out loud with assertion tests
- Reproducible Data Science in R: Flexible functions using tidy evaluation
- Reproducible Data Science in R: Writing better functions
- Reproducible Data Science in R: Writing functions that work for you
- Reproducible Data Science in R: Iterate, don’t duplicate