Andrés Isaza-Giraldo, Manuel Pereira, Rafael Candeias, Lucas Pereira
ICoWEFS 2023
Publication year: 2023

Abstract

This paper proposes a methodology for visualizing satellite-based machine learning (ML) datasets to understand the visual components that will be used as inputs for developing ML models. The proposed methodology uses t-Distributed Stochastic Neighbor Embedding (T-SNE) methods to create visualizations of satellite images leveraging models that were pre-trained in ImageNet. T-SNE is a self-supervised learning tool used to transform high-dimensional spaces into two- or three-dimension embeddings, making it easier to visualize a broad dataset in a single image or space. The methodology is demonstrated using the LUCAS Copernicus dataset with satellite images from Sentinel-2. The dataset was constructed using the TerraSense Toolkit (TSTK) and information from the LUCAS Survey, an effort of the European Soil Data Centre. The T-SNE visualization tool aims to improve ML research by providing a clearer understanding of satellite-based datasets.