Photo by Jason Blackeye on Unsplash
Here you are with a dataset indexed by time stamps. Your data might be about storage demand and supply, and you are tasked with predicting the ideal replenishment intervals for a strategic product. Or maybe you need to translate historical sales information into key actionable insights for your team. Perhaps your data is financial, with information about historical interest rates and a selection of stock prices.
Colosseum by Hank Paul on Unsplash Metric learning for instance recognition and information retrieval is a technique that has been widely implemented across multiple fields. It is a concept that is highly relevant to novel applications in research, such as the latest AI breakthrough in biology [2] with AlphaFold [11] by DeepMind, and also mature and well-proven to see vast implementation in the industry, from contextual information retrieval in Google Search [12], to image similarity for face recognition [7], that you might use every day to unlock your phone.
On the left: Wildfire — Photo by Mike Newbry. On the center: Tropical Storm — Photo by Jeffrey Grospe. On the right: Pandemic Dashboard — Photo by Martin Sanchez. Original images on Unsplash
Cluster analysis as an unsupervised learning technique is widely implemented throughout many fields of data science. When applied to data suited for hierarchical or partitional clustering, it can provide valuable insights into latent groups of the dataset and further improve your understanding of key features that can describe and classify individuals into meaningful clusters for your use case.
On the left: Canada Rent Rankings — May 2022. Report summary by Rentals.ca. On the right: Preprocessed image with cluster-defined table layout.
A crucial step in document parsing and recognition tasks, extracting table data from image and pdf files has been a widely explored problem with its own challenges. While working on a small personal project, I dived deep into it to discover a wide range of solutions with varying complexity.