How Transfer Learning Speeds Up AI/ML Development in Small Data Scenarios

Artificial Intelligence (AI) and Machine Learning (ML) have vast promise to remake businesses and industries. Unfortunately, to develop customized AI/ML models, one needs massive datasets, which are expensive and time-consuming to collect and annotate. Most companies are severely limited when trying to harness the power of AI/ML.

Table of Contents

In such cases, transfer learning comes to the rescue. This enables us to utilize knowledge in an existing model that has already been learned on large datasets. The knowledge can then be transferred to another model that needs to be trained with smaller datasets.

The Fundamentals of Transfer Learning

Transfer learning, in a nutshell, is to transfer learned features and knowledge from one model to another. This technique is widely applied in spd.tech/ai-ml-development-services, as many features learned from large datasets can also be used in similar models.

An example is that image recognition models learn to detect edges, shapes, textures, etc. We can use these learnings to train new models even with small datasets. The new models then only need to learn some specific features related to the narrow task they are supposed to handle.

Key Elements of Transfer Learning

Transfer learning involves two models – the base model and the target model:

Base Model – A large complex model trained on a very large dataset for a general task. For example, the ResNet model is trained on the ImageNet dataset for image recognition.
Target Model – A smaller model trained on a much smaller dataset but for a narrow specific task. For instance, a model can be used to identify manufacturing defects from 100 images.

Thus, the learned features and knowledge in the base model help the target model train with fewer data and resources yet achieve high accuracy because they benefit from them. Nothing changes with the base model.

There are two common techniques used:

Frozen Base Layers – The first few base layers of the base model, which make use of more general features, are frozen, as they won’t change with the image. The new dataset is only used to tune to more specific features in the later layers. If the new dataset is very small, this technique is used.
Fine-tuning – This means actually training along with the later layers at a slower rate. It is used when the new dataset is larger.

The pre-trained base model essentially provides a head start to the training process for the target model. This speeds up the overall development and lowers compute resource requirements.

The Small Data Challenge

While there are certain exceptions, most real-world ML problems deal with small datasets only. This could be due to various practical constraints:

Limited Accessibility – For niche domains, getting thousands or millions of hard-to-collect data samples may not be feasible. For example, cybersecurity threats, rare diseases, device failures etc., fall into this category.
Regulations & Privacy – In domains like healthcare financial services, users’ data cannot be freely collected and used due to regulations and privacy reasons.
Annotation Costs – Raw data collection is only the first step. It could even be even more expensive to annotate them to identify features, patterns and structures that are needed for ML training.
Massive Data Collection and Annotation – It may take a lot longer to gather and annotate massive datasets than it takes to develop a product, particularly for startups.

According to a Kaggle survey, around 49% of ML projects deal with less than 1,000 samples in their training datasets. And 70% of projects use less than 10,000 samples.

Building customized ML solutions with small datasets poses several challenges:

Overfitting due to lack of sufficient representative data
Lower prediction accuracy due to inadequate training
Increased bias as anomalies could skew small sample sizes unfairly
Lengthy manual feature engineering is required
Difficulty in deploying and maintaining models retrained frequently with new small data batches

These challenges either force compromise on capabilities or delay deployments.

How Transfer Learning Helps Small Data Models

The knowledge gained by base models from large datasets can help the target models in multiple ways, even when training data is scarce:

Faster Convergence – Rather than initialize weights randomly, transfer learning begins with learned and sufficiently relevant weight values, and convergence is realized much faster.
Better Generalization – When the pre-trained features are used, they act as regularizers to prevent the model from getting too close to the small datasets and overfitting.
Reduced Compute – With faster convergence and fewer training epochs, the training times and compute/hardware requirements are lowered significantly. According to research from Uber, transfer learning results in 87% less training time than training from scratch.
Enhanced Performance – Transfer learning models can achieve far superior performance with 50-100 times less data compared to custom models as per benchmarks. For the same performance bar, the data requirement is greatly reduced.
Less Manual Engineering – Ready-made pre-trained models reduce tedious and specialized feature engineering and model designing tasks, allowing focus on target use cases.

The practical benefits of transfer learning with small data allow for tackling more meaningful real-world problems with AI/ML at lower costs.

Real-World Examples and Impact

Many leading AI solutions today have benefited from transfer learning in small data scenarios spanning across domains:

Healthcare

Startup Paige used transfer learning to detect cancer tumors from a few hundred pathology images to train their diagnosis model. This level of accuracy was not possible without leveraging existing knowledge.

Industrial Automation

Siemens has created visual quality inspection solutions based on artificial intelligence that greatly increase manufacturing process flaw detection. Siemens, along with Basler and MVTec, combines machine vision into manufacturing automation to provide solutions enabling customers to employ artificial intelligence to identify structural and logical flaws on objects, hence improving the quality of components.

Financial Services

AI startup Upstart built a highly accurate credit risk model trained on just 1,600 loan applicant records by leveraging knowledge from a large public dataset. Without transfer learning, they would have required data on millions of applicants to build such a performant model.

Autonomous Vehicles

Tesla leverages transfer learning techniques by utilizing labeled road data from one geography to train autonomous vehicle models for new geographies with different road conditions, layouts and driving habits. This fast track launches across multiple cities.

These examples validate how transfer learning unlocks AI/ML potential for small and medium-sized organizations, even if they have limited data.

Today, transfer learning has become an easily adoptable technique due to pre-trained models and readily available open-source algorithms. Democratizing access to transfer learning capabilities to build custom solutions is also being done through Google, Amazon, Microsoft and startups like Algorithmia.

As per a survey of ML practitioners and experts by Algorithmia, transfer learning could provide cost savings of up to 75% compared to developing models from scratch. For complex perception tasks like image, video and speech recognition, the savings could be as high as 90%, indicating massive productivity potential.

The Road Ahead

Innovation in ML models and techniques, coupled with exponential growth in data and compute infrastructure, is fast-tracking further advancement.

With the growing adoption of transfer learning, expect new commercial as well as open-source pre-trained models to be available. Cloud platforms are also making it easier to leverage transfer learning. These factors will expand the reach of transfer learning across a wider spectrum of the industry.

In the future, techniques like Federated Learning and TinyML could complement transfer learning to provide additional solutions for constrained data scenarios across edge devices. Automated Machine Learning (AutoML) platforms can also simplify transfer learning to benefit non-experts directly.

As 5G and IoT spread, there is growing potential for cross-industry and cross-domain transfer learning. For instance, a manufacturing model could gain insights from an agriculture model and vice versa.

In a nutshell, transfer learning is set to accelerate AI/ML innovation by magnifying the value of data. Even small and mid-sized organizations can punch above their weight to stay competitive. With rising industry adoption, transfer learning represents an important milestone to unlock inclusive and scalable AI.

Conclusion

Transfer learning allows us to develop high-accuracy AI/ML models without requiring big datasets and large training. The productivity gains it unlocks for small data scenarios we see a large majority of the ML teams face today are huge. Transfer learning is applied along with real-world case studies that show 2x to 5x gains in model accuracy and cost and time reduction. Its adoption is further boosted by cloud platforms, open-source libraries and AutoML tools. Transfer learning is definitely still evolving, but it is a breakthrough that brings AI/ML to a point where it is becoming more accessible. That’s a step forward for wider AI proliferation down the road.

Daniel Raymond

Daniel Raymond, a project manager with over 20 years of experience, is the former CEO of a successful software company called Websystems. With a strong background in managing complex projects, he applied his expertise to develop AceProject.com and Bridge24.com, innovative project management tools designed to streamline processes and improve productivity. Throughout his career, Daniel has consistently demonstrated a commitment to excellence and a passion for empowering teams to achieve their goals.