Marek Suppa
Serverless deployment of (large) NLP models
#1about 9 minutes
Exploring practical NLP applications at Slido
Several NLP-powered features are used to enhance user experience, including keyphrase extraction, sentiment analysis, and similar question detection.
#2about 4 minutes
Choosing serverless for ML model deployment
Serverless was chosen for its ease of deployment and minimal maintenance, but it introduces challenges like cold starts and strict package size limits.
#3about 8 minutes
Shrinking large BERT models for sentiment analysis
Knowledge distillation is used to train smaller, faster models like TinyBERT from a large, fine-tuned BERT base model without significant performance loss.
#4about 8 minutes
Building an efficient similar question detection model
Sentence-BERT (SBERT) provides an efficient alternative to standard BERT for semantic similarity, and knowledge distillation helps create smaller, deployable versions.
#5about 3 minutes
Using ONNX Runtime for lightweight model inference
The large PyTorch library is replaced with the much smaller ONNX Runtime to fit the model and its dependencies within AWS Lambda's package size limits.
#6about 3 minutes
Analyzing serverless ML performance and cost-effectiveness
Increasing allocated RAM for a Lambda function improves inference speed, potentially making serverless more cost-effective than a dedicated server for uneven workloads.
#7about 3 minutes
Key takeaways for deploying NLP models serverlessly
Successful serverless deployment of large NLP models requires aggressive model size reduction, lightweight inference libraries, and an understanding of the platform's limitations.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
22:15 MIN
Q&A on monoliths, serverless, and specific use cases
Why you shouldn’t build a microservice architecture
36:27 MIN
Project learnings and future development opportunities
Leverage Cloud Computing Benefits with Serverless Multi-Cloud ML
14:15 MIN
Delivering customizations via decoupled ML models
Building the platform for providing ML predictions based on real-time player activity
07:13 MIN
Identifying the key challenges of serverless functions
Fun with PaaS – How to use Cloud Foundry and its uniqueness in creative ways
12:31 MIN
Practical use cases for serverless architectures
Serverless: Past, Present and Future
13:39 MIN
Why AI agents require modern serverless infrastructure
Postgres in the Age of AI (and Devin)
06:38 MIN
Identifying the key challenges of serverless functions
The Future of Cloud is WebAssembly
46:14 MIN
Audience Q&A on serverless IoT development
Building your way to a serverless powered IOT Buzzwire game
Featured Partners
Related Videos
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
Aarno Aukia
Leverage Cloud Computing Benefits with Serverless Multi-Cloud ML
Linda Mohamed
Unveiling the Magic: Scaling Large Language Models to Serve Millions
Patrick Koss
End the Monolith! Lessons learned adopting Serverless
Nočnica Fee
Multilingual NLP pipeline up and running from scratch
Kateryna Hrytsaienko
Self-Hosted LLMs: From Zero to Inference
Roberto Carratalá & Cedric Clyburn
From ML to LLM: On-device AI in the Browser
Nico Martin
Serverless: Past, Present and Future
Oliver Arafat
Related Articles
View all articles.gif?w=240&auto=compress,format)
.gif?w=240&auto=compress,format)
.png?w=240&auto=compress,format)

From learning to earning
Jobs that call for the skills explored in this talk.

AI Systems and MLOps Engineer for Earth Observation
Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning

Machine Learning Engineer - Large Language Models (LLM) - Startup
Startup
Charing Cross, United Kingdom
PyTorch
Machine Learning

Machine Learning Engineer | NLP | AWS Barcelona · Híbrido
Tecdata
Barcelona, Spain
Intermediate
Machine Learning
Amazon Web Services (AWS)




Deep Learning Engineer for Language Technologies (RE2)
Barcelona Supercomputing Center
Barcelona, Spain
Intermediate
Python
PyTorch
Machine Learning

Machine Learning Performance Engineer, London
Isomorphic Labs
Charing Cross, United Kingdom
£62K
Machine Learning

MLOps Engineer (Kubernetes, Cloud, ML Workflows)
FitNext Co
Charing Cross, United Kingdom
Remote
Intermediate
DevOps
Python
Docker
Grafana
+6