Open in app

Sign in

Write

Sign in

Siddharth Sharma
Siddharth Sharma

24 Followers

Home

About

Nov 22

AWS Blog In Collaboration With Nvidia — Optimizing Inference For Seq2Seq And Encoder Only Models Using Nvidia GPU And Triton Model Server

Posted on November 22, 2023 by Siddharth Sharma Blurb: Deep Learning Transformer models are complex in architecture and can have hundreds of millions (or even billions) of parameters, which leads to slow real time inference. …

Nvidia

1 min read

Nvidia

1 min read


Nov 22

Compressing LLMs With Low Rank Decomposition Of Attention Matrices

Colab Link To Reproduce Experiment: LLM Compression Via Low Rank Decomposition.ipynb Models Used: Flan-5-Base, Lan-T5- Context A neural network contains many dense layers which perform matrix multiplication. In the case of Transformers, Attention module has Key, Query, Value and Output matrices (along with the FF layer) that are have typically full…

Llm

5 min read

Compressing LLMs With Low Rank Decomposition Of Attention Matrices
Compressing LLMs With Low Rank Decomposition Of Attention Matrices
Llm

5 min read


Apr 21

Summary Of Adapter Based Performance Efficient Fine Tuning (PEFT) Techniques For Large Language Models

The two most common transfer learning techniques in NLP were feature-based transfer (generating input text embedding from a pre-trained large model and using it as a feature in your custom model) and fine-tuning (fine tuning the pre-trained model on custom data set). It is notoriously hard to fine tune Large…

Machine Learning

5 min read

Summary Of Adapter Based Performance Efficient Fine Tuning (PEFT) Techniques For Large Language…
Summary Of Adapter Based Performance Efficient Fine Tuning (PEFT) Techniques For Large Language…
Machine Learning

5 min read


Jan 19

Neural Ranking Architectures

Glimpses On Implicit/Explicit, Dense/Sparse, Gated/Non Gated, Low Rank And Many More Layered Interactions Neural ranking models are the most important component in multi stage retrieval and ranking pipeline. Whether it is e-commerce search, ads targeting, music search or browse feed ranking, ranking model will have the final say in selecting…

Machine Learning

8 min read

Neural Ranking Architectures
Neural Ranking Architectures
Machine Learning

8 min read


Jan 14

Anatomy Of A Model Inference Service

Context : This document discusses the high level architecture and components required to create a model prediction service. Here we won’t be discussing and comparing particular frameworks in detail. The underlined intention is to provide a holistic view of model serving pipelines, APIs needed and complexities involved. This document would help the…

Machine Learning

8 min read

Anatomy Of A Model Inference Service
Anatomy Of A Model Inference Service
Machine Learning

8 min read


Jan 13

Feature Fusion For The Uninitiated

Consider a typical e-commerce product. It would have a variety of content specific features like product title, brand, thumbnail etc and other engagement driven features like number of clicks, click through rate etc. Any machine learning model ingesting features of this product(e.g. product ranker, recommendation model etc.) would have to…

Machine Learning

7 min read

Feature Fusion For The Uninitiated
Feature Fusion For The Uninitiated
Machine Learning

7 min read


Jan 25, 2021

Search Query Understanding

Introduction: The journey of a search query through e-commerce engineering stack can be broadly divided into following phases, search query text processing phase, retrieval phase where relevant products are fetched from indexer and the last but not the least, product re-ranking phase where a machine learning ranking engine re sorts the…

Query Understanding

12 min read

Search Query Understanding
Search Query Understanding
Query Understanding

12 min read


May 9, 2016

Of Bandits And Bidding

Real-time bidding(RTB) refers to the buying and selling of online ad impressions through real-time auctions that occur in the time it takes a webpage to load. Those auctions are often facilitated by ad exchanges or supply side platforms(SSPs). A RTB agent has a set of active ad campaigns each with…

Advertising

14 min read

Of Bandits And Bidding
Of Bandits And Bidding
Advertising

14 min read

Siddharth Sharma

Siddharth Sharma

24 Followers

Machine Learning Tech Lead Amazon https://www.linkedin.com/in/siddharth-sharma-31140210/

Following
  • ODSC - Open Data Science

    ODSC - Open Data Science

  • AI2

    AI2

  • Jerry Liu

    Jerry Liu

  • Dariusz Gross #DATAsculptor

    Dariusz Gross #DATAsculptor

  • NYU Center for Data Science

    NYU Center for Data Science

See all (120)

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams