Member-only story

Embeddings : Data to Numbers

14 min readMar 27, 2024

(You can find the Korean version of the post at this link.)

This post marks the conclusion of our series on the topic of “Representation”. Throughout this journey, we delved into 1) Representation Learning, 2) Sequence-to-Sequence Learning, and 3) Hands-on practice with Sequence-to-Sequence models.

Whether developing artificial intelligence or other ICT software, our starting point must always be encapsulating the world in a format that computers can process. This means transforming any form of data into numbers. The data we refer to here encompasses not just text and images we encounter daily, but also chemical formulas, Excel files, weather conditions, and even stock market trends over 50 years — virtually all types and forms of data. To encapsulate this process, we can aptly describe it as “Data to Numbers”.

This method is known by several other names as well,

Encoding
Embedding
Feature Extraction
Vectorization
…

The concept of “Data to Numbers” is not exclusively tied to recent technological advancements. While neural network-based techniques are frequently highlighted nowadays, various forms of technology have existed even before the advent of deep learning. In this post, we aim to…

Embeddings : Data to Numbers

Create an account to read the full story.

Written by Hugman Sangkeun Jung

No responses yet

More from Hugman Sangkeun Jung

Transformer의 큰 그림 이해: 기술적 복잡함 없이 핵심 아이디어 파악하기

트랜스포머의 핵심 아이디어를 이해하세요. 코딩 및 복잡한 이론 없이, 트랜스포머의 핵심만 이해하시고 바로 응용하세요.

Mastering LLama — Rotary Positional Embedding (RoPE) 이해하기

RoPE 심층 분석: 언어 모델의 고급 위치 인코딩 기술

가장 성공적인 트랜스포머의 변형: BERT와 GPT 소개

자기지도 학습과 독특한 구조를 활용한 Transformer 의 변형 구조인 BERT와 GPT에 대해 살펴봅니다.

Autoencoder 와 Variational Autoencoder의 직관적인 이해

오토인코더와 변분 오토인코더의 구조, 잠재 공간, 생성 학습에서의 활용 방법을 알아보세요. 실습 예제 포함.

Recommended from Medium

Mastering LLama —Understanding Rotary Positional Embedding (RoPE)

Deep Dive into RoPE: Advanced Positional Encoding in Language Models

Comparison Between CLIP and BLIP Models

In recent years, vision-language models like CLIP (Contrastive Language-Image Pretraining)¹ and BLIP (Bootstrapped Language-Image…

Understanding Embedding Models in the Context of Large Language Models

Large Language Models (LLMs) like GPT, BERT, and similar architectures have revolutionized the field of natural language processing (NLP)…

RoPE: A Detailed Guide to Rotary Position Embedding in Modern LLMs

Rotary Position Embedding (RoPE) has been widely applied in recent large language models (LLMs) to encode positional information, including…

Papers Explained 294: Multi-LLM Text Summarization

The multi-LLM summarization framework has two fundamentally important steps at each round of conversation: generation and evaluation. These…

Anthropic drops an amazing report on LLM interpretability

Circuit Tracing: Revealing Computational Graphs in Language Models: