Member-only story

MultiChoice Question Answering In HuggingFace

Unveiling the power of question answering

Mina Ghashami

Published in

TDS Archive

15 min readFeb 7, 2024

Natural language processing techniques are demonstrating immense capability on question answering (QA) tasks. In this post, we leverage the HuggingFace library to tackle a multiple choice question answering challenge.

Specifically, we fine-tune a pre-trained BERT model on a multi-choice question dataset using the Trainer API. This allows adapting the powerful bidirectional representations from pre-trained BERT to our target task. By adding a classification head, the model learns textual patterns that help determine the correct choice out of a set of answer options per question. We then evaluate performance using accuracy across the held-out test set.

The Transformer framework allows quickly experimenting with different model architectures, tokenizer options, and training approaches. In this analysis, we demonstrate a step by step recipe for achieving competitive performance on multiple choice QA through HuggingFace Transformers.

First step: Install and Import libraries

The first step is to install and import the libraries. To install the libraries use pip install command as following:

!pip install datasets transformers[torch] --quiet

and then import the necessary libraries:

import numpy as np
import pandas as pd
import os
import json
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import (
    AutoTokenizer,
    Trainer,
    TrainingArguments,
    set_seed,
    DataCollatorWithPadding,
    DefaultDataCollator
)
from datasets import load_dataset, load_metric
from dataclasses import dataclass, field
from typing import Optional, Union

Second step: Load the dataset

In the second step, we load the train and test dataset. We use codah dataset which is available for commercial use and is licensed by “odc-by”[1]

from datasets import load_dataset

codah = load_dataset("codah", "codah")

TDS Archive

MultiChoice Question Answering In HuggingFace

Unveiling the power of question answering

First step: Install and Import libraries

Second step: Load the dataset

Create an account to read the full story.

Published in TDS Archive

Written by Mina Ghashami

Responses (3)

More from Mina Ghashami and TDS Archive

Understanding Positional Embeddings in Transformers: From Absolute to Rotary

A deep dive into absolute, relative, and rotary positional embeddings with code examples

5 AI Projects You Can Build This Weekend (with Python)

From beginner-friendly to advanced

GenAI with Python: Build Agents from Scratch (Complete Tutorial)

with Ollama, LangChain, LangGraph (No GPU, No APIKEY)

The Training Pipeline of Large Language Models

From pre-training to alignment to downstream training

Recommended from Medium

Fine Tune Large Language Model (LLM) on a Custom Dataset with QLoRA

The field of natural language processing has been revolutionized by large language models (LLMs), which showcase advanced capabilities and…

Fine-Tune Smaller Transformer Models: Text Classification

Using Microsoft’s Phi-3 to generate synthetic data

Your First Hands-On Lesson On Using AlphaFold

A practical lesson on folding proteins using AlphaFold (and more!)

SmolDockling — Hugging Face’s Tiny OCR & Document Understanding Model

In a world obsessed with scaling LLMs to 70 billion parameters, Hugging Face did something wild — they went small. Really small. Enter…

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Fine-Tuning Models with Hugging Face: A Step-by-Step Guide 🚀

Introduction