TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

MultiChoice Question Answering In HuggingFace

Unveiling the power of question answering

Mina Ghashami
TDS Archive
Published in
15 min readFeb 7, 2024

Image from unsplash.com

Natural language processing techniques are demonstrating immense capability on question answering (QA) tasks. In this post, we leverage the HuggingFace library to tackle a multiple choice question answering challenge.

Specifically, we fine-tune a pre-trained BERT model on a multi-choice question dataset using the Trainer API. This allows adapting the powerful bidirectional representations from pre-trained BERT to our target task. By adding a classification head, the model learns textual patterns that help determine the correct choice out of a set of answer options per question. We then evaluate performance using accuracy across the held-out test set.

The Transformer framework allows quickly experimenting with different model architectures, tokenizer options, and training approaches. In this analysis, we demonstrate a step by step recipe for achieving competitive performance on multiple choice QA through HuggingFace Transformers.

First step: Install and Import libraries

The first step is to install and import the libraries. To install the libraries use pip install command as following:

!pip install datasets transformers[torch] --quiet

and then import the necessary libraries:

import numpy as np
import pandas as pd
import os
import json
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import (
AutoTokenizer,
Trainer,
TrainingArguments,
set_seed,
DataCollatorWithPadding,
DefaultDataCollator
)
from datasets import load_dataset, load_metric
from dataclasses import dataclass, field
from typing import Optional, Union

Second step: Load the dataset

In the second step, we load the train and test dataset. We use codah dataset which is available for commercial use and is licensed by “odc-by”[1]

from datasets import load_dataset

codah = load_dataset("codah", "codah")

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Mina Ghashami

Written by Mina Ghashami

Applied Scientist @Amazon AWS | Adjunct lecturer@NYU | Previously, Adjunct lecturer@Stanford

Responses (3)

Write a response

Great article! Thank you for taking the time to share this! This may be a basic question but what is the difference between using "AutoModelForMultipleChoice" vs "BertForMultipleChoice"?

1

Very insightful!

Would appreciate it if you provide a colab notebook.