OpenAI API で Computer use を実装する

npaka

2025年4月5日 07:05

「OpenAI API」で「Computer use」を実装する方法をまとめました。

前回

1. Computer Using Agent Sample App

OpenAIが提供している「Computer use」の実装のサンプルコードです。これを実行してから、コードを読んでみます。

【注意】「Computer use」はプレビュー段階です。悪用や不注意によるミスの影響を受ける可能性があるため、認証された環境や重要なタスクでは信頼しないことをお勧めします。

2. cli.pyの実行

「cli.py」は、「Computer Using Agent Sample App」のサンプルコードの1つで、CUAを使いはじめる最も簡単な方法になります。

(1) ローカルPCでPython仮想環境を準備。

(2) ローカルPCの環境変数「OPENAI_API_KEY」にAPIキーを指定。

・Mac

export OPENAI_API_KEY=sk-...

・Windows

set OPENAI_API_KEY=sk-...

(3) パッケージのインストール。

git clone https://github.com/openai/openai-cua-sample-app.git
cd openai-cua-sample-app
pip install -r requirements.txt

(4) Playwrightのインストール。
今回は、「local-playwright」環境で「Computer use」を使います。「local-playwright」は、ローカル環境でPlaywrightを用いてブラウザ操作を自動化する仕組みのことです。Agentがローカルマシン上のPlaywrightブラウザを通じて、Web操作を行います。

playwright install

(5) cli.pyの実行。

openai-cua-sample-app % python cli.py --computer local-playwright --input "ぼっち・ざ・ろっくのWebサイトを探してください。"

New page created
screenshot({})
click({'button': 'left', 'x': 393, 'y': 185})
type({'text': 'ぼっち・ざ・ろっく'})
keypress({'keys': ['ENTER']})
wait({})
scroll({'x': 524, 'y': 699, 'scroll_x': 0, 'scroll_y': 483})
click({'button': 'left', 'x': 346, 'y': 593})
wait({})
New page created
I found the official website for "ぼっち・ざ・ろっく!" listed in the search results. Should I visit it?
>

Computer use のサンプルコード openai-cua-sample-app を試す。
computer-use-previewによる1ターンの推論時間がOperatorより遅い気がする。 pic.twitter.com/UkwOySlxVw
— 布留川英一 / Hidekazu Furukawa (@npaka123) April 4, 2025

3. cli.pyのパラメータ

cli.pyのパラメータは、次のとおりです。

--computer : 使用するComputer環境 (デフォルトはlocal-playwright)
--input : Agentへの初期入力 (未指定の場合は、CLIが入力を求める)
--debug : デバッグモードの有効化
--show : 実行中に画像 (スクリーンショット) を表示。
--start-url : ブラウザの初期ページURL (デフォルトはhttps://bing.com)

3. コードの解説

3-1. ファイル構成

ファイル構成は、次のとおりです。

・openai-cua-sample-app
　・main.py : メイン
　　・agent : Agent
　　　・agent.py
　　・computers : Computer
　　　・base_playwright.py
　　　・browserbase.py
　　　・computer.py
　　　・docker.py
　　　・local_playwright.py
　　　・scrapybara.py
　・utils.py : ユーティリティ

3-2. main.py

メインループを実装しています。

(1) ユーザー入力を受け取る。
(2) Agentのフルターン(Assistantメッセージが得られるまでループ)を実行。
(3) アプリ終了までループ

from agent.agent import Agent
from computers import LocalPlaywrightComputer

# メインの定義
def main(user_input=None):
    with LocalPlaywrightComputer() as computer:
        # Agentとメッセージリストの準備
        agent = Agent(computer=computer)
        items = []

        # メインループ
        while True:
            # ユーザー入力
            user_input = input("> ")
            items.append({"role": "user", "content": user_input})

            # フルターンの実行
            output_items = agent.run_full_turn(items, debug=True, show_images=True)
            items += output_items

# メインの実行
if __name__ == "__main__":
    main()

3-3. agent.py

Agentループを実装しています。

(1)「computer-use-preview」の推論を実行
(2) 推論結果をメッセージリストに追加。
(3) Computerを操作してスクリーンショットをメッセージリストに追加
(4) Assistantメッセージが得られるまでループ

from computers import Computer
from utils import (
    create_response,
    show_image,
    pp,
    sanitize_message,
    check_blocklisted_url,
)
import json
from typing import Callable

# Agent
class Agent:
    # 初期化
    def __init__(
        self,
        model="computer-use-preview",
        computer: Computer = None,
        tools: list[dict] = [],
        acknowledge_safety_check_callback: Callable = lambda: False,
    ):
        self.model = model
        self.computer = computer
        self.tools = tools
        self.print_steps = True
        self.debug = False
        self.show_images = False
        self.acknowledge_safety_check_callback = acknowledge_safety_check_callback

        if computer:
            self.tools += [
                {
                    "type": "computer-preview",
                    "display_width": computer.dimensions[0],
                    "display_height": computer.dimensions[1],
                    "environment": computer.environment,
                },
            ]

    # デバッグ出力
    def debug_print(self, *args):
        if self.debug:
            pp(*args)

    # 各項目を処理
    def handle_item(self, item):
        # メッセージを標準出力
        if item["type"] == "message":
            if self.print_steps:
                print(item["content"][0]["text"])

        # Function Callingを実行してその結果を返す
        if item["type"] == "function_call":
            name, args = item["name"], json.loads(item["arguments"])
            if self.print_steps:
                print(f"{name}({args})")

            # FunctionがComputer上に存在する場合はそれを呼び出す
            if hasattr(self.computer, name):
                method = getattr(self.computer, name)
                method(**args)
            return [
                {
                    "type": "function_call_output",
                    "call_id": item["call_id"],
                    "output": "success",
                }
            ]

        # Computerを操作してスクリーンショットを返す
        if item["type"] == "computer_call":
            action = item["action"]
            action_type = action["type"]
            action_args = {k: v for k, v in action.items() if k != "type"}
            if self.print_steps:
                print(f"{action_type}({action_args})")

            method = getattr(self.computer, action_type)
            method(**action_args)

            screenshot_base64 = self.computer.screenshot()
            if self.show_images:
                show_image(screenshot_base64)

            # ユーザーがすべてのセーフティチェックに応答しない場合はエラーで終了
            pending_checks = item.get("pending_safety_checks", [])
            for check in pending_checks:
                message = check["message"]
                if not self.acknowledge_safety_check_callback(message):
                    raise ValueError(
                        f"Safety check failed: {message}. Cannot continue with unacknowledged safety checks."
                    )

            call_output = {
                "type": "computer_call_output",
                "call_id": item["call_id"],
                "acknowledged_safety_checks": pending_checks,
                "output": {
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{screenshot_base64}",
                },
            }

            # ブラウザ環境向けの追加のURLセーフティチェック
            if self.computer.environment == "browser":
                current_url = self.computer.get_current_url()
                check_blocklisted_url(current_url)
                call_output["output"]["current_url"] = current_url

            return [call_output]
        return []

    # フルターンの実行
    def run_full_turn(
        self, input_items, print_steps=True, debug=False, show_images=False
    ):
        self.print_steps = print_steps
        self.debug = debug
        self.show_images = show_images
        new_items = []

        # Assistantメッセージが得られるまでループ
        while new_items[-1].get("role") != "assistant" if new_items else True:
            self.debug_print([sanitize_message(msg) for msg in input_items + new_items])

            # 「computer-use-preview」の推論を実行
            response = create_response(
                model=self.model,
                input=input_items + new_items,
                tools=self.tools,
                truncation="auto",
            )
            self.debug_print(response)

            if "output" not in response and self.debug:
                print(response)
                raise ValueError("No output from model")
            else:
                # 推論結果をメッセージリストに追加
                new_items += response["output"]

                # Computerを操作してスクリーンショットをメッセージリストに追加
                for item in response["output"]:
                    new_items += self.handle_item(item)

        return new_items

3-4. util.py

ユーティリティ関数を提供します。create_response()でResponses APIで「computer-use-preview」の推論を実行しています。

import os
import requests
from dotenv import load_dotenv
import json
import base64
from PIL import Image
from io import BytesIO
import io
from urllib.parse import urlparse

load_dotenv(override=True)

# ブロックリスト
BLOCKED_DOMAINS = [
    "maliciousbook.com",
    "evilvideos.com",
    "darkwebforum.com",
    "shadytok.com",
    "suspiciouspins.com",
    "ilanbigio.com",
]

# JSON出力
def pp(obj):
    print(json.dumps(obj, indent=4))

# 画像の表示
def show_image(base_64_image):
    image_data = base64.b64decode(base_64_image)
    image = Image.open(BytesIO(image_data))
    image.show()

# 画像サイズの取得
def calculate_image_dimensions(base_64_image):
    image_data = base64.b64decode(base_64_image)
    image = Image.open(io.BytesIO(image_data))
    return image.size

# computer_call_output メッセージの場合、image_url を省略したメッセージのコピーを返す
def sanitize_message(msg: dict) -> dict:
    if msg.get("type") == "computer_call_output":
        output = msg.get("output", {})
        if isinstance(output, dict):
            sanitized = msg.copy()
            sanitized["output"] = {**output, "image_url": "[omitted]"}
            return sanitized
    return msg

# 推論の実行
def create_response(**kwargs):
    url = "https://api.openai.com/v1/responses"
    headers = {
        "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
        "Content-Type": "application/json"
    }

    openai_org = os.getenv("OPENAI_ORG")
    if openai_org:
        headers["Openai-Organization"] = openai_org

    response = requests.post(url, headers=headers, json=kwargs)

    if response.status_code != 200:
        print(f"Error: {response.status_code} {response.text}")

    return response.json()

# 指定された URL (サブドメインを含む) がブロックリストに含まれている場合はエラー
def check_blocklisted_url(url: str) -> None:
    hostname = urlparse(url).hostname or ""
    if any(
        hostname == blocked or hostname.endswith(f".{blocked}")
        for blocked in BLOCKED_DOMAINS
    ):
        raise ValueError(f"Blocked URL: {url}")

いいなと思ったら応援しよう！

自然言語処理入門

1,097本

ログインまたは会員登録するとコメントできます。

OpenAI API で Computer use を実装する

前回

1. Computer Using Agent Sample App

2. cli.pyの実行

3. cli.pyのパラメータ

3. コードの解説

3-1. ファイル構成

3-2. main.py

3-3. agent.py

3-4. util.py

いいなと思ったら応援しよう！

ピックアップされています

自然言語処理入門

コメント

AIエージェントの作り方(2)

仕事の進め方を支援してくれるAIエージェントチームを作ろう（Langchain） その3 - 特定のファイルを参照

自キャラ実装日記①

LangChain(gemini-2.0-flash)エージェントからplaywright-mcpでブラウザを操作した記録

[C#]Semantic KernelのAgent FrameworkでAIエージェントを実装する

No1.【入門】RAG技術の基本概念と仕組み：検索拡張生成AIの全体像

Agent FrameWork 比較1 / OpenAI Agents SDK

PythonとOpenAI APIで実践！MCP開発入門 【第4回】コードでAIと初対話！PythonからOpenAI APIへシンプルなリクエストを送信、JSONレスポンスを体験

【新人エンジニア向け】Google Colab × OpenAI APIで作るAI教材ツール - コード完全解説

LangChain × OpenAI エージェント試作レポート｜自己紹介もできるAIを作ってみた

MCPをOpenAI Agent SDKで実装してみる

Udemy講座The Complete Agentic AI Engineering Course (2025) Week2 Day4のWebSearchToolをTavilyで実装してみる

ローカルPC×OpenAI APIで実現！音声ファイルから議事録を自動作成する方法

pydantic-graphでAIエージェントを構築

よちよちAI[オフラインでAIコードアシストする編-1.VSCode前編]

初めての記事（10個の無料生成AIツール類を紹介）

Hugging FaceのAIエージェントコース Unit2を学ぶ③

Strands Agents SDKを試してみました。

在オン（ザイオン＝ZION）入力自動化計画第４段‐９５％自動化したVer-

光速PubMedサーチ step-by-step ～生成AIによる必要情報の網羅的抽出～

PC操作を自動化するAIエージェント「OpenAI Operator」とは？生成AIがあなたのパソコン業務を代行する未来

【無料AIエージェント配布中】AI初心者でも簡単！n8n完全設定ガイド

対話専用のAIの作成 #2

うちの子で作るキャラクターAIエージェント

オープンソース“最新AIエージェント”のコードを公開＋仕組みを完全解説します

電車でAI開発したい！ スマホ→Discord→cursor→CICL 環境づくりで大撃沈した日記

[Dify実践記]note記事を書いてくるチャットフローを作ってみたよ。

【OpenAI codex活用術】最新コーディングツールcodexでアプリ作ってみた

ChatGPTを“実行環境”として使う：バーチャルOJT型スキルアセスメントを作ってみた (2)

【作ろう！】チャットGPT搭載LINEボット完全ガイド

板違いを指摘するDiscord BOTを作成した

前回の状態を保存する

【楽天ROOM自動化シリーズ 第1巻】導入：自動化の魅力と環境構築のはじめの一歩

ChatGPT から実務型 AI ツール開発者へ

Zoom・Meet録画を自動で文字起こし！Colab × GPT-4oで作る自作爆速議事録ツール！

ChatGPT超自動化10連発

仕事の進め方を支援してくれるAIエージェントチームを作ろう（Langchain）その3 - 特定のファイルを参照

PythonとOpenAI APIで実践！MCP開発入門【第4回】コードでAIと初対話！PythonからOpenAI APIへシンプルなリクエストを送信、JSONレスポンスを体験

光速PubMedサーチ step-by-step　～生成AIによる必要情報の網羅的抽出～

電車でAI開発したい！スマホ→Discord→cursor→CICL 環境づくりで大撃沈した日記

【楽天ROOM自動化シリーズ第1巻】導入：自動化の魅力と環境構築のはじめの一歩