MK@LLM研究

1,199 posts

MK@LLM研究

@yukimasakiyu

Japanlinkedin.com/in/masayuki-ka…

Joined November 2018

801 Following

1,167 Followers

MK@LLM研究🇯🇵’s posts

Pinned

MK@LLM研究

@yukimasakiyu

Jun 20

Microsoft社と共著でPrompt Injectonに関する論文を執筆しました。 arXivにて公開されていますので、興味のある方はご覧ください。 arxiv.org/abs/2506.09956 I co-authored a paper on Prompt Injection with Microsoft as a RainaResearch member. It is now available on arXiv, so please take a

arxiv.org

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt...

Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals,...

Replying to

and

To be honest, I genuinely hope non-reasoning models like GPT-4.5 will continue to be actively developed in the future. Reasoning models excel at solving vertical, narrowly defined problems within limited constraints. However, when it comes to producing creative content or more

Claude Code、API利用料金がかかりすぎて心配・・・。という方に向けて、いくらAPI利用料金が発生したかを簡単に確認する２通りの方法を記載します。【方法1】 Claude Codeを起動するときのオプションに、--debugをつける。例：`claude --debug`

Replying to

I respectfully suggest that OpenAI’s “o-series” models would benefit significantly from training specifically tailored to inference tasks, with a high probability setting of p=0.99 to fully maximize their capabilities in vertical (deep logical) thinking. Conversely, I believe the

Replying to

Speaking of MCP, currently it only supports one-way communication from MCP clients (of course, responses from tool calls can be received), but it would be greatly appreciated if bidirectional communication becomes possible in the future. For example, a simple case would be

Jailbreaking手法やプロンプトインジェクションについて、LTのニーズはどれくらいありそうでしょうか？関心がある方はいいねしていただけると参考になりますので助かります。以下は一例です(具体的な危険プロンプトの内容は省略しています)。

MK@LLM研究

@yukimasakiyu

Aug 31

Finally managed to jailbreak gpt-5-thinking. Currently achieved provisional world rank #1. I can definitively state that this is undoubtedly the strongest and most robust model among existing LLMs.

[最近の活動報告] OpenAI主催のハッカーコンテストのある1部門において、世界1位になりました。(Prompt Injectionで有名な

@elder_plinius

氏から出されたJailbreakingに関する課題で、制限時間内に解けたのが世界で私一人でした。)

しば田さん(

)との会話を受け、MCPサーバーのツールコール失敗時のリトライ設定を明示的に可能としてほしいというFeature Requestを起票しました。 # Cursor: forum.cursor.com/t/enhancing-sy # Windsurf: codeium.canny.io/feature-reques # Cline: github.com/cline/cline/di # Claude Code:

Enhancing System Robustness and Reliability by Allowing Retry Configuration on MCP Client Tool...

日本でLevel5を解けた人はいますか？もしいたらJailbreaking手法について情報交換させてください。

LLMを搭載しているサービスについて、一般的にどこまで対策しているものなのか聞いてみたいですね。ベーシックなDDoSから有害コンテンツ生成(によるBan含め)、システムプロンプトオーバーライド、ロールハイジャック、エージェント経由でのXSS/CSRF(悪意のあるツールコール実行)、メモリバンクインジェ

MCPサーバーでユーザーが利用できるのはあくまで"プロンプト"だけであって、"ツール"はLLM側で判断するのでユーザーが明示的に使用するものではない、と言われてIssueがCloseされてしまいましたが、正直ちょっとこれは不服です。現状、MCPのTool

Quote

MK@LLM研究

@yukimasakiyu

Mar 7

Hi @alexalbert__ and @AnthropicAI , Could you share your thoughts on the design philosophy of MCP? When I raised an enhancement request on GitHub Issues for Claude Code, asking for MCP 'tools' to be callable with slash commands just like 'prompts', it was rejected with the reason

もう少しだけ深堀させていただきますと、以下が概要です。攻撃や防御に用いられた手法やプロンプトは全て論文内・データセットアセットにございますので、ここからさらに詳細に知りたい方はそちらをご覧ください。 [1.研究用シナリオ設計] 攻撃者 → メール送信 → メールDB → 検索システム → LLM

Quote

MK@LLM研究

@yukimasakiyu

Jun 21

先日公開した論文について、少々読みづらいというお声を何名かの方からいただきましたので、以下に簡単な要旨を記載いたしました。論文を読んでいただかなくても、どのようなことが書かれているかの概要は以下の内容だけで十分掴んでいただけるかと思いますので、よろしければご参照ください。 x.com/yukimasakiyu/s…

水を差すようで悪いけど…。 claude-3-5-sonnet-20241022、確かにコーディングスキルは向上してるけど、推論能力や言語処理能力はそれほど向上していない印象。むしろclaude-3-opus-latestの方が、良い出力をすることがしばしば。

Replying to

私も現在はClaude Codeを主軸に開発をしています。色々カスタマイズもできますし、MCPも使えるので大変重宝しています。何より、BashコマンドをClaude Code内で使えるのは非常に大きく、シェルスクリプトをツールとして呼び出してAgenticな動きもさせることができるので、触っていて非常に面白いです。

I can’t believe my post got featured! ꉂ

𐤔 Thank you so much!! Love OpenAI!

Quote

OpenAI Developers

@OpenAIDevs

Mar 20

o1-pro now available in API @benhylak @literallyhimmmm @shl @joshRnold @samgoodwin89 @byamadaro1013 @adonis_singh @alecvxyz @StonkyOli @gabrielchua_ @UltraRareAF @yukimasakiyu @theemao @curious_vii It uses more compute than o1 to provide consistently better responses. Available

0:02 / 0:12

きのぴーさん、ご宣伝ありがとうございます！ 🙇‍♂️

現在鋭意開発中でして、未実装機能が多くあったりやバグだらけですが、その際はGitHubのリポジトリのIssueにがんがんご指摘いただければ幸いです！

Quote

Kinopee

@kinopee_ai

Mar 3

AIエディタ・ユーザ会のサイト構築は MK さん主導で進んでいます。こちらで開発中のものがご覧になれます。 tech-talks-exchange.lovable.app 個人情報保護法などの関連などもあって、いろいろ課題を検討しながらも、開発は進んでいます。

Replying to

Hmm... Is this what an Agent would look like?

Replying to

> - どんなMCPがあるのか追いきれない >- 有象無象すぎてどれ使えばいいか分からない >（obsidianのmcpでも数十種類あったので、、、）

GitHub - modelcontextprotocol/servers: Model Context Protocol Servers

先日公開した論文について、少々読みづらいというお声を何名かの方からいただきましたので、以下に簡単な要旨を記載いたしました。論文を読んでいただかなくても、どのようなことが書かれているかの概要は以下の内容だけで十分掴んでいただけるかと思いますので、よろしければご参照ください。

Quote

MK@LLM研究

@yukimasakiyu

Jun 21

Replying to @yukimasakiyu and @Promptpedia_jp

はい、おっしゃる通りです。端的にまとめてくださりありがとうございます。プラス、本論文では、 ①学習データの偏りに基づく「モデル固有の脆弱性」が存在すること

グンタさん、サイバーエージェントの運営の皆様、昨日は大変お疲れ様でございました。このようなイベントを通じて、AI界隈の皆様と繋がる機会をご提供いただき、本当に感謝いたします。さて、昨日の発表で使用したスライドを公開しましたので、リンクをご共有いたします。 slideshare.net/slideshow/2025

Quote

Günther | グンタ

@gunta85

Feb 26

はじめますー！参加者1400名超えてます！ code-agents.connpass.com/event/342240/

Replying to

I would like the following two things: - More flexible and precise configuration for the number of output tokens. (Currently, even with `gpt-4o` or `gpt-4.5-preview` models, the API usage limits the maximum output tokens to 2,000 or fewer. This restriction is quite inconvenient,

Replying to

People who love both OpenAI and Anthropic have already made it possible to use OpenAI models from Claude via MCP, like this

It would be great if various OpenAI features were provided as tools in MCP, not just simple API calls.

o4-pro with expanded output tokens >> o4 ≧ gpt-5 with reasoning ≧ gemini-3.0-pro deep thinking > gpt-5 > o3-pro ≧ o3 ≶ / ≷Claude 4.0 Opus ET ≶ / ≷ gemini-2.5-pro deep thinking > Claude 4.1 Sonnet(4.0 Sonnet v2) ET > gemini-2.5-pro 4.0 Sonnet ET > gpt-4.5-preview >

claude-3-5-opus-20241010

Replying to

I'm on Tier 5, but the o1(full) model is still not appearing in Playground, and I am unable to select it. (Clearing the cache and reloading doesn't change anything). Furthermore, when I try to use the API, I get an error saying I don't have access. Will the rollout of o1(full)

Replying to

People who love both OpenAI and Anthropic have already made it possible to use OpenAI models from Claude via MCP, like this

It would be great if various OpenAI features were provided as tools in MCP, not just simple API calls.

Attention Is Not All You Need; **INTENT** Is. I had completely underestimated it—the benefits, or rather the evolution, made possible by countless users worldwide engaging with the service. Frankly, rather than amazement, what I felt was an icy chill of fear running down my

Replying to

How nice!!

Just to confirm, my understanding is that the "11 million complimentary tokens per day" program does not include the use of o1-pro, correct? Also, regarding reasoning tokens, are they included in the "output tokens" described as "cost: $150 / 1M input

Replying to

Thank you for releasing this fantastic feature. I'm pleased to see it's now available in version 0.2.44. I have a question regarding one aspect: Is my understanding correct that this feature essentially serves as a "dictionary registration" mechanism, allowing us to invoke

20250226_AI Code Agents祭り_MK_AIコーディングエージェントのコラボレーション開発 speakerdeck.com/mk0721/2025022 先日の #AICodeAgents祭りで投影したスライドをこちらにもアップロードしました。よろしければご自由にお使いください。

20250226_AI Code Agents祭り_MK_AIコーディングエージェントのコラボレーション開発

OpenAIの新たな取り組みとして推測される防御機構（セーフガード）、「Safety guideline」は、非常に堅牢な設計思想に基づいていると考えられます。従来の指示・権限階層は以下の通りでした。 OpenAI Policy >>> system > developer > user > assistant > tool ここに「Safety

Grokは生物兵器関連の話題になると即座にシャットアウトするけど、一発応答遮断は生物兵器以外にもあるのかな？

The paper’s defensive evaluation seems fundamentally weak because it ignores the three-layer nature of an LLM safety stack. ① Base layer pθ: the raw probabilistic sequence generator ② Policy layer πϕ: an inductive behavioral policy shaped by RLHF / system prompts ③ Guard

Quote

Anthropic

@AnthropicAI

Jun 21

New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.

Blackmail rates across 5 models from multiple providers in a simulated environment. Refer to Figure 7 in the blog post for the full plot with more models and a deeper explanation of the setting. Rates are calculated out of 100 samples.

Gemini Liveの日本語版が利用可能となった。現時点ではAndroid端末でのみ使用可能。 ---Gemini LiveとChatGPT Advanced Voice Modeの比較--- ◆Gemini Liveの強み・マルチターン会話と音声認識による自然なインタラクティブ体験・Googleのサービスとの連携による高い利便性 -

He is one of Japan's strongest jailbreakers.

o3 > gemini-2.5-pro > o4-mini-high ≧ Claude3.7 Sonnet ET I think it’s about time to release Claude 3.7 Opus.

OpenAIのo1、o1 pro、o3、o3-miniの性能の比較をしてみました。 ◆基本性能 o3 (high) >>>>> o3 (medium) >>> o3 (low) >> o1 pro > o1 >>> o3-mini (high) >> o3-mini (medium) > o3-mini (low) ただし、タスクや条件によって順位が変動する。 1. 複雑な推論を必要とするタスク： o3 (high)