N-gramモデルのエントロピーに基づくパラメータ削減に関する検討 A Study on Entropy-based Compression Algorithms for N-gram Parameters

この論文にアクセスする

この論文をさがす

抄録

人間の発話を文字に変換する大語彙連続音声認識(ディクテーション)技術は,キーボード入力の省力化や,様々な環境下における人間とコンピュータ間の音声インタフェースの実現のために必要不可欠な技術であり,活発に研究が行われている.認識システムには,人間の言語知識の役割を果たす言語モデルが組み込まれており,一般的には統計的言語モデルであるN-gramが用いられている.しかし,数千語?数万語を対象とする場合,N-gramモデルのパラメータが指数関数的に増大し,システム構築に際して,大きな障害が生じることになる.本稿では,これまでに提案された種々のN-gramモデルのパラメータ削減手法の比較を行う.また,我々が提案する削減手法を(N? $-1$)-gramに適用するための予備実験を行ったので,その結果について報告する.Large vocabulary continuous speech recognition (LVCSR), which is simply called as dictation, is an essential technology for the realization of voice typing and interface between human being and a computer in various conditions.An LVCSR system reduces search space using language models,where statistical N-gram models are generally used.However, they need a huge number of parameters that grow exponentially with N and the vocabulary size. Especially in the task with large vocabulary (from a few thousand of words to several ten thousands of words), their huge memory requirement results in the system implementation difficulty.In this paper we compare algorithms for reducing the number of parameters of an N-gram model.Preliminary experiments on the augmentationof our compression algorithm to deal with (N\,$-1$)-gram are carried out.

Large vocabulary continuous speech recognition (LVCSR), which is simply called as dictation, is an essential technology for the realization of voice typing and interface between human being and a computer in various conditions. An LVCSR system reduces search space using language models, where statistical N-gram models are generally used. However, they need a huge number of parameters that grow exponentially with N and the vocabulary size. Especially in the task with large vocabulary (from a few thousand of words to several ten thousands of words), their huge memory requirement results in the system implementation difficulty. In this paper we compare algorithms for reducing the number of parameters of an N-gram model. Preliminary experiments on the augmentation of our compression algorithm to deal with (N-1)-gram are carried out.

収録刊行物

  • 情報処理学会論文誌

    情報処理学会論文誌 42(2), 327-333, 2001-02-15

    情報処理学会

参考文献:  22件中 1-22件 を表示

各種コード

  • NII論文ID(NAID)
    110002725740
  • NII書誌ID(NCID)
    AN00116647
  • 本文言語コード
    JPN
  • 資料種別
    Journal Article
  • ISSN
    1882-7764
  • NDL 記事登録ID
    5667528
  • NDL 雑誌分類
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL 請求記号
    Z14-741
  • データ提供元
    CJP書誌  NDL  NII-ELS  IR  IPSJ 
ページトップへ