(cache) PythonとbottlenoseでAmazon Product Advertising APIを使う。

概要

Product Advertising APIについては、Product Advertising APIから。
基本的に上記の通りのルールでプログラムを書けばいいのだが、面倒臭い。

APIのラッパー的なものが既に作られているのであればそちらを使ったほうが早い。

プログラムもPerl,Python,Ruby,PHPなどいろいろあるが、好きなものを選べば良いと思う。

Pythonのライブラリではbottlenoseが使い易そうだったので使ってみた。

ちなみに、AWSAccessKeyId、AWSSecretKey、AssosiateIdなどの取得の方法は、省略する。。。
他に詳しく説明されている方がいるので各自でお調べを。

bottlenoseについて

インストール方法

pip install bottlenose

ItemLookupで商品情報をxml形式で取得する

ASINコードを元に検索する場合にこちらを使う。
ASINとはAmazonで使っているコードのこと。
書籍の場合はISBNとほぼ一緒。
参考に、ASIN：4873117569の情報を取得してみた。

サンプルコード

# -*- coding: cp932 -*-
import bottlenose
from bs4 import BeautifulStoneSoup

AWSAccessKeyId="自分のIDを入れてね"
AWSSecretKey="自分のIDを入れてね"
AssosiateId = "自分のIDを入れてね"
amazon = bottlenose.Amazon(AWSAccessKeyId,
                           AWSSecretKey,
                           AssosiateId ,
                           Region='JP')

res = amazon.ItemLookup(ItemId="4873117569")
soup = BeautifulSoup(res,"lxml")

print soup.prettify()

実行結果(item要素のみ抜粋)

<?xml version="1.0" ?>
<html>
 <body>  
    <item>
     <asin>
      4873117569
     </asin>
     <detailpageurl>
      http://www.amazon.co.jp/Effective-Python-%E2%80%95Python%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%A0%E3%82%92%E6%94%B9%E8%89%AF%E3%81%99%E3%82%8B59%E9%A0%85%E7%9B%AE-Brett-Slatkin/dp/4873117569%3FSubscriptionId%3DAKIAJGREZ6P3ZM45HYGQ%26tag%3Doneshotlifetom-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D4873117569
     </detailpageurl>
     <itemlinks>
      <itemlink>
       <description>
        Add To Wishlist
       </description>
       <url>
        http://www.amazon.co.jp/gp/registry/wishlist/add-item.html%3Fasin.0%3D4873117569%26SubscriptionId%3DAKIAJGREZ6P3ZM45HYGQ%26tag%3Doneshotlifetom-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D5143%26creativeASIN%3D4873117569
       </url>
      </itemlink>
      <itemlink>
       <description>
        Tell A Friend
       </description>
       <url>
        http://www.amazon.co.jp/gp/pdp/taf/4873117569%3FSubscriptionId%3DAKIAJGREZ6P3ZM45HYGQ%26tag%3Doneshotlifetom-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D5143%26creativeASIN%3D4873117569
       </url>
      </itemlink>
      <itemlink>
       <description>
        All Customer Reviews
       </description>
       <url>
        http://www.amazon.co.jp/review/product/4873117569%3FSubscriptionId%3DAKIAJGREZ6P3ZM45HYGQ%26tag%3Doneshotlifetom-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D5143%26creativeASIN%3D4873117569
       </url>
      </itemlink>
      <itemlink>
       <description>
        All Offers
       </description>
       <url>
        http://www.amazon.co.jp/gp/offer-listing/4873117569%3FSubscriptionId%3DAKIAJGREZ6P3ZM45HYGQ%26tag%3Doneshotlifetom-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D5143%26creativeASIN%3D4873117569
       </url>
      </itemlink>
     </itemlinks>
     <itemattributes>
      <author>
       Brett Slatkin
      </author>
      <creator role="監修">
       石本 敦夫
      </creator>
      <creator role="翻訳">
       黒川 利明
      </creator>
      <manufacturer>
       オライリージャパン
      </manufacturer>
      <productgroup>
       Book
      </productgroup>
      <title>
       Effective Python ―Pythonプログラムを改良する59項目
      </title>
     </itemattributes>
    </item>
   </items>
  </itemlookupresponse>
 </body>
</html>

xmlをパースする

beautifulsoupを使えば余裕で解析が出来る。

サンプルコード

soup = BeautifulSoup(res,"lxml")

dic = {}
dic["asin"] = soup.findAll("asin")[0].text
dic["detailpageurl"] = soup.findAll("detailpageurl")[0].text
dic["author"] = soup.findAll("author")[0].text
dic["manufacturer"] = soup.findAll("manufacturer")[0].text
dic["productgroup"] = soup.findAll("productgroup")[0].text

for key,val in dic.iteritems():
    print "|",key,"|",val,"|"

実行結果

asin	4873117569
manufacturer	オライリージャパン
detailpageurl	http://www.amazon.co.jp/Effective-Python-%E2%80%95Python%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%A0%E3%82%92%E6%94%B9%E8%89%AF%E3%81%99%E3%82%8B59%E9%A0%85%E7%9B%AE-Brett-Slatkin/dp/4873117569%3FSubscriptionId%3DAKIAJGREZ6P3ZM45HYGQ%26tag%3Doneshotlifetom-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D4873117569
productgroup	Book
author	Brett Slatkin

最後に

WebAPIを使って情報を取得するのは楽しい。たいていの場合xmlかjsonで情報が取得出来るので、xmlとjosonのparseの仕方が分かれば余裕で情報が取得出来る。こういった構造化された情報を提供する人たちは、綺麗にデータを並べてくれているのが大半なので、HTMLをギコギコ解析するよりずっと簡単！