Crawling speedup

113 views
96 views

Published on

”クローリングのスペシャリストが語るクローラー運用の裏側”
powered by "Bayside Tech Bridge"

Published in: Engineering
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
113
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Crawling speedup

  1. 1. 2016/08/21
  2. 2. 15 12 Masahiro Aasakura

  3. 3. API
  4. 4. API
  5. 5. API
  6. 6. ( ) ( ) ( )
  7. 7. API
  8. 8. API
  9. 9. API ♪
  10. 10. Source Source Data DB
  11. 11. Source Source Data DB
  12. 12. Source • • • Sidekiq • Resque • Delayed •
  13. 13. Source Source Data DB
  14. 14. Data DB • DB • DB • • Redis • Memcached •
  15. 15. Source Source Data DB
  16. 16. Ruby Nokogiri Nokigiri Tutorials http://www.nokogiri.org/
  17. 17. XPATH CSS # html doc = Nokogiri::HTML.parse(html) # “XPATH” or “CSS ” title_text = doc.xpath(‘//title’).text # > title_text -> “hoge”
  18. 18. XPATH CSS “XPATH” or “CSS ”
  19. 19. XML(API) SAX
  20. 20. SAX SAX Simple API For XML XML API DOM
  21. 21. XPATH, CSS SAX xml
  22. 22. SAX
  23. 23. SAX Nokogiri::XML::SAX::Parser.new(
  24. 24. http://caramelbit.com

×