Your SlideShare is downloading. ×
PostgreSQL Materialized Views with Active Record
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

PostgreSQL Materialized Views with Active Record

300
views

Published on

How to use PostgreSQL Materialized Views with to simplify code and improve performance of your Ruby on Rails application

How to use PostgreSQL Materialized Views with to simplify code and improve performance of your Ruby on Rails application

Published in: Software

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
300
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. PostgreSQL Materialized Views and Active Record David Roberts
  • 2. The Problem How do you quickly filter data represented by multiple ActiveRecord associations and calculations?
  • 3. Data Model City% &%Philly% &%Boston% Technology% &%Ruby% &%Python% Club% &%Philly.rb% Talk% &%PostgreSQL% Materialized%Views% Feedback% &%“Super%rad%talk!!”% &%“The%whole%world%is% now%dumber”% Author% &%David%Roberts%
  • 4. View all Comments class Feedback < ActiveRecord::Base belongs_to :talk INVALID_COMMENTS = ['', 'NA', 'N/A', 'not applicable'] scope :filled_out, -> { where.not(comment: INVALID_COMMENTS) } end Feedback.filled_out Feedback Load (2409.6ms) SELECT "feedbacks".* FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable'))
  • 5. Highest Scoring Talk with Valid Comments Feedback.filled_out .select('talk_id, avg(score) as overall_score') .group('talk_id').order('overall_score desc') .limit(10) # 665ms SELECT talk_id, avg(score) as overall_score FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/ A', 'not applicable')) GROUP BY talk_id ORDER BY overall_score desc LIMIT 10;
  • 6. Highest Scoring Talk with Valid Comments results = Feedback.filled_out .select('talk_id, avg(score) as overall_score') .group('talk_id').order('overall_score desc') .limit(10) results.first.inspect => "#<Feedback id: nil, talk_id: 24510>"
  • 7. Highest Scoring Talks in PA by Authors named Parker Feedback.filled_out.joins(talk: [:author, { club: :city }] ) .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') .where("cities.state_abbr = ?", 'PA') .where("authors.name LIKE '%?%'", 'Parker') .group('feedbacks.talk_id') .order('overall_score desc') .limit(10) # 665ms SELECT feedbacks.talk_id, avg(feedbacks.score) as overall_score FROM "feedbacks" INNER JOIN "talks" ON "talks"."id" = "feedbacks"."talk_id" INNER JOIN "authors" ON "authors"."id" = "talks"."author_id" INNER JOIN "clubs" ON "clubs"."id" = "talks"."club_id" INNER JOIN "cities" ON "cities"."id" = "clubs"."city_id" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) AND (cities.state_abbr = 'PA') AND (authors.name LIKE '%Parker%') GROUP BY feedbacks.talk_id ORDER BY overall_score desc LIMIT 10;
  • 8. What’s Wrong with these Examples? • Long ugly queries • Slow queries are bad for Web Applications • Fighting ActiveRecord framework • SQL aggregates are difficult to access • Returned object no longer corresponds to model • Repetitive code to setup joins and Filter invalid comments
  • 9. Using Database Views to clean up ActiveRecord models
  • 10. Views • Uses a stored / pre-defined Query • Uses live data from corresponding tables when queried • Can reference data from many tables • Great for hiding complex SQL statements • Allows you to push functionality to database
  • 11. But this is a Ruby talk!
  • 12. class CreateTalkView < ActiveRecord::Migration def up connection.execute <<-SQL CREATE VIEW v_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id SQL end def down connection.execute 'DROP VIEW IF EXISTS v_talks_report' end end
  • 13. Encapsulate in ActiveRecord Model class TalkReport < ActiveRecord::Base # Use associations just like any other ActiveRecord object belongs_to :author belongs_to :talk belongs_to :club belongs_to :city belongs_to :technology # take advantage of talks has_many relationship delegate :feedbacks, to: :talk self.table_name = 'v_talks_report' # views cannot be changed since they are virtual def readonly true end end
  • 14. Highest Scoring Talks with Valid Comments results = TalkReport.order(overall_score: :desc) .limit(10) results.first.inspect => "#<TalkReport city_id: 16, city_name: "Burlington", state_abbr: "VT", technology_id: 2, club_id: 73, club_name: "Python - Burlington", talk_id: 6508, talk_name: "The SQL protocol is down, reboot the optical syste...", author_id: 100, author_name: "Jerrell Gleichner", overall_score: #<BigDecimal:7ff3f80f1678,'0.4E1',9(27)>>"
  • 15. Highest Scoring Talks in PA by Authors named Parker TalkReport.where(state_abbr: 'PA') .where("author_name LIKE '%Parker%'") .order(overall_score: :desc).limit(10)
  • 16. Using Materialized Views to Improve Performance
  • 17. Materialized Views • Acts similar to a Database View, but results persist for future queries • Creates a table on disk with the Result set • Can be indexed • Ideal for capturing frequently used joins and aggregations • Allows optimization of tables for updating and Materialized Views for reporting • Must be refreshed to be updated with most recent data
  • 18. class CreateTalkReportMv < ActiveRecord::Migration def up connection.execute <<-SQL CREATE MATERIALIZED VIEW mv_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id; CREATE INDEX ON mv_talks_report (overall_score); SQL end def down connection.execute 'DROP MATERIALIZED VIEW IF EXISTS mv_talks_report' end end
  • 19. ActiveRecord Model Change class TalkReport < ActiveRecord::Base # No changes to associations belongs_to … self.table_name = 'mv_talks_report' def self.repopulate connection.execute("REFRESH MATERIALIZED VIEW #{table_name}") end # Materialized Views cannot be modified def readonly true end end
  • 20. Highest Scoring Talks 99% reduction in runtime 577ms Feedback.filled_out .select('talk_id, avg(score) as score') .group('talk_id').order('score desc').limit(10) 1ms TalkReport.order(overall_score: :desc).limit(10)
  • 21. Highest Scoring Talks in PA by Authors named “Parker” 95% reduction in runtime 400ms Feedback.filled_out.joins(talk: [:author, { club: :city }] ) .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') .where("cities.state_abbr = ?", 'PA') .where("authors.name LIKE '%?%'", 'Parker') .group('feedbacks.talk_id') .order('overall_score desc') .limit(10) 19ms TalkReport.where(state_abbr: 'PA') .where("author_name LIKE '%?%'", 'Parker') .order(overall_score: :desc).limit(10)
  • 22. Why Use Materialized Views in your Rails application? • ActiveRecord models allow for easy representation in Rails • Capture commonly used joins / filters • Allows for fast, live filtering and sorting of complex associations or calculated fields • Push data intensive processing out of Ruby to Database • Make use of advanced Database functions • Can Optimize Indexes for reporting only • When Performance is more important than Storage
  • 23. Downsides • Requires PostgreSQL 9.3 • Entire Materialized View must be refreshed to update • Bad when Live Data is required • For this use case, roll your own Materialized View using standard tables
  • 24. Downsides • Migrations are painful! • Recommend writing in SQL, so no using scopes • Entire Materialized View must be dropped and redefined for any changes to the View or referring tables • Hard to read and track what changed
  • 25. Resources • Source Code used in talk • https://github.com/droberts84/materialized-view- demo • PostgreSQL Documentation • https://wiki.postgresql.org/wiki/ Materialized_Views