Symbol GC
Upcoming SlideShare
Loading in...5
×
 

Symbol GC

on

  • 162 views

RubyKaigi 2014

RubyKaigi 2014

Statistics

Views

Total Views
162
Views on SlideShare
151
Embed Views
11

Actions

Likes
3
Downloads
1
Comments
0

2 Embeds 11

https://twitter.com 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Symbol GC Symbol GC Presentation Transcript

  • Symbol GC #rubykaigi 2014 Narihiro Nakamura - @nari3
  • Self introduction
  • Self introduction ✔ Nari, @nari3, authorNari ✔ A CRuby committer. ✔ I work at NaCl. ✔ “Nakamura” ✔ is the most powerful clan in Ruby World.
  • Author http://tatsu-zziinnee..ccoomm//bbooookkss//ggccbbooookk
  • An unmotivated rubyist.
  • Today's topic obj = Object.new 100_000.times do |i| obj.respond_to?("sym#{i}".to_sym) end GC.start puts"symbol : #{Symbol.all_symbols.size}" $ ruby-2.1.2 a.rb symbol : 102416 $ ruby-trunk symbol : 2833
  • What is Symbol?
  • Symbol ✔ A symbol is a primitive data type whose instances have a unique human-readable form. ✔ Symbols can be used as identifiers.
  • :symbol
  • A pitfall of Symbol
  • A pitfall of Symbol ✔ All symbols are not garbage collected. ✔ Many beginners don't know this fact. ✔ Make a mistake even good rubyists. ✔ Prone to vulnerability ✔ User input → symbol ✔ Compress the memory
  • Simple cases ✖ if user.respond_to(params[:method].to_sym) Is this method callable? NG: params[:method] is user input ✖ params[params[:attr].to_sym] Get a value of a hash via a symbol key. NG: params[:attr] is user input.
  • Rails DoS Vulnerability CVE-2012-3424 HTTP Request: GET …. WWW-Authenticate: Digest digest = { to_sym :realm => “..”, to_sym :nonce => “..”, realm="..", nonce="...", algorithm=MD5, qop="auth" Parse to a hash } , foo=”xxx”, .., :foo => “..”, to_sym . . .,
  • We want Symbol GC ✔ There is this request from long time ago. ✔ Sasada-san has an idea. ✔ I will implement this idea.
  • Are symbols in other programming languages garbage collectable?
  • Programming languages which supported for Symbol ✔ Too Many Parentheses Languages ✔ Erlang ✔ Smalltalk ✔ Scala
  • Symbol GC support Language Symbol GC Erlang ✖ Gauche ✖ Clojure ○ EmacsLisp ✖ VisualWorks(Smalltalk) ○ Scala ○
  • Implementation dependency? ✔ Not unified. ✔ Symbol GC is undocumented in programing language specifications. ✔ Implementation = Specification?
  • EmacsLisp ✔ Function: unintern ✔ (unintern 'foo) ✔ Declare an unnecessary symbol. ✔ It's like manual memory management.
  • Scala Java main.scala 01: val a = 'sym Symbol Table String “sym”
  • Scala Java main.scala 01: val a = 'sym 02: a = null String “sym” Symbol Table Weak Reference GCG SCT ASRTATRT
  • Details of CRuby's Symbol
  • “sym”.to_sym C Ruby global_symbols ““ssyymm”” “sym” String sym_id(hash) ・ ・・ last_id(long) 1000 “sym” freeze freeze String “sym” 1001 Frozen String
  • “sym”.to_sym C Ruby global_symbols ““ssyymm”” “sym” String sym_id(hash) 1001 ・ ・・ last_id(long) “sym” freeze freeze String “sym” 1001 ID: 1001 ID2SYM(ID) SYMBOL (VALUE) :sym Frozen String
  • ID ✔ ID: Used by C Level. ✔ Store ID to a method table or a variable table. ✔ An unique number that corresponds to a symbol. ✔ Created by rb_intern(“foo”) of C API. ✔ :sym == :sym → 1001 == 1001
  • SYMBOL(VALUE) ✔ SYMBOL(VALUE): Used by Ruby Level. ✔ An raw data of :sym or ”sym”.to_sym ✔ Uncollectable.
  • Why can't collect garbage symbols.
  • For example, it stores ID to the static area of the C extension C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; SYM2ID(:foo) 1001
  • If :foo is collected, ID in sym_id will be deleted. C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; 1001 GC START
  • Then “foo”.to_sym is called. :foo == :foo but different ID C Ruby global_symbols sym_id(hash) “foo” 1002 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; 1001 1002 Different SYM2ID(:foo) != id
  • Why can't collect garbage symbols ✔ Problem: ID remaining in the C side. ✔ We can't detect and manage all IDs in C extension. ✔ Same symbol but different ID ✔ It will create an inconsistent ID.
  • In Ruby world RRIIPP.. AA ssyymmbbooll iiss ddeeaadd...... Photo by MIKI Yoshihito, https://www.flickr.com/pphhoottooss//mmuujjiittrraa//77557711002222449900
  • In C world WWRRRRRRYYYYYYYYYY!!!!!! II''mm ssttiillll aalliivvee........!! IIDD Photo by Zufallsfaktor, https://www.flickr.com/photos/zzuuffaallllssffaakkttoorr//55991111333388995599
  • How do you create Symbol GC?
  • Idea
  • Separates into two types of symbols Immortal Symbol Mortal Symbol CC WWoorrlldd RRuubbyy WWoorrlldd
  • Immortal Symbol ✔ These symbols have the ID corresponding ✔ e.g. method name, variable name, constant name, etc... ✔ use in C-level mainly ✔ Uncollectable ✔ Symbol stay alive after numbering the ID once ✔ There is no transition to Mortal Symbol.
  • def foo; end C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 Frozen String “foo”
  • Store an ID to the method table C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 Frozen String “foo” Method table 1001 def foo; end
  • ID2SYM(ID) → VALUE C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Method table 1001 def foo; end
  • Mortal Symbol ✔ These symbols don't have ID ✔ “sym”.to_sym → Mortal Symbol ✔ use in Ruby-level mainly ✔ Collectable ✔ Unreachable symbols are collected. ✔ There is transition to Immortal Symbol.
  • “bar”.to_sym C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Frozen String “bar” “bar” :bar
  • Splits uncollectable or collectable objects C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 Uncollectable ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Collectable Frozen String “bar” “bar” :bar
  • :bar will be collected C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Frozen String “bar” “bar” :bar
  • If you already have Immortal Symbol of the same name
  • def foo; end C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String
  • “foo”.to_sym C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID:1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :foo Check Use this one
  • From Mortal Symbol to Immortal Symbol
  • define_method(“foo”.to_sym){} C Ruby Mortal Symbol (VALUE) :foo global_symbols sym_id(hash) “foo” :foo
  • define_method(“foo”.to_sym){} C Ruby Immortal Mortal Symbol (VALUE) :foo global_symbols sym_id(hash) Method table 0x2c8d0 def foo; end SYM2ID(VALUE) 0x2c8d0 Pin down UUnnccoolllleeccttaabbllee Address = ID “foo” :foo
  • CAUTION
  • A new pitfall is coming!
  • Immortal Symbol ✔ All symbols are garbage collected. ✔ Immortal symbols are not garbage collected. ✔ Mortal → Immortal symbol when numbering an ID. ✔ This still lead to vulnerability!
  • A new pitfall ✔ Immortal Symbol is increase unintentionally. ✔ For instance: Get a name from a symbol ✔ rb_id2str(SYM2ID(sym)) ✔ Mortal → Immortal ✔ Please use rb_sym2str() ✔ Please attention to unconsidered SYM2ID().
  • Please keep to monitor ✔ Check Symbol.all_symbols.size ✔ Please report a bug to ruby-core or library author if increase number of symbols. ✔ It's a transition period now. ✔ It will get better gradually.
  • Details of implementation (for CRuby Hackers)
  • Static Symbol, Dynamic Symbol ✔ Static Symbol = Immediate value ✔ Immortal ✔ Dynamic Symbol = RVALUE ✔ Mortal or Immortal ✔ Change to immortal symbol when needs ID. ✔ Similar to Float and FLONUM
  • Details of RSymbol struct struct RSymbol { struct RBasic basic; VALUE fstr; ID type; }; Frozen String “foo” ID_LOCAL 0b00000 ID_INSTANCE 0b00010 ID_GLOBAL 0b00110 ID_ATTRSET 0b01000 ・・ ・
  • ID Structure 0bxxx.....xxx 000 High-order 61 bits = Counter Low-order 3 bits = ID type 0bxxx.....xx 000 1 Low-order 1 bit = Static Symbol Flag
  • Fast recognize ID ✔ Low-order 1bit = 1 → Static Symbol ✔ Dynamic Symbol ID = RVALUE address ✔ Low order 1 bit = 0 ✔ It's only check of the lower 1 bit.
  • Conclusion
  • Conclusion ✔ Most symbols will be garbage collected. ✔ But some symbols won't be garbage collected. ✔ “sym”.to_sym → OK ✔ define_method(“sym”.to_sym){} → NG
  • Acknowledgments ✔ Sasada-san ✔ Teaches me an idea of Symbol GC. ✔ Refines code of Symbol GC. ✔ Nakada-san, Tsujimoto-san, U.Nakamura-san, etc... ✔ Fixes many bugs. ✔ NaCl members
  • Thank you!