Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Scripting the Vim editor, Part 4: Dictionaries

Learn when to use dictionaries for cleaner, faster code

Damian Conway (damian@conway.org), CEO and Chief Trainer, Thoughtstream
author photo - damian conway
Damian Conway is an Adjunct Associate Professor of Computer Science at Monash University, Australia, and CEO of Thoughtstream, an international IT training company. Vim is his primary code development environment, and Vimscript is one of his two favorite programming languages.

Summary:  A dictionary is a container data structure that offers different optimizations and trade-offs from a list. In particular, in a dictionary the order of the elements stored is irrelevant and the identity of each element is explicit. In this fourth article in a series introducing Vimscript, Damian Conway introduces you to dictionaries, including an overview of their basic syntax and many functions. He concludes with several examples that illustrate the use of dictionaries for more efficient data processing and cleaner code.

View more content in this series

Date:  10 Feb 2010
Level:  Introductory PDF:  A4 and Letter (51KB | 13 pages)Get Adobe® Reader®
Also available in:   Korean  Japanese

Activity:  44981 views
Comments:  

A dictionary in Vimscript is essentially the same as an AWK associative array, a Perl hash, or a Python dictionary. That is, it's an unordered container, indexed by strings rather than integers.

This fourth article in a series on Vimscript introduces this important data structure and explains its various functions for copying, filtering, extending, and pruning. The examples focus on the differences between lists and dictionaries, and on those cases where the use of a dictionary is a better alternative to the list-based solutions developed in Part 3 on built-in lists.

Dictionaries in Vimscript

You create a dictionary in Vimscript by using curly braces around a list of key/value pairs. In each pair, the key and value are separated by a colon. For example:


Listing 1. Creating a dictionary
let seen = {}   " Haven't seen anything yet

let daytonum = { 'Sun':0, 'Mon':1, 'Tue':2, 'Wed':3, 'Thu':4, 'Fri':5, 'Sat':6 }
let diagnosis = {
    \   'Perl'   : 'Tourettes',
    \   'Python' : 'OCD',
    \   'Lisp'   : 'Megalomania',
    \   'PHP'    : 'Idiot-Savant',
    \   'C++'    : 'Savant-Idiot',
    \   'C#'     : 'Sociopathy',
    \   'Java'   : 'Delusional',
    \}

Once you have created a dictionary, you can access its values using the standard square-bracket indexing notation, but using a string as the index instead of a number:

let lang = input("Patient's name? ")

let Dx = diagnosis[lang]

If the key doesn't exist in the dictionary, an exception is thrown:

let Dx = diagnosis['Ruby']
**E716: Key not present in Dictionary: Ruby**

However, you can access potentially non-existent entries safely, using the get() function. get() takes two arguments: the dictionary itself, and a key to look up in it. If the key exists in the dictionary, the corresponding value is returned; if the key doesn't exist, get() returns zero. Alternately, you can specify a third argument, in which case get() returns that value if the key isn't found:

let Dx = get(diagnosis, 'Ruby')                     
" Returns: 0

let Dx = get(diagnosis, 'Ruby', 'Schizophrenia')    
" Returns: 'Schizophrenia'

There's a third way to access a particular dictionary entry. If the entry's key consists only of identifier characters (alphanumerics and underscores), you can access the corresponding value using the "dot notation," like so:

let Dx = diagnosis.Lisp                    " Same as: diagnosis['Lisp']

diagnosis.Perl = 'Multiple Personality'    " Same as: diagnosis['Perl']

This special limited notation makes dictionaries very easy to use as records or structs:

let user = {}

let user.name    = 'Bram'
let user.acct    = 123007
let user.pin_num = '1337'


Batch-processing of dictionaries

Vimscript provides functions that allow you to get a list of all the keys in a dictionary, a list of all its values, or a list of all its key/value pairs:

let keylist   = keys(dict)
let valuelist = values(dict)
let pairlist  = items(dict)

This items() function actually returns a list of lists, where each "inner" list has exactly two elements: one key and the corresponding value. Hence items() is especially handy for iterating through the entries of a dictionary:

for [next_key, next_val] in items(dict)
    let result = process(next_val)
    echo "Result for " next_key " is " result
endfor


Assignments and identities

Assignments in dictionaries work exactly as they do for Vimscript lists. Dictionaries are represented by references (that is, pointers), so assigning a dictionary to another variable aliases the two variables to the same underlying data structure. You can get around this by first copying or deep-copying the original:

let dict2 = dict1             " dict2 just another name for dict1
                                  
let dict3 = copy(dict1)       " dict3 has a copy of dict1's top-level elements

let dict4 = deepcopy(dict1)   " dict4 has a copy of dict1 (all the way down)

Just as for lists, you can compare identity with the is operator, and value with the == operator:

if dictA is dictB
    " They alias the same container, so must have the same keys and values
elseif dictA == dictB
    " Same keys and values, but maybe in different containers
else
    " Different keys and/or values, so must be different containers
endif


Adding and removing entries

To add an entry to a dictionary, just assign a value to a new key:

let diagnosis['COBOL'] = 'Dementia'

To merge in multiple entries from another dictionary, use the extend() function. Both the first argument (which is being extended) and the second argument (which contains the extra entries) must be dictionaries:

call extend(diagnosis, new_diagnoses)

extend() is also convenient when you want to add multiple entries explicitly:

call extend(diagnosis, {'COBOL':'Dementia', 'Forth':'Dyslexia'})

There are two ways to remove a single entry from a dictionary: the built-in remove() function, or the unlet command:

let removed_value = remove(dict, "key")

unlet dict["key"]

When removing multiple entries from a dictionary, it is cleaner and more efficient to use filter(). The filter() function works much the same way as for lists, except that in addition to testing each entry's value using v:val, you can also test its key using v:key. For example:


Listing 2. Testing values and keys
" Remove any entry whose key starts with C...
call filter(diagnosis, 'v:key[0] != "C"')

" Remove any entry whose value doesn't contain 'Savant'...
call filter(diagnosis, 'v:val =~ "Savant"')

" Remove any entry whose value is the same as its key...
call filter(diagnosis, 'v:key != v:val')


Other dictionary-related functions

In addition to filter(), dictionaries can use several other of the same built-in functions and procedures as lists. In almost every case (the notable exception being string()), a list function applied to a dictionary behaves as if the function had been passed a list of the values of the dictionary. Listing 3 shows the most commonly used functions.


Listing 3. Other list functions that also work on dictionaries

let is_empty = empty(dict)           " True if no entries at all

let entry_count = len(dict)          " How many entries?

let occurrences = count(dict, str)   " How many values are equal to str?

let greatest = max(dict)             " Find largest value of any entry
let least    = min(dict)             " Find smallest value of any entry

call map(dict, value_transform_str)  " Transform values by eval'ing string

echo string(dict)                    " Print dictionary as key/value pairs

The filter() built-in is particularly handy for normalizing the data in a dictionary. For example, given a dictionary containing the preferred names of users (perhaps indexed by userids), you could ensure that each name was correctly capitalized, like so:

call map( names, 'toupper(v:val[0]) . tolower(v:val[1:])' )

The call to map() walks through each value, aliases it to v:val, evaluates the expression in the string, and replaces the value with the result of that expression. In this example, it converts the first character of the name to uppercase, and the remaining characters to lowercase, and then uses that modified string as the new name value.


Deploying dictionaries for cleaner code

The third article in this series explained Vimscript's variadic function arguments with a small example that generated comment boxes around a specified text. Optional arguments could be added after the text string to specify the comment introducer, the character used as the "box," and the width of the comment. Listing 4 reproduces the original function.


Listing 4. Passing optional arguments as variadic parameters

function! CommentBlock(comment, ...)
    " If 1 or more optional args, first optional arg is introducer...
    let introducer =  a:0 >= 1  ?  a:1  :  "//"

    " If 2 or more optional args, second optional arg is boxing character...
    let box_char   =  a:0 >= 2  ?  a:2  :  "*"

    " If 3 or more optional args, third optional arg is comment width...
    let width      =  a:0 >= 3  ?  a:3  :  strlen(a:comment) + 2 

    " Build the comment box and put the comment inside it...
    return introducer . repeat(box_char,width) . "\<CR>"
    \    . introducer . " " . a:comment        . "\<CR>"
    \    . introducer . repeat(box_char,width) . "\<CR>"
endfunction

Variadic arguments are convenient for specifying function options but suffer from two major drawbacks: they impose an explicit ordering on the function's parameters, and they leave that ordering implicit in function calls.

Revisiting autocomments

As Listing 4 illustrates, when any arguments are optional, it is usually necessary to decide in advance the order in which they must be specified. This necessity presents a design problem, however: in order to specify a later option, the user will have to explicitly specify all the options before it as well. Ideally, the first option would be the most commonly used one, the second would be the second-most commonly used, etc. In reality, deciding on this order before the function is widely deployed can be difficult: how are you supposed to know which option will be most important to most people?

The CommentBlock() function in Listing 4, for example, assumes that the comment introducer is the optional argument that is most likely to be needed, and so places it first in the parameter list. But what if a user of the function only ever programs in C and C++, and so never alters the default introducer? Worse, what if it turns out that the width of comment blocks varies for every new project? This will prove very annoying, because developers will now have to specify all three optional arguments every time, even though the first two are always given their default values:

" Comment of required width, with standard delimiter and box character...
let new_comment = CommentBlock(comment_text, '//', '*', comment_width)

This leads directly to the second issue, namely that when any options do need to be specified explicitly, it is likely that several of them will have to be specified. However, because options default to the most commonly needed values, the user may be unfamiliar with specifying options, and hence unfamiliar with the necessary order. This can lead to implementation errors like the following:

" Box comment using ==== to standard line width...
let new_comment = CommentBlock(comment_text, '=', 72)

…which, rather disconcertingly, produces a (non-)comment that looks like this:

=727272727272727272727272727272 = A bad comment =727272727272727272727272727272

The problem is that the optional arguments have nothing explicit to indicate which option they are supposed to set. Their meaning is determined implicitly by their position in the argument list, and so any mistake in their ordering silently changes their meaning.

This is a classic case of using the wrong tool for the job. Lists are perfect when order is significant and identity is best implied by position. But, in this example, the order of the optional arguments is more a nuisance than a benefit and their positions are easily confused, which can lead to subtle errors of misidentification.

What's wanted is, in a sense, the exact opposite of a list: a data structure where order is irrelevant, and identity is explicit. In other words, a dictionary. Listing 5 shows the same function, but with its options specified via a dictionary, rather than with variadic parameters.


Listing 5. Passing optional arguments in a dictionary

function! CommentBlock(comment, opt)
    " Unpack optional arguments...
    let introducer = get(a:opt, 'intro', '//'                 )
    let box_char   = get(a:opt, 'box',   '*'                  )
    let width      = get(a:opt, 'width', strlen(a:comment) + 2)

    " Build the comment box and put the comment inside it...
    return introducer . repeat(box_char,width) . "\<CR>"
    \    . introducer . " " . a:comment        . "\<CR>"
    \    . introducer . repeat(box_char,width) . "\<CR>"
endfunction

In this version of the function, only two arguments are passed: the essential comment text, followed by a dictionary of options. The built-in get() function is then used to retrieve each option, or its default value, if the option was not specified. Calls to the function then use the named option/value pairs to configure its behavior. The implementation of the parameter parsing within the function becomes a little cleaner, and calls to the function becomes much more readable, and less error-prone. For example:

" Comment of required width, with standard delimiter and box character...
let new_comment = CommentBlock(comment_text, {'width':comment_width})

" Box comment using ==== to standard line width...
let new_comment = CommentBlock(comment_text, {'box':'=', 'width':72})


Refactoring autoalignments

In the third article in this series, we updated an earlier example function called AlignAssignments(), converting it to use lists to store the text lines it was modifying. Listing 6 reproduces that updated version of the function.


Listing 6. The updated AlignAssignments() function

function! AlignAssignments ()
    " Patterns needed to locate assignment operators...
    let ASSIGN_OP   = '[-+*/%|&]\?=\@<!=[=~]\@!'
    let ASSIGN_LINE = '^\(.\{-}\)\s*\(' . ASSIGN_OP . '\)\(.*\)$'

    " Locate block of code to be considered (same indentation, no blanks)...
    let indent_pat = '^' . matchstr(getline('.'), '^\s*') . '\S'
    let firstline  = search('^\%('. indent_pat . '\)\@!','bnW') + 1
    let lastline   = search('^\%('. indent_pat . '\)\@!', 'nW') - 1
    if lastline < 0
        let lastline = line('$')
    endif

    " Decompose lines at assignment operators...
    let lines = []
    for linetext in getline(firstline, lastline)
        let fields = matchlist(linetext, ASSIGN_LINE)
        call add(lines, fields[1:3])
    endfor

    " Determine maximal lengths of lvalue and operator...
    let op_lines = filter(copy(lines),'!empty(v:val)')
    let max_lval = max( map(copy(op_lines), 'strlen(v:val[0])') ) + 1
    let max_op   = max( map(copy(op_lines), 'strlen(v:val[1])'  ) )

    " Recompose lines with operators at the maximum length...
    let linenum = firstline
    for line in lines
        if !empty(line)
            let newline
            \    = printf("%-*s%*s%s", max_lval, line[0], max_op, line[1], line[2])
            call setline(linenum, newline)
        endif
        let linenum += 1
    endfor
endfunction

This version greatly improved the efficiency of the function, by caching data rather than reloading it, but it did so at the expense of maintainability. Specifically, because it stored the various components of each line in small three-element arrays, the code is littered with "magic indexes" (such as v:val[0] and line[1]) whose names give no clue as to their purpose.

Dictionaries are tailor-made for solving this problem, because, like lists, they aggregate data into a single structure, but, unlike lists, they label each datum with a string, rather than with a number. If those strings are selected carefully, they can make the resulting code much clearer. Instead of magic indexes, we get meaningful names (such as v:val.lval for each line's lvalue and line.op for each line's operator).

Rewriting the function using dictionaries is trivially easy, as Listing 7 demonstrates.


Listing 7. A further-improved AlignAssignments() function

function! AlignAssignments ()
    " Patterns needed to locate assignment operators...
    let ASSIGN_OP   = '[-+*/%|&]\?=\@<!=[=~]\@!'
    let ASSIGN_LINE = '^\(.\{-}\)\s*\(' . ASSIGN_OP . '\)\(.*\)$'

    " Locate block of code to be considered (same indentation, no blanks)...
    let indent_pat = '^' . matchstr(getline('.'), '^\s*') . '\S'
    let firstline  = search('^\%('. indent_pat . '\)\@!','bnW') + 1
    let lastline   = search('^\%('. indent_pat . '\)\@!', 'nW') - 1
    if lastline < 0
        let lastline = line('$')
    endif

    " Decompose lines at assignment operators...
    let lines = []
    for linetext in getline(firstline, lastline)
        let fields = matchlist(linetext, ASSIGN_LINE)
        if len(fields) 
            call add(lines, {'lval':fields[1], 'op':fields[2], 'rval':fields[3]})
        else
            call add(lines, {'text':linetext,  'op':''                         })
        endif
    endfor

    " Determine maximal lengths of lvalue and operator...
    let op_lines = filter(copy(lines),'!empty(v:val.op)')
    let max_lval = max( map(copy(op_lines), 'strlen(v:val.lval)') ) + 1
    let max_op   = max( map(copy(op_lines), 'strlen(v:val.op)'  ) )

    " Recompose lines with operators at the maximum length...
    let linenum = firstline
    for line in lines
        let newline = empty(line.op)
        \ ? line.text
        \ : printf("%-*s%*s%s", max_lval, line.lval, max_op, line.op, line.rval)
        call setline(linenum, newline)
        let linenum += 1
    endfor
endfunction

The differences in this new version are marked in bold. There are only two: the record for each line is now a dictionary rather than a hash, and the subsequent accesses to elements of each record use named lookups instead of numeric indexing. The overall result is that the code is more readable and less prone to the kinds of off-by-one errors common to array indexing.


Dictionaries as data structures

Vim provides a built-in command that allows you to remove duplicate lines from a file:

:%sort u

The u option causes the built-in sort command to remove duplicate lines (once they've been sorted), and the leading % applies that special sort to the entire file. That's handy, but only if you don't care about preserving the original order of the unique lines in the file. This might be a problem if the lines are a list of prize winners, a sign-up sheet for a finite resource, a to-do list, or any other sequence in which first-in should remain best-dressed.

Sort-free uniqueness

The keys of a dictionary are inherently unique, so it's possible to use a dictionary to remove duplicate lines from a file, and to do so in a way that preserves the original ordering of those lines. Listing 8 illustrates a simple function that achieves this goal.


Listing 8. A function for order-preserving uniqueness

function! Uniq () range
    " Nothing unique seen yet...
    let have_already_seen = {}
    let unique_lines = []

    " Walk through the lines, remembering only the hitherto-unseen ones...
    for original_line in getline(a:firstline, a:lastline)
        let normalized_line = '>' . original_line 
        if !has_key(have_already_seen, normalized_line)
            call add(unique_lines, original_line)
            let have_already_seen[normalized_line] = 1
        endif
    endfor

    " Replace the range of original lines with just the unique lines...
    exec a:firstline . ',' . a:lastline . 'delete'
    call append(a:firstline-1, unique_lines)
endfunction

The Uniq() function is declared to take a range, so it will only be called once, even when invoked on a range of lines in the buffer.

When called, it first sets up an empty dictionary (have_already_seen) that will be used to track which lines have already been encountered within the specified range. Lines that haven't been seen before will then be added to the list stored in unique_lines.

The function then provides a loop that does precisely that. It grabs the specified range of lines from the buffer with a getline() and iterates through each. It first adds a leading '>' to each line to ensure it is not empty. Vimscript dictionaries cannot store an entry whose key is an empty string, so empty lines from the buffer would not be correctly added to have_already_seen.

Once the line is normalized, the function then checks whether that line has already been used as a key in the have_already_seen dictionary. If so, an identical line must already have been seen and added to unique_lines, so the copy can be ignored. Otherwise, the line is being encountered for the first time, so the original (un-normalized) line must be added to unique_lines, and the normalized version must be added as a key in have_already_seen.

When all the lines have been filtered in this way, unique_lines will contain only the unique subset of them, in the order in which they were first encountered. All that remains is to delete the original set of lines and replace it (via an append()) with these accumulated unique lines.

With such a function available, you could set up a Normal-mode keymap to invoke the command on entire files, like so:

nmap ;u :%call Uniq()<CR>

Or you could apply it to a specific set of lines (for example, a range that had been selected in Visual mode), like so:

vmap  u :call Uniq()<CR>


Looking ahead

The basic features of Vimscript covered so far (statements and functions, arrays, and hashes) are sufficient to create almost any kind of addition to Vim's core feature set. But all the extensions we have seen have required the user to explicitly request behavior, by issuing a Normal-mode command or typing a particular sequence in Insert mode.

In the next article in this series, we'll investigate Vim's built-in event model and explore how to set up user-defined functions that trigger automatically as the user edits.


Resources

Learn

Get products and technologies

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

About the author

author photo - damian conway

Damian Conway is an Adjunct Associate Professor of Computer Science at Monash University, Australia, and CEO of Thoughtstream, an international IT training company. Vim is his primary code development environment, and Vimscript is one of his two favorite programming languages.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=467580
ArticleTitle=Scripting the Vim editor, Part 4: Dictionaries
publish-date=02102010
author1-email=damian@conway.org
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers