The development of the user interface for a large commercial software product like Microsoft® Windows 95 involves many people, broad design goals, and an aggressive work schedule. This design briefing describes how the usability engineering principles of iterative design and problem tracking were successfully applied to make the development of the UI more manageable. Specific design problems and their solutions are also discussed.
Iterative design, Microsoft Windows, problem tracking, rapid prototyping, usability engineering, usability testing.
Windows 95 is a comprehensive upgrade to the Windows 3.1 and Windows for Workgroups 3.11 products. Many changes have been made in almost every area of Windows, with the user interface being no exception. This paper discusses the design team, its goals and process then explains how usability engineering principles such as iterative design and problem tracking were applied to the project, using specific design problems and their solutions as examples.
The Windows 95 user interface design team was formed in October, 1992 during the early stages of the project. I joined the team as an adjunct member, to provide usability services, in December 1992. The design team was truly interdisciplinary, with people trained in product design, graphic design, usability testing, and computer science. The number of people oscillated during the project but was approximately twelve. The software developers dedicated to implementing the user interface accounted for another twelve or so people.
The design team was chartered with two very broad goals:
With over 50 million units of Windows 3.1 and 3.11 installed plus a largely-untapped home market, it was clear from the outset that the task of making a better product was not going to be a trivial exercise. Without careful design and testing, we were likely to make a product that improved usability for some users and worsened it for millions of other users (existing or potential). We understood fairly well the problems that intermediate and advanced users had but we knew little about problems beginning users had.
Given very broad design goals and an aggressive schedule for shipping the product (approximately 18 months to design and code the user interface) we knew from the outset that a traditional "waterfall" style development process would not allow us sufficient flexibility to attain the best-possible solution. In fact, we were concerned that the traditional approach would yield a very unusable system.
In the "waterfall" approach, the design of the system is compartmentalized (usually limited to a specification writing phase) and usability testing typically occurs near the end of the process, during quality assurance activities. We recognized that we needed much more opportunity to create a design, try it out with users (perhaps comparing it to other designs), make changes, and gather more user feedback. Our desire to abandon the waterfall model and opt for iterative design fortunately followed similar efforts in other areas of the company, so we had concrete examples of its benefits and feasibility.
Figure 1 outlines the process that we used. The process was typical of most products designed iteratively: paper or computer-based prototypes were used to try out design ideas and to gather usability data in the lab. Once a design had been coded, it was refined in the usability lab. When enough of the product had been coded and refined, it was examined more broadly, over time, in the field. Minor usability problems identified in the field were fixed before shipping the product. More importantly, the data gathered in the field is being used to guide work on the next version.
Our iterative design process was divided into three major phases: exploration, rapid prototyping, and fine tuning.
Figure 1: Windows 95 iterative design process.
In this first phase we experimented with design directions and gathered initial user data. We began with a solid foundation for the visual design of the user interface by leveraging work done by the "Cairo" team. We inherited from them much of the fundamental UI and interaction design (the desktop, the "Tray", context menus, three-dimensional look and feel, etc.). We also collected data from product support about users' top twenty problems with Windows 3.1.
Figure 2 shows a prototype Windows 95 desktop design that we usability tested in January 1993. This design was based on Cairo and incorporated a first pass at fixing some of the known problems with Windows 3.1 (window management in particular).
Figure 2: Early Windows 95 desktop (with callouts to enhance clarity).
The top icon, File Cabinet, showed a Windows 3.1 File Manager-type view (left pane shows hierarchy, right pane show contents). The second icon, World, showed items on the network. The third icon, Programs, was a folder which contained other folders full of links to programs on the computer. Along the bottom was the "Tray", which featured three buttons (System, Find, and Help) and a file storage area. Another icon, Wastebasket, was a container for deleted files.
The usability studies of the prototype desktop were conducted in the Microsoft usability lab, as were later tests. We conducted typical iterative usability studies. Three to four users representing each distinct group of interest (typically beginning and intermediate Windows 3.1 users) completed tasks which exercised the prototype. Questions we addressed in testing were sometimes very broad (e.g., "Do users like it?") and sometimes very specific (e.g., "After ten minutes of use, do users discover drag and drop to copy a file?"). We collected data typical for iterative studies: verbal protocols, time per task, number of errors, types of errors, and rating information.
Our usability testing of this prototype revealed much including several surprises:
Figure 3: File Cabinet, an early file system viewer.
From the first lab studies it became clear that we needed a baseline with Windows 3.1, to better understand what problems existed prior to Windows 95 and what problems were unique to the new design. First, we gathered market research data about Windows 3.1 users' twenty most-frequent tasks. We then conducted several lab studies comparing Windows 3.1 and Windows 95, focusing on the top twenty tasks derived from the market research data. We also interviewed professional Windows 3.1 (and Macintosh, for comparison) educators, to learn what they found easy and difficult to teach about the operating system.
The key findings were:
The results from these studies and interviews greatly changed the design of the Windows 95 UI. In the early Windows 95 prototype, we had purposefully changed some things from Windows 3.1 (e.g., the desktop was now a real container) but not others (e.g., File Manager and Program Manager-like icons on desktop) because we were afraid of going too far with the design. We were aware that creating a product which was radically different from Windows 3.1 could confuse and disappoint millions of existing users, which would clearly be unacceptable.
However, the data we collected with the Windows 95 prototype and with Windows 3.1 showed us that we couldn't continue down the current path. The results with beginning users on basic tasks were unacceptably poor and many intermediate users thought that Windows 95 was just different, not better.
We decided to step back and take a few days to think about the situation. The design team held an offsite retreat and reviewed all the data collected to date: baseline usability studies, interviews, market research, and product support information. As we discussed the data, we realized that we needed to focus on users' most-frequent tasks. We also realized that we had been focusing too much on consistency with Windows 3.1.
Essentially, we realized that a viable solution might not look or act like Windows 3.1 but would definitely provide enough value to be attractive for users of all levels, for potentially different reasons. We realized that a truly usable system would scale to the needs of different users: it would be easy to discover and learn yet would provide efficiency (through shortcuts and alternate methods) for more-experienced users.
As we started working on new designs, we hoped to avoid the classic "easy to learn but hard to use" paradox by always keeping in mind that the basic features of the UI must scale. To achieve this goal, we knew we needed to try many different ideas quickly, compare them, and iterate those which seemed most promising. To do this, we needed to make our design and evaluation processes very efficient.
Although we had opted for an iterative design approach from the beginning, one legacy of the waterfall design approach remained: the monolithic design specification ("spec"). During the first few months of the project, the spec had grown by leaps and bounds and reflected hundreds of person-hours of effort. However, due to the problems we found via user testing, the design documented in the spec was suddenly out of date. The team faced a major decision: spend weeks changing the spec to reflect the new ideas and lose valuable time for iterating or stop updating the spec and let the prototypes and code serve as a "living" spec.
After some debate, the team decided to take the latter approach. While this change made it somewhat more difficult for outside groups to keep track of what we were doing, it allowed us to iterate at top speed. The change also had an unexpected effect: it brought the whole team closer together because much of the spec existed in conversations and on white boards in people's offices. Many "hallway" conversations ensued and continued for the duration of the project.
To ensure that interested parties stayed informed about the design, we:
The first major design direction we investigated was a separate UI ("shell") for beginning users. The design was quickly mocked up in Visual Basic and tested in the usability lab. (See Figure 4.) While the design tested well, because it successfully constrained user actions to a very small set, we quickly began to see the limitations as more users were tested:
Figure 4: Partial view of separate shell for beginners.
For these reasons and others, we abandoned the idea. Importantly, because we used a prototyping tool and tested immediately in the usability lab, we still had plenty of time to investigate other directions.
Below are overviews of five areas where we designed and tested three or more major design iterations. There are many more areas for which there is not adequate space to discuss.
Figure 5: Start menu with Programs item open.
Figure 6: "Plate" visualization for minimized windows.
Figure 7: Task bar with Start button, programs, and clock.
Figure 8: Windows 3.1 File.Open dialog box.
Figure 9: Windows 95 File.Open dialog box.
Figure 10: Main Windows 3.1 printer setup dialog box.
Figure 11: Screen from Windows 95 Add Printer wizard.
Figure 12: Windows 3.1 Help.Search dialog.
Figure 13: Windows 95 Help.Index tab.
Once we had designed all of the major areas of the product, we realized that we had to take a step back and see how all of the pieces fit together. To accomplish this, we conducted summative lab tests and a longitudinal field study.
Throughout the course of designing and testing the Windows 95 UI, we applied various usability engineering principles and practices [2] [4]. With a project the size of Windows 95, we knew we needed a standard way to note all of the usability issues identified, record when and how they were to be fixed, and then close them once the fix was implemented and tested successfully with users.
We designed a relational database to meet this need. (See Figure 14.) After every phase of lab testing, I entered new problems as well as positive findings and assigned them to the appropriate owners-usually a designer and a user education person together. The status of existing problems was also updated-either left open if more work was needed or closed if solved. Every couple of weeks I ran a series of reports that printed all of the remaining problems, by owner, and distributed them to the team members. (See Figure 15.) We met to discuss progress on solutions and when the changed designs would be ready to test with users.
Figure 14: Sample tracking database record.
Figure 15: Sample tracking database report.
As with any project, the "proof is in the pudding" so sharing some summary statistics is in order.
We conducted sixty-four phases of lab testing, using 560 subjects. Fifty percent of the users were intermediate Windows 3.1 users; the rest were beginners, advanced users, and users of other operating systems. These numbers do not include testing done on components delivered to us by other teams (Exchange email client, fax software, etc.) Testing on those components accounts for approximately 25 phases and 175 users.
For the core shell components, 699 different "usability statements" were entered into the database during the project. Of that number, 148 were positive findings and 551 were problems. The problems were rated with one of three levels of severity:
Of the 551 problems identified, 15% were judged to be level 1, 43% level 2, and 42% level 3.
During the project, there were five types of resolution:
By the end of the project, all problems with resolution "planned" or "undecided" had migrated to one of the other categories. Eighty-one percent of the problems were resolved "Addressed", 8% were resolved "Somewhat", and 11% were resolved "Not Addressed". Most of the issues that were not addressed were due to a technical limitation, or sometimes a scheduling limitation.
The Windows 95 project was the first experience for many of the team members for doing iterative design, usability testing, and problem tracking.
Perhaps the best testament to our belief in iterative design is that literally no detail of the initial UI design for Windows 95 survived unchanged in the final product. At the beginning of the design process, we didn't envision the scope and volume of changes that we ended up making. Iterative design, using prototypes and the product as the spec, and our constant testing with users allowed us to explore many different solutions to problems quickly.
The design team became so used to iterating on a design that we felt rushed when, near the end of the project, we had to do some last-minute design work. There wasn't sufficient time to iterate more than once. We were disappointed that we didn't have time to continue fine tuning and re-testing the design.
The "prototype or code are the spec" approach overall worked well, although we naturally have refined the process over time. For example, all the prototypes for a given release of the product now reside in a common location on the network and include instructions for installing and running them.
The design team continues to write initial specification documents and circulate them for early feedback. Once prototyping and usability testing has begun, however, the spec often refers readers to the prototype for details. We have essentially found that the prototype is a richer type of specification, for less work, since it has other uses (usability testing, demos, etc.). A prototype also invites richer feedback, because the reviewer has to imagine less about how the system would work.
Although doing design and user testing iteratively allowed us to create usable task areas or features of the product, user testing the product holistically was key to polishing the fit between the pieces. As discussed previously, we made changes to wording in the UI and in Help topics based on the data collected. If we had not done this testing, users' overall experience with the product would have been less productive and enjoyable.
The high fix rate for usability problems would not have been possible without the intense dedication of all the team members. The tracking database made the whole process more manageable and ensured that issues didn't slip between the cracks. However, the fixes would not have been made if the team had not believed in making the most-usable product possible. Key to this belief was our understanding that we probably weren't going to get it right the first time and that not getting it right was as useful and interesting to creating a product as getting it right was.
In the tracking database, all of the issues marked "Somewhat" or "Not Addressed" were rolled over into a new database, as a starting point for design work on the next version of Windows. Product planners and designers worked with the information on a daily basis, as well as processing reports from product support.
Thanks to Jane Dailey, Chris Guzak, Francis Hogle, Marshall McClintock, Mark Malamud, Suzan Marashi, and Mark Simpson for reviewing this design briefing and providing comments. Thanks to Lauren Gallagher, Shawna Sandeno, and Jennifer Shetterly for graphic design assistance.