Google Research Blog
The latest news from Research at Google
MobileNets: Open-Source Models for Efficient On-Device Vision
Wednesday, June 14, 2017
Posted by Andrew G. Howard, Senior Software Engineer and Menglong Zhu, Software Engineer
(Cross-posted on the
Google Open Source Blog
)
Deep learning has fueled tremendous progress in the field of computer vision in recent years, with neural networks repeatedly pushing the
frontier of visual recognition technology
. While many of those technologies such as object, landmark, logo and text recognition are provided for internet-connected devices through the
Cloud Vision API
, we believe that the ever-increasing computational power of mobile devices can enable the delivery of these technologies into the hands of our users, anytime, anywhere, regardless of internet connection. However, visual recognition for on device and embedded applications poses many challenges — models must run quickly with high accuracy in a resource-constrained environment making use of limited computation, power and space.
Today we are pleased to announce the release of
MobileNets
, a family of
mobile-first
computer vision models for
TensorFlow
, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as
Inception
, are used.
Example use cases include detection, fine-grain classification, attributes and geo-localization.
This release contains the model definition for MobileNets in TensorFlow using
TF-Slim
, as well as 16 pre-trained
ImageNet
classification checkpoints for use in mobile projects of all sizes. The models can be run efficiently on mobile devices with
TensorFlow Mobile
.
Model Checkpoint
Million MACs
Million Parameters
Top-1 Accuracy
Top-5 Accuracy
MobileNet_v1_1.0_224
569
4.24
70.7
89.5
MobileNet_v1_1.0_192
418
4.24
69.3
88.9
MobileNet_v1_1.0_160
291
4.24
67.2
87.5
MobileNet_v1_1.0_128
186
4.24
64.1
85.3
MobileNet_v1_0.75_224
317
2.59
68.4
88.2
MobileNet_v1_0.75_192
233
2.59
67.4
87.3
MobileNet_v1_0.75_160
162
2.59
65.2
86.1
MobileNet_v1_0.75_128
104
2.59
61.8
83.6
MobileNet_v1_0.50_224
150
1.34
64.0
85.4
MobileNet_v1_0.50_192
110
1.34
62.1
84.0
MobileNet_v1_0.50_160
77
1.34
59.9
82.5
MobileNet_v1_0.50_128
49
1.34
56.2
79.6
MobileNet_v1_0.25_224
41
0.47
50.6
75.0
MobileNet_v1_0.25_192
34
0.47
49.0
73.6
MobileNet_v1_0.25_160
21
0.47
46.0
70.7
MobileNet_v1_0.25_128
14
0.47
41.3
66.2
Choose the right MobileNet model to fit your latency and size budget. The size of the network in memory and on disk is proportional to the number of parameters. The latency and power usage of the network scales with the number of Multiply-Accumulates (MACs) which measures the number of fused Multiplication and Addition operations. Top-1 and Top-5 accuracies are measured on the
ILSVRC dataset
.
We are excited to share MobileNets with the open-source community. Information for getting started can be found at the
TensorFlow-Slim Image Classification Library
. To learn how to run models on-device please go to
TensorFlow Mobile
. You can read more about the technical details of MobileNets in our paper,
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
.
Acknowledgements
MobileNets were made possible with the hard work of many engineers and researchers throughout Google. Specifically we would like to thank:
Core Contributors:
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
Special thanks to:
Benoit Jacob, Skirmantas Kligys, George Papandreou, Liang-Chieh Chen, Derek Chow, Sergio Guadarrama, Jonathan Huang, Andre Hentz, Pete Warden
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PhotoScan
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.