DeepDrive
self-driving car AI
Motivation
Self-driving cars are poised to become the most impactful
technology of the 21st century.
Yet, to work on them, you must be in one of the few companies
who have taken on the task.
Even then, you would be working in isolation from
other groups doing the same thing.
This does not have to be the case.
Linux,
PostGres,
and
ROS
are just a few examples of a different approach --
where companies, researchers, and individuals work together on
foundational technology
to provide a common platform for bringing better products to
market.
DeepDrive aims to be such a platform for self-driving cars
powered by deep learning.
By combining a highly realistic driving simulation
with hooks for perception and control,
DeepDrive gives anyone with a computer the opportunity to build
better self-driving cars.
The current generation of self-driving cars struggles in an
urban setting at speeds above 25mph,
requiring people to be ever-ready at the wheel.
This problem is aggravated by the high risk of getting into an
accident
and the resultant limitation of testing new approaches on real
roads.
Modern video-games like GTAV, however, present a world
where you can test self-driving car AI’s in large complex urban
areas replete with
realistic roads, weather, pedestrians, cyclists, and vehicles
with zero risk or cost.
Another important advantage of this type of simulation
is that you can quickly run the car through a barrage of
safety critical situations,
some of which may only occur every several million
miles in reality. This dramatically decreases the time involved
in properly vetting cars while giving transparency to the
decisions different types of AI's will make.
The high fidelity and vast open world provided by
GTAV also presents the most complex virtual
RL environment to date for testing sensorimotor AI.
This allows a new level of
testing for safety in AI as existing environments don’t offer
the same opportunities for
reward hacking,
distributional shift,
and negative side effects.
Finally, developing self-driving cars
out in the open provides a level of transparency not usually
seen in AI applications and in an area where it is crucial to
have visibility into both the safety and correctness of the
system.
Baseline model
An initial 8-layer neural net with the AlexNet architecture
is being made available as well as the dataset it was trained
on.
Training was done on raw image input from a
forward mounted camera regressed
against steering, throttle, yaw, and forward-speed control
values produced by an in-game AI.
This model is able to steer the car to stay in the lane,
stop for other cars, and works well in a variety of weather
and lighting conditions as shown below:
Demo
Download
Work is in progress on an amazon machine image as well as
integration with
OpenAI Gym to facilitate easy experimentation in
the simulator.
Please let me know what other types of setups and environments
you'd like support for at craigdeepdrive.io.
Tips and Tricks
- Adding examples of course correction to the training data is crucial. NVIDIA does this by simulating rotation of real-world images, and ALVINN created its own simulation for adding variety to its training. Since we are already in simulation, course correction can be added by stopping recording, steering the car off course, and recording actions and images taken during course correction. This was done at three levels of severity. Levels one and two consisted of driving the car with a previous model for one and two seconds respectively, then recording corrective actions taken by the in-game AI. The most severe level consisted of performing a random action (hard right, hard left, brake, or strong accelerate) and relinquishing control to the in-game AI after 230ms.
- Training takes less than one day on a GeForce 980 starting from pretrained imagenet weights using 0.002 as the base learning rate and decaying by 10x when performance plateaus. See the model for all hyperparameters used.
- Don’t mirror images without mirroring targets. Horizontally asymmetric targets like steering, rotational velocity, and direction need to be negated in order to correlate with mirrored images. Mirroring was not done in the initial release, but is likely a good direction for improved generalization.
- Predicting steering, speed, etc.. ahead-of-time works by adding future targets to the last fully connected layer, but causes more overfitting (e.g. 2x worst test performance with 3 frames, or 775ms, of advance data). So if you predict the future, it's probably a good idea to add more regularization. Only the current desired state of the vehicle is output by the intial model, however support for arbitrary prediction duration is provided.
- Feeding the current steering, speed, etc… as input to the net causes overreliance on these inputs vs. the images, so follow advice from ALVINN and feed random values to these units during early stages of training so that the image garners more influence over the output. This was also not done in the initial release, however examples are provided on how to do so.
- DeepDriving at Princeton suggests that detecting lane markers and other cars for determining the desired steering and throttle is a more tractable strategy than directly inferring steering and throttle from the image. It’s possible to get lane and nearby car location from GTAV for this approach although it is not yet implemented.
- The car currently has no preference for turning left or right and will wobble until a certain direction is obvious. To combat this, adversarial manipulation of input pixels could be used to perturbate activations towards a desired duration.
- Reinforcement learning is a very promising direction for future improvement. A DQN was attempted before supervised learning, but the agent developed unsatisfactorily circuitous paths to a goal given on-road/off-road and distance based rewards. Using supervised pre-training and more precise rewards will likely improve this.