How Triplebyte solved its office Wi-Fi problems
By Mike Robbins on Sep 26, 2018
Our team just moved to a larger office in downtown San Francisco. On moving day, I was shocked to discover a bundle of rough-cut unterminated ethernet cables on one end, ripped-out punch-down jacks on the other, no uplink, and no Wi-Fi!
There’s no IT team at startups, and as software engineers, we might be called on to step up in a pinch. Here’s a smorgasbord of suggestions — some well-known and others obscure — that helped me get a reliable network running fast.
Use one SSID for automatic roaming
Multiple access points should share the same SSID. They must have exactly the same security settings (same password, exact same mode, i.e. WPA2-PSK Personal) for clients to be able to automatically roam between APs.
Multiple bands (2.4 GHz and 5 GHz) should also have the same SSID. Don’t put 5 GHz on its own SSID.
If you use one single SSID globally, clients will choose the best AP and channel to connect to automatically. If you use separate SSIDs but they’re within overlapping coverage range, users need to connect and disconnect manually. This is a bad user experience and will often lead to laptop users remaining marginally connected to an AP they’re barely within range of.
Also check out Apple’s docs on wireless roaming behavior for iOS and macOS.
Statically assign Wi-Fi channels
Automatic Wi-Fi channel selection is problematic because the AP can change channels whenever it likes, which causes a global disruption in service for connected clients.
Statically assign different access points to different channels.
Use a tool (such as the Wireless Diagnostics app built into macOS) to identify channels that aren’t in use by your next-door neighbors to avoid packet collisions and retransmits.
Use non-overlapping channels. On 2.4 GHz, this means you should only choose channels 1, 6, or 11. On 5 GHz, it’s a bit more complicated, but any channel number is OK and non-overlapping if you're following my advice about 20 MHz bandwidth and avoiding DFS frequencies below.
Use narrow bandwidths (20 MHz only)
If you’re in a large office building, force your AP radios to use only 20 MHz bandwidth on both 2.4 GHz and 5 GHz bands. (If you’re in a more suburban or rural setting and have few other transmitters in radio range, you may be able to increase this.)
Wider radio bandwidths of 40, 80, or 160 MHz definitely increase the maximum theoretical throughput — for example if you have only a single client and a single AP in a patch of rural farmland. But in a real-world noisy radio environment, a wider bandwidth also increases the probability of a radio packet loss, because a collision with a competing transmission on any sub-channel corrupts the entire signal — whether due to random noise or interference from another transmitter.
Collisions are incredibly expensive as both transmitters have to back off and retry sending the packet again. In a noisy environment it may take a while before the wider bandwidth channel is entirely clear again, and devices may spend yet more time negotiating their channel bandwidths up and down dynamically.
Another way to think about this is in terms of signal-to-noise ratio (SNR). Using a wider bandwidth spreads the same signal power over a wider slice of spectrum, so there's lower power spectral density (W/Hz), and higher total noise integrated over the wider frequency band.
Optimize your user experience for minimal packet loss, rather than maximum theoretical throughput.
Avoid DFS channels
On the 5 GHz band, there are a set of extra channels which are made available using Dynamic Frequency Selection (DFS). This is a complicated process that involves clients listening for whether weather radars are using those frequencies, and changing channels if they are.
The problem is that, by law, any detected interference must trigger the AP to stop transmitting immediately and go to a listen-only mode to find a new, unused channel. Obviously, this silencing behavior provides for bad user experience.
To compound this problem, DFS channels may not be clearly marked on your AP’s configuration! The macOS Wireless Diagnostics tool shows the DFS status in the Info pane.
In summary, just avoid using channels 52 through 144 on the 5 GHz band.
Learn to crimp and punch ethernet
I found existing ethernet cables running through the walls and floors, but they were all unterminated on both ends.
For male cable ends: Buy an RJ-45 crimping tool and a bag of connectors. Cleanly cut the cable end. Trim away about 3/4” of the outer jacket without nicking the wires inside. Un-twist the twisted pairs and straighten the wires. Order them in a row in the correct order (orange-white, orange, green-white, blue, blue-white, green, brown-white, brown) with the wires facing toward the ceiling. Trim them all flush in a single cut. Slide the plug onto the wires with the latching tab facing away from you until the wires are flush with the end of the plug. Insert into the crimping tool, and crimp firmly. Double check the color ordering, that all pins are all seated puncturing down through the wire insulation, that the strain relief is on the cable jacket, and that the pins and plastic are undamaged.
For female keystone jacks: Trim away the outer jacket. Un-twist the twisted pairs. Bend them at approximately a 90 degree angle to reach the appropriate teeth on the left or right sides of the jack. Use a punch-down tool (or even a small flat-head screwdriver) to push each wire down firmly until the teeth engage, puncturing through the insulation and grabbing the wire.
Both of these skills take some physical practice and finesse. Don’t expect to produce working cables on your first try.
Always use EIA-586-B color coding
If you search for the specific mapping of wire colors to pins for ethernet cables, you’ll find two color-code standards: EIA-568-A and EIA-568-B. These color codes are also usually marked on female keystone jacks.
Always use B. It’s the de-facto standard. Never use A unless you’re intentionally making a cross-over cable, which is effectively deprecated in today’s gigabit Auto MDI-X world, which automatically crosses over when needed within the device.
Definitely don’t invent your own color ordering. It’s important for signal integrity to keep particular twisted wire pairs together.
Always test for gigabit
Ethernet ports autodetect whether to use 100 or 1000 Mbps when they’re first connected. Counting both ends of a cable, a working 100BASE-T (100 Mbps) connection requires only 8 of the 16 physical connections to be made successfully. A working 1000BASE-T (gigabit) connection requires all 16 of 16!
If you’re new to making cables — or even if you’re experienced — it’s easy to have 1 not-quite-right connection every now and then. If you don’t explicitly test, you’ll end up with a seems-OK-but-actually-degraded user experience.
You can check this with something like ifconfig | grep media
, or you can look at the LEDs on some ethernet switches.
Note: if you have wireless clients connecting to an AP that seem to be maxing out internet speed tests almost exactly at the ~90-100 Mbps level, you probably have a bad ethernet connection to the AP. I'm speaking from experience on this one!
Identify power users and provide a wired option
I identified two types of internet power users at Triplebyte:
On one end of the spectrum, our engineers are power users with large bulk downloads — tens-of-gigabyte downloads are typical. Throughput is more important than latency.
On the other end, we have several teams with demand for low-latency interactive voice and video conferencing. Our customer success and sales teams communicate with companies hiring engineers. Our talent managers communicate with engineers going through our process. Our technical interviewers communicates with engineers signing up at the start of our process. All of these teams use software like Google Hangouts, Twilio (SIP), and Zoom. Latency and packet loss are far more important than throughput.
Make wired gigabit ethernet easily available for power users. In my case, this involved wiring up every engineer’s desk plus all 11 conference and call rooms and distributing a stockpile of USB-C gigabit ethernet adapters. This empowered power users to guarantee a trouble-free network connection for themselves, and further removes high-priority traffic from the shared radio bands.
Physically protect networking equipment
Our fiber-to-ethernet transceiver, primary router, and gigabit ethernet switch are located in a cabinet that looks incredibly tempting to use for general office storage. It’s easy to imagine office supplies and extra company swag piling up on top!
Post signs and notify people that this is fragile, important equipment — not a general storage area.
Use static IPs for infrastructure
Assign static IPs for infrastructure like access points. This makes them easy to reach when reconfiguration is needed, and avoids them having to pull their own addresses dynamically.
Have a large enough DHCP pool
We currently have a full-time team of 30, plus a rotating crew of remote technical interviewers who come in to our office for a few weeks of training before they go home to conduct interviews for us remotely. I expect each person to have at least two devices: laptop and phone. That's 60-80 IPs right off the bat.
Many routers are configured with a fairly small IP pool allocated to DHCP out of the box. This will cause issues if the DHCP pool is exhausted. I configured our primary router to reserve 200 IPs for DHCP, leaving us about ~50 for static IP assignments. We use a 1-day DHCP lease time so unused addresses can be returned to the pool fairly quickly, while not requiring DHCP renewals during the workday.
If you have multiple APs, make sure only a single device (usually your primary router which is doing NAT) is configured as a DHCP server.
Use fast DNS servers
When wired or wireless clients request an IP address via DHCP, they're also provided with DNS servers to use.
I found Cloudflare's 1.1.1.1 and 1.0.0.1 provided the fastest query resolution at about 2ms, versus about 20ms for Google's 8.8.8.8 and 8.8.4.4, so I used these in the DHCP server configuration.
Another option is to use your primary router as a caching DNS resolver, and then point all clients to it. On paper this makes a lot of sense as it can locally cache for faster performance. However, the public DNS servers are so fast that I decided to have each client use them directly, avoiding any potential local caching issues.
Check yourself on GeoIP
Our new public IP address doesn't have our San Francisco location in the GeoIP database — it just says “United States.” I suspect that geographic-based DNS lookups and CDNs are giving us worse-than-optimal performance because of this. We requested a GeoIP database correction, but it hasn't been accepted yet.
This issue also led to tools like Speedtest connecting to the wrong test servers many states away and severely underestimating performance.
Speed up S3 downloads with multiple parallel connections
I regularly download datasets from S3 which are single files that are gigabytes to tens of gigabytes. I found that single stream downloads from S3 are fairly slow — nowhere close to maximizing our downlink. The download can be split into multiple segments, downloaded concurrently, and reassembled, making the overall download substantially faster.
For example, this downloads with 16 streams in parallel (brew install aria2
to install):
aria2c -x 16 -s 16 -k 4M -o ${OUTPUT_FILENAME} ${DOWNLOAD_S3_URL}
Label and document everything
While the previous office tenants left me with dozens of hastily cut ethernet cables coming out from a wall panel, I'm thankful that the cables were individually numbered and labeled with permanent marker — phew!
I created a shared Quip document detailing configuration settings for all APs and routers, screenshots, static IP assignments, wireless channel map, and upstream provider support contact info.
The results
For throughput: the Speedtest numbers speak for themselves. I see a symmetric 940 Mbps on wired and generally 100-150 Mbps wireless over our entire office.
For reliability: we definitely had issues the first few days, but after debugging a few issues described above (DFS channels, 20 MHz radio bandwidth, and one AP on a non-gigabit ethernet cable), I'm happy to report that the network is basically trouble-free — i.e. no more complaints from the sales team!
I highly recommend the High Density Wi-Fi Deployment Guide from Meraki if you're interested in reading more.
If you're a software engineer interested in seeing offers from top tech companies that already have working Wi-Fi, you should take the Triplebyte quiz — and if you care about compensation, see our software engineer salary data.