Inside AI – If I Had a Hammer: Skydio’s AI, Deep Learning and other Autonomous Solutions Let Users Concentrate on the Mission

Hayk Martiros is VP of Autonomy at Skydio, which in 2021 became the first U.S. drone manufacturer to exceed $1 billion in value. One of the company’s first employees after it launched in 2014, Martiros now leads 50 researchers and engineers in developing and deploying AI and other autonomy tools for precise video capture, infrastructure inspection and situational awareness. His goal is to use robust “algorithmic approaches to visual autonomy to enable widespread impact of drones as trustworthy and intelligent tools.” He also is the “initial creator” of Symforce.org, an open source library for symbolic computation, code generation and nonlinear optimization in robotics that went public earlier this year.

Over time, Skydio has been described in different ways: a camera-carrying flying robot; a drone services company; today’s largest U.S. drone manufacturer. Hayk Martiros, VP of autonomy for the San Mateo, California-based company, mentioned each of these representations. But he added an intriguing duo.

‘A hammer.” Or, if you prefer, “a wrench.”

Skydio’s concept is to take the human pilot out of flying, synthesizing high-tech AI, deep learning and the like so users can concentrate on the tasks at hand. “The core premise,” Martiros said, “is that we make the drones and the autonomy, and the systems around that do things easier and safer and better. We want to cross the gap into really mainstream use, where it’s just a tool. That’s what it means to really succeed. It sounds undramatic, but it’s actually the secret sauce.”

Skydio’s 3DScan™ creates an autonomous “mesh” to build a high resolution model out of thousands of photos.

TECH FROM THE START

Skydio initially focused on consumer cinematography—“a Hollywood film crew in your backpack,” as Martiros put it. The initial idea was to build “an intelligent and trustworthy flying robot that has a camera, and that uses vision-based autonomy. Those cameras let you see and understand the world, build 3D maps of it, and then plan actions in a way that makes tasks, flying and automated stuff easier. We ended up building the hardware ourselves, as well as the autonomy and the sensors, and just vertically integrating everything to work well together.”

Intrinsic hurdles had to be overcome to field competent sUAS such as Skydio’s X2 variants, which are optimized for tasks such as situational awareness and infrastructure inspection. “A drone is a really complex product,” Martiros explained. “It needs to be optimized for basically every axis. You really care about size, weight, battery life, noise, cost, safety. It has to be stable; it has to do its job. And getting information from images automatically is very difficult.”

He voiced another challenge. “Every aspect to make a mass-produced effective drone that is autonomous—supply chain, manufacturing, hardware design, electronic design of the sensors, operating system, high-level algorithms, APIs, mobile apps—it crosses a vast stretch of approaches.”

Enter breakthrough technologies such as AI, computer vision and robotics. “The field of computer vision AI has developed tremendously in the last decade,” Martiros said. “I think we kind of grew along with that. To learn from data, and improve by seeing failures and feeding that into training, an algorithm is incredibly powerful. But there are a lot of systems around it for geometric processing, nonlinear optimization, the design of an architecture of a robot that’s interacting with humans and supposed to be intelligent and trustworthy, but also reliable, stable, understandable, secure. So it is really intertwined.”

A foundational solution that emerged is Skydio Autonomy™, which rapidly uploads more than a million data points a second to create 3D models of the surroundings. “We’ve optimized a tremendous amount of our workflows and process around, ‘How do we make that loop quicker?’ ‘How do we move fast?’ The ability of working in simulation, taking an idea, scaling it out, testing it on a large amount of log data, real flight testing, metrics—that workflow is really, really key. For example, we were the first to use deep learning for an obstacle avoidance system, and that took probably four or five of our best people, about two years really dedicated heads-down effort.”

Skydio, Martiros reiterated, has benefitted by integrating autonomy from the beginning. “It’s really hard to take an existing robot or a platform and make it more autonomous. The entirety of our drones are designed from the very first concept to support fully autonomous operation.”

Skydio’s autonomy tools allow its sUAS drones to perform ultra-close inspections.

DEPLOYING THE TECHNOLOGY

Per its cinematographic roots, Skydio’s approach begins with the cameras. “There are six of them, three on top and three on the bottom,” Martiros noted. “Each has a 200-degree field of view, so each can see more than an entire hemisphere. And that gives kind of 360-degree coverage in space, and then in time as well.

“You basically have to start cobbling together an understanding of the world from that data.”

This involves two threads. “One is the geometric thread, where basically you’re comparing pixels between the images and finding matches—you know, row 100, column 100 is the same thing in the world as pixel 150-50.”

Getting beyond that requires resolving a computational issue. “They vary because of position and timing, because the drone is moving. That’s the most fundamental problem. We use a lot of deep networks for this task—if you can find a lot of those matches, you can start triangulating the location of things. So, the SLAM simultaneous localization and mapping of the process over a whole bunch of those matches and a whole bunch of camera images—you’re solving for the geometry of the 3D world as well as your own motion.”

Enter deep networks. “A lot of deep networks that are finding these matches between images, they’re trained on a combination of synthetic and real data to accomplish this task in very difficult environments, such as bright sun glare. Our cameras are rolling shutter, which means that as you expose the image, every row is taken at a different time. So, if the drone’s rotating quickly, you get this ‘whirly’ effect. You really have to model and account for that to understand 3D correctly, so there’s a lot of deep learning, a lot of geometric modeling code and nonlinear optimization code, and volumetric processing kernels that all work together to support these tasks.”

Skydio has expanded beyond its consumer roots to provide solutions for enterprise, public safety and defense users.

CHOOSING COMPONENTS WISELY

The ability to use automation technology has benefitted from others’ advances in GPUs and miniaturization. “Part of the AI revolution that’s supported all this is just bringing down the cost,” Martiros said. “You see GPT models, large language models, be trained on massive GPU clusters because the cost has come down. Nvidia and Qualcomm, specifically, just massively invested in this, so we’re riding that wave.”

For Martiros, developments in the software ecosystem have been even more important than hardware progress. “How flexible the whole software ecosystem around these platforms is really key to saying, ‘Here’s how our team can move faster, iterate our algorithms, update changes.’ The really nice thing about deep learning specifically is it’s made up of some really common modular components. People stack up deep networks with layers and layers of these, and you can build a lot of creative, interesting things. And they run really fast, because each of those modular blocks has been accelerated dramatically in the hardware.”

Skydio’s goal is to merge sophisticated autonomy with simplicity of use.

BRIDGING THE GAP

Martiros offered an anecdote to illustrate autonomy in action.

“Put yourself in the shoes of someone who works at, say, the New York Department of Transportation. You’ve got a bridge out near Albany. It’s a pretty big bridge, a lot of concrete, and it needs to get inspected. So, you need imagery of the underside, over a river, detailed enough that you can look for millimeter-sized cracks.”

Pre-drone, Martiros said, there were “a lot of bad options”—descending using traffic-stopping bucket trucks or photographing upward from a boat. “The ability of a small electric drone that can fly in a GPS-denied environment like we can, and get up close and personal to the bridge, is just so much better of an experience.” That said, Martiros reiterated a foundational need—”to trust the drone isn’t going to crash. That’s why we say we have the best obstacle avoidance in the world. You’re controlling the camera; you’re not controlling a robot. You want all these things integrated together, effortless, so the camera doesn’t crash.”

Skydio also aims to automate collecting voluminous photos through 3D Scan™ to generate high resolution 3D models.

“It’s one of my favorite things I ever worked on,” Martiros said about the product. “The basic premise is that you not only don’t have to be a drone pilot expert, you don’t have to be a photogrammetry expert.” First, the drone makes a low resolution 3D model. “Then you specify the high-level parameters you want. You set these pillars in AR [augmented reality] so on your screen you see the AR volume you’re drawing, you drop a few of them around, and then you set a floor and a ceiling and say, ‘Go.’

“The drone zips around and it’s building a mesh. With the ‘visual observer view,’ the drone takes a photo at some angle and you’re viewing that photo as if you had a camera up there. The drone basically aims to capture those thousands of photos totally on its own, because it knows what makes a good 3D model.” This “tailor-made flight plan, he said, is superior to a 2D lawnmower pattern that won’t capture the “sides.”

The result realizes Skydio’s ironic goal of using complex tech to simplify tasks. “The pilot isn’t on the sticks, trying to remember what’s happening,” Martiros said. “The more autonomy you have, the more you’re focused on the mission.”

PROCESSING THE DATA

Skydio’s visual navigation system yields quantities of metadata that verify where the photos were taken, which makes it easier to organize later. Skydio Cloud Media Sync automatically transfers, configures and manages uploads to the cloud over WiFI, where options exist. “You can pull that data, or feed it into a photogrammetry engine, build a high resolution 3D model and then pull that data,” Martiros said. “Increasingly in our cloud, you’ll be able to review results yourself and plan new missions.”

This is vital, because, as Martiros noted, drone use includes “an unbelievable amount of edge cases”—such as being under that proverbial bridge and it’s getting really dark and you’re flying into a cloud. “One aspect is just having a really robust idea of all the things that can go wrong with the drone, and the reasonable fallbacks. The tricky thing about the fallbacks is the way you need to respond varies by the use case: stay still and wait for the user to intervene, or backtrack a certain way.” That can be a major design challenge. “From the algorithmic perspective, if we’ve got an issue and we fixed it, how do we know we didn’t make five other things worse? Having a benchmark, scenarios and real logs without a lot of manual effort feels like we’ve got our bases covered.”

All this returned Martiros to his axiom of simplicity through autonomy. “If you have a way to tell it a mission that can do a bunch of different things—inspection and mapping, and also responding to a 9-11 call—then the drone is not a specific-purpose thing; it’s a really powerful general purpose tool.

“Our goal is to bring autonomy to it in the maximum way we can.”

AN ARRAY OF PRODUCTS

Martiros noted additional products Skydio deploys to realize efficient autonomy

Autonomy Core/Autonomy Enterprise Foundation provide a base layer of helper tools, including 360-degree obstacle avoidance (Core) and superzoom/close proximity avoidance (Foundation). “We’ve got these autonomous workflows where you’re making sure it does the right thing but you’re not on the hook to be controlling the drone the whole time, while visually navigating without GPS and capturing the data you’re looking for.

“For example, normally there’s about a one-meter obstacle bubble around the drone. But sometimes you need to get in a tighter space, so we allow reducing that a couple of levels. And when you really need it, like an active shooter scenario and you have to get through this narrow doorway, you bring it down even more. Its tools are just helping the pilot.”

Skydio Remote Ops offers real-time visibility.

“It’s kind of the foundation of our cloud system,” Skydio Cloud, which connects with Skydio 2 and X2 drones to provide real-time telemetry, photos and 3D Scan models. “That kind of teleoperation technology is really powerful,” Martiros said, “especially when combined with docking stations. You can basically livestream, which is tremendously useful for a lot of different use cases. For search and rescue, it makes a ton of sense, because you want people viewing this video and actively providing feedback.”

Return to home (RTH) is a fallback/health monitoring system.

“When something goes wrong, there’s a lot of different ways of getting back: go to a safe point, backtrack, use GPS, sometimes use visual navigation to go in a straight line. The kind of configuration of that system and its reliability is really key to making this thing work.”

KeyFrame™ is consumer-centric but is also used by commercial customers.

“One really sweet product is called KeyFrame. As you go, you set these waypoints, and then the keyframes just basically create a smooth spline [function around polynomials] like a Hollywood dolly shot, that interpolates those views. Then you can control it—‘Here’s the speed I want it to go’—and it can go forward or backward on this spline, and do all kinds of interesting tweaks. You don’t have to be a good pilot, you just have to get to the waypoints once, and then you can get this amazing cinematic video.”

The Skydio Dock™ offers continuous inspection and mapping.

Skydio Dock™ provides continuing inspection and monitoring; Martiros described it as Skydio’s next step along “the Arc of Autonomy.”

“You’ve got this drone installed at many different infrastructure sites. They can run on a schedule, and can build 3D maps and send data to the cloud. You’re getting this maximal kind of situational awareness, and you can say, ‘I want a map of this area every day,’ or ‘I want to look for changes that happened here.’ When our ecosystem of hardware and Dock products come together in our cloud, it will be a really powerful, dynamic system, and you can project more refined operations as a result of what you’re learning this time around.”

All images courtesy of Skydio.