Finding cheap multi city flights with machine learning and constraint programming

Presto Trip is a multi city travel search engine

Let's start with a quick demo, and then I'll really get into the background of how and why we built it.



Why build a multi city travel search engine ?

Last year, my family tasked me with planning a trip to Greece and Croatia. And pretty quickly, planning this trip became a very time consuming and pain staking activity.

1st problem: Where are we going?

We started with a few places in Greece and a few in Croatia we’d want to visit.
Simple enough, right ?

We researched the places, and put together a few options. We found that even trying to get a basic price estimate for each option was problematic.

You have to get the route and dates right and maybe even consider more complex factors like taking a cheap flight in to one place, taking a bus to a nearby city, and then taking a connecting flight to a 3rd city on a given day. I ended up opening lots of tabs using the “Multi City” feature of various flight search websites.

2nd problem: How do we fit this vacation into each of our schedules?
As if finding the right route and dates wasn’t painful enough, the moment we started factoring in our schedules, things got even more complicated.

3rd problem: How do we structure the trip to get good value for money direct flights ?
We wanted to make the most out of our vacation. We didn’t want to be stuck waiting in an airport, so even when we did find flights, we still had to go look around and perhaps consider a completely different itinerary that could have gotten us cheaper direct flights.

4th problem: What places should we visit that we may not have considered ?
When you plan a trip, at times adding some places to your itinerary can make the trip both cheaper and easier and also opportunistically let you see interesting places you might have missed otherwise. In particular, our original plan didn’t consider spending a day in Athens, and that made our trip more expensive and have more layovers. After we added a day in Athens to our trip, our trip became both cheaper and easier in terms of direct flight connections. Any time I’d make the slightest of mistakes in picking a bad day or prices changed, I’d have to redo this work all over again, for many different possible itineraries. This was frustrating because it felt that these were exactly the kind of problems a computer can solve much better than a human. The cost of these errors was also quite high. Picking a bad route or flight meant a loss of a few hundred dollars or the loss of half a vacation day in layovers, multiply by that by the number of people going on the trip, and that adds up.

User research
The motivation for Presto Trip was rooted in trip planning being painful. There were many pain points planning that trip, though in general we felt that there were lots of substitutes for the organizational components of trip planning. But the one thing that there was no real solution to was just really easily being able to get the cheapest prices. Current travel sites make it easy to find flights on a given day. They don’t really make it easy to find the cheapest trip as explained above. We talked to many travelers and showed them mocks of a travel planning app and got them to stack rank various different pain points of travel when taking multi city trips.We found that though there were many different moving pieces to getting a trip right, saving on costs was what really excited travelers the most. This was aligned with our own instincts.
So for our first version, we decided to have a singular focus on saving users money -- all features would be in support of cost savings, such as controls around scheduling and picking direct flights.

Our first question: Could we save users money and if so how much ?
Before we even built out Presto Trip fully, the very first thing we did was create a bunch of sample evaluation itineraries. We wanted to be intellectually honest to ensure that the savings we produce would be meaningful, so we took all our own multi city itineraries from the last 10 years or so, and first built a prototype of the price optimization engine to see if we could optimize them.
We ended up pricing them manually by hand, and then looked into if we could generate itineraries better than the ones users would come up with. We had a hunch that we were producing better itineraries by hand, though we wanted to really quantify just how how much better the itineraries we produced if we produced them in an automated manner.
And to make the comparison rigorous, we spent an entire day trying to find the best prices by hand so it was a comparison against someone who was serious about getting a great itinerary.

Initial prototype:
For most flight search APIs, there are typically two types of APIs:

1. A fast API for approximate prices for an entire month: 

This is generally speaking (relatively) fast to query. This is similar to the data you see on other travel sites which just shows you the cheapest flight prices on a given day on a calendar like view. Some of this data can have gaps, based on the provider.

2. A slow API for exact prices for a given day: 
APIs to get exact prices for an exact selection of flights for a specific day. This is very expensive to query, highly rated limited, and is not feasible to use for really playing around with an itinerary. Since we consider many possible itineraries to select the best ones for our users, so that makes the slow APIs infeasible to use. So we use the fast approximate API that gives you a full month worth of predictions of cheapest prices for itineraries, and then the exact API when actually letting users book flights. First off, we need to compute routes for trips. To compute the best route for a given set of places in an itinerary, one needs data between all pairs of places in the itinerary.
In theory, finding routes and dates has to deal with the combinatoric explosion of possible routes and dates and should be computation bound, but in practice, fetching the data between all pairs of cities was the major bottleneck for the size of itineraries we were targeting ( i.e. say 3 - 10 places ).
For the first prototype, the slowness of fetching all this data was okay since we were just testing if we could do price optimization offline. We started by writing out our own optimizer which used a recursive search to find better itineraries for the early initial tests, and sure enough that showed the savings could be substantial - typically in the range of 35%. We felt that was a number that was quite compelling to work with which translated to $350 or so for a trip in the $1,000 range.
After we were aware of the extent of the price savings, we really started building out the optimization engine in a way that it would be something that could run fast in the context of a search request.

Generating model data for routing:
For an itinerary of 3 places, we would need to make 6 API calls - for an itinerary of 5 places,
this goes up to 20 API calls. At 700ms per API call, that made our product technically infeasible.
And really, since we felt the frustration of nobody having solved multi city flights well, we just wanted to make sure that we really did get a good principled solution that works for as many places as technically feasible. Even if we made this many calls in parallel to keep request latency low, we would quickly overwhelm  the rate limits of our API calls. Instead our approach was that we could use a machine learning model trained that would obviate the need to fetch data for all pairs of cities to the point that we could do route computation using entirely the model, and then only fetch pricing data for the route that we picked. We could run batched inference on the model this turns the giant wait to getting data for prices into code that that runs in near “constant” time - I say constant time in a somewhat imprecise manner,  since to be precise it does depend on the number of data points we need to infer, but for all practical purposes the code to compute predict approximate flight prices between all pairs of cities in an itinerary and a given time window is much faster than even a single API call.

Our goal was to build a regression model that would try to predict flight prices for a given day. We experimented with different losses, starting with mean squared error though later we switched to mean squared logarithmic error when we looked at the price data and found that the range of
prices could be quite large. In the past when working on Smart Reply, hyper parameter tuning was very useful for improving models and I did want to do some work to tune our models.
We found the simplest framework for this to be Keras Autotuner.  This also did mandate that we switch over to Tensorflow 2 beta well before it was released, We kept our serving stack to still be based off of Tensorflow v1, and had no problems just serving the models trained using v2.

Constraint programming based optimizer
For the prototype, we wrote a simple recursive optimizer by hand to pick the best dates. This code became rather complex and special cased - and most importantly, it was naively recursive and thus quite inefficient. At the same time, we knew based on our user research that we needed to support some rather rich scheduling options if we really did want to solve the kinds of common things that users would want for date control in terms of what we did to generate itineraries for them. So while the simple hand written recursive optimizer was fine for the prototype, it was time to switch to a constraint programming based optimizer. For a solver library, we looked around and thought the OR Tools library was a good fit. This switch shifted a lot of the burden from to preprocessing data and describing constraints in a way that we could call the solver on it.  All in all this switch to a constraint solver made the code to find the best dates more robust and faster. In many ways it added a lot more code than our hand written optimizer to perform searches, the nature of the code was a lot cleaner.  One trade off here is that at times if you have a bug in the rules you feed to a solver, you can end up with no solution and get no feedback on what went wrong, so you do have to be careful While we wrote unit tests for a lot of our code, the unit tests here in particular were especially useful.

Fusing model data and real data for picking which days to fly
We found a minimal loss in price quality switching from using real data to only using our model for routing. However, the model did not have the precision to be used for selecting dates. Additionally, for many rarer routes, the APIs to get dates for an entire month return very few results. This meant for getting routing right, at times we needed to compare two different routes for which there was only partial data available. Ultimately we went with fusing model predictions with real data for date picking.

At the same time, in many cases the data is not there and we know the model can help us correctly predict prices, and we can use that to decide what days are going to be cheaper to fly on. So what we do is we penalize model predictions by the average error we have on our evaluation set to bias prices we show in the UI to come from real data as much as possible and only fallback to the model when absolutely necessary. Thus in most cases, the solver just picks days when real data is available.

One question one might as is "but if you have an approximate price, doesn't that result in  a poor user experience as compared to an exact price?". The way we think of this is that have to use this as a starting point, and only while selecting flights and options do you get a real price even on existing travel websites. In our UI, model predictions show up with an “est.” next to them for the initial itinerary view, and since the prices coming from the model are penalized, in most cases the actual flight prices viewing an itinerary translates into a flight price that is slightly cheaper during actual flight selection. We’d much rather err on the side that a user sees a higher price upfront so they can trust the itineraries we show and think of it as a pessimistic estimate.

Showing alternative itineraries
After we had both the model and constraint solver setup, we found that we could find many of the
searches for even complex trips relatively quickly. We somehow also wanted to show users how unlike other travel apps, it’s often quite cheap to add more cities to an itinerary because we don’t add places to an itinerary in a simple manner such as just adding them to the start or the end - we find the best route and we find the best dates so when a city is added, both routes and dates could change.
To strike a balance between UI responsiveness and per request overheads, our web frontend fetches a few itineraries as a time. For one itinerary, we make data fetches proportional to the number of legs in an itinerary. For multiple itineraries, we have data fetches for the itinerary run in sub linear with respect to total itinerary legs time since we as compared to one base itinerary, most of the itinerary is the same.
For really large itinerary sizes, the computation rather than data fetching starts to dominate. We’ve keep the number of places in an itinerary to be at 10 total places, which is still quite generous and much higher than other travel websites, especially if you consider we are not just computing one optimized itinerary, we're showing you many of them simultaneously in the UI.
Additionally, using our models we can also spot and show you great deals for clusters of cities, and we surface these deals too. These are the most computationally complex to find, so we first load the main itinerary, then suggestions for single cities, and then finally clusters of cities. We don't get in the way of the user when we are computing these alternative itineraries, so they are
free to add more places or change other settings and do more searches.

User interface design:
We initially started building Presto Trip with a trip planner based UI, since at it’s core we felt we wanted to really engage users from when they are very early in the process of starting to plan a trip,
and current travel sites didn’t really have a great way to play around with an edit an itinerary. A bit further down the road, we came to the realization that planning applications were in some ways not what users thought of when it came to booking a trip.

Our choices were:
  1. A trip planner UI where which also had booking functionality like travel search engines.
  2. A travel search engine like UI which also had trip planning features.
We ultimately settled on having a travel search oriented UI, though still kept the features that we liked about a trip planner centric UI, especially a nice pretty map which updates with itinerary changes. One of our former coworkers we showed our product to even joked it reminded them of Muppets “travel by map”. We’ll take it. :)



Also we really wanted users to be able to see just how cheap it is to add more places to their itinerary,
and how sometimes you could just visit an entire cluster of cities for just a bit more than visiting one
city.

Sample itineraries ( current as of 10/08/2019 ):
Price savings are the bread and butter of our product. While we have a vastly larger evaluation
set we evaluate internally, here is one sample comparison of a trip from San Francisco to Europe
(Zurich, Paris, London, Munich, Basel, & Milan). For reference, a round trip flight to Zurich at the time of this writing alone was more than $700 ($775 on Google Flights).

Other than price quality, I would also suggest looking at trip structure: the Presto Trip version has far more direct flights, fewer layovers. By contrast, in some other cases, the trips suggested not only cost more and have more layovers, but also have intra city airport transfers. Also, in the case of the Presto Trip suggested itinerary, we could have even been cheaper, by about $30, but the bar that we've always used for our work is only suggest flights which we'd take.

Priced on 10/08/2019:

$601
$830
$851
$838
$3,675
$1,361

Priced on 10/15/2019:

$477
$832
$922
$902
$2,512
$1,881

You are welcome to benchmark us on price against any flight search engine of your choice on  multi city itineraries  (specifically, itineraries which have two or more destinations ) to convince yourself the price savings are real and this isn't a contrived example.

Tools used:

React:
We liked the idea of functional reactive programming with React. We used AngularJS for the last startup we worked on together, and in many ways React felt far more intuitive. In terms of frontend testing, we used Jest with mocks. We toyed around with the idea of using gRPC for the client to backend communication, though settled on just passing JSON objects back and forth for easier in browser debugging. Instead to get some of the typing benefits for gRPC, we came up with a scheme where we internally use protobufs on the server, and then convert it to JSON on the wire.

Our client uses a mixture of TypeScript and JavaScript. If we think of decomposing our UI into the model view controlled paradigm, all the "model" code is in TypeScript. That is to say any code that can run without any UI such as data processing and communicating with the server is strongly typed. On the client, we used protobuf.js and it’s associated utility pbts to generate type declarations for th e frontend, so even though we don’t use protobufs on the wire, the client can still treats has typing information based on the protobuf declarations, we can detect type errors in our code during development.

Keras + Tensorflow 2 + Keras Autotuner:
We use a machine learning model for optimizing your route, and powering our suggestion
features. We used Keras + Tensorflow 2 + Keras Tuner to build and tune these models.

Keras Tuner lets you specify what all trials you want to conduct and the number of trials you want, and it will then train many different models based on what hyper parameters for your model you want to tune and track the metrics you want for different trials. At the end of all the trials, you can see how different models do on various metrics, and you get a model configuration file you want use to fix
the hyperparameters of the model you actually train.

OR Tools:
We used OR tools for the constraint programming engine to power our scheduling features.
OR Tools in general worked well, though we did run into some issues along the way where (just like Tensorflow ) the core is written in C++. Some language bindings are at times undocumented and you really just have to read or step through C++ code to really understand some undocumented concepts or get some advanced features working that are present in the C++ core but might not be exposed in a language binding. That said, the OR tools community group was very helpful and we are grateful to them for answering some of the questions we had.

Testing:
We use absl-py + bazel for our server tests, and Jest + yarn for our front end unit tests.
Using bazel means we only rerun tests when actual dependencies that need a test have changed.
While bazel compiles protobuf declarations during the testing process, for the release process we manually have a script to do this to build our deployment container images. Before any release
all unit tests must pass for both the frontend and backend.

Closing thoughts:
By the time we built Presto Trip, I had already booked tickets to Greece and Croatia - but I hadn’t booked tickets back. After we got it working, I used Presto Trip for the return part of the trip. I was able to find great deals for cheap direct flights to Lisbon, Naples, Bilbao and Seville and by every possible metric ranging from absolute cost, cost per mile to number of direct flights, it was hands down far better than flights for the first half of my trip I had booked manually even though I booked the tickets relatively close to the dates of my flights.

I could really talk more about the code here, but I'll just let the pictures do most of the talking.




We hope Presto Trip can similarly help other travelers plan and book great vacations at great prices.

Comments