But in a series of recent papers, researchers at MIT’s Laboratory for Information and Decision Systems (LIDS) have shown that, in a number of different contexts, a little versatility can go a long way. Their theoretical analyses could have implications for operations management, cloud computing — and possibly even health-care delivery and manufacturing.
Take, for instance, a product-support call center for a company with a wide range of products. It’s far more cost-effective to train each customer service representative on the technical specifications of a single product than on all the products. But what if a bunch of calls about a product come in, and the center doesn’t have enough specialists to field them?
Kuang Xu, a graduate student in the Department of Electrical Engineering and Computer Science, and his advisor, John Tsitsiklis, the Clarence J. Lebel Professor of Electrical Engineering, model such problems as requests arriving at a bank of network servers. In a 2012 paper in the journal Stochastic Systems, they showed that if a small percentage of servers — or call-center reps — can field any type of request, the result is an exponential reduction in delays. (More technically, the rate at which delays increase with the volume of requests is exponentially lower.)
That paper won the annual award for best student paper from the Institute for Operations Research and the Management Sciences, the largest professional society in operations research. But “that model is not actually very relevant for, let’s say, a call center,” Xu says. It would be infeasible, Xu explains, for a company with, say, 200 products to train even 10 percent of its call-center employees on all of them.
Mix and match
So this summer, at the annual meeting of the Association for Computing Machinery’s Special Interest Group on Performance Evaluation (Sigmetrics), Xu and Tsitsiklis will present a follow-up paper in which all the servers in the network (or reps at the call center) have just a little versatility. “The second paper was motivated by, ‘Well, what is a more natural and more powerful kind of flexible structure?’” Xu explains.
In that scenario, too, the LIDS researchers find that versatility pays dividends. The specifics vary with the number of types of requests and the number of servers, but in a wide range of cases where specialized servers will incur delays, servers that can each handle a few different types of requests — approximately the logarithm of the total number of request types — can reduce delay time to near zero.
That scenario is, as Xu says, a more realistic model of the kind of expertise that could be expected from call-center reps. But it’s also a more realistic model of how to distribute information among servers in a Web services company’s data center.
On its own, the versatility of the individual servers is no guarantee of short wait times. “You also need a clever scheduling algorithm,” Tsitsiklis says. “That’s the hard part.” In particular, the LIDS researchers showed, the system has to wait for an adequate number of requests to come in before parceling them out to servers. That number is not large: For a bank of 1,000 servers, it might be around 50, which would take a fraction of a second to arrive at a large Web services site.
But that slight delay is necessary to ensure that all the requests can be handled in parallel. If the first 25 requests are parceled out as they arrive, there’s some chance that the only servers that can handle the 26th will already be busy when it arrives. But waiting for 50 provides strong statistical guarantees that a server can be found for every request. A minuscule delay at the outset insures against longer delays later.
The road ahead
Xu and Tsitsiklis are currently pursuing this line of research down several different avenues. In the Sigmetrics paper, the assignment of different types of tasks to different servers is random; consequently, the performance guarantees are probabilistic. There’s still a chance, no matter how tiny, that tasks will be assigned in a way that introduce delays. The LIDS researchers are currently working on a deterministic version of their algorithm, meaning that tasks are assigned according to a regular pattern, offering stronger performance guarantees.
They’re also exploring variations on their model in which the flexibility is not on the supply side — the servers — but the demand side — the requests. They haven’t validated the model yet, but there’s some evidence that a variation of their algorithm could be used to assign scarce health-care resources to, say, patients in an emergency room, some of whom might be able to wait longer than others before receiving treatment.
“The topic of flexibility has been explored in various directions,” says Ton Dieker, an assistant professor at Georgia Tech’s Algorithms and Randomness Center and Thinktank. Indeed, the classic literature on flexibility in manufacturing systems includes several papers by David Simchi-Levi, of the MIT Department of Civil and Environmental Engineering, and the MIT Sloan School of Management’s Stephen Graves.
What differentiates the LIDS researchers’ work is that “they say something about what happens to the performance of these systems with flexibility if the systems get very large — and we are in the era of large systems,” Dieker says. “What is interesting there is that in these large systems, even a little bit [of flexibility] helps a lot.”
The application of the researchers’ work to computer systems is obvious, Dieker says. But, he adds, “this is fundamental work, so it might find later applications elsewhere, as well.”