# Papers and reports

##### Lecture Slides and Videos

**Lecture slides for a course in Reinforcement Learning and Optimal Control (January 8-February 21, 2019), at Arizona State University:**

Slides-Lecture 1, Slides-Lecture 2, Slides-Lecture 3, Slides-Lecture 4, Slides-Lecture 5, Slides-Lecture 6, Slides-Lecture 7, Slides-Lecture 8, Slides-Lecture 9, Slides-Lecture 10, Slides-Lecture 11, Slides-Lecture 12, Slides-Lecture 13.

Lecture 13 is an overview of the entire course.

Ten Key Ideas for Reinforcement Learning and Optimal Control: Slides for an extended lecture/summary of the textbook “Reinforcement Learning and Optimal Control”.

**Video lectures from Reinforcement Learning and Optimal Control course at Arizona State University:**.

Video-Lecture 1, Video-Lecture 2, Video-Lecture 3,Video-Lecture 4, Video-Lecture 5, Video-Lecture 6, Video-Lecture 7, Video-Lecture 8, Video-Lecture 9, Video-Lecture 10, Video-Lecture 11, Video-Lecture 12, Video-Lecture 13.

Click here for slides from overview lecture on Reinforcement Learning and Optimal Control, Stanford and Berkeley, March 2019.

Click here for video of overview lecture on Reinforcement Learning and Optimal Control from Stanford, March 2019.

**Video lectures on Exact and Approximate Finite Horizon DP:** Videos from a 4-lecture, 4-hour short course at the University of Cyprus on finite horizon DP, Nicosia, 2017. Videos from Youtube. (Lecture Slides: Lecture 1, Lecture 2, Lecture 3, Lecture 4.) Based on Chapters 1 and 6 of the book Dynamic Programming and Optimal Control, Vol. I, 4th Edition, Athena Scientific.

**Video lectures on Exact and Approximate Infinite Horizon DP:** Videos from a 6-lecture, 12-hour short course at Tsinghua Univ. on approximate DP, Beijing, China, 2014. From the Tsinghua course site, and from Youtube. (Complete Set of Lecture Slides.) Based on the book Dynamic Programming and Optimal Control, Vol. II, 4th Edition: Approximate Dynamic Programming, Athena Scientific.

Abstract Dynamic Programming, a lecture slide overview of the book Abstract Dynamic Programming, Athena Scientific, 2013; (Additional related lecture slides on regular policies in Abstract DP); (Related Video Lectures on semicontractive DP, the solutions of Bellman’s equation, and applications to stable optimal control); click here for a copy of the book.

Dynamic Programming and Stochastic Control, 2015, Lecture Slides for MIT course 6.231, Fall 2015. Based on the 2-Vol. book Dynamic Programming and Optimal Control, Athena Scientific.

Video from a January 2017 slide presentation on the relation of Proximal Algorithms and Temporal Difference Methods, for solving large linear systems of equations arising in Dynamic Programming among others. The slides are hard to read at times in the video, so you may wish to download the PDF version of the slides. See also Related slides with a more numerical deterministic nonDP point of view from NIPS 2017 Click here for a Related Report. A version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. Based on the books Convex Optimization Algorithms, Nonlinear Programming, 3rd Edition, and Dynamic Programming and Optimal Control, Vol. II, 4th Edition: Approximate Dynamic Programming, Athena Scientific.

Nonlinear Programming, Lecture Slides for MIT course 6.252, 2005.

Convex Analysis and Optimization, 2014 Lecture Slides for MIT course 6.253, Spring 2014. Based on the book “Convex Optimization Theory,” Athena Scientific, 2009, and the book “Convex Optimization Algorithms,” Athena Scientific, 2014.

Convex Analysis and Optimization, 2003, Lecture Slides for MIT course 6.253, Fall 2003.

Enhanced Fritz John Conditions and Pseudonormality, a lecture slide overview of a major part of the book “Convex Analysis and Optimization,” Athena Scientific, 2003.

Slides on Convex Optimization: A 60-Year Journey, a lecture on the history and the evolution of the subject.

Ten Simple Rules for Mathematical Writing, Tutorial lecture on writing engineering/mathematics papers and books.

##### Survey and Expository Documents

**Dynamic and Neuro-Dynamic Programming – Reinforcement Learning**

“Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations,” Lab. for Information and Decision Systems Report, MIT, April 2018 (revised August 2018); arXiv preprint arXiv:1804.04577; a version published in IEEE/CAA Journal of Automatica Sinica. A survey of policy iteration methods for approximate Dynamic Programming, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. (Lecture Slides). (Related Video Lecture).

D. P. Bertsekas, “Approximate Policy Iteration: A Survey and Some New Methods,” *Journal of Control Theory and Applications, Vol. 9, 2011, pp. 310-335.* A survey of policy iteration methods for approximate Dynamic Programming, with a particular emphasis on two approximate policy evaluation methods, projection/temporal difference and aggregation, as well as the pathologies of policy improvement.

D. P. Bertsekas, “Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC,” *in Fundamental Issues in Control, European J. of Control, Vol. 11, Nos. 4-5, 2005*; From 2005 CDC, Seville, Spain. A selective survey of approximate dynamic programming (ADP), with a particular emphasis on two directions of research: rollout algorithms and model predictive control (MPC), and their connection.

D. P. Bertsekas, “Rollout Algorithms for Discrete Optimization: A Survey,” *Handbook of Combinatorial Optimization, Springer, 2013.* A 19-page expository article providing a summary overview of the subject.

D. P. Bertsekas, “Lambda-Policy Iteration: A Review and a New Implementation,” *Lab. for Information and Decision Systems Report LIDS-P-2874, MIT, October 2011.* Appears in “Reinforcement Learning and Approximate Dynamic Programming for Feedback Control,” by F. Lewis and D. Liu (eds.), IEEE Press Computational Intelligence Series, 2012. A review of lambda-policy iteration, a method for exact and approximate dynamic programming, and the theoretical foundation of the LSPE(lambda) method. We discuss various implementations of the method, including one that is new and introduces a natural form of exploration enhancement in LSTD(lambda), LSPE(lambda), and TD(lambda). (Video of a related lecture from ADPRL 2014.) (Lecture Slides from ADPRL 2014.)

D. P. Bertsekas, and S. E. Shreve, “Mathematical Issues in Dynamic Programming,” an unpublished expository paper that provides orientation on the central mathematical issues for a comprehensive and rigorous theory of dynamic programming and stochastic control, as given in the authors’ book “Stochastic Optimal Control: The Discrete-Time Case,” Bertsekas and Shreve, Academic Press, 1978 (republished by Athena Scientific, 1996). For an extended version see the Appendix of the book “Dynamic Programming and Optimal Control, Vol. II, 4th Edition,” (by D. Bertsekas, Athena Scientific, 2012).

D. P. Bertsekas, “Neuro-Dynamic Programming,” *Encyclopedia of Optimization,* Kluwer, 2001. A 9-page expository article providing orientation, references, and a summary overview of the subject. You may also find helpful the following introductory slide presentation: “Neuro-Dynamic Programming: An Overview.”

D. P. Bertsekas, “Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications,” *Lab. for Information and Decision Systems Report LIDS-P-2884, MIT, May 2012.* A review of algorithms for generalized dynamic programming models based on weighted sup-norm contractions. The analysis parallels and extends the one available for discounted MDP and for generalized models based on unweighted sup-norm contractions.

**Optimization and Distributed Computation**

D. P. Bertsekas, “Auction Algorithms for Network Flow Problems: A Tutorial Introduction,” *Computational Optimization and Applications, Vol. 1, pp. 7-66, 1992*. An extensive tutorial paper that surveys auction algorithms, a comprehensive class of algorithms for solving the classical linear network flow problem and its various special cases such as shortest path, max-flow, assignment, transportation, and transhipment problems. An account of this material may also be found in the internet-accessible book “Linear Network Optimization,” (D. Bertsekas, 1991).

D. P. Bertsekas, “Auction Algorithms,” *Encyclopedia of Optimization,* Kluwer, 2001. An 8-page expository article providing orientation, references, and a summary overview of the subject, as given in the author’s book “Network Optimization: Continuous and Discrete Models,” Athena Scientific, 1998; the book is also internet-accessible.

D. P. Bertsekas, and J. N. Tsitsiklis, “Some Aspects of Parallel and Distributed Iterative Algorithms – A Survey,” *Automatica,* Vol. 27, 1991, pp. 3-21. A survey of some topics from the 1989 “Parallel and Distributed Computation” book by the authors. It includes some new results on asynchronous iterative algorithms. Also click here for a follow-up paper on termination of asynchronous iterative algorithms.

**Convex Optimization**

D. P. Bertsekas, “Min Common/Max Crossing Duality: A Geometric View of Conjugacy in Convex Optimization,” *Lab. for Information and Decision Systems Report LIDS-P-2796, MIT, August 2008; revised Jan. 2009.* A long expository paper on the geometric foundations of duality and convex optimization, as more extensively discussed in the book “Convex Optimization Theory,” (Athena Scientific, 2009); the book is also internet-accessible.

D. P. Bertsekas, “Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey,” *Lab. for Information and Decision Systems Report LIDS-P-2848, MIT, August 2010*; this is an extended version of a paper in the edited volume *Optimization for Machine Learning*, by S. Sra, S. Nowozin, and S. J. Wright, MIT Press, Cambridge, MA, 2012, pp. 85-119. A survey of incremental methods for minimizing a sum $sum_{i=1}^mf_i(x)$, and their applications in inference/machine learning, signal processing, and large-scale and distributed optimization. (Related Video Lecture).

##### Most Recent Papers

Bertsekas, D., “Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning,” ASU Report, April 2020.

Bertsekas, D., “Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm,” arXiv preprint, arXiv:2002.07407 February 2020.

S. Bhattacharya, S. Badyal, T. Wheeler, S. Gil, D. P. Bertsekas, “Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems,” to appear in IEEE Robotics and Automation Letters, 2020; arXiv preprint arXiv:2002.04175.

D. P. Bertsekas, “Multiagent Rollout Algorithms and Reinforcement Learning,” arXiv preprint arXiv:1910.00120, September 2019.

D. P. Bertsekas, “Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning,” Lab. for Information and Decision Systems Report, MIT, October 2018; a shorter version appears as arXiv preprint arXiv:1910.02426, Oct. 2019.

D. P. Bertsekas, “Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations,” Lab. for Information and Decision Systems Report, MIT, April 2018 (revised August 2018); arXiv preprint arXiv:1804.04577; a version published in IEEE/CAA Journal of Automatica Sinica. (Lecture Slides). (Related Video Lecture).

D. P. Bertsekas, “Stable Optimal Control and Semicontractive Dynamic Programming,” SIAM J. on Control and Optimization, Vol. 56, 2018, pp. 231-252, (Related Lecture Slides), (Related Video Lecture from MIT, May 2017). (Related Lecture Slides from UConn, Oct. 2017). (Related Video Lecture from UConn, Oct. 2017).

D. P. Bertsekas, “Proper Policies in Infinite-State Stochastic Shortest Path Problems,” IEEE Transactions on Automatic Control, Vol. 63, 2018, pp. 3787-3792. (Related Lecture Slides).

D. P. Bertsekas, “Proximal Algorithms and Temporal Differences for Large Linear Systems: Extrapolation, Approximation, and Simulation,” Lab. for Information and Decision Systems Report LIDS-P-3205, MIT, October 2016; arXiv preprint arXiv:1610.1610.05427; a version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. (Related Video Lecture from INFORMS ICS Conference, Jan 2017). (Slides from INFORMS ICS Conference). (Related Slides from NIPS 2017).

D. P. Bertsekas, “Affine Monotonic and Risk-Sensitive Models in Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-3204, MIT, June 2016 (revised Nov. 2017); arXiv preprint arXiv:1608.01393; IEEE Transactions on Aut. Control, Vol. 64, 2019, pp. 3117-3128.

D. P. Bertsekas, “Incremental Aggregated Proximal and Augmented Lagrangian Algorithms,” Lab. for Information and Decision Systems Report LIDS-P-3176, MIT, July 2015; revised September 2015; arXiv preprint arXiv:1507.1365936. (Related Lecture Slides). (Related Video Lecture) .

D. P. Bertsekas, “Regular Policies in Abstract Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-3173, MIT, May 2015; SIAM J. on Optimization, Vol. 27, No. 3, pp. 1694-1727. (Related Lecture Slides); (Related Video Lectures).

D. P. Bertsekas, “Value and Policy Iteration in Deterministic Optimal Control and Adaptive Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-3174, MIT, May 2015 (revised Sept. 2015); arXiv preprint arXiv:1507.01026; IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, 2017, pp. 500-509.

D. P. Bertsekas, “Robust Shortest Path Planning and Semicontractive Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2915, MIT, Feb. 2014 (revised Jan. 2015 and June 2016); arXiv preprint arXiv:1608.01670; Naval Research Logistics (NRL), 66(1), pp.15-37.

D. P. Bertsekas and H. Yu, “Stochastic Shortest Path Problems Under Weak Conditions,” Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, Jan. 2016.

H. Yu and D. P. Bertsekas, “A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies,” Lab. for Information and Decision Systems Report LIDS-P-2905, MIT, July 2013; Journal Version,Mathematics of Operations Research, Vol. 40, 2015, pp. 926-968.

M. Wang and D. P. Bertsekas, “Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization ,” Lab. for Information and Decision Systems Report LIDS-P-2907, MIT, July 2013. (Related Lecture Slides) (Related Video Lecture) .

M. Wang and D. P. Bertsekas, “Stochastic First-Order Methods with Random Constraint Projection,” SIAM Journal on Optimization, Vol. 26, 2016, pp. 681-717. (Related Lecture Slides) (Related Video Lecture) .

M. Wang and D. P. Bertsekas, “Incremental Constraint Projection Methods for Variational Inequalities,” Lab. for Information and Decision Systems Report LIDS-P-2898, MIT, December 2012; Mathematical Programming, Vol. 150, 2015, pp. 321-363.

H. Yu and D. P. Bertsekas, “Weighted Bellman Equations and their Applications in Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2876, MIT, October 2012. (Related Lecture Slides from INFORMS 2012.) (Related Lecture Slides from ADPRL 2014.) (Video of the lecture from ADPRL 2014.)

D. P. Bertsekas, “Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications,” Lab. for Information and Decision Systems Report LIDS-P-2884, MIT, May 2012.

M. Wang and D. P. Bertsekas, “Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems,” Lab. for Information and Decision Systems Report LIDS-P-2878, MIT, December 2011 (revised March 2012); Mathematics of Operations Research, Vol. 39, pp. 1-30, 2013 related slide presentation, related poster presentation.

M. Wang and D. P. Bertsekas, “Convergence of Iterative Simulation-Based Methods for Singular Linear Systems,” Lab. for Information and Decision Systems Report LIDS-P-2879, MIT, December 2011 (revised April 2012); Stochastic Systems, Vol. 3, pp 39-96, 2013. related slide presentation, related poster presentation.

D. P. Bertsekas, “Lambda-Policy Iteration: A Review and a New Implementation,” Lab. for Information and Decision Systems Report LIDS-P-2874, MIT, October 2011. In “Reinforcement Learning and Approximate Dynamic Programming for Feedback Control,” by F. Lewis and D. Liu (eds.), IEEE Press Computational Intelligence Series.(Video of a related lecture from ADPRL 2014.)(Lecture Slides from ADPRL 2014.)

H. Yu and D. P. Bertsekas, “Q-Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems,” Lab. for Information and Decision Systems Report LIDS-P-2871, MIT, September 2011; revised March 2012; Annals of Operations Research Vol. 208, 2013, pp. 95-132.

H. Yu and D. P. Bertsekas, “On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems,” Lab. for Information and Decision Systems Report LIDS-P-2859, MIT, March 2011; revised Sept. 2011; Mathematics of Operations Research 38(2), pp. 209-227, 2013.

D. P. Bertsekas, “Temporal Difference Methods for General Projected Equations,” IEEE Trans. on Automatic Control, Vol. 56, pp. 2128 – 2139, 2011. (Related Lecture Slides).

D. P. Bertsekas, “Centralized and Distributed Newton Methods for Network Optimization and Extensions,” Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011.

D. P. Bertsekas and H. Yu, “Distributed Asynchronous Policy Iteration in Dynamic Programming,” Proc. of 2010 Allerton Conference on Communication, Control, and Computing, Allerton Park, ILL, Sept. 2010. (Related Lecture Slides) (An extended version with additional algorithmic analysis) (A counterexample by Williams and Baird that motivates in part this paper).

D. P. Bertsekas, “Incremental Proximal Methods for Large Scale Convex Optimization,” Mathematical Programming, Vol. 129, 2011, pp.163-195. An extended survey version: “Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey,” Lab. for Information and Decision Systems Report LIDS-P-2848, MIT, August 2010. (Related Lecture Slides). (Related Video Lecture).

D. P. Bertsekas, “Pathologies of Temporal Difference Methods in Approximate Dynamic Programming,” Proc. 2010 IEEE Conference on Decision and Control, Atlanta, GA, Dec. 2010. (Related Lecture Slides).

D. P. Bertsekas and H. Yu, “Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2831, MIT, April, 2010 (revised November 2011); Math. of Operations Research, Vol. 37, 2012, pp. 66-94; a shorter version appears in Proc. of 2010 IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010. (Related Lecture Slides) (A counterexample by Williams and Baird that motivates in part this paper)

D. P. Bertsekas, “Approximate Policy Iteration: A Survey and Some New Methods,” Journal of Control Theory and Applications, Vol. 9, 2011, pp. 310-335.

D. P. Bertsekas and H. Yu, “A Unifying Polyhedral Approximation Framework for Convex Optimization,” Lab. for Information and Decision Systems Report LIDS-P-2820, MIT, September 2009 (revised December 2010), published in SIAM J. on Optimization, Vol. 21, 2011, pp. 333-360. (Related VideoLecture, Dec. 2008) (Related Lecture Slides).

H. Yu and D. P. Bertsekas, “Error Bounds for Approximations from Projected Linear Equations,” Mathematics of Operations Research, Vol. 35, 2010, pp. 306-329. — A shorter/abridged version appeared at European Workshop on Reinforcement Learning (EWRL’08), 2008, Lille, France. (Related Lecture Slides).

A. Nedich and D. P. Bertsekas, “The Effect of Deterministic Noise in Subgradient Methods,” Math. Programming, Ser. A, Vol. 125, pp. 75-99, 2010.

##### Dynamic and Neuro-Dynamic Programming - Reinforcement Learning

Bertsekas, D., “Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning,” ASU Report, April 2020. We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all-agents-at-once method. For the case of a finite-state discounted problem, we showed convergence to an agent-by-agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.

Bertsekas, D., “Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm,” arXiv preprint, arXiv:2002.07407 February 2020. We consider an extension of the rollout algorithm that applies to constrained deterministic dynamic programming, including challenging combinatorial optimization problems. The algorithm relies on a suboptimal policy, called base heuristic. Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm has a cost improvement property: it produces a feasible solution, whose cost is no worse than the base heuristic’s cost. We then focus on multiagent problems, where the control at each stage consists of multiple components (one per agent), which are coupled either through the cost function or the constraints or both. We show that the cost improvement property is maintained with an alternative implementation that has greatly reduced computational requirements, and makes possible the use of rollout in problems with many agents. We demonstrate this alternative algorithm by applying it to layered graph problems that involve both a spatial and a temporal structure. We consider in some detail a prominent example of such problems: multidimensional assignment, where we use the auction algorithm for 2-dimensional assignment as a base heuristic. This auction algorithm is particularly well-suited for our context, because through the use of prices, it can advantageously use the solution of an assignment problem as a starting point for solving other related assignment problems, and this can greatly speed up the execution of the rollout algorithm.

S. Bhattacharya, S. Badyal, T. Wheeler, S. Gil, D. P. Bertsekas, “Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems,” to appear in IEEE Robotics and Automation Letters, 2020; arXiv preprint arXiv:2002.04175. In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

D. P. Bertsekas, “Multiagent Rollout Algorithms and Reinforcement Learning,” arXiv preprint arXiv:1910.00120, September 2019. We consider finite and infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. We introduce an algorithm, whereby at every stage, each agent’s decision is made by executing a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of local computation required at every stage by each agent is independent of the number of agents, while the amount of global computation (over all agents) grows linearly with the number of agents. By contrast, with the standard rollout algorithm, the amount of global computation grows exponentially with the number of agents. Despite the drastic reduction in required computation, we show that our algorithm has the fundamental cost improvement property of rollout: an improved performance relative to the base policy. We also explore related reinforcement learning and approximate policy iteration algorithms, and we discuss how this cost improvement property is affected when we attempt to improve further the method’s computational efficiency through parallelization of the agents’ computations.

D. P. Bertsekas, “Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning,” Lab. for Information and Decision Systems Report, MIT, October 2018; a shorter version appears as arXiv preprint arXiv:1910.02426, Oct. 2019. We propose a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a {it scoring function} $V$ of the state, which biases the values of the aggregate cost function towards their correct levels. The classical aggregation framework is obtained when $Vequiv0$, but our scheme works best when $V$ is a known reasonably good approximation to the optimal cost function $jstar$. When $V$ is equal to the cost function $J_m$ of some known policy $m$, our scheme is equivalent to the rollout algorithm based on $m$, in the extreme case where there is a single aggregate state. In the case of hard aggregation with multiple aggregate states, our scheme is equivalent to approximation in value space with lookahead function $tl J$ equal to $J_m$ plus local corrections that are constant within each aggregate state. The local correction levels are obtained by solving a low-dimensional aggregate DP problem, yielding an arbitrarily close approximation $tl J$ to $jstar$, when the number of aggregate states is sufficiently large. As a result, for $V=J_m$, our score-based aggregation approach can be used as an enhanced form of improvement of the policy $m$. When combined with an approximate policy evaluation scheme, it can form the basis for a new and enhanced form of approximate policy iteration. When $V$ is a generic scoring function, the aggregation scheme is equivalent to approximation in value space based on $V$, in the extreme case of a single aggregate state. It can yield an arbitrarily close approximation to $jstar$ when a sufficiently large number of aggregate states are used, through local corrections to $V$, obtained by solving an aggregate problem. Except for the scoring function, the aggregate problem is similar to the one of the classical aggregation framework, and its algorithmic solution by simulation or other methods is nearly identical to one for classical aggregation, assuming values of $V$ are available when needed.

D. P. Bertsekas, “Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations,” Lab. for Information and Decision Systems Report, MIT, April 2018 (revised August 2018); arXiv preprint arXiv:1804.04577; a version will appear in IEEE/CAA Journal of Automatica Sinica. (Lecture Slides). (Related Video Lecture). In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller “aggregate” Markov decision problem, whose states relate to the features. The optimal cost function of the aggregate problem, a nonlinear function of the features, serves as an architecture for approximation in value space of the optimal cost function or the cost functions of policies of the original problem. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with reinforcement learning based on deep neural networks, which is used to obtain the needed features. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by deep reinforcement learning, thereby potentially leading to more effective policy improvement.

D. P. Bertsekas, “Proximal Algorithms and Temporal Differences for Large Linear Systems: Extrapolation, Approximation, and Simulation,” Lab. for Information and Decision Systems Report LIDS-P-3205, MIT, October 2016; arXiv preprint arXiv:1610.1610.05427; a version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. (Related Video Lecture from INFORMS ICS Conference, Jan 2017). (Slides from INFORMS ICS Conference). (Related Slides from NIPS 2017). We consider large linear and nonlinear fixed point problems, and solution with proximal algorithms. We show that there is a close connection between two seemingly different types of methods from distinct fields: 1) Proximal iterations for linear systems of equations, which are prominent in numerical analysis and convex optimization, and 2) Temporal difference (TD) type methods, such as TD($l$), LSTD($l$), and LSPE($l$), which are central in simulation-based approximate dynamic programming/reinforcement learning (DP/RL), and its recent prominent successes in large-scale game contexts, among others. One benefit of this connection is a new and simple way to accelerate the standard proximal algorithm by extrapolation towards the TD iteration, which generically has a faster convergence rate. Another benefit is the potential integration into the proximal algorithmic context of several new ideas that have emerged in the DP/RL context. We discuss some of the possibilities, and in particular, algorithms that project each proximal iterate onto the subspace spanned by a small number of basis functions, using low-dimensional calculations and simulation. A third benefit is that insights and analysis from proximal algorithms can be brought to bear on the enhancement of TD methods. The linear fixed point methodology can be extended to nonlinear fixed point problems involving a contraction, thus providing guaranteed and potentially substantial acceleration of the proximal and forward backward splitting algorithms at no extra cost. Moreover, the connection of proximal and TD methods can be extended to nonlinear (nondifferentiable) fixed point problems through new proximal-like algorithms that involve successive linearization, similar to policy iteration in DP.

D. P. Bertsekas, “Stable Optimal Control and Semicontractive Dynamic Programming,” SIAM J. on Control and Optimization, Vol. 56, 2018, pp. 231-252, (Related Lecture Slides). We consider discrete-time infinite horizon deterministic optimal control problems with nonnegative cost, and a destination that is cost-free and absorbing. The classical linear-quadratic regulator problem is a special case. Our assumptions are very general, and allow the possibility that the optimal policy may not be stabilizing the system, e.g., may not reach the destination either asymptotically or in a finite number of steps. We introduce a new unifying notion of stable feedback policy, based on perturbation of the cost per stage, which in addition to implying convergence of the generated states to the destination, quantifies the speed of convergence. We consider the properties of two distinct cost functions: $J^*$, the overall optimal, and $hat J$, the restricted optimal over just the stable policies. Different classes of stable policies (with different speeds of convergence) may yield different values of $hat J$. We show that for any class of stable policies, $hat J$ is a solution of Bellman’s equation, and we characterize the smallest and the largest solutions: they are $J^*$, and $J^+$, the restricted optimal cost function over the class of (finitely) terminating policies. We also characterize the regions of convergence of various modified versions of value and policy iteration algorithms, as substitutes for the standard algorithms, which may not work in general.

D. P. Bertsekas, “Proper Policies in Infinite-State Stochastic Shortest Path Problems,” IEEE Transactions on Automatic Control, Vol. 63, 2018, pp. 3787-3792. (Related Lecture Slides). We consider stochastic shortest path problems with infinite state and control spaces, and a nonnegative cost per stage. We extend the notion of a proper policy from the context of finite state space to the context of infinite state space. We consider the optimal cost function $J^*$, and the optimal cost function $hat J$ over just the proper policies. Assuming that there exists at least one proper policy, we show that $J^*$ and $hat J$ are the smallest and largest solutions of Bellman’s equation, respectively, within a class of functions with a boundedness property. The standard value iteration algorithm may be attracted to either $J^*$ or $hat J$, depending on the initial condition.

D. P. Bertsekas, “Proximal Algorithms and Temporal Differences for Large Linear Systems: Extrapolation, Approximation, and Simulation,” Lab. for Information and Decision Systems Report LIDS-P-3205, MIT, October 2016; arXiv preprint arXiv:1610.1610.05427; a version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. (Related Video Lecture from INFORMS ICS Conference, Jan 2017). (Slides from INFORMS ICS Conference). (Related Slides from NIPS 2017). In this paper we consider large linear fixed point problems and solution with proximal algorithms. We show that, under certain assumptions, there is a close connection between proximal iterations, which are prominent in numerical analysis and optimization, and multistep methods of the temporal difference type such as TD($l$), LSTD($l$), and LSPE($l$), which are central in simulation-based approximate dynamic programming. As an application of this connection, we show that we may accelerate the standard proximal algorithm by extrapolation towards the multistep iteration, which generically has a faster convergence rate. We also use the connection with multistep methods to integrate into the proximal algorithmic context several new ideas that have emerged in the approximate dynamic programming context. In particular, we consider algorithms that project each proximal iterate onto the subspace spanned by a small number of basis functions, using low-dimensional calculations and simulation, and we discuss various algorithmic options from approximate dynamic programming.

D. P. Bertsekas, “Affine Monotonic and Risk-Sensitive Models in Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-3204, MIT, June 2016; arXiv preprint arXiv:1608.01393; IEEE Transactions on Aut. Control, Vol. 64, 2019, pp. 3117-3128. In this paper we consider a broad class of infinite horizon discrete-time optimal control models that involve a nonnegative cost function and an affine mapping in their dynamic programming equation. They include as special cases classical models such as stochastic undiscounted nonnegative cost problems, stochastic multiplicative cost problems, and risk-sensitive problems with exponential cost. We focus on the case where the state space is finite and the control space has some compactness properties. We assume that the affine mapping has a semicontractive character, whereby for some policies it is a contraction, while for others it is not. In one line of analysis, we impose assumptions that guarantee that the latter policies cannot be optimal. Under these assumptions, we prove strong results that resemble those for discounted Markovian decision problems, such as the uniqueness of solution of Bellman’s equation, and the validity of forms of value and policy iteration. In the absence of these assumptions, the results are weaker and unusual in character: the optimal cost function need not be a solution of Bellman’s equation, and an optimal policy may not be found by value or policy iteration. Instead the optimal cost function over just the contractive policies solves Bellman’s equation, and can be computed by a variety of algorithms.

D. P. Bertsekas, “Regular Policies in Abstract Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-3173, MIT, May 2015; SIAM J. on Optimization, Vol. 27, No. 3, pp. 1694-1727. (Related Lecture Slides); (Related Video Lectures). We consider an abstract dynamic programming model, and analysis based on regular policies that are well-behaved with respect to value iteration. We show that the optimal cost function over regular policies may have favorable fixed point and value iteration properties, which the optimal cost function over all policies need not have. We accordingly develop a methodology that can deal with long standing analytical and algorithmic issues in undiscounted dynamic programming models, such as stochastic shortest path, positive cost, negative cost, mixed positive-negative cost, risk-sensitive, and multiplicative cost problems. Among others, we use our approach to obtain new results for convergence of value and policy iteration in deterministic discrete-time optimal control with nonnegative cost per stage.

D. P. Bertsekas, “Value and Policy Iteration in Deterministic Optimal Control and Adaptive Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-3174, MIT, May 2015 (revised Sept. 2015); arXiv preprint arXiv:1507.01026; IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, 2017, pp. 500-509. In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general assumptions, we establish the uniqueness of solution of Bellman’s equation and we provide convergence results for value and policy iteration.

D. P. Bertsekas, “Robust Shortest Path Planning and Semicontractive Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2915, MIT, Feb. 2014 (revised Jan. 2015 and June 2016); Naval Research Logistics (NRL), 66(1), pp.15-37. In this paper we consider shortest path problems in a directed graph where the transitions between nodes are subject to uncertainty. We use a minimax formulation, where the objective is to guarantee that a special destination state is reached with a minimum cost path even under the worst possible instance of the uncertainty. Problems of this type arise, among others, in planning and pursuit-evasion contexts, and in model predictive control. Our analysis makes use of the recently developed theory of abstract semicontractive dynamic programming models. We investigate questions of existence and uniqueness of solution of the optimality equation, existence of optimal paths, and the validity of various algorithms patterned after the classical methods of value and policy iteration, as well as a new Dijkstra-like algorithm for problems with nonnegative arc lengths.

D. P. Bertsekas and H. Yu, “Stochastic Shortest Path Problems Under Weak Conditions,” Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, Jan. 2016; to appear in Math. of OR. In this paper we weaken the conditions under which some of the basic analytical and algorithmic results for finite-state stochastic shortest path problems hold. We provide an analysis under three types of assumptions, under all of which the standard form of policy iteration may fail, and other anomalies may occur. In the first type of assumptions, we require a standard compactness and continuity condition, as well as the existence of an optimal proper policy, thereby allowing positive and negative costs per stage, and improper policies with finite cost at all states. The analysis is based on introducing an additive perturbation $d>0$ to the cost per stage, which drives the cost of improper policies to infinity. By considering the $d$-perturbed problem and taking the limit as $ddownarrow0$, we show the validity of Bellman’s equation and value iteration, and we construct a convergent policy iteration algorithm that uses a diminishing sequence of perturbations. In the second type of assumptions we require nonpositive one-stage costs and we give policy iteration algorithms that are optimistic and do not require the use of perturbations. In the third type of assumptions we require nonnegative one-stage costs, as well as the compactness and continuity condition, and we convert the problem to an equivalent stochastic shortest path problem for which the existing theory applies. Using this transformation, we address the uniqueness of solution of Bellman’s equation, the convergence of value iteration, and the convergence of some variants of policy iteration. Our analysis and algorithms under the second and third type of assumptions fully apply to finite-state positive (reward) and negative (reward) dynamic programming models.

H. Yu and D. P. Bertsekas, “A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies,” Lab. for Information and Decision Systems Report LIDS-P-2905, MIT, July 2013; Mathematics of Operations Research, Vol. 40, 2015, pp. 926-968. We consider the stochastic control model with Borel spaces and universally measurable policies. For this model the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function, in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems, for the discounted case where the one-stage costs are bounded, and for the undiscounted case where the one-stage costs are nonpositive or nonnegative. For the undiscounted total cost problems with nonnegative one-stage costs, we also give a new convergence theorem for value iteration, which shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles WhittleÕs bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra and Sudderth, which showed that value iteration, when initialized with the constant function zero, could require a transfinite number of iterations to converge. We use the new convergence theorem for value iteration to establish the convergence of our mixed value and policy iteration method for the nonnegative cost models.

H. Yu and D. P. Bertsekas, “Weighted Bellman Equations and their Applications in Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2876, MIT, October 2012; related slide presentation (Related Lecture Slides from ADPRL 2014.) (Video of the lecture from ADPRL 2014.) We consider approximation methods for Markov decision processes in the learning and simulation context. For policy evaluation based on solving approximate versions of a Bellman equation, we propose the use of weighted Bellman mappings. Such mappings comprise weighted sums of one-step and multistep Bellman mappings, where the weights depend on both the step and the state. For projected versions of the associated Bellman equations, we show that their solutions have the same nature and essential approximation properties as the commonly used approximate solutions from TD($lambda$). The most important feature of our framework is that each state can be associated with a different type of mapping. Compared with the standard TD($lambda$) framework, this gives a more flexible way to combine multistage costs and state transition probabilities in approximate policy evaluation, and provides alternative means for bias-variance control. With weighted Bellman mappings, there is also greater flexibility to design learning and simulation-based algorithms. We demonstrate this with examples, including new TD-type algorithms with state-dependent $lambda$ parameters, as well as block versions of the algorithms. Weighted Bellman mappings can also be applied in approximate policy iteration: we provide several examples, including some new optimistic policy iteration schemes. Another major feature of our framework is that the projection need not be based on a norm, but rather can use a semi-norm. This allows us to establish a close connection between projected equation and aggregation methods, and to develop for the first time multistep aggregation methods, including some of the TD($lambda$)-type.

D. P. Bertsekas, “Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications,” *Lab. for Information and Decision Systems Report LIDS-P-2884, MIT, May 2012.* We consider a class of generalized dynamic programming models based on weighted sup-norm contractions. We provide an analysis that parallels the one available for discounted MDP and for generalized models based on unweighted sup-norm contractions. In particular, we discuss the main properties and associated algorithms of these models, including value iteration, policy iteration, and their optimistic and approximate variants. The analysis relies on several earlier works that use more specialized assumptions. In particular, we review and extend the classical results of Denardo [Den67] for unweighted sup-norm contraction models, as well as more recent results relating to approximation methods for discounted MDP. We also apply the analysis to stochastic shortest path problems where all policies are assumed proper. For these problems we extend three results that are known for discounted MDP. The first relates to the convergence of optimistic policy iteration and extends a result of Rothblum [Rot79], the second relates to error bounds for approximate policy iteration and extends a result of Bertsekas and Tsitsiklis [BeT96], and the third relates to error bounds for approximate optimistic policy iteration and extends a result of Thiery and Scherrer [ThS10b].

M. Wang and D. P. Bertsekas, “Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems,” Lab. for Information and Decision Systems Report LIDS-P-2878, MIT, December 2011 (revised March 2012); Mathematics of Operations Research, Vol. 39, pp. 1-30, 2013; related slide presentation, related poster presentation.Abstract: We consider linear systems of equations, $Ax=b$, of various types frequently arising in large-scale applications, with an emphasis on the case where $A$ is singular. Under certain conditions, necessary as well as sufficient, linear deterministic iterative methods generate sequences ${x_k}$ that converge to a solution, as long as there exists at least one solution. We show that this convergence property is frequently lost when these methods are implemented with simulation, as is often done in important classes of large-scale problems. We introduce additional conditions and novel algorithmic stabilization schemes under which ${x_k}$ converges to a solution when $A$ is singular, and may also be used with substantial benefit when $A$ is nearly singular. Moreover, we establish the mathematical foundation for related work that deals with special cases of singular systems, including some arising in approximate dynamic programming, where convergence may be obtained without a stabilization mechanism.

M. Wang and D. P. Bertsekas, “Convergence of Iterative Simulation-Based Methods for Singular Linear Systems,” Lab. for Information and Decision Systems Report LIDS-P-2879, MIT, December 2011 (revised April 2012); Stochastic Systems, 2013; related slide presentation, related poster presentation.Abstract: We consider simulation-based algorithms for linear systems of equations, $Ax=b$, where $A$ is singular. The convergence properties of iterative solution methods can be impaired when the methods are implemented with simulation, as is often done in important classes of large-scale problems. We focus on special cases of singular systems, including some arising in approximate dynamic programming, where convergence of the residual sequence may be obtained without a stabilization mechanism, while the sequence of iterates may diverge. For some of these special cases, under additional assumptions, we show that the sequence is guaranteed to converge. For situations where the sequence of iterates diverges, we propose schemes for extracting from the divergent sequence another sequence that converges to a solution of $Ax=b$.

D. P. Bertsekas, “Lambda-Policy Iteration: A Review and a New Implementation,” Lab. for Information and Decision Systems Report LIDS-P-2874, MIT, October 2011. In “Reinforcement Learning and Approximate Dynamic Programming for Feedback Control,” by F. Lewis and D. Liu (eds.), IEEE Press Computational Intelligence Series, 2012.Abstract: In this paper we discuss lambda-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is done approximately, using a finite number of VI. We review the theory of the method and associated questions of bias and exploration arising in simulation-based cost function approximation. We then discuss various implementations, which offer advantages over well-established PI methods that use LSPE(lambda), LSTD(lambda), or TD(lambda) for policy evaluation with cost function approximation. One of these implementations is based on a new simulation scheme, called geometric sampling, which uses multiple short trajectories rather than a single infinitely long trajectory.

H. Yu and D. P. Bertsekas, “Q-Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems,” Lab. for Information and Decision Systems Report LIDS-P-2871, MIT, September 2011; revised March 2012; Annals of Operations Research Vol. 208, 2013, pp. 95-132.Abstract: We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BeY10]. The main difference from the standard policy iteration approach is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm solves an optimal stopping problem inexactly with a finite number of value iterations. The main advantage over the standard Q-learning approach is lower overhead: most iterations do not require a minimization over all controls, in the spirit of modified policy iteration. We prove the convergence of asynchronous stochastic lookup table implementations of our method for undiscounted, total cost stochastic shortest path problems, thereby overcoming some of the traditional convergence difficulties of asynchronous modified policy iteration, and providing policy iteration-like alternative Q-learning schemes with as reliable convergence as classical Q-learning. We also discuss methods that use basis function approximations of Q-factors and we give associated error bounds.

H. Yu and D. P. Bertsekas, “On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems,” Lab. for Information and Decision Systems Report LIDS-P-2859, MIT, March 2011; revised Sept. 2011; Mathematics of Operations Research 38(2), pp. 209-227, 2013.Abstract: We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsi94] and establishing completely the convergence of Q-learning for these SSP models.

D. P. Bertsekas, “Temporal Difference Methods for General Projected Equations,” IEEE Trans. on Automatic Control, Vol. 56, pp. 2128 – 2139, 2011.Abstract: We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and a class of iterative algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with rank deficiency in the associated matrix inversion. There is a sharp qualitative distinction between the deterministic and the simulation-based versions: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck. (Related Lecture Slides)

N. Polydorides, M. Wang, and D. P. Bertsekas, “A Quasi Monte Carlo Method for Large Scale Inverse Problems,” Proc. of The 9th International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (MCQMC 2010); in “Monte Carlo and Quasi-Monte Carlo Methods 2010,” by H. Wozniakowski and L. Plaskota (eds), Springer-Verlag.Abstract: We consider large-scale linear inverse problems with a simulation-based algorithm that approximates the solution within a low-dimensional subspace. The algorithm uses Tikhonov regularization, regression, and low-dimensional linear algebra calculations and storage. For sampling efficiency, we implement importance sampling schemes, specially tailored to the structure of inverse problems. We emphasize various alternative methods for approximating the optimal sampling distribution and we demonstrate their impact on the reduction of simulation noise. The performance of our algorithm is tested on a practical inverse problem arising from Fredholm integral equations of the first kind.

D. P. Bertsekas and H. Yu, “Distributed Asynchronous Policy Iteration in Dynamic Programming,” Proc. of 2010 Allerton Conference on Communication, Control, and Computing, Allerton Park, ILL, Sept. 2010. (Related Lecture Slides) (An extended version with additional algorithmic analysis) (A counterexample by Williams and Baird that motivates in part this paper)Abstract: We consider the distributed solution of dynamic programming (DP) problems by policy iteration. We envision a network of processors, each updating asynchronously a local policy and a local cost function, defined on a portion of the state space. The computed values are communicated asynchronously between processors and are used to perform the local policy and cost updates. The natural algorithm of this type can fail even under favorable circumstances, as shown by Williams and Baird [WiB93]. We propose an alternative and almost as simple algorithm, which converges to the optimum under the most general conditions, including asynchronous updating by multiple processors using outdated local cost functions of other processors.

D. P. Bertsekas, “Pathologies of Temporal Difference Methods in Approximate Dynamic Programming,” Proc. 2010 IEEE Conference on Decision and Control, Atlanta, GA, Dec. 2010. (Related Lecture Slides)Abstract: Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.

D. P. Bertsekas and H. Yu, “Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2831, MIT, April, 2010 (revised November 2011); Math. of Operations Research, Vol. 37, 2012, pp. 66-94; a shorter version appears in Proc. of 2010 IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010.(Related Lecture Slides) (A counterexample by Williams and Baird that motivates in part this paper)Abstract: We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm requires (possibly inexact) solution of a nonlinear system of equations, involving estimates of state costs as well as Q-factors. This is Bellman’s equation for an optimal stopping problem that can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead and more reliable convergence advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration.

D. P. Bertsekas, “Approximate Policy Iteration: A Survey and Some New Methods,” Journal of Control Theory and Applications, Vol. 9, 2011, pp. 310-335.Abstract: We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic policy iteration.Our discussion of policy evaluation is couched in general terms, and aims to unify the many approaches in the field in the light of recent research developments, and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods such as LSTD, and iterative methods such as LSPE and TD($l$), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods, and provides the connecting link between LSTD and LSPE.Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.

N. Polydorides, M. Wang, and D. P. Bertsekas, “Approximate Solution of Large-Scale Linear Inverse Problems with Monte Carlo Simulation,” Lab. for Information and Decision Systems Report LIDS-P-2822, MIT, November 2009.Abstract: We consider the approximate solution of linear ill-posed inverse problems of high dimension with a simulation-based algorithm that approximates the solution within a low-dimensional subspace. The algorithm uses Tikhonov regularization, regression, and low-dimensional linear algebra calculations and storage. For sampling efficiency, we use variance reduction/importance sampling schemes, specially tailored to the structure of inverse problems. We demonstrate the implementation of our algorithm in a series of practical large-scale examples arising from Fredholm integral equations of the first kind.

M. Wang, N. Polydorides, and D. P. Bertsekas, “Approximate Simulation-Based Solution of Large-Scale Least Squares Problems,” Lab. for Information and Decision Systems Report LIDS-P-2819, MIT, September 2009.Abstract: We consider linear least squares problems of very large dimension, such as those arising for example in inverse problems. We introduce an associated approximate problem, within a subspace spanned by a relatively small number of basis functions, and solution methods that use simulation, importance sampling, and low-dimensional calculations. The main components of this methodology are a regression/regularization approach that can deal with nearly singular problems, and an importance sampling design approach that exploits existing continuity structures in the underlying models, and allows the solution of very large problems.

D. P. Bertsekas, “Projected Equations, Variational Inequalities, and Temporal Difference Methods,” Lab. for Information and Decision Systems Report LIDS-P-2808, MIT, March 2009 — A shorter/abridged version appeared in Proc. of IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009, Nashville, TN.Abstract: We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce a unified framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be implemented with low-dimensional simulation. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD algorithms, which offer special implementation advantages, reduced overhead, and improved computational complexity over the standard LSTD and LSPE methods. We demonstrate a sharp qualitative distinction between the deterministic and the simulation-based versions: the performance of the former is greatly affected by direction and feature scaling, yet the latter asymptotically perform identically, regardless of scaling. (Related Lecture Slides)

H. Yu and D. P. Bertsekas, “Basis Function Adaptation Methods for Cost Approximation in MDP,” Proc. of IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009, Nashville, TN. Click here for an extended version.Abstract: We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD. (Related Lecture Slides)

H. Yu and D. P. Bertsekas, “Error Bounds for Approximations from Projected Linear Equations,” Mathematics of Operations Research, Vol. 35, 2010, pp. 306-329. — A shorter/abridged version appeared at European Workshop on Reinforcement Learning (EWRL’08), 2008, Lille, France.Abstract: We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markov decision processes (MDP), one of our bounds is always sharper than the standard contraction-based bounds, and another one is often sharper. The former bound is also tight in a worst-case sense. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.(Related Lecture Slides)

D. P. Bertsekas and H. Yu, “Projected Equation Methods for Approximate Solution of Large Linear Systems,” Journal of Computational and Applied Mathematics, Vol. 227, 2009, pp. 27-50.Abstract: We consider linear systems of equations and solution approximations derived by projection on a low-dimensional subspace. We propose stochastic iterative algorithms, based on simulation, which converge to the approximate solution and are suitable for very large-dimensional problems. The algorithms are extensions of recent approximate dynamic programming methods, known as temporal difference methods, which solve a projected form of Bellman’s equation by using simulation-based approximations to this equation, or by using a projected value iteration method. (Related Lecture Slides) An extended report: “Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods,” Lab. for Information and Decision Systems Report 2754, MIT, June 2007.

H. Yu and D. P. Bertsekas, “Q-Learning Algorithms for Optimal Stopping Based on Least Squares,” Proc. European Control Conference 2007, Kos, Greece, July 2007. (Related Lecture Slides) An extended report: “A Least Squares Q-Learning Algorithm for Optimal Stopping Problems,” Lab. for Information and Decision Systems Report 2731, MIT, February 2007 (revised June 2007).Abstract: We consider the solution of discounted optimal stopping problems using linear function approximation methods. A $Q$-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties. (Lecture Slides)

H. Yu and D. P. Bertsekas, “On Near-Optimality of the Set of Finite-State Controllers for Average Cost POMDP,” Lab. for Information and Decision Systems Report 2689, MIT, April 2006; Mathematics of Operations Research, 33(1), pp. 1-11, February 2008.Abstract: We consider the average cost problem for partially observable Markov decision processes (POMDP) with finite state, observation, and control spaces. We prove that there exists an $epsilon$-optimal finite-state controller functionally independent of initial distributions for any $epsilon > 0$, under the assumption that the optimal liminf average cost function of the POMDP is constant. As part of our proof, we establish that if the optimal liminf average cost function is constant, then the optimal limsup average cost function is also constant, and the two are equal. We also discuss the connection between the existence of nearly optimal finite-history controllers and two other important issues for average cost POMDP: the existence of an average cost that is independent of the initial state distribution, and the existence of a bounded solution to the constant average cost optimality equation. (Related slide presentation)

H. Yu and D. P. Bertsekas, “Convergence Results for Some Temporal Difference Methods Based on Least Squares,” Lab. for Information and Decision Systems Report 2697, MIT, June 2006; revised August, 2008; IEEE Trans. on Aut. Control, Vol. 54, 2009, pp. 1515-1531.Abstract: We consider finite-state Markovian Decision Problems and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms. These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the discounted cost method, which uses a constant stepsize, and we prove its convergence. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(Lambda), as well as more reliable. (Related slide presentation)

D. P. Bertsekas, “Separable Dynamic Programming and Approximate Decomposition Methods,” Lab. for Information and Decision Systems Report 2684, MIT, Feb. 2006; IEEE Trans. on Aut. Control, Vol. 52, 2007, pp. 911-916.Abstract: We consider control, planning, and resource allocation problems involving several independent subsystems that are coupled through a control/decision constraint. We discuss one-step lookahead methods that use an approximate cost-to-go function derived from the solution of single subsystem problems. We propose a new method for constructing such approximations, and derive bounds on the performance of the associated suboptimal policies. We then specialize this method to problems of reachability of target tubes that have the form of a box (a Cartesian product of subsystem tubes). This yields inner approximating tubes, which have the form of a union of a finite number of boxes, each involving single subsystem calculations.

D. P. Bertsekas, “Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC,” *in Fundamental Issues in Control, European J. of Control, Vol. 11, Nos. 4-5, 2005*; From 2005 CDC, Seville, Spain.Abstract: We survey selectively the field of approximate dynamic programming (ADP), with a particular emphasis on two recent directions of research: rollout algorithms and model predictive control (MPC). We argue that while motivated by different concerns, these two methodologies are closely connected, and the mathematical essence of their desirable properties (cost improvement and stability, respectively) is couched on the central dynamic programming idea of policy iteration. In particular, among other things, we show that the most common MPC schemes can be viewed as rollout algorithms or one-step policy iteration methods. Furthermore, we embed rollout and MPC within a new unifying suboptimal control framework, based on a concept of restricted or constrained structure policies, which contains these schemes as special cases.

D. P. Bertsekas, “Rollout Algorithms for Constrained Dynamic Programming,” Lab. for Information and Decision Systems Report 2646, MIT, April 2005.Abstract: The rollout algorithm is a suboptimal control method for deterministic and stochastic problems that can be solved by dynamic programming. In this short note, we derive an extension of the rollout algorithm that applies to constrained deterministic dynamic programming problems, and relies on a suboptimal policy, called base heuristic. Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm also produces a feasible solution, whose cost is no worse than the cost corresponding to the base heuristic.

H. Yu, and D. P. Bertsekas, “Discretized Approximations for POMDP with Average Cost,” The 20th Conference on Uncertainty in Artificial Intelligence, 2004, Banff, Canada.Abstract: In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. We show the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.

D. P. Bertsekas, V. Borkar, and A. Nedic, “Improved Temporal Difference Methods with Linear Function Approximation,” in “Learning and Approximate Dynamic Programming”, by A. Barto, W. Powell, J. Si, (Eds.), IEEE Press, 2004, pp. 231-255.Abstract: We consider temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost, and linear cost function approximation. We show, under standard assumptions, that a least squares-based temporal difference method, proposed by Nedic and Bertsekas [NeB03], converges with a stepsize equal to 1. To our knowledge, this is the first iterative temporal difference method that converges without requiring a diminishing stepsize. We discuss the connections of the method with Sutton’s TD(Lambda) and with various versions of least-squares based value iteration, and we show via analysis and experiment that the method is substantially and often dramatically faster than TD(Lambda), as well as simpler and more reliable. We also discuss the relation of our method with the LSTD method of Boyan [Boy02], and Bradtke and Barto [BrB96].

A. Nedic and D. P. Bertsekas, “Least-Squares Policy Evaluation Algorithms with Linear Function Approximation,” *LIDS Report LIDS-P-2537,* Dec. 2001; *J. of Discrete Event Systems*, Vol. 13, 2003, pp. 79-110.Abstract: We consider two policy evaluation methods for discounted dynamic programming, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the lambda-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(lambda) algorithm recently proposed by Boyan, which for lambda=0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(lambda), with probability 1, for every lambda in [0,1].

C. C. Wu and D. P. Bertsekas, “Admission Control for Wireless Networks,”accepted in*IEEE Trans. on Vehicular Technology*.Abstract: With the population of wireless subscribers increasing at a rapid rate, overloaded situations are likely to become an increasing problem. Admission control can be used to balance the goals of maximizing bandwidth utilization and ensuring sufficient resources for high priority events. In this paper, we formulate the admission control problem as a Markov decision problem. While dynamic programming can be used to solve such problems, the large size of the state space makes this impractical. We propose an approximate dynamic programming technique, which involves creating an approximation of the original model with a state space sufficiently small so that dynamic programming can be applied. Our results show that the method improves significantly on policies that are generally in use, in particular, the greedy policy and the reservation policy. Much of the computation required for our method can be done off-line, and the real-time computation required is easily distributed between the cells.

D. P. Bertsekas and D. A. Castanon, “Rollout Algorithms for Stochastic Scheduling Problems,” *J. of Heuristics*, Vol. 5, 1999, pp. 89-108.Abstract: Stochastic scheduling problems are difficult stochastic control problems with combinatorial decision spaces. In this paper we focus on a class of stochastic scheduling problems, the quiz problem and its variations. We discuss the use of heuristics for their solution, and we propose rollout algorithms based on these heuristics which approximate the stochastic dynamic programming algorithm. We show how the rollout algorithms can be implemented efficiently, and we delineate circumstances under which they are guaranteed to perform better than the heuristics on which they are based. We also show computational results which suggest that the performance of the rollout policies is near-optimal, and is substantially better than the performance of their underlying heuristics.

D. P. Bertsekas, M. L. Homer, D. A. Logan, S. D. Patek, and N. R. Sandell, “Missile Defense and Interceptor Allocation by Neuro-Dynamic Programming,” *IEEE Trans. on Systems, Man, and Cybernetics*, 1999.Abstract: The purpose of this paper is to propose a solution methodology for a missile defense problem involving the sequential allocation of defensive resources over a series of engagements. The problem is cast as a dynamic programming/Markovian decision problem, which is computationally intractable by exact methods because of its large number of states and its complex modeling issues. We have employed a Neuro-Dynamic Programming (NDP) framework, whereby the cost-to-go function is approximated using neural network architectures that are trained on simulated data. We report on the performance obtained using several different training methods, and we compare this performance with the optimal.

J. Abounadi, D. Bertsekas, and V. Borkar, “Learning Algorithms for Markov Decision Processes,” *Report LIDS-P-2434,* Lab. for Info. and Decision Systems, 1998; *SIAM J. on Control and Optimization,* Vol. 40, 2001, pp. 681-698.Abstract: This paper gives the first rigorous convergence analysis of analogs of Watkins’ Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of average cost problem – the traditional relative value iteration algorithm and a recent algorithm of Bertsekas based on the stochastic shortest path (SSP) formulation of the problem. Both synchronous and asynchronous implementations are considered and are analyzed using the “ODE” method. This involves establishing asymptotic stability of associated ODE limits. The SSP algorithm also uses ideas from two time scale stochastic approximation.

J. Abounadi, D. Bertsekas, and V. Borkar, “Stochastic Approximation for Non-Expansive Maps: Q-Learning Algorithms,” *Report LIDS-P-2433, Lab. for Info. and Decision Systems, 1998; SIAM J. on Control and Optimization,* Vol. 41, 2002, pp. 1-22.Abstract: We discuss synchronous and asynchronous variants of fixed point iterations of the form x^{k+1} = x^k + gamma(k) bl(F(x^k,w^k)-x^kbr), where F is a non-expansive mapping under a suitable norm, and {w^k} is a stochastic sequence. These are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark’s Lemma for the synchronous case or Borkar’s Theorem for the asynchronous case. However, the analysis requires that the iterates {x^k} are bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness, which is based on scaling ideas and properties of Lyapunov functions. We then combine the boundedness property with Borkar’s stability analysis of ODE’s involving non-expansive mappings to prove convergence with probability 1. We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and we are able to relax some of the assumptions of the currently available results.

S. D. Patek and D. P. Bertsekas, “Stochastic Shortest Path Games,” *SIAM J. on Control and Optimization*, Vol. 36, 1999, pp. 804-824.Abstract: We consider dynamic, two-player, zero-sum games where the “minimizing” player seeks to drive an underlying finite-state dynamic system to a special terminal state along a least expected cost path. The “maximizer” seeks to interfere with the minimizer’s progress so as to maximize the expected total cost. We consider, for the frst time, undiscounted finite-state problems, with compact action spaces, and transition costs that are not strictly positive. We admit that there are policies for the minimizer which permit the maximizer to prolong the game indefinitely. Under assumptions which generalize deterministic shortest path problems, we establish (i) the existence of a real-valued equilibrium cost vector achievable with stationary policies for the opposing players and (ii) the convergence of value iteration and policy iteration to the unique solution of Bellman’s equation.

D. P. Bertsekas, “A New Value Iteration Method for the Average Cost Dynamic Programming Problem,” *SIAM J. on Control and Optimization, Vol. 36, 1998, pp. 742-759*.Abstract: We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup norm contraction and for this reason it admits a Gauss-Seidel implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems.

D. P. Bertsekas, “Differential Training of Rollout Policies,” *Proc. of the 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, Ill., October 1997*.Abstract: We consider the approximate solution of stochastic optimal control problems using a neuro-dynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popular methods of computing rollout policies are particularly sensitive to simulation and approximation error, and we present more robust alternatives, which aim to estimate relative rather than absolute Q-factor and cost-to-go values. In particular, we propose a method, called differential training, that can be used to obtain an approximation to cost-to-go differences rather than cost-to-go values by using standard methods such as TD(lambda) and lambda-policy iteration. This method is suitable for recursively generating rollout policies in the context of simulation-based policy iteration methods.

D. P. Bertsekas, J. N. Tsitsiklis, and C. Wu, “Rollout Algorithms for Combinatorial Optimization,” *J. of Heuristics*, Vol. 3, 1997, pp. 245-262.Abstract: We consider the approximate solution of discrete optimization problems using procedures that are capable of magnifying the effectiveness of any given heuristic algorithm through sequential application. In particular, we embed the problem within a dynamic programming framework, and we introduce several types of rollout algorithms, which are related to notions of policy iteration. We provide conditions guaranteeing that the rollout algorithm improves the performance of the original heuristic algorithm. The method is illustrated in the context of a machine maintenance and repair problem.

D. P. Bertsekas and S. Ioffe, “Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming,” Report LIDS-P-2349, Lab. for Information and Decision Systems, MIT, 1996.Abstract: We consider a new policy iteration method for dynamic programming problems with discounted and undiscounted cost. The method is based on the notion of temporal differences, and is primarily geared to the case of large and complex problems where the use of approximations is essential. We develop the theory of these methods without approximation, we describe how to embed them within a neuro-dynamic programming/reinforcement learning context where feature-based approximation architectures are used, we relate them to TD(Lambda) methods, and we illustrate their use in the training of a tetris playing program.

L. C. Polymenakos, D. P. Bertsekas, and J. N. Tsitsiklis, “Efficient Algorithms for Continuous-Space Shortest Path Problems,” *IEEE Transactions on Automatic Control, Vol. 43, 1998, pp. 278-283*.Abstract: We consider a continuous-space shortest path problem in a two-dimensional plane. This is the problem of finding a trajectory that starts at a given point, ends at the boundary of a compact set, and minimizes a cost function of the form $int_0^Tr(x(t))dt+q(x(T)).$ For a discretized version of this problem, a Dijkstra-like method that requires one iteration per discretization point has been developed by Tsitsiklis. Here we develop some new label correcting-like methods based on the Small Label First methods of Bertsekas. We prove the finite termination of these methods, and we present computational results showing that they are competitive and often superior to the Dijkstra-like method, and are also much faster than the traditional Jacobi and Gauss-Seidel methods.

S. D. Patek and D. P. Bertsekas,“Play Selection in American Football: a Case Study in Neuro-Dynamic Programming”, Chapter 7 in *Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search: Interfaces in Computer Science and Operations Research,* David L. Woodruff, editor. Kluwer Academic Publishers, Boston, 1997.Abstract: We present a computational case study of neuro-dynamic program- ming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration. The algorithms we consider evolve as a sequence of approximate policy eval- uations and policy updates. An (exact) evaluation amounts to the computation of the reward-to-go function associated with the policy in question. Approxi- mations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for esti- mating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropaga- tion. The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, al- though we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.

B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis, “A Neuro-Dynamic Programming Approach to Retailer Inventory Management,” *Proceedings of the IEEE Conference on Decision and Control, 1997*; this is an extended version which appeared as a Lab. for Information and Decision Systems Report, MIT, Nov. 1996.Abstract: We present a model of two-echelon retailer inventory systems, and we cast the problem of generating optimal control strategies into the framework of dynamic programming. We formulate two specific case studies for which the underlying dynamic programming problems involve thirty three and forty six state variables, respectively. Because of the enormity of these state spaces, classical algorithms of dynamic programming are inapplicable. To address these complex problems we develop approximate dynamic programming algorithms. The algorithms are motivated by recent research in artificial intelligence involving simulation-based methods and neural network approximations, and they are representative of algorithms studied in the emerging field of neuro-dynamic programming. We assess performance of resulting solutions relative to optimized s-type (“order-up-to” policies), which are generally accepted as reasonable heuristics for the types of problems we consider. In both case studies, we are able to generate control strategies substantially superior to the heuristics, reducing inventory costs by approximately ten percent.

D. P. Bertsekas, F. Guerriero, and R. Musmanno, “Parallel Asynchronous Label Correcting Methods for Shortest Paths,” *J. of Optimization Theory and Applications*, Vol. 88, 1996, pp. 297-320.Abstract: In this paper we develop parallel asynchronous implementations of some known and some new label correcting methods for finding a shortest path from a single origin to all the other nodes of a directed graph. We compare these implementations on a shared memory multiprocessor, the Alliant FX/80, using several types of randomly generated problems. Excellent (sometimes superlinear) speedup is achieved with some of the methods, and it is found that the asynchronous versions of these methods are substantially faster than their synchronous counterparts.

D. P. Bertsekas, “Generic Rank One Corrections for Value Iteration in Markovian Decision Problems,” *Operations Research Letters*, Vol. 17, 1995, pp. 111-119.Abstract: Given a linear iteration of the form $x:=F(x)$, we consider modified versions of the form $x:=F(x+g d)$, where $d$ is a fixed direction, and $g$ is chosen to minimize the norm of the residual $|x+g d-F(x+g d)|$. We propose ways to choose $d$ so that the convergence rate of the modified iteration is governed by the subdominant eigenvalue of the original. In the special case where $F$ relates to a Markovian decision problem, we obtain a new extrapolation method for value iteration. In particular, our method accelerates the Gauss-Seidel version of the value iteration method for discounted problems in the same way that MacQueen’s error bounds accelerate the standard version. Furthermore, our method applies equally well to Markov Renewal and undiscounted problems.

D. P. Bertsekas, “A Counterexample to Temporal Difference Learning,” *Neural Computation*, Vol. 7, 1995, pp. 270-279.Abstract: Sutton’s TD($l$) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on $l$. For $l=1$ the representation is optimal with respect to a least squares error criterion, but as $l$ decreases towards 0 the representation becomes progressively worse and, in some cases, very poor. The example suggests a need to understand better the circumstances under which TD(0) and Q-learning obtain satisfactory neural network-based compact representations of the cost function. A variation of TD(0) is also proposed, which performs better on the example.

D. P. Bertsekas and J. N. Tsitsiklis, “An Analysis of Stochastic Shortest Path Problems,” *Mathematics of Operations Research,* Vol. 16, 1991, pp. 580-595.Abstract: We consider a stochastic version of the classical shortest path problem whereby for each node of a graph, we must choose a probability distribution over the set of successor nodes so as to reach a certain destination node with minimum expected cost. The costs of transition between successive nodes can be positive as well as negative. We prove natural generalizations of the standard results for the deterministic shortest path problem, and we extend the corresponding theory for undiscounted finite state Markovian decision problems by removing the usual restriction that costs are either all nonnegative or all nonpositive.

D. P. Bertsekas and D. A. Castanon, “Adaptive Aggregation Methods for Infinite Horizon Dynamic Programming,” *IEEE Transactions on Aut. Control*, Vol. 34, 1989, pp. 589-598.Abstract: We propose a class of iterative aggregation algorithms for solving infinite horizon dynamic programming problems. The idea is to interject aggregation iterations in the course of the usual successive approximation method. An important new feature that sets our method apart from earlier proposals is that the aggregate groups of states change adaptively from one aggregation iteration to the next, depending on the progress of the computation. This allows acceleration of convergence in difficult problems involving multiple ergodic classes for which methods using fixed groups of aggregate states are ineffective. No knowledge of special problem structure is utilized by the algorithms.

D. P. Bertsekas, “Distributed Dynamic Programming,” *IEEE Transactions on Aut. Control*, Vol. AC-27, 1982, pp. 610-616.Abstract: We consider distributed algorithms for solving dynamic programming problems whereby several processors participate simultaneously in th computation while maintaining coordination by information exchange via communication links. A model of asynchronous distributed computation is developed which requires very weak assumptions on the ordering of computations,the timing of information exchange,the amount of local information needed at each computation node, and the initial condition for the algorithm. the class of problems considered is very broad and includes shortest path problems, and finite and infinite horizon stochastic optimal control problems. When specialized to the shortest path problem, the algorithm reduces to the algorithm originally implemented for routing messages in the internet.

D. P. Bertsekas, and S. E. Shreve “Existence of Optimal Stationary Policies in Deterministic Optimal Control,” *J. of Math Analysis and Applications*, Vol. 69, 1979, pp. 607-620.Abstract: This paper considers deterministic discrete-time optimal control problems over an infinite horizon involving a stationary system and a nonpositive cost per stage. Various results are provided relating to existence of an epsilon-optimal stationary policy, and existence of an optimal stationary policy assuming an optimal policy exists.

S. E. Shreve, and D. P. Bertsekas, “Alternative Theoretical Frameworks for Finite Horizon Discrete-Time Stochastic Optimal Control,” *SIAM J. on Control and Optimization,* Vol. 16, 1978, pp. 953-978.Abstract: Stochastic optimal control problems are usually analyzed under one of three types of assumptions: a) Countability assumptions on the underlying probability space – this eliminates all difficulties of measure theoretic nature; b) Semicontinuity assumptions under which the existence of optimal Borel measurable policies can be guaranteed; and c) Borel measurability assumptions under which the existence of p-optimal or p-epsilon-optimal Borel measurable policies can be guaranteed (Blackwell, Strauch). In this paper, we introduce a general theoretical framework based on outer integration which contains these three models as special cases. Within this framework all known results for finite horizon problems together with some new ones are proved and subsequently specialized. An important new feature of our specialization to the Borel measurable model is the introduction of universally measurable policies. We show that everywhere optimal or nearly optimal policies exist within this class and this enables us to dispense with the notion of p-optimality.

D. P. Bertsekas, “Monotone Mappings with Application in Dynamic Programming,” *SIAM J. on Control and Optimization,* Vol. 15, 1977, pp. 438-464; a version from the Proceedings of the 1975 IEEE Conference on Decision and Control, Houston, TX: “Monotone Mappings in Dynamic Programming.”Abstract: The structure of many sequential optimization problems over a finite or infinite horizon can be summarized in the mapping that defines the related dynamic programming algorithm. In this paper, we take as a starting point this mapping and obtain results that are applicable to a broad class of problems. This approach has also been taken earlier by Denardo under contraction assumptions. The analysis here is carried out without contraction assumptions and thus the results obtained can be applied, for example, to the positive and negative dynamic programming models of Blackwell and Strauch. Most of the existing results for these models are generalized and several new results are obtained relating mostly to the convergence of the dynamic programming algorithm and the existence of optimal stationary policies.

D. P. Bertsekas, “Convergence of Discretization Procedures in Dynamic Programming,” *IEEE Transactions on Aut. Control,* June 1975, pp. 415-419.Abstract: The computational solution of discrete-time stochastic optimal control problems by dynamic programming requires, in most cases, discretization of the state and control spaces whenever these spaces are infinite. In this short paper we consider a discretization procedure often employed in practice. Under certain compactness and Lipschitz continuity assumptions we show that the solution of the discretized algorithm converges to the solution of the continuous algorithm, as the discretization grid becomes finer and finer. Furthermore, any control law obtained from the discretized algorithm results in a value of the cost functional which converges to the optimal value of the problem.

D. P. Bertsekas, “Linear Convex Stochastic Control Problems Over an Infinite Horizon,” *IEEE Transactions on Aut. Control,* Vol. AC-18, 1973, pp. 314-315.Abstract: A stochastic control problem over an infinite horizon which involves a linear system and a convex cost functional is analyzed. We prove the convergence of the dynamic programming algorithm associated with the problem, and we show the existence of a stationary Borel measurable optimal control law. The approach used illustrates how results on infinite time reachability [1] can be used for the analysis of dynamic programming algorithms over an infinite horizon subject to state constraints.

D. P. Bertsekas, “Control of Uncertain Systems with a Set-Membership Description of the Uncertainty,” *Ph.D. Thesis, Dept. of Electrical Engineering, M.I.T.*, 1971.Abstract: The problem of optimal feedback control of uncertain discrete-time dynamic systems is considered, where the uncertain quantities do not have a stochastic description but instead they are known to belong to given sets. The problem is converted to a sequential minimax problem and dynamic programming is suggested as a general method for its solution. The notion of a sufficiently informative function, which parallels the notion of a sufficient statistic of optimal control is introduced, and the possible decomposition of the optimal controller into an estimator and an actuator is demonstrated. Some special cases involving a linear system are further examined. A problem involving a convex cost functional and perfect state information for the controller is considered in detail. Particular attention is given to a special case, the problem of reachability of a target tube, and an ellipsoidal approximation algorithm is obtained which leads to linear control laws. State estimation problems are also examined, and some algorithms are derived which offer distinct advantages over existing estimation schemes. These algorithms are subsequently used in the solution of some reachability problems with imperfect state information for the controller.

##### Nonlinear Programming and Optimization Applications

D. P. Bertsekas, “Proximal Algorithms and Temporal Differences for Large Linear Systems: Extrapolation, Approximation, and Simulation,” Lab. for Information and Decision Systems Report LIDS-P-3205, MIT, October 2016. (Related Lecture Slides), a version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. (Related Video Lecture). In this paper we consider large linear fixed point problems and solution with proximal algorithms. We show that, under certain assumptions, there is a close connection between proximal iterations, which are prominent in numerical analysis and optimization, and multistep methods of the temporal difference type such as TD(lambda), LSTD(lambda), and LSPE(lambda), which are central in simulation-based approximate dynamic programming. As an application of this connection, we show that we may accelerate the standard proximal algorithm by extrapolation towards the multistep iteration, which generically has a faster convergence rate. We also use the connection with multistep methods to integrate into the proximal algorithmic context several new ideas that have emerged in the approximate dynamic programming context. In particular, we consider algorithms that project each proximal iterate onto the subspace spanned by a small number of basis functions, using low-dimensional calculations and simulation, and we discuss various algorithmic options from approximate dynamic programming.

D. P. Bertsekas, “Incremental Aggregated Proximal and Augmented Lagrangian Algorithms,” Lab. for Information and Decision Systems Report LIDS-P-3176, MIT, June 2015. (Related Video Lecture) . We consider minimization of the sum of a large number of convex functions, and we propose an incremental aggregated version of the proximal algorithm, which bears similarity to the incremental aggregated gradient and subgradient methods that have received a lot of recent attention. Under cost function differentiability and strong convexity assumptions, we show linear convergence for a sufficiently small constant stepsize. We then consider dual versions of incremental proximal algorithms, which are incremental augmented Lagrangian methods for separable constrained optimization problems. Contrary to the standard augmented Lagrangian method, these methods admit decomposition in the minimization of the augmented Lagrangian. The incremental aggregated augmented Lagrangian method also bears similarity to several known decomposition algorithms, which, however, are not incremental in nature: the alternating direction method of multipliers (ADMM), the Stephanopoulos and Westerberg algorithm [StW75], and the related methods of Tadjewski [Tad89] and Ruszczynski [Rus95].

M. Wang and D. P. Bertsekas, “Stochastic First-Order Methods with Random Constraint Projection,” SIAM Journal on Optimization, 2016, Vol. 26, pp. 681-717. (Related Lecture Slides) (Related Video Lecture) . We consider convex optimization problems with structures that are suitable for sequential treatment or online sampling. In particular, we focus on problems where the objective function is an expected value, and the constraint set is the intersection of a large number of simpler sets. We propose an algorithmic framework for stochastic first-order methods using random projection/proximal updates and random constraint updates, which contain as special cases several known algorithms as well as many new algorithms. To analyze the convergence of these algorithms in a unified manner, we prove a general coupled convergence theorem. It states that the convergence is obtained from an interplay between two coupled processes: progress toward feasibility and progress toward optimality. Under suitable stepsize assumptions, we show that the optimality error decreases at a rate of O(1/ \sqrt{k}) and the feasibility error decreases at a rate of O(log k/k). We also consider a ?number of typical sampling processes for generating stochastic first-order information and random constraints, which are common in data-intensive applications, online learning, and simulation optimization. By using the coupled convergence theorem as a modular architecture, we are able to analyze the convergence of stochastic algorithms that use arbitrary combinations of these sampling processes.

M. Wang and D. P. Bertsekas, “Incremental Constraint Projection Methods for Variational Inequalities,” Lab. for Information and Decision Systems Report LIDS-P-2898, MIT, December 2012; Mathematical Programming, Vol. 150, 2015, pp. 321-363. We consider the solution of strongly monotone variational inequalities of the form $F(x^*)'(x-x^*)\geq 0$, for all $x\in X$. We focus on special structures that lend themselves to sampling, such as when $X$ is the intersection of a large number of sets, and/or $F$ is an expected value or is the sum of a large number of component functions. We propose new methods that combine elements of incremental constraint projection and stochastic gradient. We analyze the convergence and the rate of convergence of these methods with various types of sampling schemes, and we establish a substantial rate of convergence advantage for random sampling over cyclic sampling.

D. P. Bertsekas, “Incremental Proximal Methods for Large Scale Convex Optimization,” Mathematical Programming, Vol. 129, 2011, pp.163-195. (Related Lecture Slides). (Related Video Lecture) . Abstract: We consider the minimization of a sum $\sum_{i=1}^mf_i(x)$ consisting of a large number of convex component functions $f_i$. For this problem, incremental methods consisting of gradient or subgradient iterations applied to single components have proved very effective. We propose new proximal versions of incremental methods, consisting of proximal iterations applied to single components, as well as combinations of gradient, subgradient, and proximal iterations. We provide a convergence and rate of convergence analysis of a variety of such methods, including some that involve randomization in the selection of components. We also discuss applications in various contexts, including signal processing and inference/machine learning.

D. P. Bertsekas, “Centralized and Distributed Newton Methods for Network Optimization and Extensions,” Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011. Abstract: We consider Newton methods for common types of single commodity and multi-commodity network flow problems. Despite the potentially very large dimension of the problem, they can be implemented using the conjugate gradient method and low-dimensional network operations, as shown nearly thirty years ago. We revisit these methods, compare them to more recent proposals, and describe how they can be implemented in a distributed computing system. We also discuss generalizations, including the treatment of arc gains, linear side constraints, and related special structures.

D. P. Bertsekas and H. Yu, “A Unifying Polyhedral Approximation Framework for Convex Optimization,” Lab. for Information and Decision Systems Report LIDS-P-2820, MIT, September 2009 (revised December 2010), published in SIAM J. on Optimization, Vol. 21, 2011, pp. 333-360. (Related VideoLecture, Dec. 2008) (Related Lecture Slides) Abstract: We propose a unifying framework for polyhedral approximation in convex optimization. It subsumes classical methods, such as cutting plane and simplicial decomposition, but also includes new methods, and new versions/extensions of old methods, such as a simplicial decomposition method for nondifferentiable optimization, and a new piecewise linear approximation method for convex single commodity network flow problems. Our framework is based on an extended form of monotropic programming, a broadly applicable model, which includes as special cases Fenchel duality and Rockafellar’s monotropic programming, and is characterized by an elegant and symmetric duality theory. Our algorithm combines flexibly outer and inner linearization of the cost function. The linearization is progressively refined by using primal and dual differentiation, and the roles of outer and inner linearization are reversed in a mathematically equivalent dual algorithm. We provide convergence results and error bounds for the general case where outer and inner linearization are combined in the same algorithm.

A. Nedich and D. P. Bertsekas, “The Effect of Deterministic Noise in Subgradient Methods,” Math. Programming, Ser. A, Vol. 125, pp. 75-99, 2010. Abstract: In this paper, we study the influence of noise on subgradient methods for convex constrained optimization. The noise may be due to various sources, and is manifested in inexact computation of the subgradients. Assuming that the noise is deterministic and bounded, we discuss the convergence properties for two cases: the case where the constraint set is compact, and the case where this set need not be compact but the objective function has a sharp set of minima (for example the function is polyhedral). In both cases, using several different stepsize rules, we prove convergence to the optimal value within some tolerance that is given explicitly in terms of the subgradient errors. In the first case, the tolerance is nonzero, but in the second case, somewhat surprisingly, the optimal value can be obtained exactly, provided the size of the error in the subgradient computation is below some threshold. We then extend these results to objective functions that are the sum of a large number of convex functions, in which case an incremental subgradient method can be used.

Yu, H., Bertsekas, D. P., and Rousu, J., An Efficient Discriminative Training Method for Generative Models,” Extended Abstract, the 6th International Workshop on Mining and Learning with Graphs (MLG), 2008. Abstract: We propose an efficient discriminative training method for generative models under supervised learning. In our setting, fully observed instances are given as training examples, together with a specification of variables of interest for prediction. We formulate the training as a convex programming problem, incorporating the SVM-type large margin constraints to favor parameters under which the maximum a posteriori (MAP) estimates of the prediction variables, conditioned on the rest, are close to their true values given in the training instances. The resulting optimization problem is, however, more complex than its quadratic programming (QP) counterpart resulting from the SVM-type training of conditional models, because of the presence of non-linear constraints on the parameters. We present an efficient optimization method, which combines several techniques, namely, a data-dependent reparametrization of dual variables, restricted simplicial decomposition, and the proximal point algorithm. Our method extends the one for solving the aforementioned QP counterpart, proposed earlier by some of the authors.

D. P. Bertsekas, “Extended Monotropic Programming and Duality,” Lab. for Information and Decision Systems Report 2692, MIT, March 2006, corrected in Feb. 2010; a version appeared in JOTA, 2008, Vol. 139, pp. 209-225. Abstract: We consider the separable problem of minimizing $\sum_{i=1}^mf_{i}(x_{i})$ subject to $x\in S$, where $x_i$ are multidimensional subvectors of $x$, $f_i$ are convex functions, and $S$ is a subspace. Monotropic programming, extensively studied by Rockafellar, is the special case where the subvectors $x_i$ are the scalar components of $x$. We show a strong duality result that parallels Rockafellar’s result for monotropic programming, and contains other known and new results as special cases. The proof is based on the use of $\e$-subdifferentials and the $\e$-descent method, which is used here as an analytical vehicle.

D. P. Bertsekas, and P. Tseng, “Set Intersection Theorems and Existence of Optimal Solutions,” Lab. for Information and Decision Systems Report 2628, MIT, Nov. 2004; revised August 2005; Math. Programming J., Vol. 110, 2007 pp. 287-314. Abstract: The question of nonemptiness of the intersection of a nested sequence of closed sets is fundamental in a number of important optimization topics, including the existence of optimal solutions, the validity of the minimax inequality in zero sum games, and the absence of a duality gap in constrained optimization. We introduce the new notion of an asymptotic direction of a sequence of closed sets, and the associated notions of retractive, horizon, and critical directions, and we provide several conditions that guarantee the nonemptiness of the corresponding intersection. We show how these conditions can be used to obtain simple proofs of some known results on existence of optimal solutions, and to derive some new results, including an extension of the Frank-Wolfe Theorem for (nonconvex) quadratic programming. (Related Slide Presentation)

D. P. Bertsekas, “Lagrange Multipliers with Optimal Sensitivity Properties in Constrained Optimization,” Lab. for Information and Decision Systems Report 2632, MIT, October 2004; in Proc. of the 2004 Erice Workshop on Large Scale Nonlinear Optimization, Erice, Italy, Kluwer, 2005. We consider optimization problems with inequality and abstract set constraints, and we derive sensitivity properties of Lagrange multipliers under very weak conditions. In particular, we do not assume uniqueness of a Lagrange multiplier or continuity of the perturbation function. We show that the Lagrange multiplier of minimum norm defines the optimal rate of improvement of the cost per unit constraint violation. (Related Slide Presentation), (Related Slide Presentation)

D. P. Bertsekas, A. E. Ozdaglar, and P. Tseng, “Enhanced Fritz John Conditions for Convex Programming,” Lab. for Information and Decision Systems Report 2631, MIT, July 2004; SIAM J. on Optimization, Vol. 16, 2006, p. 766. Abstract: We consider convex constrained optimization problems, and we enhance the classical Fritz John optimality conditions to assert the existence of multipliers with special sensitivity properties. In particular, we prove the existence of Fritz John multipliers that are informative in the sense that they identify constraints whose relaxation, at rates proportional to the multipliers, strictly improves the primal optimal value. Moreover, we show that if the set of geometric multipliers is nonempty, then the minimum-norm vector of this set is informative, and defines the optimal rate of cost improvement per unit constraint violation. Our assumptions are very general, and do not include the absence of duality gap or the existence of optimal solutions. In particular, for the case where there is a duality gap, we establish enhanced Fritz John conditions involving the dual optimal value and dual optimal solutions. (Related Slide Presentation)

A. E. Ozdaglar and D. P. Bertsekas, “The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization,” *Optimization Methods and Software, Vol. 19 (2004), pp. 493–506*. Abstract: We consider optimization problems with equality, inequality, and abstract set constraints. We investigate the relations between various characteristics of the constraint set that imply the existence of Lagrange multipliers. For problems with no abstract set constraint, the classical condition of quasiregularity provides the connecting link between the most common constraint qualifications and existence of Lagrange multipliers. In earlier work, we introduced a new and general condition, pseudonormality, that is central within the theory of constraint qualifications, exact penalty functions, and existence of Lagrange multipliers. In this paper, we explore the relations between pseudonormality, quasiregularity, and existence of Lagrange multipliers in the presence of an abstract set constraint. In particular, under a regularity assumption on the abstract constraint set, we show that pseudonormality implies quasiregularity. However, contrary to pseudonormality, quasiregularity does not imply the existence of Lagrange multipliers, except under additional assumptions.

A. E. Ozdaglar and D. P. Bertsekas, “Optimal Solution of Integer Multicommodity Flow Problems with Application in Optical Networks,” *Proc. of Symposium on Global Optimization, Santorini, Greece, June 2003*. Abstract: In this paper, we propose methods for solving broadly applicable integer multicommodity flow problems. We focus in particular on the problem of routing and wavelength assignment (RWA), which is critically important for increasing the efficiency of wavelength-routed all-optical networks. Our methodology can be applied as a special case to the problem of routing in a circuit-switched network. We discuss an integer-linear programming formulation, which can be addressed with highly efficient linear (not integer) programming methods, to obtain optimal or nearly optimal solutions. *Note:* A comparative computational evaluation of the methodology of this paper is given in the thesis by Ali Meli.

A. E. Ozdaglar and D. P. Bertsekas, “Routing and Wavelength Assignment in Optical Networks,” *Report LIDS-P-2535,* Dec. 2001; *IEEE Trans. on Networking,* no. 2, Apr. 2003, pp. 259-272. Abstract: The problem of routing and wavelength assignment (RWA) is critically important for increasing the efficiency of wavelength-routed all-optical networks. Given the physical network structure and the required connections, the RWA problem is to select a suitable path and wavelength among the many possible choices for each connection so that no two paths sharing a link are assigned the same wavelength. In work to date, this problem has been formulated as a difficult integer programming problem that does not lend itself to efficient solution or insightful analysis. In this work, we propose several novel optimization problem formulations that offer the promise of radical improvements over the existing methods. We adopt a (quasi-)static view of the problem and propose new integer-linear programming formulations, which can be addressed with highly efficient linear (*not* integer) programming methods and yield optimal or near-optimal RWA policies. The fact that this is possible is surprising, and is the starting point for new and greatly improved methods for RWA. Aside from its intrinsic value, the quasi-static solution method can form the basis for suboptimal solution methods for the stochastic/dynamic settings. *Note:* A comparative computational evaluation of the methodology of this paper is given in the thesis by Ali Meli.

D. P. Bertsekas and A. E. Ozdaglar, “Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization,” *Report LIDS-P-2489, Nov. 2000; JOTA, Vol. 114, (2002), pp. 287–343.* Abstract: We consider optimization problems with equality, inequality, and abstract set constraints, and we explore various characteristics of the constraint set that imply the existence of Lagrange multipliers. We prove a generalized version of the Fritz-John theorem, and we introduce new and general conditions that extend and unify the major constraint qualifications. Among these conditions, two new properties, pseudonormality and quasinormality, emerge as central within the taxonomy of interesting constraint characteristics. In the case where there is no abstract set constraint, these properties provide the connecting link between the classical constraint qualifications and two distinct pathways to the existence of Lagrange multipliers: one involving the notion of quasiregularity and Farkas’ Lemma, and the other involving the use of exact penalty functions. The second pathway also applies in the general case where there is an abstract set constraint.(Related Slide Presentation)

P. Tseng and D. P. Bertsekas, “An Epsilon-Relaxation Method for Separable Convex Cost Generalized Network Flow Problems,” *Math. Programming, Vol. 88, (2000), pp. 85–104. *Abstract: We generalize the epsilon-relaxation method for the single commodity, linear or separable convex cost network flow problem to network flow problems with positive gains. The method maintains epsilon-complementary slackness at all iterations and adjusts the arc flows and the node prices so as to satisfy flow conservation upon termination. Each iteration of the method involves either a price change on a node or a flow change along an arc or a flow change along a simple cycle. Complexity bounds for the method are derived. For one implementation employing epsilon-scaling, the bound is polynomial in the number of nodes N, the number of arcs A, a certain constant Gamma depending on the arc gains, and ln(epsilon^0/\bar epsilon), where epsilon^0 and \bar epsilon denote, respectively, the initial and the final tolerance epsilon.

D. P. Bertsekas and A. E. Koksal-Ozdaglar, “Enhanced Optimality Conditions and Exact Penalty Functions,” *Proc. of the 38th Allerton Conference on Communication, Control, and Computing, Allerton Park, Ill., September 2000*. Abstract: We consider optimization problems with equality, inequality, and abstract set constraints, and we explore various characteristics of the constraint set that imply the existence of Lagrange multipliers. We prove a generalized version of the Fritz-John theorem, and we introduce new and general conditions that extend and unify the major constraint qualifications. Among these conditions, a new property, pseudonormality, provides the connecting link between the classical constraint qualifications and the use of exact penalty functions. (Related Slide Presentation)

A. Nedic and D. P. Bertsekas, “Incremental Subgradient Methods for Nondifferentiable Optimization,” *Report LIDS-P-2460, Dec. 2000, SIAM J. on Optimization,* Vol. 12, 2001, pp. 109-138. Abstract: We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method.

A. Nedic, D. P. Bertsekas, and V. Borkar, Distributed Asynchronous Incremental Subgradient Methods, *Proceedings of the March 2000 Haifa Workshop “Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications”, D. Butnariu, Y. Censor, and S. Reich, Eds., Elsevier, Amsterdam, 2001. *Abstract: We propose and analyze a distributed asynchronous subgradient method for minimizing a convex function that consists of the sum of a large number of component functions. The idea is to distribute the computation of the component subgradients among a set of processors, which communicate only with a coordinator. The coordinator performs the subgradient iteration incrementally and asynchronously, by taking steps along the subgradients of the component functions that are available at the update time. The incremental approach has performed very well in centralized computation, and the parallel implementation should improve its performance substantially, particularly for typical problems where computation of the component subgradients is relatively costly.

A. Nedic and D. P. Bertsekas, “Convergence Rate of Incremental Subgradient Algorithms,” *Stochastic Optimization: Algorithms and Applications (S. Uryasev and P. M. Pardalos, Editors), pp. 263-304, Kluwer Academic Publishers, 2000*. Abstract: We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we present convergence results and estimates of the convergence rate of a number of variants of incremental subgradient methods, including some that use randomization. The convergence rate estimates are consistent with our computational results, and suggests that the randomized variants perform substantially better than their deterministic counterparts.

D. P. Bertsekas, and J. N. Tsitsiklis, “Gradient Convergence in Gradient Methods with Errors,” *Report LIDS-P-2404, Lab. for Info. and Decision Systems, November 1997, SIAM J. on Optimization, Vol. 10, 2000, pp. 627-642*. Abstract: For the classical gradient method and several deterministic and stochastic variants for unconstrained minimization, we discuss the issue of convergence of the gradient sequence and the attendant issue of stationarity of limit points of the iterate sequence. We assume that the cost function gradient is Lipschitz continuous, and that the stepsize diminishes to 0 and satisfies standard stochastic approximation conditions. We show that either the cost sequence diverges to – infinity or else the cost sequence converges to a finite value and the gradient sequence converges to 0 (with probability 1 in the stochastic case). Existing results assume various boundedness conditions, such as boundedness of the cost sequence or the gradient sequence or the iterate sequence, which we do not assume.

D. P. Bertsekas, “A Note on Error Bounds for Convex and Nonconvex Programs, “ *COAP (Comp. Optimization and Applications), Vol. 12, 1999, pp. 41-51*. Abstract: Given a single feasible solution $x_F$ and a single infeasible solution $x_I$ of a mathematical program, we provide an upper bound to the optimal dual value. We assume that $x_F$ satisfies a weakened form of the Slater condition. We apply the bound to convex programs and we discuss its relation to Hoffman-like bounds. As a special case, we recover a bound due to Mangasarian [Man97] on the distance of a point to a convex set specified by inequalities.

C. C. Wu and D. P. Bertsekas, “Distributed Power Control Algorithms for Wireless Networks,” *IEEE Trans. on Vehicular Technology*, Vol. 50, pp. 504-514, 2001. Abstract: Power control has been shown to be an effective way to increase capacity in wireless systems. In previous work on power control, it has been assumed that power levels can be assigned from a continuous range. In practice, however, power levels are assigned from a discrete set. In this work, we consider the minimization of the total power transmitted over given discrete sets of available power levels subject to maintaining an acceptable signal quality for each mobile. We have developed distributed iterative algorithms for solving a more general version of this integer programming problem, which is of independent interest, and have shown that they find the optimal solution in a finite number of iterations which is polynomial in the number of power levels and the number of mobiles.

D. P. Bertsekas, “A New class of Incremental Gradient Methods for Least Squares,” *SIAM J. on Optimization*, Vol. 7, 1997, pp. 913-926. Abstract: The LMS method for linear least squares problems differs from the steepest descent method in that it processes data blocks one-by-one, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, and to slower (sublinear) convergence when close to the optimal solution. We embed both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and we propose a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rate of steepest descent. These methods are well-suited for neural network training problems with large data sets.

D. P. Bertsekas, “Incremental Least Squares Methods and the Extended Kalman Filter,” *SIAM J. on Optimization*, Vol. 6, 1996, pp. 807-822. Abstract: In this paper we propose and analyze nonlinear least squares methods, which process the data incrementally, one data block at a time. Such methods are well suited for large data sets and real time operation, and have received much attention in the context of neural network training problems. We focus on the Extended Kalman Filter, which may be viewed as an incremental version of the Gauss-Newton method. We provide a nonstochastic analysis of its convergence properties, and we discuss variants aimed at accelerating its convergence.

D. P. Bertsekas, “Thevenin Decomposition and Network Optimization,” *J. of Optimization Theory and Applications*, Vol. 89, 1996, pp. 1-15. Abstract: Thevenin’s theorem, one of the most celebrated results of electric circuit theory, provides a two-parameter characterization of the behavior of an arbitrarily large circuit, as seen from two of its terminals. We interpret the theorem as a sensitivity result in an associated minimum energy/network flow problem, and we abstract its main idea to develop a decomposition method for convex quadratic programming problems with linear equality constraints, such as those arising in a variety of contexts such as Newton’s method, interior point methods, and least squares estimation. Like Thevenin’s theorem, our method is particularly useful in problems involving a system consisting of several subsystems, connected to each other with a small number of coupling variables.

D. P. Bertsekas and P. Tseng, “Partial Proximal Minimization Algorithms for Convex Programming,” *SIAM J. on Optimization, Vol. 4, 1994, pp. 551-572*. Abstract: We consider an extension of the proximal minimization algorithm where only some of the minimization variables appear in the quadratic proximal term. We interpret the resulting iterates in terms of the iterates of the standard algorithm and we show a uniform descent property, which holds independently of the proximal terms used. This property is used to give simple convergence proofs of parallel algorithms where multiple processors simultaneously execute proximal iterations using different partial proximal terms. We also show that partial proximal minimization algorithms are dual to multiplier methods with partial elimination of constraints, and we establish a relation between parallel proximal minimization algorithms and parallel constraint distribution algorithms.

D. P. Bertsekas and P. Tseng, “On the Convergence of the Exponential Multiplier Method for Convex Programming,” *Math Programming,* Vol. 60, pp. 1-19, 1993. Abstract: In this paper, we analyze the exponential method of multipliers for convex constrained minimization problems, which operates like the usual Augmented Lagrangian method, except that it uses an exponential penalty function in place of the usual quadratic. We also analyze a dual counterpart, the entropy minimization algorithm, which operates like the proximal minimization algorithm, except that it uses a logarithmic/entropy “proximal” term in place of a quadratic. We strengthen substantially the available convergence results for these methods, and we derive their convergence rate when applied to linear programs.

J. Eckstein and D. P. Bertsekas, “On the Douglas-Ratchford Splitting Method and the Proximal Point Algorithm for Maximal Monotone Operators,” *Mathematical Programming,* Vol. 55, 1992, pp. 293-318. Abstract: This paper shows, by means of an operator called a splitting operator, that the Douglas-Ratchford splitting method for finding a zero of the sum of two monotone operators is a special case of the proximal point algorithm. Therefor, applications of Douglas-Ratchford splitting, such as the alternating direction method of multipliers for convex programming decomposition, are also special cases of the proximal point algorithm. This observation allows the unification and generalization of a variety of convex programming algorithms. By introducing a modified version of the proximal point algorithm, we derive a new, generalized alternating direction method of multipliers for convex programming. Advances of this sort illustrate the power and generality gained by adopting monotone operator theory as a conceptual framework.

P. Tseng and D. P. Bertsekas, “Relaxation Methods for Strictly Convex Costs and Linear Constraints,” Mathematics of Operations Research, Vol. 16, 1991 pp. 462-481. Abstract: Consider the problem of minimizing a stritcly convex (possibly nondifferentiable) cost subject to linear constraints. We propose a dual coordinate ascent method for this problem that uses inexact line search and either essentially cyclic or Gauss-Southwell order of coordinate relaxation. We show, under very weak conditions, that this method generates a sequence of primal vectors converging to the optimal primal solution. Under an additional regularity assumption, and assuming that the effective domain of the cost function is polyhedral, we show that a related sequence of dual vectors converges in cost to the optimal cost. If the constraint set has an interior point in the effective domain of the cost function, then this sequence of dual vectors is bounded and each of its limit points is an optimal dual solution. When the cost function is strongly convex, we show that the order of coordinate relaxation can become progressively more chaotic. These results significantly improve upon those obtained previously.

P. Tseng and D. P. Bertsekas, “Relaxation Methods for Monotropic Programs,” *Math. Programming, Vol. 46, (1990), pp. 127–151.* Abstract: We propose a dual ascent method for the problem of minimizing a convex, possibly nondifferentiable, separable cost subject to linear constraints. The method has properties reminiscent of the Gauss-Seidel method in numerical analysis and uses the epsilon-complemetary slackness mechanism introduced in Bertsekas, Hosein and Tseng (1987) to ensure finite convergence to near optimality. As special cases we obtain the methods in Bertsekas, Hosein and Tseng (1987) for network flow programs and the methods in Tseng and Bertsekas (1987) for linear programs.

J. Dunn and D. P. Bertsekas, “Efficient Dynamic Programming Implementations of Newton’s Method for Unconstrained Optimal Control Problems,” *J. Opt. Theory and Appl., Vol. 63, 1989, pp. 23-38.* Abstract: Naive implementations of Newton’s method for unconstrained N-stage discrete-time optimal control problems with Bolza objective functions tend to increase in cost like N^3 as N increases. However, if the inherent recursive structure of the Bolza problem is properly exploited, the cost of computing a Newton step will increase only linearly with N. The efficient Newton implementation scheme proposed here is similar to Mayne’s DDP (differential dynamic programming) method but produces the Newton step exactly, even when the dynamical equations are nonlinear. The proposed scheme is also related to a Riccati treatment of the linear, two-point boundary-value problems that characterize optimal solutions. For discrete-time problems, the dynamic programming approach and the Riccati substitution differ in an interesting way; however, these differences essentially vanish in the continuous-time limit.

P. Tseng and D. P. Bertsekas, “Relaxation Methods for Linear Programs,” *Math. of Operations Research, Vol. 12, (1987), pp. 569–596.* Abstract: In this paper we propose a new method for solving linear programs. This method may be viewed as a generalized coordinate descent method whereby the descent directions are chosen from a finite set. The generation of the descent directions are based on results from monotropic programming theory. The method may be alternatively viewed as an extension of the relaxation method for network flow problems [1], [2]. Node labeling, cuts, and flow augmentation paths in the network case correspond to, respectively, tableau pivoting, rows of tableaus, and columns of tableaus possessing special sign patterns in the linear programming case.

E. M. Gafni, and D. P. Bertsekas, “Two-Metric Projection Methods for Constrained Optimization,” *SIAM J. on Control and Optimization,* Vol. 22, 1984, pp. 936-964. Abstract: This paper is concerned with the problem min{f(x)|x\in X}, where X is a convex subset of a linear space H, and f is a smooth real-valued function on H. We propose the class of methods x_{k+1}=P(x_k-\alpha_k g_k), where P denotes projection on X with respect to the Hilbert space norm ||.||, g_k denotes the Frechet derivative of f at x_k with respect to another Hilbert space norm ||.||_k on H, and \alpha_k is a positive scalar stepsize. We thus remove an important restriction in the original proposal of Goldstein, and Levitin and Poljak, where the norms ||.|| and ||.||_k must be the same. It is therefore possible to match the norm ||.|| with the structure of X so that the projection operation is simplified while at the same time reserving the option to choose ||.||_k on the basis of approximations to the Hessian of f so as to attain a typically superlinear rate of convergence. The resulting methods are particularly attractive for large-scale problems with specially structured constraint sets such as optimal control and nonlinear multi-commodity network flow problems. The latter class of problems is discussed in some detail.

D. P. Bertsekas, G. S, Lauer, N. R. Sandell, and T. A. Posbergh, “Optimal Short-Term Scheduling of Large-Scale Power Systems,” *IEEE Trans. on Aut. Control,* Vol. AC-28, 1983, pp. 1-11. Abstract: This paper is concerned with the long-standing problem of optimal unit commitment in an electric power system. We follow the traditional formulation of this problem which gives rise to a large-scale, dynamic, mixed-integer programming problem. We describe a solution methodology based on duality, Lagrangian relaxation and nondifferentiable optimization that has two unique features. First, computational requirements typically grow only linearly with the number of generating units. Second, the duality gap decreases in relative terms as the number of units increases, and as a result our algorithm tends to actually perform better for problems of large size. This allows for the first time consistently reliable solution of large practical problems involving several hundreds of units within realistic time constraints. Aside from the unit commitment problem, this methodology is applicable to a broad class of large-scale dynamic scheduling and resource allocation problems involving integer variables.

D. P. Bertsekas, and N. R. Sandell, “Estimates of the Duality Gap for large-Scale Separable Nonconvex Optimization Problems,” *Proc. of 21st IEEE Conference on Decision and Control, Volume 21, Part 1, Dec. 1982, pp. 782 – 785.* Abstract: We derive some estimates of the duality gap for separable constrained optimization problems involving nonconvex, possibly discontinuous, objective functions, and nonconvex, possibly discrete, constraint sets. The main result is that as the number of separable terms increases to infinity the duality gap as a fraction of the optimal cost decreases to zero. The analysis is related to the one of Aubin and Ekeland, and is based on the Shapley-Folkman theorem. Our assumptions are different and our estimates are sharper and more convenient for integer programming problems.

D. P. Bertsekas and E. Gafni, “Projection Methods for Variational Inequalities with Applications to the Traffic Assignment Problem,” *Math. Progr. Studies,* Vol. 17, 1982, pp. 139-159. Abstract: It is well known [2, 3, 16] that if $\bar T:\Rn\mapsto\Rn$ is a Lipschitz continuous, strongly monotone operator and $X$ is a closed convex set, then a solution $x^*\in X$ of the variational inequality$(x-x^*)’\bar T(x^*)\geq 0$, for all $x\in X$, can be found iteratively by means of the projection method $x_{k+1}=P_X[x_k-\alpha \bar T(x_k)]$, $x_0\in X$, provided the stepsize $\alpha$ is sufficiently small. We show that the same is true if $\bar T$ is of the form $\bar T=A’TA$ where $A:\Rn\mapsto\Rm$ is a linear mapping, provided $T:\Rm\mapsto\Rm$ is Lipschitz continuous and strongly monotone, and the set $X$ is polyhedral. This fact is used to construct an effective algorithm for finding a network flow which satisfies given demand constraints, and is positive only on paths of minimum delay or travel time.

D. P. Bertsekas, “Projected Newton Methods for Optimization Problems with Simple Constraints,” *SIAM J. Control and Optimization,* Vol. 20, 1982, pp. 221-246. Abstract: We consider the problem min{f(x)|x>=0} and propose algorithms of the form x_{k+1}=P(x_k-a_kD_k grad f(x_k)) where P denotes projection on the positive orthant, a_k is a stepsize chosen by an Armijo-like rule, and D_k is a positive definite symmetric matrix, which is partly diagonal. We show that D_k can be calculated simply on the basis of second derivatives of f so that the resulting Newton-like algorithm has a typically superlinear rate of convergence. With other choices of D_k convergence at a typically linear rate is obtained. The algorithms are almost as simple as their unconstrained counterparts. They are well suited for problems of large dimension such as those arising in optimal control while being competitive with existing methods for low-dimensional problems. The effectiveness of the Newton-like algorithm is demonstrated via computational examples involving as many as 10,000 variables. Extensions to general linearly constrained problems are also provided. These extensions utilize a notion of an active generalized rectangle patterned after the notion of an active manifold used in maniforld suboptimization methods, By contrast with these methods, many constraints can be added or subtracted from the binding set at each iteration without the need to solve a quadratic programming problem.

D. P. Bertsekas, “Enlarging the Region of Convergence of Newton’s Method for Constrained Optimization,” *J. Optimization Th. and Applications,* Vol. 36, 1982, pp. 221-252. Abstract: In this paper, we consider Newton’s method for solving the system of necessary optimality conditions of optimization problems with equality and inequality constraints. The principal drawbacks of the method are the need for a good starting point, the inability to distinguish between local maxima and local minima, and when inequality constraints are present, the necessity to solve a quadratic programming problem at each iteration. We show that all these drawbacks can be overcome to a great extent without sacrificing the superlinear convergence rate by making use of exact differentiable penalty functions introduced by Di Pillo and Grippo. We also show that there is a close relationship between the class of penalty functions of Di Pillo and Grippo and the class of Fletcher, and that the region of convergence of a variation of Newton’s method can be enlarged by making use of one of Fletcher’s penalty functions.

D. P. Bertsekas, “Convexification Procedures and Decomposition Methods for Nonconvex Optimization Problems,” *J. of Optimization Theory and Applications,* Vol. 29, 1979, pp. 169-197. Abstract: In order for primal-dual methods to be applicable to a constrained minimization problem, it is necessary that restrictive convexity conditions are satisfied. In this paper, we consider a procedure by means of which a nonconvex problem is convexified and transformed into one which can be solved with the aid of primal-dual methods. Under this transformation, separability of the type necessary for application of decomposition algorithms is preserved. This feature extends the range of applicability of such algorithms to nonconvex problems. Relation with multiplier methods are explored with the aid of a local version of a conjugate convex function.

D. P. Bertsekas, “Local Convex Conjugacy and Fenchel Duality,” *Preprints of Triennial World Congress of IFAC, Helsinki, June 1978,* Vol. 2, 1978, pp. 1079-1084. Abstract: In this paper we introduce a notion of a convex conjugate function of a nonlinear function defined on a manifold specified by nonlinear equality constraints. Under certain assumptions the conjugate is defined locally around a point and upon conjugation yields the original function. Local versions of the Fenchel duality theorem are also proved.

D. P. Bertsekas, “On the Convergence Properties of Second-Order Multiplier Methods,” *J. of Optimization Theory and Applications,* Vol. 25, 1978, pp. 443-449. Abstract: The purpose of this note is to provide some estimates relating to Newton-type methods of multipliers. These estimates can be used to infer that convergence in such methods can be achieved for an arbitrary choice of the initial multiplier vector by selecting the penalty parameter sufficiently large.

D. P. Bertsekas, “Approximation Procedures Based on the Method of Multipliers,” *J. of Optimization Theory and Applications,* Vol. 23, 1977. Abstract: In this paper, we consider a method for solving certain optimization problems with constraints, nondifferentiabilities, and other ill-conditioning terms in the cost functional by approximating them by well-behaved optimization problems. The approach is based on the method of multipliers. The convergence properties of the methods proposed can be inferred from corresponding properties of multiplier methods with partial elimination of constraints. A related analysis is provided in this paper.

D. P. Bertsekas, “Multiplier Methods: A Survey,” *Automatica,* Vol. 12, 1976, pp. 133-145. Abstract: The purpose of this paper is to provide a survey of convergence and rate of convergence aspects of a class of recently proposed methods for constrained minimization – the, so called, multiplier methods. The results discussed highlight the operational aspects of multiplier methods and demonstrate their significant advantages over ordinary penalty methods.

D. P. Bertsekas, “On Penalty and Multiplier Methods for Constrained Minimization,” *SIAM J. on Control and Optimization,* Vol. 14, 1976, pp. 216-235. Abstract: In this paper we consider a generalized class of quadratic penalty function methods for the solution of nonconvex nonlinear programming problems. This class contains as special cases both the usual quadratic penalty function method and the recently proposed multiplier method. We obtain convergence and rate of convergence results for the sequences of primal and dual variables generated. The convergence results for the multiplier method are global in nature and constitute a substantial improvement over existing local convergence results. The rate of convergence results show that the multiplier method should be expected to converge considerably faster than the pure penalty method. At the same time, we construct a global duality framework for nonconvex optimization problems. The dual functional is concave, everywhere finite, and has strong differentiability properties. Furthermore, its value, gradient and Hessian matrix within an arbitrary bounded set can be obtained by uncon- strained minimization of a certain augmented Lagrangian.

D. P. Bertsekas, “On the Goldstein-Levitin-Polyak Gradient Projection Method,” *IEEE Trans. on Aut. Control,* Vol. AC-21, 1976. Abstract: This paper considers some aspects of the gradient projection method proposed by Goldstein, Levitin and Polyak, and more recently, in a less general context by McCormick. We propose and analyze some convergent step-size rules to be used in conjunction with the method. These rules are similar in spirit to the efficient Armijo rule for the method of steepest descent and under mild assumptions, they have the desirable property that they identify the set of active inequality constraints in a finite number of iterations.

D. P. Bertsekas, “A New Algorithm for Solution of Resistive Networks Involving Diodes,” *IEEE Trans. on Circuits and Systems,* Vol. CAS-23, 1976, pp. 599-608. Abstract: The solution of electric network problems by various algorithms such as for example Newton’s method is often hampered by the presence of physical diodes with steeply rising exponential characteristics which cause overflow and slow convergence during numerical computation. In this paper we propose and analyze an algorithm which bypasses these difficulties by successively approximating the steep diode characteristics by smoother exponential functions. The algorithm may be modified to be used in the presence of ideal diodes and is related to penalty and multiplier methods for constrained minimization and Davidenko’s method for solving certain ill-conditioned systems of nonlinear equations.

D. P. Bertsekas, “Combined Primal-Dual and Penalty Methods for Constrained Minimization,” *SIAM J. on Control,* Vol. 13, pp. 521-544, 1975. Abstract: In this paper we consider a class of combined primal-dual and penalty methods often called methods of multipliers. The analysis focuses mainly on the rate of convergence of these methods. It is shown that this rate is considerably more favorable than the corresponding rate for penalty function methods. Some efficient versions of multiplier methods are also considered whereby the intermediate unconstrained minimizations involved are approximate and only asymptotically exact. It is shown that such approximation schemes may lead to a substantial deterioration of the convergence rate, and a special approximation scheme is proposed which exhibits the same rate as the method with exact minimization. Finally, we analyze the properties of the step size rule of the multiplier method in relation to other possible step sizes, and we consider a modified step size rule for the case of the convex programming problem.

D. P. Bertsekas, “Necessary and Sufficient Conditions for a Penalty Method to be Exact,” *Mathematical Programming,* Vol. 9, pp. 87-99, 1975. Abstract: This paper identifies necessary and sufficient conditions for a penalty method to yield an optimal solution or a Lagrange multiplier of a convex programming problem by means of a single unconstrained minimization. The conditions are given in terms of properties of the objective and constraint functions of the problem as well as the penalty function adopted. It is shown among other things that all linear programs with finite optimal value satisfy such conditions when the penalty function is quadratic.

D. P. Bertsekas, “Nondifferentiable Optimization via Approximation,” *in Mathematical Programming Study 3, Nondifferentiable Optimization,* M. L. Balinski, P. Wolfe, (eds.), North-Holland Publ. Co., pp. 1-15, 1975. Abstract: This paper presents a systematic approach for minimization of a wide class of nondifferentiable functions. The technique is based on approximation of the nondifferentiable function by a smooth function and is related to penalty and multiplier methods for constrained minimization. Some convergence results are given and the method is ilustrated by means of examples from nonlinear programming.

D. P. Bertsekas, “On the Method of Multipliers for Convex Programming,” *IEEE Transactions on Aut. Control,* June 1975, pp. 385-388. Abstract: It is known that the method of multipliers for constrained minimization can be viewed as a fixed stepsize gradient method for solving a certain dual problem. In this short paper it is shown that for convex programming problems the method converges globally for a wide range of possible stepsizes. This fact is proved for both cases where unconstrained minimization is exact and approximate. The results provide the basis for considering modifications of the basic stepsize of the method of multipliers which are aimed at acceleration of its speed of convergence. A few such modifications are discussed and some computational results are presented relating to a problem in optimal control.

D. P. Bertsekas, “Necessary and Sufficient Conditions for Existence of an Optimal Portfolio,” *Journal of Economic Theory,* Vol. 8, No. 2, pp. 235-247, 1974. Abstract: This paper identifies necessary and sufficient conditions for existence of a solution to a class of optimization problems under uncertainty. This class includes certain problems of optimal portfolio selection when rates of return of risky assets are uncertain, as well as problems of optimal choice of inputs and outputs by a perfectly competitive firm facing uncertain prices.

D. P. Bertsekas, “Partial Conjugate Gradient Methods for a Class of Optimal Control Problems,” *IEEE Trans. on Aut. Control,* Vol. AC-19, 1974, pp. 209-217. Abstract: In this paper we examine the computational aspects of a certain class of discrete-time optimal control problems. We propose and analyze two partial conjugate gradient algorithms which operate in cycles of s+1 conjugate gradient steps (s\le n = space dimension). The algorithms are motivated by the special form of the Hessian matrix of the cost functional. The first algorithm exhibits a linear convergence rate and offers some advantages over steepest descent in certain cases such as when the system is unstable. The second algorithm requires second-order information with respect to the control variables at the beginning of each cycle and exhibits (s+1)-step superlinear convergence rate. Furthermore, it solves a linear-quadratic problem in s+1 steps as compared with the mN steps (m = control space dimension, N = number of stages) required by the ordinary conjugate gradient method.

D. P. Bertsekas, “Stochastic Optimization Problems with Nondifferentiable Cost Functionals,” *Journal of Optimization Theory and Applications,* Vol. 12, 1973. pp. 218-231. Abstract: In this paper, we examine a class of stochastic optimization problems characterized by nondifferentiability of the objective function. It is shown that, in many cases, the expected value of the objective function is differentiable and, thus, the resulting optimization problem can be solved by using classical analytical or numerical methods. Te results are subsequently applied to the solution of a problem of economic resource allocation.

D. P. Bertsekas, and S. K. Mitter, “A Descent Numerical Method for Optimization Problems with Nondifferentiable Cost Functionals,” *SIAM Journal on Control,* Vol. 11, 1973, pp. 637-652. Abstract: In this paper we consider the numerical solution of convex optimization problems with nondifferentiable cost functionals. We propose a new algorithm, the epsilon-subgradient method, a large step, double iterative algorithm which converges rapidly under very general assumptions. We discuss the application of the algorithm in some problems of nonlinear programming and optimal control and we show that the epsilon-subgradient method contains as a special case a minimax algorithm due to Pschenichnyi.

B. W. Kort and D. P. Bertsekas, “A New Penalty Function Method for Constrained Minimization,” *Proc. of 1972 IEEE Conference on Decision and Control,* New Orleans, La., 1972, pp. 162-166. Abstract: During recent years it has been shown that the performance of penalty function methods for constrained minimization can be improved significantly by introducing gradient type iterations for solving the dual problem. In this paper we present a new penalty function algorithm of this type which offers significant advantages over existing schemes for the case of the convex programming problem. The algorithm treats inequality constraints explicitly and can also be used for the solution of general mathematical programming problems.

D. P. Bertsekas, and S. K. Mitter, “Steepest Descent for Optimization Problems with Nondifferentiable Cost Functionals,” *Proc. of Princeton Conference on Information Sciences and Systems,* 1971, pp. 347-351.

##### Network Optimization

D. P. Bertsekas, “Centralized and Distributed Newton Methods for Network Optimization and Extensions,” Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011. Abstract: We consider Newton methods for common types of single commodity and multi-commodity network flow problems. Despite the potentially very large dimension of the problem, they can be implemented using the conjugate gradient method and low-dimensional network operations, as shown nearly thirty years ago. We revisit these methods, compare them to more recent proposals, and describe how they can be implemented in a distributed computing system. We also discuss generalizations, including the treatment of arc gains, linear side constraints, and related special structures.

A. E. Ozdaglar and D. P. Bertsekas, “Optimal Solution of Integer Multicommodity Flow Problems with Application in Optical Networks,” *Proc. of Symposium on Global Optimization, Santorini, Greece, June 2003; Frontiers in global optimization, pp. 411–435, Nonconvex Optim. Appl., 74, Kluwer Acad. Publ., Boston, MA, 2004.* Abstract: In this paper, we propose methods for solving broadly applicable integer multicommodity flow problems. We focus in particular on the problem of routing and wavelength assignment (RWA), which is critically important for increasing the efficiency of wavelength-routed all-optical networks. Our methodology can be applied as a special case to the problem of routing in a circuit-switched network. We discuss an integer-linear programming formulation, which can be addressed with highly efficient linear (not integer) programming methods, to obtain optimal or nearly optimal solutions. *Note:* A comparative computational evaluation of the methodology of this paper is given in the thesis by Ali Meli.

A. E. Ozdaglar and D. P. Bertsekas, “Routing and Wavelength Assignment in Optical Networks,” *Report LIDS-P-2535,* Dec. 2001; *IEEE Trans. on Networking,* no. 2, Apr. 2003, pp. 259-272. Abstract: The problem of routing and wavelength assignment (RWA) is critically important for increasing the efficiency of wavelength-routed all-optical networks. Given the physical network structure and the required connections, the RWA problem is to select a suitable path and wavelength among the many possible choices for each connection so that no two paths sharing a link are assigned the same wavelength. In work to date, this problem has been formulated as a difficult integer programming problem that does not lend itself to efficient solution or insightful analysis. In this work, we propose several novel optimization problem formulations that offer the promise of radical improvements over the existing methods. We adopt a (quasi-)static view of the problem and propose new integer-linear programming formulations, which can be addressed with highly efficient linear (*not* integer) programming methods and yield optimal or near-optimal RWA policies. The fact that this is possible is surprising, and is the starting point for new and greatly improved methods for RWA. Aside from its intrinsic value, the quasi-static solution method can form the basis for suboptimal solution methods for the stochastic/dynamic settings. *Note:* A comparative computational evaluation of the methodology of this paper is given in the thesis by Ali Meli.

P. Tseng, and D. P. Bertsekas, “An Epsilon-Relaxation Method for Separable Convex Cost Generalized Network Flow Problems,” Math. Programming, Vol. 88, 2000, pp. 85-104. Abstract: We generalize the epsilon-relaxation method of for the single commodity, separable convex cost network flow problem to network flow problems with positive gains. We show that the method terminates with a near optimal solution, and we provide an associated complexity analysis.

D. P. Bertsekas, L. C. Polymenakos, and P. Tseng, “Epsilon-Relaxation and Auction Methods for Separable Convex Cost Network Flow Problems,” *in Network Optimization,* by P. M. Pardalos, D. W. Hearn, and W. W. Hager (eds.), Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, N.Y., 1998, pp. 103-126; also given in the book by Bertsekas “Network Optimization: Continuous and Discrete Models,” Athena Scientific, 1998. Abstract: We consider a generic auction method for the solution of the single commodity, separable convex cost network flow problem. This method provides a unifying framework for the $\e$-relaxation method and the auction/sequential shortest path algorithm and, as a consequence, we develop a unified complexity analysis for the two methods. We also present computational results showing that these methods are much faster than earlier relaxation methods, particularly for ill-conditioned problems.

D. P. Bertsekas, L. C. Polymenakos, and P. Tseng, “An Epsilon-Relaxation Method for Convex Network Optimization Problems,” *SIAM J. on Optimization*, Vol. 7, 1997, pp. 853-870. Abstract: We propose a new method for the solution of the single commodity, separable convex cost network flow problem. The method generalizes the $\e$-relaxation method developed for linear cost problems, and reduces to that method when applied to linear cost problems. We show that the method terminates with a near optimal solution, and we provide an associated complexity analysis. We also present computational results showing that the method is much faster than earlier relaxation methods, particularly for ill-conditioned problems.

D. P. Bertsekas, F. Guerriero, and R. Musmanno, “Parallel Asynchronous Label Correcting Methods for Shortest Paths,” *J. of Optimization Theory and Applications*, Vol. 88, 1996, pp. 297-320. Abstract: In this paper we develop parallel asynchronous implementations of some known and some new label correcting methods for finding a shortest path from a single origin to all the other nodes of a directed graph. We compare these implementations on a shared memory multiprocessor, the Alliant FX/80, using several types of randomly generated problems. Excellent (sometimes superlinear) speedup is achieved with some of the methods, and it is found that the asynchronous versions of these methods are substantially faster than their synchronous counterparts.

D. P. Bertsekas, “An Auction Algorithm for the Max-Flow Problem,” *J. of Optimization Theory and Applications*, Vol. 87, 1995, pp. 69-101. Abstract: We propose a new algorithm for the max-flow problem. It consists of a sequence of augmentations along paths constructed by an auction-like algorithm. These paths are not necessarily shortest, that is, they need not contain a minimum number of arcs. However, they typically can be found with much less computation than the shortest augmenting paths used by competing methods. Our algorithm outperforms these latter methods as well as state-of-the-art preflow-push algorithms by a very large margin in tests with standard randomly generated problems.

D. P. Bertsekas, S. Pallottino, and M. G. Scutella, “Polynomial Auction Algorithms for Shortest Paths,” *Computational Optimization and Applications *, Vol. 4, 1995, pp. 99-125. Abstract: In this paper we consider strongly polynomial variations of the auction algorithm for the single origin/many destinations shortest path problem. These variations are based on the idea of graph reduction, that is, deleting unnecessary arcs of the graph by using certain bounds naturally obtained in the course of the algorithm. We study the structure of the reduced graph and we exploit this structure to obtain algorithms with $O\bl(n\min\{m,n\log n\}\br)$ and $O(n^2)$ running time. Our computational experiments show that these algorithms outperform their closest competitors on randomly generated dense all destinations problems, and on a broad variety of few destination problems.

D. P. Bertsekas, “An Auction/Sequential Shortest Path Algorithm for the Minimum Cost Flow Problem,” *Report LIDS-P-2146, Lab. for Info. and Decision Systems, Revision of Feb. 1995*. Abstract: We propose a new algorithm for the solution of the linear minimum cost network flow problem, based on a sequential shortest path augmentation approach. Each shortest path is constructed by means of the recently proposed auction/shortest path algorithm. This approach allows useful information to be passed from one shortest path construction to the next. However, the naive implementation of this approach where the length of each arc is equal to its reduced cost fails because of the presence of zero cost cycles. We overcome this difficulty by using as arc lengths $\e$-perturbations of reduced costs and by using $\e$-complementary slackness conditions in place of the usual complementary slackness conditions. We present several variants of the main algorithm, including one that has proved very efficient for the max-flow problem. We also discuss the possibility of combining the algorithm with the relaxation method and we provide encouraging computational results.

D. P. Bertsekas, Mathematical Equivalence of the Auction Algorithm for Assignment and the Epsilon-Relaxation (Preflow-Push) Method for Min Cost Flow,” *In: Large Scale Optimization*, Hager W.W., Hearn D.W., Pardalos P.M. (eds), Springer, Boston, MA. Abstract: It is well known that the linear minimum cost flow network problem can be converted to an equivalent assignment problem. We show here that when the auction algorithm is applied to this equivalent problem with some special rules for choosing the initial object prices and the person submitting a bid at each iteration, one obtains the generic form of the e-relaxation method. The reverse equivalence is already known, that is, if we view the assignment problem as a special case of a minimum cost flow problem and we apply the e-relaxation method with some special rules for choosing the node to iterate on, we obtain the auction algorithm. Thus, the two methods are mathematically equivalent.

D. P. Bertsekas, and D. A. Castanon, “A Generic Auction Algorithm for the Minimum Cost Network Flow Problem,” *Computational Optimization and Applications, *Vol. 2, 1993, pp. 229-260. Abstract: In this paper we broadly generalize the assignment auction algorithm to solve linear minimum cost network flow problems. We introduce a generic algorithm, which contains as special cases a number of known algorithms, including the e-relaxation method, and the auction algorithm for assignment and for transportation problems. The generic algorithm can serve as a broadly useful framework for the development and the complexity analysis of specialized auction algorithms that exploit the structure of particular network problems. Using this framework, we develop and analyze two new algorithms, an algorithm for general minimum cost flow problems, called network auction, and an algorithm for the k node-disjoint shortest path problem.

D. P. Bertsekas, D. A. Castanon, and H. Tsaknakis, “Reverse Auction and the Solution of Asymmetric Assignment Problems,” *SIAM J. on Optimization, *Vol. 3, 1993, pp. 268-299. Abstract: In this paper we propose auction algorithms for solving several types of assignment problems with inequality constraints. Included are asymmetric problems with different numbers of persons and objects, and multiassignment problems, where persons may be assigned to several objects and reversely. A central new idea in all these algorithms is to combine regular auction, where persons bid for objects by raising their prices, with reverse auction, where objects compete for persons by essentially offering discounts. Reverse auction can also be used to accelerate substantially (and sometimes dramatically) the convergence of regular auction for symmetric assignment problems.

D. P. Bertsekas, “A Simple and Fast Label Correcting Algorithm for Shortest Paths,” *Networks,* Vol. 23, pp. 703-709, 1993. Abstract: We propose a new method for ordering the candidate nodes in label correcting methods for shortest path problems. The method is equally simple but much faster than the D’ Esopo-Pape algorithm. It is similar to the threshold algorithm in that it tries to scan nodes with small labels as early as possible, and performs comparably with that algorithm. Our algorithm can also be combined with the threshold algorithm thereby considerably improving the practical performance of both algorithms.

D. P. Bertsekas and D. A. Castanon, “Parallel Primal-Dual Methods for the Minimum Cost Flow Problem,” *Computational Optimization and Applications,* Vol. 2, pp. 317-336, 1993. Abstract: In this paper we discuss the parallel asynchronous implementation of the classical primal-dual method for solving the linear minimum cost network flow problem. Multiple augmentations and price rises are simultaneously attempted starting from several nodes with possibly outdated price and flow information. The results are then merged asynchronously subject to rather weak compatibility conditions. We show that this algorithm is valid, terminating finitely to an optimal solution. We also present computational results using an Encore Multimax that illustrate the speedup that can be obtained by parallel implementation.

D. P. Bertsekas and D. A. Castanon, “A Forward Reverse Auction Algorithm for Asymmetric Assignment Problems,” Report LIDS-P-2159, Lab. for Information and Decision Systems, also *Computational Optimization and Applications,* Vol. 1, pp. 277-297, 1992. Abstract: In this paper we consider the asymmetric assignment problem and we propose a new auction algorithm for its solution. The algorithm uses in a novel way the recently proposed idea of reverse auction, where in addition to persons bidding for objects by raising their prices, we also have objects competing for persons by essentially offering discounts. In practice, the new algorithm apparently deals better with price wars than the currently existing auction algorithms. As a result it frequently does not require $\e$-scaling for good practical performance, and tends to terminate substantially (and often dramatically) faster than its competitors.

D. P. Bertsekas and D. A. Castanon, “Parallel Synchronous and Asynchronous Implementations of the Auction Algorithm,” *Parallel Computing,* Vol. 17, pp. 707-732, 1991. Abstract: In this paper we discuss the parallel implementation of the auction algorithm for the classical assignment problem. We show that the algorithm admits a totally asynchronous implementation and we consider several implementations on a shared memory machine, with varying degrees of synchronization. We also discuss and explore computationally the tradeoffs involved in using asynchronism to reduce the synchronization penalty.

D. P. Bertsekas, “An Auction Algorithm for Shortest Paths,” *SIAM J. on Optimization,* Vol. 1, 1991, pp. 425-447. Abstract: We propose a new and simple algorithm for finding shortest paths in a directed graph. In the single origin/single destination case, the algorithm maintains a single path starting at the origin, which is extended or contracted by a single node at each iteration. Simultaneously, at most one dual variable is adjusted at each iteration so as to either improve or maintain the value of a dual function. For the case of multiple origins, the algorithm is well suited for parallel computation. It maintains multiple paths that can be extended or contracted in parallel by several processors that share the results of their computations. Based on experiments with randomly generated problems on a serial machine, the algorithm outperforms substantially its closest competitors for problems with few origins and a single destination. It also seems better suited for parallel computation than other shortest path algorithms.

P. Tseng, D. P. Bertsekas and J. N. Tsitsiklis, “Partially Asynchronous Algorithms for Network Flow and Other Problems,” *SIAM J. on Control and Optimization,* Vol. 28, 1990, pp. 678-710. Abstract: The problem of computing a fixed point of a nonexpansive function f is considered. Sufficient conditions are provided under which a parallel, partially asynchronous implementation of the iteration x:=f(x) converges. These results are then applied to (i) quadratic programming subject to box constraints, (ii) strictly convex cost network flow optimization, (iii) an agreement and a Markov chain problem, (iv) neural network optimization, and (v) finding the least element of a polyhedral set determined by a weakly diagonally dominant, Leontief system. Finally, simulation results illustrating the attainable speedup and the effects of asynchronism are presented.

D. P. Bertsekas and D. A. Castanon, “The Auction Algorithm for the Transportation Problem,” *Annals of Operations Research,* Vol. 20, pp. 67-96, 1989. Abstract: This paper generalizes the auction algorithm to solve linear transportation problems. The idea is to convert the transportation problem into an assignment problem, and then to modify the auction algorithm to exploit the special structure of this problem. Computational results show that this modified version of the auction algorithm is very efficient for certain types of transportation problems.

D. P. Bertsekas and J. Eckstein, “Dual Coordinate Step Methods for Linear Network Flow Problems,” *Mathematical Programming, Series B,* Vol. 42, 1988, pp. 203-243. Abstract: We review a class of recently-proposed linear-cost network flow methods which are amenable to distributed implementation. All the methods in the class use the notion of epsilon-complementary slackness, and most do not explicitly manipulate any “global” objects such as paths, trees, or cuts. Interestingly, these methods have stimulated a large class of serial computational methods complexity results. We develop the basic theory of these methods and present two specific methods, the epsilon-relaxation algorithm for the minimum-cost flow problem, and the auction algorithm for the assignment problem. We show how to implement these methods with serial complexities of O(N^3 log NC) and O(NA log NC), respectively. We also discuss practical implementation issues and computational experience to date. Finally, we show how to implement epsilon-relaxation in a completely asynchronous, “chaotic” environment in which some processors compute faster than others, and there can be arbitrarily large communication delays.

D. P. Bertsekas, “The Auction Algorithm: A Distributed Relaxation Method for the Assignment Problem,” *Annals of Operations Research,* Vol. 14, 1988, pp. 105-123. Abstract: We propose a massively parallelizable algorithm for the classical assignment problem. The algorithm operates like an auction whereby unassigned persons bid simultaneously for objects thereby raising their prices. Once all bids are in, objects are awarded to the highest bidder. The algorithm can also be interpreted as a Jacobi – like relaxation method for solving a dual problem. Its (sequential) worst – case complexity, for a particular implementation that uses scaling, is O(NAlog(NC)), where N is the number of persons, A is the number of pairs of persons and objects that can be assigned to each other, and C is the maximum absolute object value. Computational results show that, for large problems, the algorithm is competitive with existing methods even without the benefit of parallelism. When executed on a parallel machine, the algorithm exhibits substantial speedup.

D. P. Bertsekas and P. Tseng, “The relax codes for linear minimum cost network flow problems,” *Annals of Operations Research,* Vol. 13, 1988, pp. 125-190. Abstract: We describe a relaxation algorithm for solving the classical minimum cost net- work flow problem. Our implementation is compared with mature state-of-the-art primal simplex and primal-dual codes and is found to be several times faster on all types of randomly generated network flow problems. Furthermore, the speed-up factor increases with problem dimension. The codes, called RELAX-II and RELAXT-II, have a facility for efficient reoptimization and sensitivity analysis, and are in the public domain.

D. P. Bertsekas and P. Tseng, “Relaxation Methods for Minimum Cost Ordinary and Generalized Network Flow Problems,” *Operations Research Journal,* Vol. 36, 1988, pp. 93-114. Abstract: We propose a new class of algorithms for linear network flow problems with and without gains. These algorithms are based on iterative improvement of a dual cost and operate in a manner that is reminiscent of coordinate ascent and Gauss-Seidel relaxation methods. we compare our coded implementations of these methods with mature state-of-the-art primal simplex and primal-dual codes,and find them to be several times faster on standard benchmark problems, and faster by an order of magnitude on large, randomly generated problems. Our experiments indicate that the speedup factor increases with problem dimension.

D. P. Bertsekas, P. A. Hosein, and P. Tseng, “Relaxation Methods for Network Flow Problems with Convex Arc Costs,” *SIAM J. on Optimization*, Vol. 25, 1987. Abstract: We consider the standard single commodity network flow problem with both linear and strictly convex possibly nondifferentiable arc costs. For the case where all arc costs are strictly convex we study the convergence of a dual Gauss-Seidel type relaxation method that is well suited for parallel computation. We then extend this method to the case where some of the arc costs are linear. As a special case we recover a relaxation method for the linear minimum cost network flow problem proposed in Bertsekas [1] and Bertsekas and Tseng [2].

D. P. Bertsekas and D. ElBaz, “Distributed Asynchronous Relaxation Algorithms for Convex Network Flow Problems,” *SIAM J. Control and Optimization*, Vol. 25, 1987, pp. 74-85. Abstract: We consider the solution of the single commodity strictly convex network flow prolem in a distributed asynchronous computation environment. The dual of this problem is unconstrained, differentiable, and well suited for solution via Gauss-Seidel relaxation. We show that the structure of the dual allows the successful application of a distributed asynchronous method whereby relaxation iterations are carried out in parallel by several processors in arbitrary order and with arbitrarily large interprocessor communication delays.

D. P. Bertsekas, “Distributed Relaxation Methods for Linear Network Flow Problems,” *Proc. of 25th CDC, Athens, Greece,* 1986, pp. 2101-2106. Abstract: We consider distributed solution of the classical linear minimum cost network flow problem. We formulate a dual problem which is unconstrained, piecewise linear, and involves a dual variable for each node. We propose a dual algorithm that resembles a Gauss-Seidel relaxation method. At each iteration the dual variable of a single node is changed based on local information from adjacent nodes. In a distributed setting each node can change its variable independently of the variables of other nodes. The algorithm is efficient for some classes of problems, notably for the max-flow problem for which it resembles a recent algorithm by Goldberg [11].

J. N. Tsitsiklis, and D. P. Bertsekas, “Distributed Asynchronous Optimal Routing in Data Networks,” *IEEE Trans. on Aut. Control,* Vol. AC-31, 1986, pp. 325-332. Abstract: In this paper we study the performance of a class of distributed optimal routing algorithms of the gradient projection type under weaker and more realistic assumptions than those considered thus far. In particular, we show convergence to an optimal routing without assuming synchronization of computation at all nodes and measurement of link lengths at all links, while taking into account the probability of link flow transients caused by routing updates. This demonstrates the robustness of these algorithms in a realistic distributed operating environment.

D. P. Bertsekas, “A Unified Framework for Primal-Dual Methods in Minimum Cost Network Flow Problems,” *Math. Programming,* Vol. 32, pp. 125-145, 1985. Abstract: We introduce a broad class of algorithms for finding a minimum cost flow in a capacitated network. the algorithms are of the primal-dual type. They maintain primal feasibility with respect to capacity constraints, while trying to satisfy the conservation of flow equation at each node by means of a wide variety of procedures based on flow augmentation, price adjustment, and ascent of a dual functional. The manner in which these procedures are combined is flexible thereby allowing the construction of algorithms that can be tailored to the problem at hand for maximum effectiveness. Particular attention is given to methods that incorporate features from classical relaxation procedures. Experimental codes based on these methods outperform by a substantial margin the fastest available primal-dual and primal simplex codes on standard benchmark problems.

E. Gafni, and D. P. Bertsekas, “Dynamic Control of Session Input Rates in Communication Networks,” *IEEE Trans. on Aut. Control,* Vol. AC-29, 1984, pp. 1009-1016. Abstract: We consider a distributed iterative algorithm for dynamically adjusting the input rate of each session of a voice or data network using virtual circuits so as to exercise flow control. Each session origin periodically receives information regarding the level of congestion along the session path and iteratively corrects its input rate. In this paper, we place emphasis on voice networks, but the ideas involved are also relevant for dynamic flow control in data networks. The algorithm provides for the addition of new and termination of old sessions and maintains at all times feasibility of link flows with respect to capacity constraints. Fairness with respect to all sessions is built into the algorithm and a mechanism is provided to control link utilization and average delay per packet at any desired level.

D. P. Bertsekas, E. Gafni, and R. G. Gallager, “Second Derivative Algorithms for Minimum Delay Distributed Routing in Networks,” *IEEE Trans. on Communications, *Vol. COM-32, 1984 p. 911. Abstract: We propose a class of algorithms for finding an optimal quasi-static routing in a communication network. The algorithms are based on Gallager’s method [1] and provide methods for iteratively updating the routing table entries of each node in a manner that guarantees convergence to a minimum delay routing. Their main feature is that they utilize second derivatives of the objective function and may be viewed as approximations to a constrained version of Newton’s method. The use of second derivatives results in improved speed of convergence and automatic stepsize scaling with respect to level of traffic input. These advantages are of crucial importance for the practical implementation of the algorithm using distributed computation in an environment where input traffic statistics gradually change.

D. P. Bertsekas and E. Gafni, “Projected Newton Methods and Optimization of Multicommodity Flows,” *IEEE Trans. on Automatic Control,* Vol. AC-28, 1983, pp. 1090-1096. Abstract: A superlinearly convergent Newton like method for linearly constrained optimization problems is adapted for solution of multicommodity network flow problems of the type arising in communication and transportation networks. We show that the method can be implemented approximately by making use of conjugate gradient iterations without the need to compute explicitly the Hessian matrix. Preliminary computational results suggest that this type of method is capable of yielding highly accurate solutions of nonlinear multicommodity flow problems far more efficiently than any of the methods available at present.

D. P. Bertsekas and E. Gafni, “Projection Methods for Variational Inequalities with Applications to the Traffic Assignment Problem,” *Math. Progr. Studies,* Vol. 17, 1982, pp. 139-159. Abstract: It is well known [2, 3, 16] that if $\bar T:\Rn\mapsto\Rn$ is a Lipschitz continuous, strongly monotone operator and $X$ is a closed convex set, then a solution $x^*\in X$ of the variational inequality$(x-x^*)’\bar T(x^*)\geq 0$, for all $x\in X$, can be found iteratively by means of the projection method $x_{k+1}=P_X[x_k-\alpha \bar T(x_k)]$, $x_0\in X$, provided the stepsize $\alpha$ is sufficiently small. We show that the same is true if $\bar T$ is of the form $\bar T=A’TA$ where $A:\Rn\mapsto\Rm$ is a linear mapping, provided $T:\Rm\mapsto\Rm$ is Lipschitz continuous and strongly monotone, and the set $X$ is polyhedral. This fact is used to construct an effective algorithm for finding a network flow which satisfies given demand constraints, and is positive only on paths of minimum delay or travel time.

D. P. Bertsekas, “A New Algorithm for the Assignment Problem,” Mathematical Programming, Vol. 21, pp. 152-171, 1981. Abstract: We propose a new algorithm for the classical assignment problem. The algorithm resembles in some ways the Hungarian method but differs substantially in other respects. The average computational complexity of an efficient implementation of the algorithm seems to be considerably better than the one of the Hungarian method. In a large number of randomly generated problems the algorithm has consistently outperformed an efficiently coded version of the Hungarian method by a broad margin. The factor of improvement increases with the problem dimension N and reaches an order of magnitude for N equal to several hundreds.

D. P. Bertsekas, “A Distributed Algorithm for the Assignment Problem,” Lab. for Information and Decision Systems Report, MIT, May 1979. Abstract: This paper describes a new algorithm for solving the classical assignment problem. The algorithm is of a primal-dual nature and in some ways resembles the Hungarian and subgradient methods, but is substantially different in other respects. Its main feature is that it is well suited for distributed operation whereby each node participates in the computation on the basis of limited local information about the topology of the network and the data of the problem. The algorithmic process resembles an auction where economic agents compete for resources by making successively higher bids. The algorithm terminates in a finite number of iterations after resource prices reach levels where no further bidding is profitable. (This is the original paper on the auction algorithm.)

##### Parallel and Distributed Algorithms

D. P. Bertsekas, “Centralized and Distributed Newton Methods for Network Optimization and Extensions,” Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011. Abstract: We consider Newton methods for common types of single commodity and multi-commodity network flow problems. Despite the potentially very large dimension of the problem, they can be implemented using the conjugate gradient method and low-dimensional network operations, as shown nearly thirty years ago. We revisit these methods, compare them to more recent proposals, and describe how they can be implemented in a distributed computing system. We also discuss generalizations, including the treatment of arc gains, linear side constraints, and related special structures.

D. P. Bertsekas and H. Yu, “Distributed Asynchronous Policy Iteration in Dynamic Programming,” Proc. of 2010 Allerton Conference on Communication, Control, and Computing, Allerton Park, ILL, Sept. 2010. (Related Lecture Slides) (An extended version with additional algorithmic analysis). Abstract: We consider the distributed solution of dynamic programming (DP) problems by policy iteration. We envision a network of processors, each updating asynchronously a local policy and a local cost function, defined on a portion of the state space. The computed values are communicated asynchronously between processors and are used to perform the local policy and cost updates. The natural algorithm of this type can fail even under favorable circumstances, as shown by Williams and Baird [WiB93]. We propose an alternative and almost as simple algorithm, which converges to the optimum under the most general conditions, including asynchronous updating by multiple processors using outdated local cost functions of other processors.

D. P. Bertsekas and J. N. Tsitsiklis, “Comment on Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules,” Lab. for Information and Decision Systems Report, MIT, June 2006; to appear in IEEE Trans. on Aut. Control. Abstract: We clarify the relation of the model and the convergence results of Jadbabaie et al. [3] to those studied by Bertsekas et al. [6, 5, 1]. We show that the update equations in [3] are a very special case of those in [5]. Furthermore, the main convergence results in [3] are special cases of those in [5], except for a small difference in the connectivity assumptions which, however, does not affect the proof.

A. Nedic, D. P. Bertsekas, and V. Borkar, Distributed Asynchronous Incremental Subgradient Methods, *Proceedings of the March 2000 Haifa Workshop “Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications”, D. Butnariu, Y. Censor, and S. Reich, Eds., Elsevier, Amsterdam, 2001. *Abstract: We propose and analyze a distributed asynchronous subgradient method for minimizing a convex function that consists of the sum of a large number of component functions. The idea is to distribute the computation of the component subgradients among a set of processors, which communicate only with a coordinator. The coordinator performs the subgradient iteration incrementally and asynchronously, by taking steps along the subgradients of the component functions that are available at the update time. The incremental approach has performed very well in centralized computation, and the parallel implementation should improve its performance substantially, particularly for typical problems where computation of the component subgradients is relatively costly.

S. A. Savari and D. P. Bertsekas, “Finite Termination of Asynchronous Iterative Algorithms,” *Parallel Computing*, Vol. 22, 1996, pp. 39-56. Abstract: We consider $n$-processor distributed systems where the $i$th processor executes asynchronously the iteration $x_i=f_i(x)$. It is natural to terminate the iteration of the $i$th processor when some local condition, such as $x_i-f_i(x)$: “small”, holds. However, local termination conditions of this type may not lead to global termination because of the asynchronous character of the algorithm. In this paper, we propose several approaches to modify the original algorithm and/or supplement it with an interprocessor communication protocol so that this difficulty does not arise.

E. A. Varvarigos and D. P. Bertsekas, “A Conflict Sense Routing Protocol and Its Performance for Hypercubes”, *IEEE Trans. Computers*, Vol. 45, 1996, pp. 693-703 (copy of this paper available from the first author). Abstract: We propose a new switching format for multiprocessor networks, which we call Conflict Sense Routing Protocol. This switching format is a hybrid of packet and circuit switching, and combines advantages of both. We initially present the protocol in a way applicable to a general topology. We then present an implementation of this protocol for a hypercube computer and a particular routing algorithm. We also analyze the steady-state throughput of the hypercube implementation for random node-to-node communications.

D. P. Bertsekas, D. A. Castanon, J. Eckstein, and S. Zenios, “Parallel Computing in Network Optimization”, *Handbooks in OR & MS, (M. O. Ball, et. al, Eds.)*, Vol. 7, 1995, pp. 331-399.

D. P. Bertsekas, F. Guerriero and R. Musmanno, “Parallel Shortest Path Methods for Globally Optimal Trajectories,” *High Performance Computing: Technology, Methods, and Applications*, (J. Dongarra et.al., Eds.), Elsevier, 1995. Abstract: In this paper we consider a special type of trajectory optimization problem that can be viewed as a continuous-space analog of the classical shortest path problem. This problem is approached by space discretization and solution of a discretized version of the associated Hamilton-Jacobi equation. It was recently shown by Tsitsiklis that some of the ideas of classical shortest path methods, such as those underlying Dijkstra’s algorithm, can be applied to solve the discretized Hamilton-Jacobi equation. In more recent work, Polymenakos, Bertsekas, and Tsitsiklis have carried this analogy further to show that some efficient label correcting methods for shortest path problems, the SLF and SLF/LLL methods of Bertsekas, can be fruitfully adapted to solve the discretized Hamilton-Jacobi equation. In this paper we discuss parallel asynchronous implementations of these methods on a shared memory multiprocessor, the Alliant FX/80. Our results show that these methods are well suited for parallelization and achieve excellent speedup.

E. A. Varvarigos and D. P. Bertsekas, “Multinode Broadcast in Hypercubes and Rings with Randomly Distributed Length of Packets,” *IEEE Transactions on Parallel and Distributed Systems,* Vol. 4, pp. 144-154, 1993. Abstract: We consider a multinode broadcast (MNB) in a hypercube and in a ring network of processors. This is the communication task where we want each node of the network to broadcast a packet to all the other nodes. The communication model that we use is different than those considered in the literature so far. In particular, we assume that the lengths of the packets that are broadcast are not fixed, but are distributed according to some probabilistic rule, and we compare the optimal times required to execute the MNB for variable and for fixed packet lengths.

D. P. Bertsekas and D. A. Castanon, “Parallel Asynchronous Hungarian Methods for the Assignment Problem,” *ORSA J. on Computing,* Vol. 5, pp. 261-274, 1993. Abstract: In this paper we discuss the parallel asynchronous implementation of the Hungarian method for solving the classical assignment problem. Multiple augmentations and price rises are simultaneously attempted starting from several unassigned sources and using possibly outdated price and assignment information. The results are then merged asynchronously subject to rather weak compatibility conditions. We show the validity of this algorithm and we demonstrate computationally that an asynchronous implementation is often faster than its synchronous counterpart.

D. P. Bertsekas and D. A. Castanon, “Parallel Primal-Dual Methods for the Minimum Cost Flow Problem,” *Computational Optimization and Applications,* Vol. 2, pp. 317-336, 1993. Abstract: In this paper we discuss the parallel asynchronous implementation of the classical primal-dual method for solving the linear minimum cost network flow problem. Multiple augmentations and price rises are simultaneously attempted starting from several nodes with possibly outdated price and flow information. The results are then merged asynchronously subject to rather weak compatibility conditions. We show that this algorithm is valid, terminating finitely to an optimal solution. We also present computational results using an Encore Multimax that illustrate the speedup that can be obtained by parallel implementation.

D. P. Bertsekas and D. A. Castanon, “Parallel Synchronous and Asynchronous Implementations of the Auction Algorithm,” *Parallel Computing,* Vol. 17, 1991, pp. 707-732. Abstract: In this paper we discuss the parallel implementation of the auction algorithm for the classical assignment problem. We show that the algorithm admits a totally asynchronous implementation and we consider several implementations on a shared memory machine, with varying degrees of synchronization. We also discuss and explore computationally the tradeoffs involved in using asynchronism to reduce the synchronization penalty.

D. P. Bertsekas, and J. N. Tsitsiklis, “Some Aspects of Parallel and Distributed Iterative Algorithms – A Survey,” *Automatica,* Vol. 27, 1991, pp. 3-21. Abstract: We consider iterative algorithms of the form x:=f(x), executed by a parallel or distributed computing system. We first consider synchronous executions of such iterations and study their communication requirements, as well as issues related to processor synchronization. We also discuss the parallelization of iterations of the Gauss-Seidel type. We then consider asynchronous implementations whereby each processor iterates on a different component of x, at its own pace, using the most recently received (but possibly outdated) information on the remaining components of x. While certain algorithms may fail to converge when implemented asynchronously, a large number of positive convergence results is available. We classify asynchronous algorithms into three main categories, depending on the amount of asynchronism they can tolerate, and survey the corresponding convergence results. We also discuss issues related to their termination.

D. P. Bertsekas, C. Ozveren, G. D. Stamoulis, P. Tseng, and J. N. Tsitsiklis, “Optimal Communication Algorithms for Hypercubes,” *J. of Parallel and Distributed Computing,* Vol. 11, 1991, pp. 263-275. Abstract: We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor.

P. Tseng, D. P. Bertsekas and J. N. Tsitsiklis, “Partially Asynchronous Algorithms for Network Flow and Other Problems,” *SIAM J. on Control and Optimization,* Vol. 28, 1990, pp. 678-710. Abstract: The problem of computing a fixed point of a nonexpansive function f is considered. Sufficient conditions are provided under which a parallel, partially asynchronous implementation of the iteration x:=f(x) converges. These results are then applied to (i) quadratic programming subject to box constraints, (ii) strictly convex cost network flow optimization, (iii) an agreement and a Markov chain problem, (iv) neural network optimization, and (v) finding the least element of a polyhedral set determined by a weakly diagonally dominant, Leontief system. Finally, simulation results illustrating the attainable speedup and the effects of asynchronism are presented.

D. P. Bertsekas and J. N. Tsitsiklis, “Convergence Rate and Termination of Asynchronous Iterative Algorithms”, *Proceedings of the 1989 International Conference on Supercomputing,* Crete, Greece, pp. 461-470, June 1989. Abstract: We consider iterative algorithms of the form z := f(z), executed by a parallel or distributed computing system. We focus on asynchronous implementations whereby each processor iterates on a different component of z, at its own pace, using the most recently received (but possibly outdated) information on the remaining components of 2. We provide results on the convergence rate of such algorithms and make a comparison with the convergence rate of the corresponding synchronous methods in which the computation proceeds in phases. We also present results on how to terminate asynchronous iterations in finite time with an approximate solution of the computational problem under consideration.

D. P. Bertsekas and J. Eckstein, “Dual Coordinate Step Methods for Linear Network Flow Problems,” *Mathematical Programming, Series B,* Vol. 42, 1988, pp. 203-243. Abstract: We review a class of recently-proposed linear-cost network flow methods which are amenable to distributed implementation. All the methods in the class use the notion of epsilon-complementary slackness, and most do not explicitly manipulate any “global” objects such as paths, trees, or cuts. Interestingly, these methods have stimulated a large class of serial computational methods complexity results. We develop the basic theory of these methods and present two specific methods, the epsilon-relaxation algorithm for the minimum-cost flow problem, and the auction algorithm for the assignment problem. We show how to implement these methods with serial complexities of O(N^3 log NC) and O(NA log NC), respectively. We also discuss practical implementation issues and computational experience to date. Finally, we show how to implement epsilon-relaxation in a completely asynchronous, “chaotic” environment in which some processors compute faster than others, and there can be arbitrarily large communication delays.

D. P. Bertsekas, P. A. Hosein, and P. Tseng, “Relaxation Methods for Network Flow Problems with Convex Arc Costs,” *SIAM J. on Optimization*, Vol. 25, 1987. Abstract: We consider the standard single commodity network flow problem with both linear and strictly convex possibly nondifferentiable arc costs. For the case where all arc costs are strictly convex we study the convergence of a dual Gauss-Seidel type relaxation method that is well suited for parallel computation. We then extend this method to the case where some of the arc costs are linear. As a special case we recover a relaxation method for the linear minimum cost network flow problem proposed in Bertsekas [1] and Bertsekas and Tseng [2].

D. P. Bertsekas and D. ElBaz, “Distributed Asynchronous Relaxation Algorithms for Convex Network Flow Problems,” *SIAM J. Control and Optimization*, Vol. 25, 1987, pp. 74-85. Abstract: We consider the solution of the single commodity strictly convex network flow prolem in a distributed asynchronous computation environment. The dual of this problem is unconstrained, differentiable, and well suited for solution via Gauss-Seidel relaxation. We show that the structure of the dual allows the successful application of a distributed asynchronous method whereby relaxation iterations are carried out in parallel by several processors in arbitrary order and with arbitrarily large interprocessor communication delays.

D. P. Bertsekas, “Distributed Relaxation Methods for Linear Network Flow Problems,” *Proc. of 25th CDC, Athens, Greece,* 1986, pp. 2101-2106. Abstract: We consider distributed solution of the classical linear minimum cost network flow problem. We formulate a dual problem which is unconstrained, piecewise linear, and involves a dual variable for each node. We propose a dual algorithm that resembles a Gauss-Seidel relaxation method. At each iteration the dual variable of a single node is changed based on local information from adjacent nodes. In a distributed setting each node can change its variable independently of the variables of other nodes. The algorithm is efficient for some classes of problems, notably for the max-flow problem for which it resembles a recent algorithm by Goldberg [11].

J. N. Tsitsiklis, and D. P. Bertsekas, “Distributed Asynchronous Optimal Routing in Data Networks,” *IEEE Trans. on Aut. Control,* Vol. AC-31, 1986, pp. 325-332. Abstract: In this paper we study the performance of a class of distributed optimal routing algorithms of the gradient projection type under weaker and more realistic assumptions than those considered thus far. In particular, we show convergence to an optimal routing without assuming synchronization of computation at all nodes and measurement of link lengths at all links, while taking into account the probability of link flow transients caused by routing updates. This demonstrates the robustness of these algorithms in a realistic distributed operating environment.

J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, “Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms,” *IEEE Trans. on Aut. Control,* Vol. AC-31, 1986, pp. 803-812. Abstract: We present a model for asynchronous distributed computation and then proceed to analyze the convergence of natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms. We show that such algorithms retain the desirable convergence properties of their centralized counterparts, provided that the time between consecutive interprocessor communications and the communication delays are not too large.

D. P. Bertsekas, “Distributed Asynchronous Computation of Fixed Points,” *Mathematical Programming*, Vol. 27, 1983, pp. 107-120. Abstract: We present an algorithmic model for distributed computation computation of fixed points whereby several processors participate simultaneously in the calculations while exchanging information via communication links. We place essentially no restrictions on the ordering of computation and communication between processors thereby allowing for completely uncoordinated execution. We provide a general convergence theorem for algorithms of this type, and demonstrate its applicability to several classes of problems, including the calculation of fixed points of contraction and monotone mappings arising in linear and nonlinear systems of equations, optimization problems, shortest path problems, and dynamic programming.

D. P. Bertsekas, “Distributed Dynamic Programming,” *IEEE Transactions on Aut. Control*, Vol AC-27, 1982, pp. 610-616. Abstract: We consider distributed algorithms for solving dynamic programming problems whereby several processors participate simultaneously in the computation while maintaining coordination by information exchange via communication links. A model of asynchronous distributed computation is developed which requires very weak assumptions on the ordering of computations,the timing of information exchange,the amount of local information needed at each computation node, and the initial condition for the algorithm. the class of problems considered is very broad and includes shortest path problems, and finite and infinite horizon stochastic optimal control problems. When specialized to the shortest path problem, the algorithm reduces t the algorithm originally implemented for routing messages in the internet.

Kimemia, J., Gershwin, S., and D. P. Bertsekas, “Computation of Production Control Policies by a Dynamic Programming Technique,” *LIDS Report LIDS-P-1236, MIT; also in Analysis and Optimization of Systems, A. Bensoussan and J. L. Lions (eds.), Springer, N. Y., pp. 243-269*, 1982. Abstract: The problem of production management for an automated manufacturing system is described. The system consists of machines that can perform a variety of tasks on a family of parts. The machines are unreliable, and the main difficulty the control system faces is to meet production requirements while machines fail and are repaired at random times. A multi-level hierarchical control algorithm is proposed which involves a stochastic optimal control problem at the first level. Optimal production policies are characterized and a computational scheme is described.

E. Gafni, and D. P. Bertsekas, “Distributed Algorithms for Generating Loop-Free Routes in Networks with Frequently Changing Topology,” *IEEE Trans. on Communications,* Vol. COM-29, 1981, pp. 11-18. Abstract: We consider the problem of maintaining communication between the nodes of a data network and a central station in the presence of frequent topological changes as, for example, in mobile packet radio networks. We argue that flooding schemes have significant drawbacks for such networks, and propose a general class of distributed algorithms for establishing new loop-free routes to the station for any node left without a route due to changes in the network topology. By virtue of built-in redundancy, the algorithms are typically activated very infrequently and, even when they are, they do not involve any communication within the portion of the network that has not been materially affected by a topological change.

D. P. Bertsekas, “A Distributed Algorithm for the Assignment Problem,” Lab. for Information and Decision Systems Report, MIT, May 1979. Abstract: This paper describes a new algorithm for solving the classical assignment problem. The algorithm is of a primal-dual nature and in some ways resembles the Hungarian and subgradient methods, but is substantially different in other respects. Its main feature is that it is well suited for distributed operation whereby each node participates in the computation on the basis of limited local information about the topology of the network and the data of the problem. The algorithmic process resembles an auction where economic agents compete for resources by making successively higher bids. The algorithm terminates in a finite number of iterations after resource prices reach levels where no further bidding is profitable.

##### Set-Membership Estimation and Control

D. P. Bertsekas, “Control of Uncertain Systems with a Set-Membership Description of the Uncertainty,” *Ph.D. Thesis, Dept. of Electrical Engineering, M.I.T.*, 1971. Abstract: The problem of optimal feedback control of uncertain discrete-time dynamic systems is considered, where the uncertain quantities do not have a stochastic description but instead they are known to belong to given sets. The problem is converted to a sequential minimax problem and dynamic programming is suggested as a general method for its solution. The notion of a sufficiently informative function, which parallels the notion of a sufficient statistic of optimal control is introduced, and the possible decomposition of the optimal controller into an estimator and an actuator is demonstrated. Some special cases involving a linear system are further examined. A problem involving a convex cost functional and perfect state information for the controller is considered in detail. Particular attention is given to a special case, the problem of reachability of a target tube, and an ellipsoidal approximation algorithm is obtained which leads to linear control laws. State estimation problems are also examined, and some algorithms are derived which offer distinct advantages over existing estimation schemes. These algorithms are subsequently used in the solution of some reachability problems with imperfect state information for the controller.

D. P. Bertsekas and I. B. Rhodes, “On the Minimax Reachability of Target Sets and Target Tubes,” *Automatica*, Vol. 7, pp. 233-241, March 1971. Abstract: This paper is concerned with the closed-loop control of discrete-time systems in the presence of uncertainty. The uncertainty may arise as disturbances in the system dynamics, disturbances corrupting the output measurements or incomplete knowledge of the initial state of the system. In all cases, the uncertain quantities are assumed unknown except that they lie in given sets. Attention is first given to the problem of driving the system state at the final time into a prescribed target set under the worst possible combination of disturbances. This is then extended to the problem of keeping the entire state trajectory in a given target “tube.” Necessary and sufficient conditions for reachability of a target set and a target tube are given in the case where the system state can be measured exactly, while sufficient conditions for reachability are given for the case where only disturbance corrupted output measurements are available. An algorithm is given for the efficient construction of ellipsoidal approximations to the sets involved and it is shown that this algorithm leads to linear control laws. The application of the results in this paper to pursuit-evasion games is also discussed.

D. P. Bertsekas and I. B. Rhodes, “Recursive State Estimation with a Set-Membership Description of the Uncertainty,” *IEEE Trans. on Automatic Control*, Vol. AC-16, pp. 117-128, April 1971. Abstract: This paper is concerned with the problem of estimating the state of a linear dynamic system using noise-corrupted observations, when input disturbances and observation errors are unknown except for the fact that they belong to given bounded sets. The cases of both energy constraints and individual instantaneous constraints for the uncertain quantities are considered. In the former case, the set of possible system states compatible with the observations received is shown to be an ellipsoid, and equations for its center and weighting matrix are given, while in the latter case, equations describing a bounding ellipsoid to the set of possible states are derived. All three problems of filtering, prediction, and smoothing are examined by relating them to standard tracking problems of optimal control theory. The resulting estimators are similar in structure and comparable in simplicity to the corresponding stochastic linear minimum-variance estimators, and it is shown that they provide distinct advantages over existing schemes for recursive estimation with a set-membership description of uncertainty.

D. P. Bertsekas, “Infinite Time Reachability of State Space Regions by Using Feedback Control,” *IEEE Trans. on Automatic Control*, Vol. AC-17, pp. 604-613, October 1972. Abstract: In this paper we consider some aspects of the problem of feedback control of a time-invariant uncertain system subject to state constraints over an infinite-time interval. The central question that we investigate is under what conditions can the state of the uncertain system be forced to stay in a specified region of the state space for all times by using feedback control. At the same time we study the behavior of the region of n-step reachability as n tends to infinity. It is shown that in general this region may exhibit instability as we pass to the limit, and that under a compactness assumption this region converges to a steady state. A special case involving a linear finite-dimensional system is examined in more detail. It is shown that there exist ellipsoidal regions in state space where the state can be confined by making use of a linear time-invariant control law, provided that the system is stabilizable. Such control laws can be calculated efficiently through the solution of a recursive matrix equation of the Riccati type.

D. P. Bertsekas, “On the Solution of Some Minimax Problems,” *Proceedings of the 1972 IEEE Conference on Decision and Control*, pp. 328-332. Abstract: In dynamic minimax and stochastic optimization problems frequently one is forced to use a suboptimal controller since the computation and implementation of the optimal controller based on dynamic programming is impractical in many cases. In this paper we study the performance of some suboptimal controllers in relation to the performance of the optimal feedback controller and the optimal open-loop controller. Attention is focused on some classes of, so called, open-loop-feedback controllers. It is shown under quite general assumptions that these open-loop-feedback controllers perform at least as well as the optimal open-loop controller. The results are developed for general minimax problems with perfect and imperfect state information. In the latter case the open-loop-feedback controller makes use of an estimator which is required to perform at least as well as a pure predictor in order for the results to hold. Some of the results presented have stochastic counterparts.

D. P. Bertsekas and I. B. Rhodes, “Sufficiently Informative Functions and the Minimax Feedback Control of Uncertain Dynamic Systems,” *IEEE Trans. on Automatic Control*, Vol. AC-18, pp. 117-124, April 1973. Abstract: The problem of optimal feedback control of uncertain discrete-time dynamic systems is considered where the uncertain quantities do not have a stochastic description but instead are known to belong to given sets. The problem is converted to a sequential minimax problem and dynamic programming is suggested as a general method for its solution. The notion of a sufficiently informative function, which parallels the notion of a sufficient statistic of stochastic optimal control, is introduced, and conditions under which the optimal controller decomposes into an estimator and an actuator are identified. A limited class of problems for which this decomposition simplifies the computation and implementation of the optimal controller is delineated.

D. P. Bertsekas, “Linear Convex Stochastic Control Problems Over an Infinite Horizon,” *IEEE Transactions on Aut. Control,* Vol. AC-18, 1973, pp. 314-315. Abstract: A stochastic control problem over an infinite horizon which involves a linear system and a convex cost functional is analyzed. We prove the convergence of the dynamic programming algorithm associated with the problem, and we show the existence of a stationary Borel measurable optimal control law. The approach used illustrates how results on infinite time reachability can be used for the analysis of dynamic programming algorithms over an infinite horizon subject to state constraints.

D. P. Bertsekas, “Separable Dynamic Programming and Approximate Decomposition Methods,” Lab. for Information and Decision Systems Report 2684, MIT, Feb. 2006; IEEE Trans. on Aut. Control, Vol. 52, 2007, pp. 911-916. Abstract: We consider control, planning, and resource allocation problems involving several independent subsystems that are coupled through a control/decision constraint. We discuss one-step lookahead methods that use an approximate cost-to-go function derived from the solution of single subsystem problems. We propose a new method for constructing such approximations, and derive bounds on the performance of the associated suboptimal policies. We then specialize this method to problems of reachability of target tubes that have the form of a box (a Cartesian product of subsystem tubes). This yields inner approximating tubes, which have the form of a union of a finite number of boxes, each involving single subsystem calculations.

D. P. Bertsekas, “Robust Shortest Path Planning and Semicontractive Dynamic Programming,” Lab. for Information and Decision Systems Report LIDS-P-2915, MIT, Feb. 2014 (revised Jan. 2015 and June 2016); Naval Research Logistics (NRL), Vol. 66, 2019 pp.15-37. In this paper we consider shortest path problems in a directed graph where the transitions between nodes are subject to uncertainty. We use a minimax formulation, where the objective is to guarantee that a special destination state is reached with a minimum cost path even under the worst possible instance of the uncertainty. Problems of this type arise, among others, in planning and pursuit-evasion contexts, and in model predictive control. Our analysis makes use of the recently developed theory of abstract semicontractive dynamic programming models. We investigate questions of existence and uniqueness of solution of the optimality equation, existence of optimal paths, and the validity of various algorithms patterned after the classical methods of value and policy iteration, as well as a new Dijkstra-like algorithm for problems with nonnegative arc lengths.