PROJECT 4.1
| Difficulty | Advanced |
| Estimated Time | 5 – 7 hours |
| Prerequisites | Module 1.3 (Numerical Methods), Module 1.4 (Higher-Order ODEs), Module 3.4 (RL for Dynamical Systems) |
The gravitational three-body problem is one of the oldest unsolved problems in classical mechanics. Given three point masses interacting through Newtonian gravity, the system of coupled second-order ODEs is:
Naively integrating this system with a fixed time step and a single integration method leads to either excessive computation (small fixed step) or numerical blow-up near close encounters (large fixed step). Your task is to build a reinforcement learning agent that adaptively selects both the time step size and the integration method at each simulation step, aiming to maintain the relative energy error below a prescribed threshold while minimizing the total number of force evaluations (a proxy for computational cost).
The agent must handle the full range of dynamical regimes that arise in the three-body problem: wide hierarchical orbits where large steps suffice, close binary interactions requiring small steps and symplectic methods, and chaotic scattering events where the trajectory is sensitive to integration accuracy. A successful solution demonstrates that learned policies can outperform hand-tuned adaptive stepping heuristics on this challenging benchmark.
There is no external dataset for this project. Instead, you must generate initial conditions
for the three-body system and use numerical integration to produce reference trajectories.
The following Python code generates a single random initial condition and computes a
high-accuracy reference solution using scipy.integrate.solve_ivp with the
DOP853 method (8th-order Runge-Kutta) at very tight tolerances.
import numpy as np
from scipy.integrate import solve_ivp
# --- Constants ---
G = 4.0 * np.pi**2 # AU^3 / (M_sun * yr^2)
SOFTENING = 1e-4 # AU
T_FINAL = 5.0 # years
N_SCENARIOS = 50 # number of initial conditions to generate
def generate_initial_conditions(seed=None):
"""Generate random initial conditions for a 3-body system in 2D.
Returns:
masses: array of shape (3,), in solar masses [1, 10]
state0: array of shape (12,), [x1,y1,x2,y2,x3,y3,vx1,vy1,vx2,vy2,vx3,vy3]
"""
rng = np.random.default_rng(seed)
# Random masses between 1 and 10 solar masses
masses = rng.uniform(1.0, 10.0, size=3)
# Random positions in [-5, 5] AU
positions = rng.uniform(-5.0, 5.0, size=6)
# Shift to center-of-mass frame
total_mass = masses.sum()
com_x = (masses[0]*positions[0] + masses[1]*positions[2] + masses[2]*positions[4]) / total_mass
com_y = (masses[0]*positions[1] + masses[1]*positions[3] + masses[2]*positions[5]) / total_mass
positions[0] -= com_x; positions[1] -= com_y
positions[2] -= com_x; positions[3] -= com_y
positions[4] -= com_x; positions[5] -= com_y
# Random velocities in [-2, 2] AU/yr
velocities = rng.uniform(-2.0, 2.0, size=6)
# Shift to zero total momentum
px = masses[0]*velocities[0] + masses[1]*velocities[2] + masses[2]*velocities[4]
py = masses[0]*velocities[1] + masses[1]*velocities[3] + masses[2]*velocities[5]
velocities[0] -= px / (3*masses[0]); velocities[1] -= py / (3*masses[0])
velocities[2] -= px / (3*masses[1]); velocities[3] -= py / (3*masses[1])
velocities[4] -= px / (3*masses[2]); velocities[5] -= py / (3*masses[2])
state0 = np.concatenate([positions, velocities])
return masses, state0
def gravitational_rhs(t, state, masses):
"""Right-hand side of the 3-body equations of motion in 2D."""
pos = state[:6].reshape(3, 2)
vel = state[6:].reshape(3, 2)
acc = np.zeros_like(pos)
for i in range(3):
for j in range(3):
if i == j:
continue
rij = pos[j] - pos[i]
dist = np.sqrt(np.dot(rij, rij) + SOFTENING**2)
acc[i] += G * masses[j] * rij / dist**3
return np.concatenate([vel.flatten(), acc.flatten()])
def total_energy(state, masses):
"""Compute kinetic + potential energy of the 3-body system."""
pos = state[:6].reshape(3, 2)
vel = state[6:].reshape(3, 2)
KE = 0.5 * sum(masses[i] * np.dot(vel[i], vel[i]) for i in range(3))
PE = 0.0
for i in range(3):
for j in range(i+1, 3):
rij = pos[j] - pos[i]
dist = np.sqrt(np.dot(rij, rij) + SOFTENING**2)
PE -= G * masses[i] * masses[j] / dist
return KE + PE
def generate_reference_solution(masses, state0, t_final=T_FINAL, n_eval=10000):
"""High-accuracy reference solution using DOP853."""
t_eval = np.linspace(0, t_final, n_eval)
sol = solve_ivp(
gravitational_rhs, [0, t_final], state0,
args=(masses,), method='DOP853',
rtol=1e-13, atol=1e-15, t_eval=t_eval, max_step=1e-3
)
return sol.t, sol.y.T # shape (n_eval, 12)
# --- Generate all scenarios ---
scenarios = []
for i in range(N_SCENARIOS):
masses, state0 = generate_initial_conditions(seed=42 + i)
E0 = total_energy(state0, masses)
t_ref, y_ref = generate_reference_solution(masses, state0)
scenarios.append({
'masses': masses, 'state0': state0,
'E0': E0, 't_ref': t_ref, 'y_ref': y_ref
})
print(f"Scenario {i}: masses={masses.round(2)}, E0={E0:.6f}")
Generate at least 50 scenarios. Use 40 for training the RL agent and hold out 10 for evaluation. Store each scenario as a dictionary containing the masses, initial state, initial energy, and reference trajectory.
The classical baseline uses the standard fourth-order Runge-Kutta method with a fixed time step of $dt = 0.001$ years. This is the simplest "safe" choice: it provides reasonable accuracy for moderately separated configurations but wastes computation during quiescent phases and may still violate the energy threshold during close encounters.
def rk4_step(f, t, state, dt, masses):
"""Single step of the classical RK4 method."""
k1 = f(t, state, masses)
k2 = f(t + 0.5*dt, state + 0.5*dt*k1, masses)
k3 = f(t + 0.5*dt, state + 0.5*dt*k2, masses)
k4 = f(t + dt, state + dt*k3, masses)
return state + (dt / 6.0) * (k1 + 2*k2 + 2*k3 + k4)
def run_fixed_rk4(masses, state0, t_final, dt=0.001):
"""Run fixed-step RK4 simulation. Returns trajectory and evaluation count."""
t = 0.0
state = state0.copy()
trajectory = [state.copy()]
times = [0.0]
n_evals = 0
while t < t_final:
dt_actual = min(dt, t_final - t)
state = rk4_step(gravitational_rhs, t, state, dt_actual, masses)
t += dt_actual
n_evals += 4 # RK4 uses 4 function evaluations per step
trajectory.append(state.copy())
times.append(t)
return np.array(times), np.array(trajectory), n_evals
Record the total number of force evaluations, final energy error, and maximum energy error over the trajectory. These serve as the primary comparison metrics.
A simple adaptive heuristic adjusts the time step inversely proportional to the maximum velocity magnitude in the system. When bodies are moving fast (close encounter), the step shrinks; when they are moving slowly (wide separation), the step grows.
def run_velocity_adaptive(masses, state0, t_final, dt_base=0.001):
"""Heuristic adaptive stepping based on max velocity magnitude."""
t = 0.0
state = state0.copy()
trajectory = [state.copy()]
times = [0.0]
n_evals = 0
dt = dt_base
while t < t_final:
# Compute max velocity magnitude
vel = state[6:].reshape(3, 2)
v_max = max(np.linalg.norm(vel[i]) for i in range(3))
# Adaptive step: dt ~ 1 / v_max, clamped to [1e-6, 0.1]
dt = np.clip(dt_base / max(v_max, 0.1), 1e-6, 0.1)
dt_actual = min(dt, t_final - t)
state = rk4_step(gravitational_rhs, t, state, dt_actual, masses)
t += dt_actual
n_evals += 4
trajectory.append(state.copy())
times.append(t)
return np.array(times), np.array(trajectory), n_evals
This heuristic is intuitive but does not account for the structure of the integration error (it does not switch methods) and cannot learn from past experience across different initial conditions.
At each simulation step $k$, the agent observes:
where $\mathbf{r}_i, \mathbf{v}_i \in \mathbb{R}^2$ are positions and velocities of the three bodies (12 components), $\Delta E_k = |E(t_k) - E_0| / |E_0|$ is the current relative energy error (1 component), $dt_k$ is the current time step in log-scale (1 component), and $m_{\text{method}} \in \{0, 1, 2\}$ is the index of the currently selected integration method (1 component, one-hot encoded as 3 components in practice, giving 16 total inputs).
The agent outputs a discrete action $a_k \in \{0, 1, \ldots, 14\}$, which encodes a joint decision:
The total action space has $5 \times 3 = 15$ discrete actions.
where $c_{\text{method}}$ is the computational cost of the chosen method: $c_{\text{Euler}} = 1$, $c_{\text{Verlet}} = 2$, $c_{\text{RK4}} = 4$. This reward structure incentivizes the agent to advance the simulation as cheaply as possible (fewest force evaluations) while maintaining energy conservation. The large penalty for violating the threshold ensures constraint satisfaction.
| # | Item | Description |
|---|---|---|
| D1 | threebody_env.py |
Gymnasium environment wrapping the 3-body integrator with Euler, RK4, and Verlet methods. Includes reset(), step(), and render(). |
| D2 | generate_scenarios.py |
Script to generate 50+ initial conditions and reference trajectories. Saves to scenarios.npz. |
| D3 | train_agent.py |
PPO training script with logging. Saves checkpoints every 50k steps. |
| D4 | evaluate.py |
Evaluation script: runs agent, fixed-RK4, and heuristic on held-out scenarios. Outputs CSV with metrics. |
| D5 | Report (PDF, 4–6 pages) | Sections: Introduction, Methods (environment + agent), Results (plots + tables), Discussion, Conclusion. |
| D6 | Plots |
(a) Orbital trajectories for 3 representative scenarios. (b) Energy error vs. time for agent, fixed-RK4, and heuristic. (c) Cumulative force evaluations vs. time for all three methods. (d) Training reward curve (mean ± std over 5 seeds). (e) Heatmap of agent's method selection over the trajectory. |
| D7 | Metrics table | For each method and each held-out scenario: max energy error, mean energy error, total force evaluations, wall-clock time, and completion rate. |
| Criterion | Weight | Description |
|---|---|---|
| Correctness | 30 | Euler, RK4, and Verlet are implemented correctly and pass unit tests against known two-body analytical solutions. Energy conservation for fixed-step RK4 matches reference. Agent actions are applied correctly (dt clipping, method switching). |
| Efficiency | 25 | The trained agent uses fewer total force evaluations than fixed-step RK4 on at least 7 out of 10 held-out scenarios while maintaining the energy threshold. Improvement over the heuristic baseline on at least 5 out of 10 scenarios. |
| Robustness | 20 | Agent completes at least 8 out of 10 held-out scenarios without triggering the energy violation penalty or ejection termination. Performance is consistent across 3 random training seeds. |
| Reproducibility | 15 |
A single requirements.txt and README.md allow the grader
to reproduce all results by running python generate_scenarios.py && python train_agent.py && python evaluate.py.
Random seeds are fixed. All hyperparameters are in a config file or argparse.
|
| Documentation | 10 | Report is clearly written with proper figures. Code has docstrings and inline comments for non-obvious steps. Plots are labeled and legible. |
| Total | 100 |