Implementation Details
RL Framework and Chip Representation
NVIDIA's solution redefines chip floorplanning as a Markov Decision Process (MDP). The state is the current partial placement represented as a graph: nodes for macros/cells, edges for nets/connectivity. A graph convolutional neural network (GCNN) processes this, outputting embeddings for the policy and value networks.[1] Actions involve selecting position/orientation for the next macro from a discrete action space, enabling efficient exploration of layouts.
Rewards are multi-objective: negative HPWL (sum of half-perimeters of nets), density penalties for overlaps, and congestion scores from routing estimators. This guides the agent toward Pareto-optimal PPA.[2]
Training Pipeline
Training starts on small benchmarks (MCNC/GSRC) with sequence-pair or slicing-tree encodings, scaling via curriculum learning to larger instances. Proximal Policy Optimization (PPO), a model-free RL algorithm, stabilizes training over millions of episodes. NVIDIA leveraged NVIDIA GPUs for parallel rollouts, achieving convergence in days.[3] Transfer learning from synthetic data bootstraps real-chip policies.
For the target chip, fine-tuning took hours, deploying the agent to generate 10-100 layouts ranked by estimated quality. Human experts select/review, closing the loop.
Scaling to Production
The system handled NVIDIA's 2.7 million cell GPU block with 320 macros in 3 hours, producing layouts competitive in wirelength (reduced by 15-20% est.), routability (100% success), and area utilization. Integrated into EDA flows like Innovus, it iterates with detailed placement.[4]
Challenges like action space explosion were overcome with hierarchical placement (coarse-to-fine) and GNN attention mechanisms for long-range dependencies. Validation on held-out benchmarks confirmed generalization.[5]
Integration and Tools
Built atop PyTorch and NVIDIA's cuGraph for GNNs, the pipeline runs on DGX systems. Post-placement, standard DRC/LVS ensure manufacturability. This end-to-end automation cut design cycles, enabling faster iterations for Blackwell/Hopper architectures.