flowchart TD
    START([Start: Process K/V Tiles]) --> INIT[Initialize:<br/>M = -∞, L = 0, O = 0]
    
    INIT --> LOOP{For each K,V tile}
    
    LOOP --> LOAD[Load Q, K_i, V_i tiles]
    LOAD --> QK[Compute Q×K_i<br/>using tensor cores]
    QK --> SCALE[Scale by 1/√d]
    
    SCALE --> NEWMAX[Compute new row max:<br/>M_new = max(M_old, rowmax(QK_scaled))]
    
    NEWMAX --> RESCALE[Compute rescale factor:<br/>α = exp(M_old - M_new)]
    
    RESCALE --> SOFTMAX[Apply softmax to QK:<br/>P_i = exp(QK_scaled - M_new)]
    
    SOFTMAX --> ROWSUM[Compute row sum:<br/>L_new = α×L_old + rowsum(P_i)]
    
    ROWSUM --> UPDATE[Update output:<br/>O = α×O + P_i×V_i]
    
    UPDATE --> NEXT{More tiles?}
    NEXT -->|Yes| LOOP
    NEXT -->|No| FINAL[Final normalization:<br/>O = O / L_new]
    
    FINAL --> END([Output O])
    
    subgraph "Key Insight"
        INSIGHT[No need to store<br/>full attention matrix!<br/>Memory: O(N) not O(N²)]
    end
    
    style NEWMAX fill:#ffecb3
    style RESCALE fill:#ffecb3
    style UPDATE fill:#c8e6c9
    style INSIGHT fill:#f8bbd9