* add a profiling worflow.
* fix
* fix
* more clarification
* add points.
* up
* cache hooks
* improve readme.
* propagate deletion.
* up
* up
* wan fixes.
* more
* up
* add more traces.
* up
* better title
* cuda graphs.
* up
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add torch.compile link.
* approach -> How the tooling works
* table
* unavoidable gaps.
* make important
* note on regional compilation
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make regional compilation note clearer.
* Apply suggestions from code review
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* clarify scheduler related changes.
* Apply suggestions from code review
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Update examples/profiling/README.md
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* up
* formatting
* benchmarking runtime
* up
* up
* up
* up
* Update examples/profiling/README.md
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>