Skip to content

PyTorch/XLA 2.0 release

Compare
Choose a tag to compare
@miladm miladm released this 12 Aug 07:23
· 2267 commits to master since this release
500e1c2

Cloud TPUs now support the PyTorch 2.0 release, via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes.

Beta Features

PJRT runtime

  • Checkout our newest document; PjRt is the default runtime in 2.0.
  • New Implementation of xm.rendezvous with XLA collective communication which scales better (#4181)
  • New PJRT TPU backend through the C-API (#4077)
  • Use PJRT to default if no runtime is configured (#4599)
  • Experimental support for torch.distributed and DDP on TPU v2 and v3 (#4520)

FSDP

  • Add auto_wrap_policy into XLA FSDP for automatic wrapping (#4318)

Stable Features

Lazy Tensor Core Migration

  • Migration is completed, checkout this dev discussion for more detail.
  • Naively inherits LazyTensor (#4271)
  • Adopt even more LazyTensor interfaces (#4317)
  • Introduce XLAGraphExecutor (#4270)
  • Inherits LazyGraphExecutor (#4296)
  • Adopt more LazyGraphExecutor virtual interfaces (#4314)
  • Rollback to use xla::Shape instead of torch::lazy::Shape (#4111)
  • Use TORCH_LAZY_COUNTER/METRIC (#4208)

Improvements & Additions

  • Add an option to increase the worker thread efficiency for data loading (#4727)
  • Improve numerical stability of torch.sigmoid (#4311)
  • Add an api to clear counter and metrics (#4109)
  • Add met.short_metrics_report to display more concise metrics report (#4148)
  • Document environment variables (#4273)
  • Op Lowering
    • _linalg_svd (#4537)
    • Upsample_bilinear2d with scale (#4464)

Experimental Features

TorchDynamo (torch.compile) support

  • Checkout our newest doc.
  • Dynamo bridge python binding (#4119)
  • Dynamo bridge backend implementation (#4523)
  • Training optimization: make execution async (#4425)
  • Training optimization: reduce graph execution per step (#4523)

PyTorch/XLA GSPMD on single host

  • Preserve parameter sharding with sharded data placeholder (#4721)
  • Transfer shards from server to host (#4508)
  • Store the sharding annotation within XLATensor(#4390)
  • Use d2d replication for more efficient input sharding (#4336)
  • Mesh to support custom device order. (#4162)
  • Introduce virtual SPMD device to avoid unpartitioned data transfer (#4091)

Ongoing development

Ongoing Dynamic Shape implementation

  • Implement missing XLASymNodeImpl::Sub (#4551)
  • Make empty_symint support dynamism. (#4550)
  • Add dynamic shape support to SigmoidBackward (#4322)
  • Add a forward pass NN model with dynamism test (#4256)

Ongoing SPMD multi host execution (#4573)

Bug fixes & improvements

  • Support int as index type (#4602)
  • Only alias inputs and outputs when force_ltc_sync == True (#4575)
  • Fix race condition between execution and buffer tear down on GPU when using bfc_allocator (#4542)
  • Release the GIL during TransferFromServer (#4504)
  • Fix type annotations in FSDP (#4371)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy