Tube
Tube
is more flexible than Tin
and slower in that it helps you create necessary fields and do automatic batching.
Registrations
All you need to do is to register:
- Input/intermediate/output tensor shapes instead of fields
- At least one kernel that takes the following as arguments
- Taichi fields: correspond to tensors (may or may not require gradients)
- (Optional) Extra arguments: will NOT receive gradients
Acceptable dimensions of tensors to be registered:
None
: means the flexible batch dimension, must be the first dimension e.g. (None, 2, 3, 4)
- Positive integers: fixed dimensions with the indicated dimensionality
- Negative integers:
-1
: means any number [1, +inf)
, only usable in the registration of input tensors.
- Negative integers < -1: indices of some dimensions that must be of the same dimensionality
- Restriction: negative indices must be "declared" in the registration of input tensors first, then used in the registration of intermediate and output tensors.
- Example 1: tensor
a
and b
of shapes a: (2, -2, 3)
and b: (-2, 5, 6)
mean the dimensions of -2
must match.
- Example 2: tensor
a
and b
of shapes a: (-1, 2, 3)
and b: (-1, 5, 6)
mean no restrictions on the first dimensions.
Registration order:
Input tensors/intermediate fields/output tensors must be registered first, and then kernel.
@ti.kernel
def ti_add(arr_a: ti.template(), arr_b: ti.template(), output_arr: ti.template()):
for i in arr_a:
output_arr[i] = arr_a[i] + arr_b[i]
ti.init(ti.cpu)
cpu = torch.device("cpu")
a = torch.ones(10)
b = torch.ones(10)
tube = Tube(cpu) \
.register_input_tensor((10,), torch.float32, "arr_a", False) \
.register_input_tensor((10,), torch.float32, "arr_b", False) \
.register_output_tensor((10,), torch.float32, "output_arr", False) \
.register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
.finish()
out = tube(a, b)
When registering a kernel, a list of field/tensor names is required, for example, the above ["arr_a", "arr_b", "output_arr"]
.
This list should correspond to the fields in the arguments of a kernel (e.g. above ti_add()
).
The order of input tensors should match the input fields of a kernel.
Automatic batching
Automatic batching is done simply by running kernels batch
times. The batch number is determined by the leading dimension of tensors of registered shape (None, ...)
.
It's required that if any input tensors or intermediate fields are batched (which means they have registered the first dimension to be None
), all output tensors must be registered as batched.
Examples
Simple one without negative indices or batch dimension:
@ti.kernel
def ti_add(arr_a: ti.template(), arr_b: ti.template(), output_arr: ti.template()):
for i in arr_a:
output_arr[i] = arr_a[i] + arr_b[i]
ti.init(ti.cpu)
cpu = torch.device("cpu")
a = torch.ones(10)
b = torch.ones(10)
tube = Tube(cpu) \
.register_input_tensor((10,), torch.float32, "arr_a", False) \
.register_input_tensor((10,), torch.float32, "arr_b", False) \
.register_output_tensor((10,), torch.float32, "output_arr", False) \
.register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
.finish()
out = tube(a, b)
With negative dimension index:
ti.init(ti.cpu)
cpu = torch.device("cpu")
tube = Tube(cpu) \
.register_input_tensor((-2,), torch.float32, "arr_a", False) \
.register_input_tensor((-2,), torch.float32, "arr_b", False) \
.register_output_tensor((-2,), torch.float32, "output_arr", False) \
.register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
.finish()
dim = 10
a = torch.ones(dim)
b = torch.ones(dim)
out = tube(a, b)
assert torch.allclose(out, torch.full((dim,), 2.))
dim = 100
a = torch.ones(dim)
b = torch.ones(dim)
out = tube(a, b)
assert torch.allclose(out, torch.full((dim,), 2.))
With batch dimension:
@ti.kernel
def int_add(a: ti.template(), b: ti.template(), out: ti.template()):
out[None] = a[None] + b[None]
ti.init(ti.cpu)
b = torch.tensor(1., requires_grad=True)
batched_a = torch.ones(10, requires_grad=True)
tube = Tube() \
.register_input_tensor((None,), torch.float32, "a") \
.register_input_tensor((), torch.float32, "b") \
.register_output_tensor((None,), torch.float32, "out", True) \
.register_kernel(int_add, ["a", "b", "out"]) \
.finish()
out = tube(batched_a, b)
loss = out.sum()
loss.backward()
assert torch.allclose(torch.ones_like(batched_a) + 1, out)
assert b.grad == 10.
assert torch.allclose(torch.ones_like(batched_a), batched_a.grad)
For more invalid use examples, please see tests in tests/test_tube
.
Advanced field construction with FieldManager
There is a way to tweak how fields are constructed in order to gain performance improvement in kernel calculations.
By supplying a customized FieldManager
when registering a field, you can construct a field however you want.
Please refer to the code FieldManger
in src/stannum/auxiliary.py
for more information.
If you don't know why constructing fields differently can improve performance, don't use this feature.
If you don't know how to construct fields differently, please refer to Taichi field documentation.
Source code(tar.gz)
Source code(zip)
stannum-0.4.0-py3-none-any.whl(15.81 KB)