Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Last update: Jan 7, 2023

Related tags

Machine Learning python machine-learning tensorflow numpy scikit-learn pandas pytorch xgboost lightgbm tensor dask ray dataframe statsmodels joblib

Overview

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and many other libraries.

Documentation, 中文文档

Installation

Mars is easy to install by

pip install pymars

Installation for Developers

When you want to contribute code to Mars, you can follow the instructions below to install Mars for development:

git clone https://github.com/mars-project/mars.git
cd mars
pip install -e ".[dev]"

More details about installing Mars can be found at installation section in Mars document.

Architecture Overview

Getting Started

Starting a new runtime locally via:

>>> import mars
>>> mars.new_session()

Or connecting to a Mars cluster which is already initialized.

>>> import mars
>>> mars.new_session('http://
   
    :
    
     '
    
   )

Mars Tensor

Mars tensor provides a familiar interface like Numpy.

Numpy

Mars tensor

import numpy as np
N = 200_000_000
a = np.random.uniform(-1, 1, size=(N, 2))
print((np.linalg.norm(a, axis=1) < 1)
      .sum() * 4 / N)

import mars.tensor as mt
N = 200_000_000
a = mt.random.uniform(-1, 1, size=(N, 2))
print(((mt.linalg.norm(a, axis=1) < 1)
        .sum() * 4 / N).execute())

3.14174502
CPU times: user 11.6 s, sys: 8.22 s,
           total: 19.9 s
Wall time: 22.5 s

3.14161908
CPU times: user 966 ms, sys: 544 ms,
           total: 1.51 s
Wall time: 3.77 s

Mars can leverage multiple cores, even on a laptop, and could be even faster for a distributed setting.

Mars DataFrame

Mars DataFrame provides a familiar interface like pandas.

Pandas

Mars DataFrame

import numpy as np
import pandas as pd
df = pd.DataFrame(
    np.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum())

import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(
    mt.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum().execute())

CPU times: user 10.9 s, sys: 2.69 s,
           total: 13.6 s
Wall time: 11 s

CPU times: user 1.21 s, sys: 212 ms,
           total: 1.42 s
Wall time: 2.75 s

Mars Learn

Mars learn provides a familiar interface like scikit-learn.

Scikit-learn

Mars learn

from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
             [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)

from mars.learn.datasets import make_blobs
from mars.learn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
              [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)

Mars learn also integrates with many libraries:

Mars remote

Mars remote allows users to execute functions in parallel.

Vanilla function calls

Mars remote

import numpy as np


def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [calc_chunk(n, i)
      for i in range(N // n)]
pi = calc_pi(fs, N)
print(pi)

import numpy as np
import mars.remote as mr

def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [mr.spawn(calc_chunk, args=(n, i))
      for i in range(N // n)]
pi = mr.spawn(calc_pi, args=(fs, N))
print(pi.execute().fetch())

3.1416312
CPU times: user 32.2 s, sys: 4.86 s,
           total: 37.1 s
Wall time: 12.4 s

3.1416312
CPU times: user 616 ms, sys: 307 ms,
           total: 923 ms
Wall time: 3.99 s

DASK on Mars

Refer to DASK on Mars for more information.

Eager Mode

Mars supports eager mode which makes it friendly for developing and easy to debug.

Users can enable the eager mode by options, set options at the beginning of the program or console session.

>>> from mars.config import options
>>> options.eager_mode = True

Or use a context.

>>> from mars.config import option_context
>>> with option_context() as options:
>>>     options.eager_mode = True
>>>     # the eager mode is on only for the with statement
>>>     ...

If eager mode is on, tensor, DataFrame etc will be executed immediately by default session once it is created.

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> from mars.config import options
>>> options.eager_mode = True
>>> t = mt.arange(6).reshape((2, 3))
>>> t
array([[0, 1, 2],
       [3, 4, 5]])
>>> df = md.DataFrame(t)
>>> df.sum()
0    3
1    5
2    7
dtype: int64

Mars on Ray

Mars also has deep integration with Ray and can run on Ray efficiently and interact with the large ecosystem of machine learning and distributed systems built on top of the core Ray.

Starting a new Mars on Ray runtime locally via:

import ray
ray.init()
import mars
mars.new_ray_session(worker_num=2)
import mars.tensor as mt
mt.random.RandomState(0).rand(1000_0000, 5).sum().execute()

Or connecting to a Mars on Ray runtime which is already initialized.

import mars
mars.new_ray_session('http://
   
    :
    
     '
    
   )
# perform computation

Interact with Ray Dataset:

0.5).show(5) # Convert ray dataset to mars dataframe df2 = md.read_ray_dataset(ds) print(df2.head(5).execute()) ">

import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(
    mt.random.rand(1000_0000, 4),
    columns=list('abcd'))
# Convert mars dataframe to ray dataset
ds = md.to_ray_dataset(df)
print(ds.schema(), ds.count())
ds.filter(lambda row: row["a"] > 0.5).show(5)
# Convert ray dataset to mars dataframe
df2 = md.read_ray_dataset(ds)
print(df2.head(5).execute())

Refer to Mars on Ray for more information.

Easy to scale in and scale out

Mars can scale in to a single machine, and scale out to a cluster with thousands of machines. It's fairly simple to migrate from a single machine to a cluster to process more data or gain a better performance.

Bare Metal Deployment

Mars is easy to scale out to a cluster by starting different components of mars distributed runtime on different machines in the cluster.

A node can be selected as supervisor which integrated a web service, leaving other nodes as workers. The supervisor can be started with the following command:

mars-supervisor -h <host_name> -p <supervisor_port> -w <web_port>

Workers can be started with the following command:

mars-worker -h <host_name> -p <worker_port> -s <supervisor_endpoint>

After all mars processes are started, users can run

>>> sess = new_session('http://
   
    :
    
     '
    
   )
>>> # perform computation

Kubernetes Deployment

Refer to Run on Kubernetes for more information.

Yarn Deployment

Refer to Run on Yarn for more information.

Getting involved

Read development guide.
Join the mailing list: send an email to [email protected].
Please report bugs by submitting a GitHub issue.
Submit contributions using pull requests.

Thank you in advance for your contributions!

Comments

Win7 32-bit, when I install mars, it reports file not found

Environment：win7 32-bit+python3.7.0 Execute：pip install pymars Error Message：

Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-h838j__8\pymars\set
up.py", line 42, in <module>
        with open(os.path.join(repo_root, 'requirements-extra.txt'), 'r') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\ADMINI~1
\\AppData\\Local\\Temp\\pip-install-h838j__8\\pymars\\requirements-extra.txt'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Users\ADMINI~1
\AppData\Local\Temp\pip-install-h838j__8\pymars\

opened by eagleLiu82 27

Add _binary_roc_auc_score method

What do these changes do?

Contains a utility function similar to sklearn _binary_roc_auc_score. The function will be used for building the general sklearn.metrics.roc_auc_score functionality. The code format and documentation are kept similar to the sklearn function.

Related issue number

Fixes #2402
type: feature backported already mod: learn

opened by Divyanshu-Singh-Chauhan 20
Implemented hypergeometric functions

What do these changes do?

This PR implemented hyper-geometric functions as listed in https://docs.scipy.org/doc/scipy/reference/special.html#hypergeometric-functions

Related issue number

Resolves #759
mod: tensor type: feature backported already

opened by Alfa-Shashank 16
Fix `md.concat` error when there are same fetch chunk data
What do these changes do?

There is a GraphContainsCycleError when concatenating two DataFrame in which there are same fetch chunks.

Related issue number

Fixes #3284

Check code requirements

[ ] tests added / passed (if needed)

[ ] Ensure all linting tests pass, see here for how to run them
opened by zhongchun 15

Failed when use read_sql_query

Describe the bug mars.dataframe.read_sql_query()

To Reproduce my python version is 3.7.4 the version of mars is 0.4.4 the version of sqlalchemy is 1.3.10 the version of MySQL is 5.7.26

Here is my code:

import mars.dataframe as md
import mars.tensor as mt
from sqlalchemy import create_engine

engine = create_engine('mysql+mysqlconnector://user:password@localhost:3306/dbname?charset=utf8mb4', echo=True)
sql1 = 'SELECT S_INFO_UNIQUECODE, REPORT_PERIOD, MYFIRST_INDICATOR FROM databasetable1'
df = md.read_sql_query(sql1, con=engine)
print(df.head().execute())

and the error is

D:\Anaconda3\envs\py37\python.exe E:/mycode/read_data.py
2020-08-03 15:54:52,383 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2020-08-03 15:54:52,383 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,387 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2020-08-03 15:54:52,387 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,389 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2020-08-03 15:54:52,389 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,390 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8mb4' and `Collation` = 'utf8mb4_bin'
2020-08-03 15:54:52,390 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,391 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2020-08-03 15:54:52,391 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,392 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2020-08-03 15:54:52,392 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,393 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_bin AS anon_1
2020-08-03 15:54:52,393 INFO sqlalchemy.engine.base.Engine {}
D:\Anaconda3\envs\py37\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\\xD6\\xD0\\xB9\\xFA\\xB1\\xEA...' for column 'VARIABLE_VALUE' at row 485")
  result = self._query(query)
2020-08-03 15:54:52,404 INFO sqlalchemy.engine.base.Engine SHOW CREATE TABLE `SELECT S_INFO_UNIQUECODE, REPORT_PERIOD, MYFIRST_INDICATOR FROM databasetable1`
2020-08-03 15:54:52,404 INFO sqlalchemy.engine.base.Engine {}
2020-08-03 15:54:52,405 INFO sqlalchemy.engine.base.Engine ROLLBACK
Traceback (most recent call last):
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1278, in _execute_context
    cursor, statement, parameters, context
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\cursors.py", line 170, in execute
    result = self._query(query)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\cursors.py", line 328, in _query
    conn.query(q)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 732, in _read_query_result
    result.read()
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 684, in _read_packet
    packet.check_error()
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.InternalError: (1059, "Identifier name 'SELECT S_INFO_UNIQUECODE, REPORT_PERIOD, MYFIRST_INDICATOR FROM databasetable1' is too long")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:/mycode/read_data.py", line 7,
    df1 = md.read_sql('SELECT S_INFO_UNIQUECODE, REPORT_PERIOD, MYFIRST_INDICATOR FROM databasetable1', con=engine)
  File "D:\Anaconda3\envs\py37\lib\site-packages\mars\dataframe\datasource\read_sql.py", line 577, in read_sql
    low_limit=low_limit, high_limit=high_limit)
  File "D:\Anaconda3\envs\py37\lib\site-packages\mars\dataframe\datasource\read_sql.py", line 479, in _read_sql
    return op(test_rows, chunk_size)
  File "D:\Anaconda3\envs\py37\lib\site-packages\mars\dataframe\datasource\read_sql.py", line 222, in __call__
    selectable = self._get_selectable(con)
  File "D:\Anaconda3\envs\py37\lib\site-packages\mars\dataframe\datasource\read_sql.py", line 175, in _get_selectable
    autoload_with=engine_or_conn, schema=self._schema)
  File "<string>", line 2, in __new__
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\util\deprecations.py", line 139, in warned
    return fn(*args, **kwargs)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\sql\schema.py", line 559, in __new__
    metadata._remove_table(name, schema)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\util\langhelpers.py", line 69, in __exit__
    exc_value, with_traceback=exc_tb,
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise_
    raise exception
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\sql\schema.py", line 554, in __new__
    table._init(name, metadata, *args, **kw)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\sql\schema.py", line 648, in _init
    resolve_fks=resolve_fks,
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\sql\schema.py", line 672, in _autoload
    _extend_on=_extend_on,
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1654, in run_callable
    return callable_(self, *args, **kwargs)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\default.py", line 470, in reflecttable
    table, include_columns, exclude_columns, resolve_fks, **opts
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\reflection.py", line 649, in reflecttable
    table_name, schema, **table.dialect_kwargs
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\reflection.py", line 314, in get_table_options
    self.bind, table_name, schema, info_cache=self.info_cache, **kw
  File "<string>", line 2, in get_table_options
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\reflection.py", line 52, in cache
    ret = fn(self, con, *args, **kw)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 2624, in get_table_options
    connection, table_name, schema, **kw
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 2870, in _parsed_state_or_create
    info_cache=kw.get("info_cache", None),
  File "<string>", line 2, in _setup_parser
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\reflection.py", line 52, in cache
    ret = fn(self, con, *args, **kw)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 2898, in _setup_parser
    connection, None, charset, full_name=full_name
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\dialects\mysql\base.py", line 2998, in _show_create_table
    ).execute(st)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1006, in execute
    return self._execute_text(object_, multiparams, params)
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1181, in _execute_text
    parameters,
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1318, in _execute_context
    e, statement, parameters, cursor, context
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1512, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise_
    raise exception
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\base.py", line 1278, in _execute_context
    cursor, statement, parameters, context
  File "D:\Anaconda3\envs\py37\lib\site-packages\sqlalchemy\engine\default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\cursors.py", line 170, in execute
    result = self._query(query)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\cursors.py", line 328, in _query
    conn.query(q)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 517, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 732, in _read_query_result
    result.read()
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 1075, in read
    first_packet = self.connection._read_packet()
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\connections.py", line 684, in _read_packet
    packet.check_error()
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "D:\Anaconda3\envs\py37\lib\site-packages\pymysql\err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1059, "Identifier name 'SELECT S_INFO_UNIQUECODE, REPORT_PERIOD, MYFIRST_INDICATOR FROM databasetable1' is too long")
[SQL: SHOW CREATE TABLE `SELECT S_INFO_UNIQUECODE, REPORT_PERIOD, MYFIRST_INDICATOR FROM databasetable1`]
(Background on this error at: http://sqlalche.me/e/13/2j85)

Process finished with exit code 1

When I use the code below , everything is fine.

import mars.dataframe as md
import mars.tensor as mt
from sqlalchemy import create_engine

engine = create_engine('mysql+mysqlconnector://user:password@localhost:3306/dbname?charset=utf8mb4', echo=True)
sql1 = 'SELECT * FROM databasetable1'
df = md.read_sql_query(sql1, con=engine)
print(df.head().execute())

type: bug mod: dataframe

opened by bytedynamic 15

MARS sample with Distributed Flaml/AutoML?

Hello Hello, I'm trying to run a distributed AutoML with MARS + LightGBM for huge (TB) csv input file. I can't find how to do it any idea? TY! https://microsoft.github.io/FLAML/

opened by wil70 14
[Metric] Add common metrics
What do these changes do?

This pr adds custom defined metrics to record graph building, subtask execution time.

Metric name | Tag | Meaning | Source -- | -- | -- | -- mars.tileable_graph_gen_time_secs | addresssession_id | Time consuming in seconds to generate a tileable graph | drvier mars.chunk_graph_gen_time_secs | session_idtask_id | Time consuming in seconds to generate a chunk graph | supervisor mars.subtask_graph_gen_time_secs | session_idtask_idstage_id | Time consuming in seconds to generate a subtask graph | supervisor mars.subtask_execution_time_secs | session_idsubtask_id | Time consuming in seconds to execute a subtask | worker subpool mars.stage_execution_time_secs | session_idtask_idstage_id | Time consuming in seconds to execute a stage | supervisor mars.task_execution_time_secs | session_idtask_id | Time consuming in seconds to execute a task | supervisor

And we propose a naming convention for metrics as follows:

[namespace].[component]_name[_units]

namespace could be mars, component could be supervisor or worker and can be omitted. For example, we can naming a mars.subtask_execution_time_secs to present the subtask execution time.

Related issue number

Closes #2743

Check code requirements

[ ] tests added / passed (if needed)

[ ] Ensure all linting tests pass, see here for how to run them

type: feature mod: actor mod: task service mod: subtask service
opened by zhongchun 14
Implements `linear_model.LinearRegression`
What do these changes do?

Here is a draft PR.

Implement learn.linear_model.LinearRegression.

Align with sklearn::0.24.x.

Replace spicy.lstsq with closed form solutions via mars.tensor, but still cannot catch LinAlgError by singular matrices

Thus only pass 18/30 tests, 3 of which are highly related to sparse/csr issues.

Remain many np/scipy operations to be replaced.

Related issue number
type: feature backported already mod: learn
opened by Fernadoo 14
Implement {DataFrame, Series}.empty

What do these changes do?

I re-implemented {DataFrame, Series}.empty and add related test cases.

Related issue number

Issue #1830 should be resolved
type: feature mod: dataframe backported already

opened by Fernadoo 14
Implementation of `Dataframe.merge/join` by shuffle

What do these changes do?

A proof-of-concept implementation of DataFrame.merge/join (in pandas the join is implemented using merge).

The merge is implemented by shuffle on the keys that to be used to merge.

Related issue number

Resolves #571
type: feature mod: dataframe backported already

opened by sighingnow 13

Performance doubt about svd

My code is here.

from time import time

def test_numpy():
    start_time = time()
    import numpy as np
    a = np.random.rand(5000, 4000)
    U, s, V = np.linalg.svd(a)
    print(time()-start_time)

def test_mars():
    import mars.tensor as mt
    start_time = time()
    a = mt.random.rand(5000, 4000,chunk_size=100)
    U, s, V = mt.linalg.svd(a).execute()
    print(time()-start_time)

if __name__ == '__main__':
    test_numpy()
    test_mars()

The result is:

36.3145694732666
141.55802178382874

Does this mean that this function is four or five times slower than numpy when the Tall Skinny QR (TSQR) Matrix Factorization condition is not met?

opened by SunYanCN 13

Update Mars on Ray doc
What do these changes do?

Related issue number

Fixes #xxxx

Check code requirements

[ ] tests added / passed (if needed)

[ ] Ensure all linting tests pass, see here for how to run them

type: docs
opened by fyrestone 0
Bump json5, babel-loader and react-hot-loader in /mars/services/web/ui
Bumps json5 to 2.2.3 and updates ancestor dependencies json5, babel-loader and react-hot-loader. These dependencies need to be updated together.

Updates json5 from 2.2.0 to 2.2.3

Release notes

Sourced from json5's releases.

v2.2.3

Fix: [email protected] is now the 'latest' release according to npm instead of v1.0.2. (#299)

v2.2.2

Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).

v2.2.1

Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)

Changelog

Sourced from json5's changelog.

v2.2.3 [code, diff]

Fix: [email protected] is now the 'latest' release according to npm instead of v1.0.2. (#299)

v2.2.2 [code, diff]

Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).

v2.2.1 [code, diff]

Fix: Removed dependence on minimist to patch CVE-2021-44906. (#266)

Commits

c3a7524 2.2.3

94fd06d docs: update CHANGELOG for v2.2.3

3b8cebf docs(security): use GitHub security advisories

f0fd9e1 docs: publish a security policy

6a91a05 docs(template): bug -> bug report

14f8cb1 2.2.2

10cc7ca docs: update CHANGELOG for v2.2.2

7774c10 fix: add proto to objects and arrays

edde30a Readme: slight tweak to intro

97286f8 Improve example in readme

Additional commits viewable in compare view

Updates babel-loader from 8.2.2 to 8.3.0

Release notes

Sourced from babel-loader's releases.

v8.3.0

New features

Pass external dependencies from Babel to Webpack by @nicolo-ribaudo in babel/babel-loader#971

Full Changelog: https://github.com/babel/babel-loader/compare/v8.2.5...v8.3.0

v8.2.5

What's Changed

fix: respect inputSourceMap loader option by @alan-agius4 in babel/babel-loader#896

New Contributors

@alan-agius4 made their first contribution in babel/babel-loader#896

Full Changelog: https://github.com/babel/babel-loader/compare/v8.2.4...v8.2.5

v8.2.4

What's Changed

doc(README.md): fix a broken markdown link by @loveDstyle in babel/babel-loader#919

Bump loader-utils to 2.x by @stianjensen in babel/babel-loader#931

Use md5 hashing for OpenSSL 3 by @pathmapper in babel/babel-loader#924

Thanks @loveDstyle, @stianjensen and @pathmapper for your first PRs!

8.2.3

This release fixes compatibility with Node.js 17

Use md5 hash for caching on node v17 (babel/babel-loader#918)

Thanks @Reptarsrage!

Commits

9bf5652 8.3.0

80ab7d0 Update @babel/ dependencies

493ac6c Pass external dependencies from Babel to Webpack (#971)

df28fe3 Fix broken main test (#950)

0b338e4 update hash method so it doesn't fail on a fips enabled machine (#939)

1f98d3c 8.2.5

c622868 fix: respect inputSourceMap loader option (#896)

f7982c1 8.2.4

4bb9e21 Use md5 hashing for OpenSSL 3 (#924)

247c94b Bump loader-utils to 2.x (#931)

Additional commits viewable in compare view

Updates react-hot-loader from 4.13.0 to 4.13.1

Changelog

Sourced from react-hot-loader's changelog.

Changelog

All notable changes to this project will be documented in this file. See standard-version for commit guidelines.

Commits

07d1f4e 4.13.1

bddd179 update loader-utils version (#1849)

f58a931 Remove space after peerDependency version definition (#1843)

642b705 README.mdfix typo (#1811)

ed8b310 Note create-react-app support for fast refresh in readme (#1708)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.

type: enhancement mod: web to be backported
opened by dependabot[bot] 0
[PROPOSAL] Worker node failover based on subtask lineage
Problem

Currently, the supervisor will cancel the execution of the entire stage after receiving an error report that the execution of the subtask fails when the MainPool process exits in Mars. For traditional batch processing, just rerun is good. However, this is very unfriendly to the scenario of a large job, because it has many subtasks and takes a long time to run. Once it fails, it will be expensive to rerun. At the same time, large jobs generally require much more nodes, and the probability of corresponding node failures will also increase. Once a node failure causes the MainPool to exit, the data on the corresponding node will be lost, and subsequent dependencies execution will fail. For example, a job has been running for more than 20 hours, and the execution is 90% complete, but because a certain MainPool exits, the entire job will fail. Large jobs are relatively common in modern data processing. A job will take up 1200 nodes or more, and it will take about 40 hours. In order to solve the node failure problem and ensure the stable and normal operation of jobs, a complete node failover solution is required.

Solution

As shown in the figure above, the green arrow is the normal execution flow, and the red arrow is the exception handling execution flow.

TaskSupervisorService first tiles a high-level computing graph into a low-level fine-grained computing graph. A node in this graph is a subtask; then, the subtask graph is handed over to SchedulingSupervisorService for scheduling.

Under normal circumstances, TaskManagerActor submits coarse-grained graphs to TaskProcessor, and after processing (such as tiles, fuses, and lineage records) it is then handed over to SubtaskManagerActor for scheduling and execution. When a subtask finishes executing, the TaskManagerActor is notified.

When an error occurs, such as a node fails, it will go to the exception handling process of the red arrow. First, perform error detection to determine whether recovery is possible, if possible, perform error recovery, and if not, report an error.

Lineage relationship management

Lineage relationship management is mainly for the convenience of backtracking data. When a subtask fails due to lack of upstream dependent data (node failure will cause data loss on it), lineage relationship can help us quickly find the subtask that generated the data. Then we can rerun these subtasks and restore the data. There are two types of lineage relationship data in the same SubtaskGraph and in different SubtaskGraph.

In the same SubtaskGraph: When a subtask in a task only depends on its own subtasks, only one SubtaskGraph is involved; at this time, it is only necessary to find the upstream in the current graph.

In different SubtaskGraph: When a task needs to use the data of the previous task, two SubtaskGraph will be involved; at this time, it is necessary to find the upstream in the two SubtaskGraph. For the first case, the existing upstream and downstream information in the SubtaskGraph already meets the requirements; while the second case, we need to find the upstream across the SubtaskGraph, so we should record a concise lineage relationship to facilitate quick search. In fact, lineage relationship management only needs to record the last subtasks of each SubtaskGraph, that is, the result subtasks.

Error Detection

Error analysis

After Node exits, what errors will there be? How to detect? How to recover?

It will report ServerClosed error if SubtaskManagerActor continue to submit subtasks to the failed node.

Subtasks that depend on the data of the failed node will fail to execute, and there are two types of errors:

ServerClosed will be reported when MainPool is preparing data while the dependent node has exited.

DataNotExist will be reported when the MainPool is executing while the dependent node has exited.

How to catch these errors

In the first case, we can catch the ServerClosed directly after SubtaskManagerActor is submitted.

In the second case, we need to catch exceptions in TaskStageProcessor when set_subtask_result.

Error Recovery

For the first case:

Mark the ServerClosed node as stopped to avoid submitting other subtasks to it.

Resubmit failed subtasks to other nodes.

Resubmit subtasks queued on the failed node to other nodes. For the second case, we need to process as shown in the figure below:

Find the dependent subtasks of the failed subtask. If the dependent subtask does not support re-running, cancel the current stage and report an error; if it supports re-running, add these subtasks to the lost_objects_queue (there is a priority field in the subtask, which can be determined according to this priority to schedule).

When crossing subtask graphs, it is not enough to distinguish the dependent subtasks found only by priority, and the earlier subtask must be rerun first. In order to better distinguish priorities, the execution order + subtask priority are used as priority considerations. That is, when recording lineage relationship, it is necessary to record the execution order of the subtask graph.

Record the dependent subtasks and failed subtasks, and when the dependent subtasks backtrack successfully, continue to schedule the failed subtasks.

Every time when scheduling, the subtask in the lost_objects_queue is prioritized, and the subtask is selected according to the priority; then the subtask in the worker queue is scheduled.

Node Management

Node management mainly includes two aspects:

Just mark it as stopped when a node fails, which is equivalent to removing it from the cluster to avoid subsequent submissions.

Use autoscale to add new nodes to the cluster if the cluster resources are insufficient.

Todos

Checkpoint

There will be a lot of backtracking, and the cost will be relatively high if the execution chain of the job is relatively long, especially when there are wide dependencies. In order to solve this problem, we checkpoint the critical operation stage and materialize the intermediate data to external storage. In this way, excessive lineage searches and subtask backtracking can be avoided. Of course, the checkpoint function is configurable and does not depend strongly on it.
opened by zhongchun 0
[WIP][Scheduling] Add worker node failover with lineage
What do these changes do?

Currently, the supervisor will cancel the execution of the entire stage after receiving an error report that the execution of the subtask fails when the MainPool process exits in Mars. For traditional batch processing, just rerun is good. However, this is very unfriendly to the scenario of a large job, because it has many subtasks and takes a long time to run. Once it fails, it will be expensive to rerun. At the same time, large jobs generally require much more nodes, and the probability of corresponding node failures will also increase. Once a node failure causes the MainPool to exit, the data on the corresponding node will be lost, and subsequent dependencies execution will fail. For example, a job has been running for more than 20 hours, and the execution is 90% complete, but because a certain MainPool exits, the entire job will fail. Large jobs are relatively common in modern data processing. A job will take up 1200 nodes or more, and it will take about 40 hours. In order to solve the node failure problem and ensure the stable and normal operation of jobs, a complete node failover solution is required.

Related issue number

Issue #3308

Check code requirements

[ ] tests added / passed (if needed)

[ ] Ensure all linting tests pass, see here for how to run them
opened by zhongchun 0
Bump express from 4.17.1 to 4.17.3 in /mars/services/web/ui
Bumps express from 4.17.1 to 4.17.3.

Release notes

Sourced from express's releases.

4.17.3

deps: accepts@~1.3.8

deps: mime-types@~2.1.34

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

Fix handling of __proto__ keys

pref: remove unnecessary regexp for trust proxy

4.17.2

Fix handling of undefined in res.jsonp

Fix handling of undefined when "json escape" is enabled

Fix incorrect middleware execution with unanchored RegExps

Fix res.jsonp(obj, status) deprecation message

Fix typo in res.is JSDoc

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: type-is@~1.6.18

deps: [email protected]

deps: [email protected]

deps: [email protected]

Fix maxAge option to reject invalid values

deps: proxy-addr@~2.0.7

Use req.socket over deprecated req.connection

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

pref: ignore empty http tokens

deps: [email protected]

deps: [email protected]

deps: [email protected]

Changelog

Sourced from express's changelog.

4.17.3 / 2022-02-16

deps: accepts@~1.3.8

deps: mime-types@~2.1.34

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

Fix handling of __proto__ keys

pref: remove unnecessary regexp for trust proxy

4.17.2 / 2021-12-16

Fix handling of undefined in res.jsonp

Fix handling of undefined when "json escape" is enabled

Fix incorrect middleware execution with unanchored RegExps

Fix res.jsonp(obj, status) deprecation message

Fix typo in res.is JSDoc

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: type-is@~1.6.18

deps: [email protected]

deps: [email protected]

deps: [email protected]

Fix maxAge option to reject invalid values

deps: proxy-addr@~2.0.7

Use req.socket over deprecated req.connection

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

deps: [email protected]

pref: ignore empty http tokens

deps: [email protected]

deps: [email protected]

deps: [email protected]

Commits

3d7fce5 4.17.3

f906371 build: update example dependencies

6381bc6 deps: [email protected]

a007863 deps: [email protected]

e98f584 Revert "build: use [email protected] for Node.js < 4"

a659137 tests: use strict mode

a39e409 tests: prevent leaking changes to NODE_ENV

82de4de examples: fix path traversal in downloads example

12310c5 build: use nyc for test coverage

884657d examples: remove bitwise syntax for includes check

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.

type: enhancement mod: web to be backported
opened by dependabot[bot] 0
Add supports for CGroup V2
What do these changes do?

Add supports for CGroup V2 under new kernels.

Related issue number

Fixes #xxxx

Check code requirements

[x] tests added / passed (if needed)

[x] Ensure all linting tests pass, see here for how to run them
opened by wjsi 0

Releases(v0.10.0a1)

v0.10.0a1(Jun 12, 2022)
This is the release notes of v0.10.0a1. See here for the complete list of solved issues and merged PRs.

New Features

Oscar

Stop importing main module when starting Mars local cluster (#3110)

Tensor

Integrate special error functions (#3060)

Integrate part of scipy elliptic functions and integrals (#3111)

DataFrame

Support sort=True for Groupby (#2959, thanks @sak2002!)

Enhancements

Disable bloom filter in merge for now (#2967)

[Ray] Implement ray task executor progress (#3008)

Dump remote tracebacks to make local ones more friendly (#3028)

Use tell when remove mapper data after execution (#3027)

Optimize import speed for Mars package (#3022)

Do not aggressively choose tree method in tile of groupby for distributed setting (#3032)

[Ray] Implements get_chunks_result for Ray execution context (#3023)

Refine ThreadedServiceContext.get_chunks_meta usage (#3037)

Shuffle both sides at the same time for md.merge (#3041)

Assign reducer ops in task assigner to make them more balanced across cluster (#3048)

[Ray] Destroy Ray executor when the task finish (#3049)

[Ray] Implements get_chunks_meta for Ray execution context (#3052)

[Ray] Support basic subtask retry and lineage reconstruction (#2969)

Combine tree and shuffle methods in DataFrameGroupBy.agg tile (#3051)

[Ray] Implements get_total_n_cpu for Ray execution context (#3059)

[Ray] Implement cancel method on Ray task executor (#3044)

Use OS-designated ports instead of random ports to create sub pools (#3053)

Unify DataFrameGroupByAgg's tile logic for auto method (#3084)

Simplify router clean up when pools or clusters ends (#3086)

Call immutable web API only once when previous call blocks (#3085)

[Ray] Create RayTaskState actor as needed by default (#3081)

[Ray] Implement gc for ray task executor context (#3061)

Simplify argument passing in actor batch calls (#3098)

Optimize performance of transfer (#3091)

Add n_reducers and reducer_ordinal to shuffle operands (#3055)

Optimize serializable memory (#3120)

Bug fixes

Fix errors when deleting mapper data (#3018)

Fix recursive_tile that it may cause duplicated tile for one tileable (#3021)

Fix error message when sparse data format not supported (#3046)

Patch pandas to make pickle compatible between 1.2 and 1.3 (#3047)

Fix chunk index error in auto_merge_chunks (#3057)

[Ray] Fix ray worker failover (#3080)

[Metric] Fix prometheus metric backend (#3124)

Fix mt.{cumsum, cumprod} when the first chunk is empty (#3134)

Tests

Check initialization of serializables on CI (#3007)

Use @pytest_asyncio.fixture instead of @pytest.fixture for async fixtures (#3025)

Change code owners to Mars PMC maintainers (#3031)

[Ray] Fix ray executor progress test (#3033)

[Ray] Optimize Ray CI execution time and stability (#3102)

Make test_session_set_progress more stable under Ray tests (#3103)

Update pytest imports for test_special.py (#3129)

[Ray] Fix flaky test test_optional_supervisor_node (#3133)

Others

Build web code before CIBW when deploying to PyPI (#3014)

Make PyPI user name configurable (#3130)

Source code(tar.gz)
Source code(zip)
v0.9.0(Jun 12, 2022)
This is the release notes of v0.9.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.9.0rc3; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1 alpha2 beta1 beta2 rc1 rc2 rc3

Changes that break compatibility

From v0.9 on, Python 3.6 is dropped support.

Highlights

Performance is fully optimized in this version, welcome to give your feedback.

New Features

Oscar

Stop importing main module when starting Mars local cluster (#3113)

Tensor

Integrate special error functions (#3062)

Integrate part of scipy elliptic functions and integrals (#3112)

DataFrame

Support sort=True for Groupby (#3063, thanks @sak2002!)

Enhancements

Dump remote tracebacks to make local ones more friendly (#3030)

Optimize import speed for Mars package (#3035)

[Ray] Implement ray task executor progress (#3065)

Shuffle both sides at the same time for md.merge (#3066)

Refine ThreadedServiceContext.get_chunks_meta usage (#3067)

Do not aggressively choose tree method in tile of groupby for distributed setting (#3070)

Disable bloom filter in merge for now (#3071)

[Ray] Implements get_chunks_result for Ray execution context (#3072)

Use tell when remove mapper data after execution (#3073)

Assign reducer ops in task assigner to make them more balanced across cluster (#3075)

[Ray] Destroy Ray executor when the task finish (#3074)

Combine tree and shuffle methods in DataFrameGroupBy.agg tile (#3077)

[Ray] Implements get_chunks_meta for Ray execution context (#3076)

Use OS-designated ports instead of random ports to create sub pools (#3087)

Call immutable web API only once when previous call blocks (#3088)

Unify DataFrameGroupByAgg's tile logic for auto method (#3094)

[Ray] Support basic subtask retry and lineage reconstruction (#3097)

Simplify argument passing in actor batch calls (#3100)

[Ray] Implements get_total_n_cpu for Ray execution context (#3104)

Optimize performance of transfer (#3105)

Add n_reducers and reducer_ordinal to shuffle operands (#3107)

[Ray] Implement cancel method on Ray task executor (#3093)

[Ray] Create RayTaskState actor as needed by default (#3114)

[Ray] Implement gc for ray task executor context (#3116)

Optimize serializable memory (#3126)

Bug fixes

Patch pandas to make pickle compatible between 1.2 and 1.3 (#3050)

Fix errors when deleting mapper data (#3064)

Fix chunk index error in auto_merge_chunks (#3068)

Fix recursive_tile that it may cause duplicated tile for one tileable (#3069)

[Ray] Fix ray worker failover (#3115)

[Ray] Fix pandas schema parsing when reading Ray dataset (#3117)

[Ray] fix auto scale-in hang (#3125)

[Metric] Fix prometheus metric backend (#3127)

Fix mt.{cumsum, cumprod} when the first chunk is empty (#3136)

Tests

Check initialization of serializables on CI (#3013)

[Ray] Optimize Ray CI execution time and stability (#3121)

Update pytest imports for test_special.py (#3131)

[Ray] Fix flaky test test_optional_supervisor_node (#3135)

Others

Build web code before CIBW when deploying to PyPI (#3016)

Source code(tar.gz)
Source code(zip)
v0.8.7(May 10, 2022)
This is the release notes of v0.8.7.

Bug fixes

Fixes missing web packages in Linux wheels (#3014)

Source code(tar.gz)
Source code(zip)
v0.9.0rc3(May 7, 2022)
This is the release notes of v0.9.0rc3. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implementing Ellipsoidal Harmonics Functions (#2891, thanks @shantam-8!)

Services

Support worker meta service (#2909)

Basic Ray execution backend (#2921)

Enhancements

Add execution API to enable custimization of Mars Task Service (#2894)

Optimize serialization performance (#2914)

Skip adding band in meta when fetch shuffle data (#2922)

Store complete meta on worker and update supervisor meta via fetching from workers (#2912)

Use cython to accelerate core serialization (#2924)

Refine lifecycle api to support incref or decref with ref counts (#2926)

Ignore fetch operands when assign initial nodes (#2929)

Use cython to accelerate message serialization (#2932)

Ignore broadcaster's locality when assign subtasks (#2943)

Allow spawning serialization to threads for large objects (#2944)

Add metrics and event report for Ray channels (#2936)

Add more logs about execution info (#2940)

Add support for dask.persist (#2953, thanks @loopyme!)

Remove should_be_monotonic property (#2949)

Add metrics on operand and subtask executions (#2947, thanks @zhongchun!)

[Ray] optimize ray fetcher by query in remote node (#2957)

Improve deploy backend (#2958)

Support reporting tile progress (#2954)

Add logic key for tileable graph (#2961, thanks @zhongchun!)

[Ray] Loads the subtask inputs from meta (#2976)

New ExecutionConfig API (#2968)

Fix speculative execution compatibility with coloring (#2995)

Make functions that may take long run in thread for lifecycle tracker (#2992)

Optimize metric configs (#2996, thanks @zhongchun!)

Expand the ability of resource evaluator (#2997, thanks @zhongchun!)

Optimize gen subtask graph (#3004)

[Ray] Ray execution state (#3002)

Bug fixes

Fix paramter issue of worker actor pool (#2911, thanks @zhongchun!)

Fix default config to ensure storage backends configured (#2935)

Wrap errors in operand execution to protect scheduling service (#2964)

Fix dtype of series result for DataFrame.apply (#2978)

Fix potential data leak for shuffle tasks (#2975)

Fix potential empty chunks when creating DataFrame from pandas (#2987)

[Ray] Support new ray cluster through ray client (#2981)

Fix missing extra_params when constructing operands (#2999)

Fix msg_to_simple_str in Ray backend and add tests (#3003)

Fix incorrect result for df.sort_values when specifying multiple ascending (#2984)

Documentation

Add development documents for metrics (#2955, thanks @zhongchun!)

Tests

Add TPC-H benchmarks (#2937)

Fix Ray cases (#2983)

Fix version mismatch between kubernetes and minikube (#2986)

Allow selecting TPC queries (#3005)

Source code(tar.gz)
Source code(zip)
v0.8.6(May 7, 2022)
This is the release notes of v0.8.6. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implementing Ellipsoidal Harmonics Functions (#2927, thanks @shantam-8!)

Enhancements

Add support for dask.persist (#2990, thanks @loopyme!)

Optimize gen subtask graph (#3006)

Ignore broadcaster's locality when assign subtasks (#2994)

Bug fixes

Fix task hang when error object cannot be pickled (#2913)

Fix potential KeyError in actor_ref calls when running with multiple processes (#2962)

Wrap errors in operand execution to protect scheduling service (#2971)

Fix dtype of series result for DataFrame.apply (#2979)

Fix default config to ensure storage backends configured (#2989)

Fix potential empty chunks when creating DataFrame from pandas (#2991)

Fix incorrect result for df.sort_values when specifying multiple ascending (#3006)

Fix missing extra_params when constructing operands (#3006)

Tests

Fix version mismatch between kubernetes and minikube (#2988)

Source code(tar.gz)
Source code(zip)
v0.9.0rc2(Apr 9, 2022)
This is the release notes of v0.9.0rc2. See here for the complete list of solved issues and merged PRs.

New Features

Web

Add stack display page on Mars Web (#2876)

Enhancements

Avoid printing too many messages in Oscar (#2871)

Expand slot scheduler to resource scheduler (#2846, thanks @zhongchun!)

Optimized iterative tiling by pruning unrelated chunks (#2874)

Optimize DataFrameIsin's tile (#2864)

Add benchmark for serialization (#2901)

[Ray] Ray client channel get recv when first complied (#2740, thanks @Catch-Bull!)

Use bloom filter to optimize df.merge execution (#2895)

Stop recording all mapper meta (#2900)

[Ray] Use main pool as owner when autoscale disabled (#2878)

Bug fixes

Fix XGBoost when some workers do not have evals data (#2861)

Fix duplicate node iteration in GraphAssigner (#2857)

Raise ActorNotExist when no supervisors available (#2859)

Fix dtype infer in DataFrame arithmetic on datetime consts (#2879)

Fix timeout for wait_task (#2883)

Make sure error can be raised in Actor.__pre_destroy__ (#2887)

Tests

Upgrade azure-pipelines to Python 3.9 (#2862)

Adapt to official cancel of Github Actions (#2902)

Source code(tar.gz)
Source code(zip)
v0.8.5(Apr 9, 2022)
This is the release notes of v0.8.5. See here for the complete list of solved issues and merged PRs.

New Features

Web

Add stack display page on Mars Web (#2881)

Enhancements

Avoid printing too many messages in Oscar (#2880)

[Ray] Use main pool as owner when autoscale disabled (#2903)

Bug fixes

Fix XGBoost when some workers do not have evals data (#2863)

Raise ActorNotExist when no supervisors available (#2869)

Fix dtype infer in DataFrame arithmetic on datetime consts (#2880)

Fix duplicate node iteration in GraphAssigner (#2880)

Fix timeout for wait_task (#2890)

Make sure errors can be raised in Actor.__pre_destroy__ (#2892)

Tests

Upgrade azure-pipelines to Python 3.9 (#2886)

Adapt to official cancel of Github Actions (#2903)

Source code(tar.gz)
Source code(zip)
v0.9.0rc1(Mar 23, 2022)
This is the release notes of v0.9.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implements mars.tensor.setdiff1d (#2823)

Learn

Added support for mars.learn.metrics.roc_auc_score (#2832)

Services

A speculative execution based task scheduler (#2576)

Metric

[ray] Add metric for ray object store (#2776, thanks @Catch-Bull!)

Others

Use versioneer to manage release versions (#2806)

Enhancements

Support generating a DOT file for subtask graph (#2803)

Support generating dtypes, index_value etc lazily for DataFrame chunks (#2756)

[ray] Default enable fault tolerance for ray (#2801)

Improve subtask details in logs (#2836)

Accurate resource management for global slot manager (#2732)

Configure nthread of XGBoost jobs (#2844)

Improved performance of mars.learn.metrics.{roc_curve, roc_auc_score} (#2838)

Bump minimist and nanoid in Mars UI due to security alerts (#2849)

Fix store duplicate chunk and meta per subtask (#2845)

Bug fixes

Fix default value of gpu property for some operands (#2811)

Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2817)

Fix race condition of set_subtask_result (#2784)

Fix duplicate subtask submit (#2815)

Change StorageHandlerActor to stateful (#2824)

Fix running xgboost on Ray cluster (#2826)

Fix FileSystem.ls for OSS (#2837)

Stop fetching data when pure dependencies specified (#2840)

Fix dirty version number caused by versioneer when building with cibuildwheel (#2855)

Tests

[Ray] Refine ray tests (#2793)

Build docker images cronically (#2804)

Introduce asv benchmark (#2798)

Source code(tar.gz)
Source code(zip)
v0.8.4(Mar 23, 2022)
This is the release notes of v0.8.4. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implements mars.tensor.setdiff1d (#2829)

Learn

Added support for mars.learn.metrics.roc_auc_score (#2841)

Others

Use versioneer to manage release versions (#2807)

Use cibuildwheel to release wheels (#2854)

Enhancements

Support generating a DOT file for subtask graph (#2818)

Enhance subtask details in logs (#2842)

Configure cores of XGBoost jobs (#2847)

Improved performance of mars.learn.metrics.{roc_curve, roc_auc_score} (#2850)

Fix store duplicate chunk and meta per subtask (#2851)

Bump minimist and nanoid in Mars UI due to security alerts (#2851)

Bug fixes

Fix race condition of set_subtask_result (#2819)

Fix duplicate subtask submit (#2819)

Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2819)

Fix default value of gpu property for some operands (#2820)

Fix running xgboost on Ray cluster (#2830)

Change StorageHandlerActor to stateful (#2830)

Fix FileSystem.ls for OSS (#2842)

Stop fetching data when pure dependencies specified (#2843)

Tests

[Ray] Refine ray tests (#2810)

Build docker images cronically (#2807)

Source code(tar.gz)
Source code(zip)
v0.9.0b2(Mar 8, 2022)
This is the release notes of v0.9.0b2. See here for the complete list of solved issues and merged PRs.

New Features

Metric

Add metric framework (#2742, thanks @zhongchun!)

Add prometheus metric implementation (#2752, thanks @zhongchun!)

Add ray metrics implementation (#2749, thanks @zhongchun!)

Add common metrics (#2760, thanks @zhongchun!)

Enhancements

Simplify rechunk implementation (#2745)

Stop inferring outputs when args provided (#2759)

Add broadcast merge support for DataFrame (#2772)

Remove deprecate warnings when import mars.tensor (#2788)

Optimize in-process actor calls (#2763)

[ray] New ray actor creation model (#2783)

Bug fixes

Fix duplicate dec object ref (#2741, thanks @Catch-Bull!)

Fix long exception of asyncio.gather (#2748)

Fix NameError: name 'pq' is not defined if pyarrow is not installed (#2751)

Fix profiling band_subtasks and most_calls are empty if the slow duration is large (#2755)

Fix the wrong result of df.merge (#2774)

Fix DataFrame initializer when Mars object exists in list (#2770)

[ray] support ray client mode (#2773)

Tests

Increase test stability for command-line tests (#2779)

Source code(tar.gz)
Source code(zip)
v0.8.3(Mar 8, 2022)
This is the release notes of v0.8.3. See here for the complete list of solved issues and merged PRs.

Enhancements

Stop inferring outputs when args provided (#2761)

Remove deprecate warnings when import mars.tensor (#2790)

[Ray] New ray actor creation model (#2794)

Bug fixes

Fix long exception of asyncio.gather (#2753)

Fix wrong result of df.merge (#2777)

Fix DataFrame initializer when Mars object exists in list (#2778)

Fix duplicate dec object ref (#2789, thanks @Catch-Bull!)

[Ray] Support Ray client mode (#2796)

Tests

Increase test stability for command-line tests (#2786)

Source code(tar.gz)
Source code(zip)
v0.9.0b1(Feb 21, 2022)
This is the release notes of v0.9.0b1. See here for the complete list of solved issues and merged PRs.

Highlights

A new coloring-based fusion algorithm is introduced in #2719, performance is expected to have a significant increase compared to previous releases, however, some unexpected situations may happen, feel free to reach out to us if you find any.

New Features

DataFrame

Support inclusive argument for pd.date_range (#2718)

Others

Add cibuildwheel with Linux AArch64 wheel build support (#2672, thanks @odidev!)

Enhancements

Refine failure recovery log and exception (#2633)

Optimize eval-setitem expressions as single eval expressions (#2695)

Auto merge small chunks when df.groupby().apply(func) is doing aggregation (#2708)

Optimize GroupBy's aggregation algorithm (#2696)

[Ray] refine ray dataset integration (#2705)

Improve profiling (#2629)

Add support for reading partitioned parquet for fastparquet (#2724)

Introduce coloring based fusion algorithm (#2719)

Fix duplicate exceptions in log (#2723)

Bug fixes

Fix sort_values for empty DataFrame or Series (#2681)

Eliminate redundant eval node in optimization (#2683)

Avoid iterative tiling for df.loc[:, fields] (#2685)

[hotfix][ray] fix ray dataset compatibility (#2693)

Fix use_arrow_dtype parameter for read_parquet (#2698)

Fix error on dependent DataFrame setitems (#2701)

Fix estimate_pandas_size for pd.MultiIndex (#2707)

Import vineyard.data.pickle to make members available. (#2714)

Fix shuffle when ndim of input tensors are different (#2727)

Documentation

Add Slack invite link (#2704, thanks @yuyiming!)

Source code(tar.gz)
Source code(zip)
v0.8.2(Feb 21, 2022)
This is the release notes of v0.8.2. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame

Support inclusive argument for pd.date_range (#2721)

Enhancements

Optimize eval-setitem expressions as single eval expressions (#2699)

[Ray] Refine raydataset integration (#2712)

[Ray] refine ray dataset integration (#2726)

Add support for reading partitioned parquet for fastparquet (#2729)

Fix duplicate exceptions in log (#2736)

Bug fixes

Fix sort_values for empty DataFrame or Series (#2686)

Eliminate redundant eval node in optimization (#2688)

Avoid iterative tiling for df.loc[:, fields] (#2689)

Fix use_arrow_dtype parameter for read_parquet (#2702)

Fix error on dependent DataFrame setitems (#2703)

Fix estimate_pandas_size on pd.MultiIndex (#2710)

Import vineyard.data.pickle to make members available (#2716)

Fix shuffle when ndim of input tensors are different (#2728)

Source code(tar.gz)
Source code(zip)
v0.9.0a2(Feb 3, 2022)
This is the release notes of v0.9.0a2. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame

Add support for GroupBy.{ffill, bfill,fillna} (#2639, thanks @Marascax!)

Add nunique support for DataFrameGroupBy (#2662)

Others

Add wheel support for Python 3.10 and drop Python 3.6 (#2622)

Enhancements

Added merging small files support for md.{read_parquet, read_csv} (#2661)

Add support for HTTP request rewriter (#2664)

Optimize filtering DataFrame with its fields (#2571)

Add pyproject.toml to config build packages (#2674)

Bug fixes

Fix backward compatibility for pandas 1.1 and 1.2 (#2624)

Fix backward compatibility for pandas 1.0 (#2628)

Fix NotImplementedError for mo.batch when single call not implemented (#2635)

Fix IndexError raise by aggregation of DataFrameGroupBy (#2641)

Fix compatibility for pandas 1.4 (#2650)

Fix df.loc[:] to make sure same index_value key generated (#2643)

Fix aggregation with comparison (#2647)

Fix the wrong index_value generated by df.loc[:] (#2658)

Fix optimizing DataFrame query with timestamp in conditions (#2671)

Fix as_index when calling agg on SeriesGroupBy (#2676)

Source code(tar.gz)
Source code(zip)
v0.8.1(Feb 3, 2022)
This is the release notes of v0.8.1. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame

Add support for GroupBy.{ffill, bfill,fillna} (#2657, thanks @Marascax!)

Add nunique support for DataFrameGroupBy (#2667)

Enhancements

Add support for HTTP request rewriter (#2665)

Add merging small files support for md.{read_parquet, read_csv} (#2669)

Optimize filtering DataFrame with its fields (#2668)

Bug fixes

Allow specifying multiple supervisor processes (#2625)

Fix backward compatibility for pandas 1.0 (#2630)

Fix NotImplementedError for mo.batch when single call not implemented (#2637)

Fix compatibility for pandas 1.4 (#2652)

Fix IndexError raise by aggregation of DataFrameGroupBy (#2653)

Fix df.loc[:] to make sure same index_value key generated (#2654)

Fix aggregation with comparison (#2655)

Fix the wrong index_value generated by df.loc[:] (#2666)

Fix as_index when calling groupby-agg (#2678)

Source code(tar.gz)
Source code(zip)
v0.9.0a1(Dec 16, 2021)
This is the release notes of v0.9.0a1. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implements mt.bincount (#2548)

DataFrame

Support Series.median() (#2566, thanks @perfumescent!)

Learn

Add mars.learn.metrics.multilabel_confusion_matrix and derivative metrics (#2554)

Services

Add basic profiling support for supervisor (#2586)

Enhancements

Add app_queue in new_cluster (#2550, thanks @xxxxsk!)

Implement web API of get_infos (#2558)

Reduce time cost of cpu_percent() calls (#2567)

Reduce estimation time cost (#2577)

[ray] refine mars on ray usability (#2580)

[ray] Refine raydataset integration (#2579)

Optimize tileable graph construction (#2583)

Stop calling user funcs when dtypes is specified (#2587)

Supports adding Mars extensions via setup entrypoints (#2589)

Skip details of shuffled chunks in meta (#2600)

Reduce the time cost of fetching tileable data (#2594)

Use batched request to apply for slots (#2601)

Reduce RPC cost of oscar by removing unnecessary tasks (#2597)

Bug fixes

Fix index series.apply when result index unchanged (#2557)

Stop using asdict to handle dataclasses (#2561)

Fix tests under cudf 21.10 (#2608)

Fix DataFrame getitem when exists duplicate columns (#2581)

Upgrade required version of vineyard. (#2588)

Fix progress always is 0 or 100% (#2591)

Make Proxima work with latest Mars (#2599, thanks @yuyiming!)

Fix None dtype for some unary tensor functions (#2603)

Fix duplicate decref of subtask input chunk (#2611, thanks @Catch-Bull!)

Documentation

Add a document about how to implement a Mars operand (#2562)

Source code(tar.gz)
Source code(zip)
v0.8.0(Dec 16, 2021)
This is the release notes of v0.8.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.8.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1 alpha2 alpha3 beta1 beta2 rc1

New Features

Tensor

Implements mt.bincount (#2552)

DataFrame

Support Series.median (#2570, thanks @perfumescent!)

Learn

Add mars.learn.metrics.multilabel_confusion_matrix and derivative metrics (#2568)

Enhancements

Implement web API of get_infos (#2564)

Reduce time cost of cpu_percent() calls (#2572)

Stop calling user funcs when dtypes is specified (#2596)

Supports adding Mars extensions via setup entrypoints (#2598)

[Ray] Refine mars on ray usability (#2606)

Reduce estimation time cost (#2607)

Skip details of shuffled chunks in meta (#2609)

Reduce the time cost of fetching tileable data (#2616)

Reduce RPC cost of oscar by removing unnecessary tasks (#2613)

Use batched request to apply for slots (#2615)

Bug fixes

Fix index series.apply when result index unchanged (#2563)

Fix DataFrame getitem when exists duplicate columns (#2582)

Upgrade required version of vineyard (#2593)

Fix progress always is 0 or 100% (#2595)

Fix None dtype for some unary tensor functions (#2604)

Make Proxima work with latest Mars (#2605, thanks @yuyiming!)

Fix tests for cudf 21.10 (#2608)

Fix duplicate decref of subtask input chunk (#2614, thanks @Catch-Bull!)

Source code(tar.gz)
Source code(zip)
v0.8.0rc1(Oct 23, 2021)
This is the release notes of v0.8.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Add preliminary implementations for ufunc methods (#2510)

Add partial support for setitem with fancy indexing (#2453)

DataFrame

Support md.get_dummies() (#2323, thanks @hoarjour!)

Learn

Add make_regression support for learn module (#2515)

Implements fit and predict methods for bagging (#2516)

Implements mars.learn.ensemble.IsolationForest (#2531)

Implements mars.learn.preprocessor.LabelEncoder (#2542)

Services

Add web API for scheduling (#2533)

Web

Display tileable properties on web (#2525, thanks @RandomY-2!)

Others

Support mutable tensor on oscar (#2432, thanks @Coco58323!)

Add experimental support for CUDA under WSL for Windows 11 (#2538)

Enhancements

Use black to enforce code style (#2492)

Reduce indentation of frontend code (#2540)

Bug fixes

Fix output of df.groupby(as_index=False).size() (#2507)

[Ray] Fix web serialize lambda (#2512)

Fix reduction result on empty series (#2520)

Fix DataFrame.loc when df is empty (#2524)

Fix df.loc when providing empty list (#2528)

Documentation

Add doc for reading csv in oss (#2514, thanks @Catch-Bull!)

Source code(tar.gz)
Source code(zip)
v0.7.5(Oct 23, 2021)
This is the release notes of v0.7.5. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Add preliminary implementations for ufunc methods (#2513)

Add partial support for setitem with fancy indexing (#2544)

DataFrame

Implements md.get_dummies (#2534, thanks @hoarjour!)

Learn

Add make_regression support for learn module (#2517)

Implements mars.learn.preprocessor.LabelEncoder (#2545)

Services

Add web API for scheduling (#2535)

Web

Display tileable properties on web (#2539, thanks @RandomY-2!)

Others

Add experimental support for CUDA under WSL for Windows 11 (#2543)

Enhancements

Reduce indentation of frontend code (#2541)

Bug fixes

Fix output of df.groupby(as_index=False).size() (#2508)

Fix reduction result on empty series (#2522)

Fix df.loc when df is empty (#2526)

[Ray] Fix serializing lambdas in web (#2529)

Fix df.loc when providing empty list (#2532)

Documentation

Add doc for reading csv in oss (#2530, thanks @Catch-Bull!)

Source code(tar.gz)
Source code(zip)
v0.8.0b2(Oct 9, 2021)
This is the release notes of v0.8.0b2. See here for the complete list of solved issues and merged PRs.

New Features

Learn

Implements glm.LogisticRegression (#2466, thanks @Fernadoo!)

Implements bagging sampling (#2496)

Services

Basic reschedule subtask (#2467)

Web

Split tileable information and subtask graph into two tabs (#2480, thanks @RandomY-2!)

Include tileable property in detail api (#2493, thanks @RandomY-2!)

Enhancements

Support specified vineyard socket and skip the launching vineyardd process (#2481)

Refine MarsDMatrix & support more parameters for XGB classifier and regressor (#2498)

Bug fixes

Compatible with scikit-learn 1.0 (#2486)

Fix bug that failed to execute query when there are multiple arguments (#2490, thanks @perfumescent!)

Documentation

Fix wrong translation in cluster deployment. (#2489, thanks @perfumescent!)

Tests

Fix version of statsmodels to pass CI (#2497)

Source code(tar.gz)
Source code(zip)
v0.7.4(Oct 9, 2021)
This is the release notes of v0.7.4. See here for the complete list of solved issues and merged PRs.

New Features

Web

Split tileable information and subtask graph into two tabs (#2482, thanks @RandomY-2!)

Include tileable property in detail api (#2499, thanks @RandomY-2!)

Enhancements

Support specified vineyard socket and skip the launching vineyardd process (#2500)

Refine MarsDMatrix & support more parameters for XGB classifier and regressor (#2501)

Bug fixes

Compatible with scikit-learn 1.0 (#2487)

Fix bug that failed to execute query when there are multiple arguments (#2491, thanks @perfumescent!)

Source code(tar.gz)
Source code(zip)
v0.8.0b1(Sep 21, 2021)
This is the release notes of v0.8.0b1. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame

Integrate Mars DataFrame with Ray Dataset (#2393, thanks @vcfgv!)

Learn

Add _binary_roc_auc_score method (#2403, thanks @Divyanshu-Singh-Chauhan!)

Web

Support visualizing subtask graphs on Mars Web (#2426, thanks @RandomY-2!)

Others

Revisit {from,to}_vineyard for tensors and dataframes. (#2419)

[Ray] Reconstruct worker (#2413)

Make cmdline support third party modules (#2454)

Add nightly builds for docker images (#2456)

Enhancements

Refine and unify subtask detail APIs (#2465, thanks @RandomY-2!)

Bug fixes

Fix df/series.{apply, map_chunk} when some chunk output empty data (#2434)

Fix missing DAGs when initializing with empty API results (#2439, thanks @RandomY-2!)

Fix skew and kurt errors under MacOS (#2443)

Add tests for public kubernetes image (#2446)

Fix timeout error when waiting for a submitted task (#2457)

Print the error message when error happens in TaskProcessor (#2458)

Fix misuse of name parameter in DataFrame align (#2469, thanks @qxzhou1010!)

Fix hang when start sub pool fails (#2468)

Installation

Build and upload docker images in continuous deployment (#244)

Tests

Fix coverage for Azure pipeline (#2474)

Source code(tar.gz)
Source code(zip)
v0.7.3(Sep 21, 2021)
This is the release notes of v0.7.3. See here for the complete list of solved issues and merged PRs.

New Features

Learn

Add _binary_roc_auc_score method (#2477, thanks @Divyanshu-Singh-Chauhan!)

Web

Support visualizing subtask graphs on Mars Web (#2471, thanks @RandomY-2!)

Others

Revisit {from,to}_vineyard for tensors and dataframes (#2436)

Add nightly builds for docker images (#2462)

Make cmdline support third party modules (#2472)

Bug fixes

Fix df/series.{apply, map_chunk} when some chunk output empty data (#2437)

Fix missing DAGs when initializing with empty API results (#2442, thanks @RandomY-2!)

Fix skew and kurt errors under MacOS (#2445)

Fix usage of kubernetes image (#2448)

Fix timeout error when waiting for a submitted task (#2461)

Fix misuse of name parameter in DataFrame align (#2473, thanks @qxzhou1010!)

Fix hang when start sub pool fails (#2476)

Tests

Fix coverage for Azure pipeline (#2475)

Source code(tar.gz)
Source code(zip)
v0.7.2post1(Sep 10, 2021)
This release is a hotfix of v0.7.2 in order to fix the public docker image.

Bug fixes

Fix usage of kubernetes image for v0.7.2 (#2447)

Source code(tar.gz)
Source code(zip)
v0.8.0a3(Sep 5, 2021)
This is the release notes of v0.8.0a3. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implemented hypergeometric functions (#2397, thanks @Alfa-Shashank!)

Implements mars.tensor.append (#2417)

DataFrame

Implements Series.between (#2368, thanks @gowrijsuria!)

Integrate Mars DataFrame with Ray MLDataset (#2294, thanks @vcfgv!)

Support DataFrame.transpose() (#2327, thanks @hoarjour!)

Learn

Add mars.learn.ensemble.{BlockwiseVotingClassifier, BlockwiseVotingRegressor} (#2390)

Implements linear_model.LinearRegression (#2260, thanks @Fernadoo!)

Add TensorFlow dataset (#2383, thanks @yuanchongtt!)

Implements mars.learn.preprocessing.{LabelBinarizer,label_binarize} (#2415)

Implements mars.learn.metrics.log_loss (#2418)

Implements mars.learn.wrappers.ParallelPostFit (#2425)

Services

Initially support auto scaling (#2210)

Web

API for subtask DAG structure (#2389, thanks @RandomY-2!)

Bug fixes

Fix raising wrong error for an operand when post_execute implemented and error occurs in execute (#2395)

[Ray] Fix occasionally failed unittest test_ownership_when_scale_in (#2401)

[Oscar] Fix possible ActorCaller.call hang (#2404, thanks @Catch-Bull!)

Documentation

Highlight dask-on-mars in doc (#2399)

Tests

Improve case stability (#2381)

Upgrade vineyard to v0.2.7 (#2193)

Add checks for file mode changes and absolute imports (#2398)

[Ray] Fix ray version (#2406)

Change all tests to use relative import (#2407)

Source code(tar.gz)
Source code(zip)
v0.7.2(Sep 5, 2021)
This is the release notes of v0.7.2. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Implements hypergeometric functions (#2408, thanks @Alfa-Shashank!)

Implements mars.tensor.append (#2422)

DataFrame

Implements Series.between (#2382, thanks @gowrijsuria!)

Implements DataFrame.transpose() (#2423, thanks @hoarjour!)

Learn

Add mars.learn.ensemble.{BlockwiseVotingClassifier, BlockwiseVotingRegressor} (#2391)

Add TensorFlow dataset (#2409, thanks @yuanchongtt!)

Implements linear_model.LinearRegression (#2411, thanks @Fernadoo!)

Implements mars.learn.preprocessing.{LabelBinarizer, label_binarize} (#2421)

Implements mars.learn.metrics.log_loss (#2424)

Implements mars.learn.wrappers.ParallelPostFit (#2427)

Web

API for subtask DAG structure (#2410, thanks @RandomY-2!)

Bug fixes

Fix raising wrong error for an operand when post_execute implemented and error occurs in execute (#2396)

Tests

Improve case stability (#2387)

Change all tests to use relative import (#2412)

Source code(tar.gz)
Source code(zip)
v0.8.0a2(Aug 23, 2021)
This is the release notes of v0.8.0a2. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame

Support initializing Mars objects from CUDA (#2308)

Support md.to_numeric (#2290, thanks @hoarjour!)

Gives an error when input DataFrame has unknown dtypes (#2355)

Added assign to DataFrame (#2362, thanks @hxri!)

Support reading csv file from oss (#2292, thanks @zebivy!)

Tensor

Implements mars.tensor.stats.ks_2samp (#2324)

Implements mars.tensor.stats.ks_1samp (#2335)

Learn

Support PyTorch Dataset for oscar (#2246, thanks @yuanchongtt!)

Add KFold support (#2363)

Services

Add API to retrieve progress and status of tileables (#2357)

Web

Add visualization page for tileable graphs (#2282, thanks @RandomY-2!)

Add storage infos in web (#2317)

Display tileable progress, status and dependency link type on task detail page (#2360, thanks @RandomY-2!)

Others

[Ray] Rerun subtask for ray backend (#2288, thanks @keyile!)

Add experimental Dask-on-Mars support (#2289, thanks @loopyme!)

Enhancements

Support setting multiple columns in DataFrame (#2303)

Refactor tileable visualization classes (#2318)

Create service classes to manage service and session operations (#2326)

Improve wait_actor_pool_recovered (#2328, thanks @keyile!)

Remove bokeh from package requirements (#2339)

Optimize mars supervisor scheduling (#2325)

Bug fixes

Fix hangs when worker main pool has failures. (#2286)

Fix the error when multiple subtasks fetch the same data (#2322)

[Ray] Fix ray ci (#2343, thanks @keyile!)

Fix error in Dask-on-Mars when compute multiple objects (#2348, thanks @loopyme!)

Fix KeyError when remote function returns None (#2371)

Fix DataFrame comparison when data type is period (#2373)

Documentation

Fix untranslated strings in doc (#2346)

Fix docs of DataFrame.assign (#2367)

Source code(tar.gz)
Source code(zip)
v0.7.1(Aug 23, 2021)
This is the release notes of v0.7.1. See here for the complete list of solved issues and merged PRs.

New Features

DataFrame

Support md.to_numeric (#2334, thanks @hoarjour!)

Gives an error when input DataFrame has unknown dtypes (#2359)

Implements DataFrame.assign (#2369, thanks @hxri!)

Support reading csv file from oss (#2374, thanks @zebivy!)

Tensor

Implements mars.tensor.stats.ks_2samp (#2332)

Implements mt.stats.ks_1samp (#2341)

Learn

Support PyTorch Dataset for oscar (#2364, thanks @yuanchongtt!)

Add KFold support (#2365)

Services

Add API to retrieve progress and status of tileables (#2358)

Web

Add visualization page for tileable graphs (#2319, thanks @RandomY-2!)

Add storage infos in web (#2333)

Display tileable progress, status and dependency link type on task detail page (#2377, thanks @RandomY-2!)

Enhancements

Support setting multiple columns in DataFrame (#2313)

Create service classes to manage service and session operations (#2331)

Remove bokeh from package requirements (#2344)

Optimize scheduling service on supervisors (#2347)

Improve wait_actor_pool_recovered (#2350, thanks @keyile!)

Bug fixes

Fix the error when multiple subtasks fetch the same data (#2340)

Fix KeyError when remote function returns None (#2375)

Fix DataFrame comparison when data type is period (#2376)

Documentation

Fix untranslated strings in doc (#2349)

Source code(tar.gz)
Source code(zip)
v0.8.0a1(Aug 6, 2021)
This is the release notes of v0.8.0a1. See here for the complete list of solved issues and merged PRs.

New Features

Tensor

Add partial support of bessel functions (#2258, thanks @JuntaoMa!)

Implements mars.tensor.in1d (#2297)

Learn

Implements mars.learn.utils.multiclass.unique_label (#2295)

Services

Add get_storage_level_info api (#2228)

Basic rerun subtask (#2198)

Add API to fetch tileable graph as JSON (#2253, thanks @RandomY-2!)

Enable running on GPU for oscar (#2284)

Others

Add support for seek method in memory cases (#2250)

Enhancements

Support choosing aggregation algorithm at runtime (#2213)

Add support for stateless actors (#2218)

Add status filters for Cluster service (#2214)

Reassign subtasks and filter nodes with status (#2159, thanks @vcfgv!)

Add methods to sessions to get web endpoint (#2236)

Ensure range index incremental for data source op like md.read_csv (#2232)

Use Kubernetes Service to discover Mars Supervisors (#2227)

Record mapper meta for shuffle task (#2248)

Support data dependency for run_script (#2251)

Refine oscar debugging (#2252)

Support fetch_log for web session (#2257)

Use batch method to reduce transferring cost for shuffle tasks (#2233)

Allow turning off actor killing (#2273)

Assign bands given devices of subtasks (#2276)

Add bind method to facilitate extracting batch args (#2280)

Reduce memory estimation for specific operands (#2283)

Bug fixes

Fix NoDataToSpill when multiple storage quota requests happen simultaneously (#2203)

Pass logging config file name into sub pools (#2222)

Stop using thread local to store default session. (#2217)

Fix possible CI failure when destroying remote object for incremental index (#2239)

Fix service errors in Windows (#2237)

Documentation

Doc refinement for Oscar (#2234)

Add docs for batch methods (#2293)

Installation

Merge default & distributed requirements (#2263)

Tests

Add separate check pipeline (#2299)

Fix delocate version to 0.8.2 to avoid deploy error (#2305)

Source code(tar.gz)
Source code(zip)
v0.7.0(Aug 6, 2021)
This is the release notes of v0.7.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.7.0rc2; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1 alpha2 alpha3 alpha4 alpha5 alpha6 alpha7 alpha8 beta1 beta2 rc1 rc2

Changes that break compatibility

v0.7.0 has unified local and distributed execution layer, local thread-based scheduling has been removed, instead, the unified runtime is based on multiprocess-based scheduling which could get rid of infamous GIL problem .

Thus, for local usage, please new a local default session via:

import mars mars.new_session() # create a default local session

If not doing so, it will be initialized once in the background, however, keep in mind that the initialization of multiprocess scheduling consumes more time compared to multithread one.

We tried our best to keep other compatibilities, if you find any incompatible place, please open an issue to reach out to us.

Highlights

v0.7.0 implements a unified execution layer, all deployment including bare metal, Kubernetes, Ray as well as Yarn shares the same fundamental components. This unified execution layer optimized many aspects compare to the old one including:

Better serialization based on pickle5 protocol, which is 5-7x faster than old version.

Completely rewritten execution layer which has better performance, even 20%-50% faster than the old version on a laptop.

Based on multiprocess scheduling which avoids infamous GIL issue.

Mars on Ray is way more better due to the reason that Ray actor is leveraged to build the Ray backend of Oscar which is a lightweight actor framework that is the fundamental part of the entire execution layer.

GPU can be supported more better with the new architecture.

New Features

Tensor

Add partial support of bessel functions (#2274, thanks @JuntaoMa!)

Implements mars.tensor.in1d (#2301)

Learn

Implements mars.learn.utils.multiclass.unique_label (#2300)

Services

Add get_storage_level_info api (#2242)

Add API to fetch tileable graph as JSON (#2271, thanks @RandomY-2!)

Enable running on GPU for oscar (#2306)

Others

Add support for seek method in memory cases (#2264)

Enhancements

Add support for stateless actors (#2220)

Add status filters for Cluster service (#2221)

Pass logging config file name into sub pools (#2225)

Support choosing aggregation algorithm at runtime (#2226)

Add method to session to get web endpoint (#2238)

Use Kubernetes Service to discover Mars Supervisors (#2240)

Ensure range index incremental for data source op like md.read_csv (#2244)

Record mapper meta for shuffle task (#2255)

Support data dependency for run_script (#2256)

Refine oscar debugging (#2261)

Support fetch_log for web session (#2262)

Allow turning off actor killing (#2277)

Use batch method to reduce transferring cost for shuffle tasks (#2279)

Assign bands given devices of subtasks (#2278)

Add bind method to facilitate extracting batch args (#2281)

Reduce memory estimation for specific operands (#2285)

Bug fixes

Fix NoDataToSpill when multiple storage quota requests happen simultaneously (#2223)

Stop using thread local to store default session (#2243)

Fix service errors in Windows (#2247)

Documentation

Doc refinement for Oscar (#2291)

Add docs for batch methods (#2298)

Installation

Merge default & distributed requirements (#2270)

Tests

Add separate check pipeline (#2302)

Source code(tar.gz)
Source code(zip)