[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Overview

build

TensorFrames (Deprecated)

Note: TensorFrames is deprecated. You can use pandas UDF instead.

Experimental TensorFlow binding for Scala and Apache Spark.

TensorFrames (TensorFlow on Spark DataFrames) lets you manipulate Apache Spark's DataFrames with TensorFlow programs.

This package is experimental and is provided as a technical preview only. While the interfaces are all implemented and working, there are still some areas of low performance.

Supported platforms:

This package only officially supports linux 64bit platforms as a target. Contributions are welcome for other platforms.

See the file project/Dependencies.scala for adding your own platform.

Officially TensorFrames supports Spark 2.4+ and Scala 2.11.

See the user guide for extensive information about the API.

For questions, see the TensorFrames mailing list.

TensorFrames is available as a Spark package.

Requirements

  • A working version of Apache Spark (2.4 or greater)

  • Java 8+

  • (Optional) python 2.7+/3.6+ if you want to use the python interface.

  • (Optional) the python TensorFlow package if you want to use the python interface. See the official instructions on how to get the latest release of TensorFlow.

  • (Optional) pandas >= 0.19.1 if you want to use the python interface

Additionally, for developement, you need the following dependencies:

  • protoc 3.x

  • nose >= 1.3

How to run in python

Assuming that SPARK_HOME is set, you can use PySpark like any other Spark package.

$SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.6.0-s_2.11

Here is a small program that uses TensorFlow to add 3 to an existing column.

import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row

data = [Row(x=float(x)) for x in range(10)]
df = sqlContext.createDataFrame(data)
with tf.Graph().as_default() as g:
    # The TensorFlow placeholder that corresponds to column 'x'.
    # The shape of the placeholder is automatically inferred from the DataFrame.
    x = tfs.block(df, "x")
    # The output that adds 3 to x
    z = tf.add(x, 3, name='z')
    # The resulting dataframe
    df2 = tfs.map_blocks(z, df)

# The transform is lazy as for most DataFrame operations. This will trigger it:
df2.collect()

# Notice that z is an extra column next to x

# [Row(z=3.0, x=0.0),
#  Row(z=4.0, x=1.0),
#  Row(z=5.0, x=2.0),
#  Row(z=6.0, x=3.0),
#  Row(z=7.0, x=4.0),
#  Row(z=8.0, x=5.0),
#  Row(z=9.0, x=6.0),
#  Row(z=10.0, x=7.0),
#  Row(z=11.0, x=8.0),
#  Row(z=12.0, x=9.0)]

The second example shows the block-wise reducing operations: we compute the sum of a field containing vectors of integers, working with blocks of rows for more efficient processing.

# Build a DataFrame of vectors
data = [Row(y=[float(y), float(-y)]) for y in range(10)]
df = sqlContext.createDataFrame(data)
# Because the dataframe contains vectors, we need to analyze it first to find the
# dimensions of the vectors.
df2 = tfs.analyze(df)

# The information gathered by TF can be printed to check the content:
tfs.print_schema(df2)
# root
#  |-- y: array (nullable = false) double[?,2]

# Let's use the analyzed dataframe to compute the sum and the elementwise minimum 
# of all the vectors:
# First, let's make a copy of the 'y' column. This will be very cheap in Spark 2.0+
df3 = df2.select(df2.y, df2.y.alias("z"))
with tf.Graph().as_default() as g:
    # The placeholders. Note the special name that end with '_input':
    y_input = tfs.block(df3, 'y', tf_name="y_input")
    z_input = tfs.block(df3, 'z', tf_name="z_input")
    y = tf.reduce_sum(y_input, [0], name='y')
    z = tf.reduce_min(z_input, [0], name='z')
    # The resulting dataframe
    (data_sum, data_min) = tfs.reduce_blocks([y, z], df3)

# The final results are numpy arrays:
print(data_sum)
# [45., -45.]
print(data_min)
# [0., -9.]

Notes

Note the scoping of the graphs above. This is important because TensorFrames finds which DataFrame column to feed to TensorFrames based on the placeholders of the graph. Also, it is good practice to keep small graphs when sending them to Spark.

For small tensors (scalars and vectors), TensorFrames usually infers the shapes of the tensors without requiring a preliminary analysis. If it cannot do it, an error message will indicate that you need to run the DataFrame through tfs.analyze() first.

Look at the python documentation of the TensorFrames package to see what methods are available.

How to run in Scala

The scala support is a bit more limited than python. In scala, operations can be loaded from an existing graph defined in the ProtocolBuffers format, or using a simple scala DSL. The Scala DSL only features a subset of TensorFlow transforms. It is very easy to extend though, so other transforms will be added without much effort in the future.

You simply use the published package:

$SPARK_HOME/bin/spark-shell --packages databricks:tensorframes:0.6.0-s_2.11

Here is the same program as before:

import org.tensorframes.{dsl => tf}
import org.tensorframes.dsl.Implicits._

val df = spark.createDataFrame(Seq(1.0->1.1, 2.0->2.2)).toDF("a", "b")

// As in Python, scoping is recommended to prevent name collisions.
val df2 = tf.withGraph {
    val a = df.block("a")
    // Unlike python, the scala syntax is more flexible:
    val out = a + 3.0 named "out"
    // The 'mapBlocks' method is added using implicits to dataframes.
    df.mapBlocks(out).select("a", "out")
}

// The transform is all lazy at this point, let's execute it with collect:
df2.collect()
// res0: Array[org.apache.spark.sql.Row] = Array([1.0,4.0], [2.0,5.0])   

How to compile and install for developers

It is recommended you use Conda Environment to guarantee that the build environment can be reproduced. Once you have installed Conda, you can set the environment from the root of project:

conda create -q -n tensorframes-environment python=$PYTHON_VERSION

This will create an environment for your project. We recommend using Python version 3.7 or 2.7.13. After the environemnt is created, you can activate it and install all dependencies as follows:

conda activate tensorframes-environment
pip install --user -r python/requirements.txt

You also need to compile the scala code. The recommended procedure is to use the assembly:

build/sbt tfs_testing/assembly
# Builds the spark package:
build/sbt distribution/spDist

Assuming that SPARK_HOME is set and that you are in the root directory of the project:

$SPARK_HOME/bin/spark-shell --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

If you want to run the python version:

PYTHONPATH=$PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar \
$SPARK_HOME/bin/pyspark --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

Acknowledgements

Before TensorFlow released its Java API, this project was built on the great javacpp project, that implements the low-level bindings between TensorFlow and the Java virtual machine.

Many thanks to Google for the release of TensorFlow.

Comments
  •  java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    I build the jar by follow the readme, and then run it in pycharm https://www.dropbox.com/s/qmrs72l0p8p4bc2/Screen%20Shot%202016-07-06%20at%2011.40.26%20PM.png?dl=0 I add the self build jar as content root, I guess that's cause the error,

    line 11 is x = tfs.block(df, "x")

    code:

    import tensorflow as tf
    import tensorframes as tfs
    from pyspark.shell import sqlContext
    from pyspark.sql import Row
    
    data = [Row(x=float(x)) for x in range(10)]
    df = sqlContext.createDataFrame(data)
    
    with tf.Graph().as_default() as g:
        # The TensorFlow placeholder that corresponds to column 'x'.
        # The shape of the placeholder is automatically inferred from the DataFrame.
        x = tfs.block(df, "x")
        # The output that adds 3 to x
        z = tf.add(x, 3, name='z')
        # The resulting dataframe
        df2 = tfs.map_blocks(z, df)
    
    # The transform is lazy as for most DataFrame operations. This will trigger it:
    df2.collect()
    

    log

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    16/07/06 23:28:43 INFO SparkContext: Running Spark version 1.6.1
    16/07/06 23:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/07/06 23:28:44 INFO SecurityManager: Changing view acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: Changing modify acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(julian_qian); users with modify permissions: Set(julian_qian)
    16/07/06 23:28:44 INFO Utils: Successfully started service 'sparkDriver' on port 60597.
    16/07/06 23:28:45 INFO Slf4jLogger: Slf4jLogger started
    16/07/06 23:28:45 INFO Remoting: Starting remoting
    16/07/06 23:28:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60598]
    16/07/06 23:28:45 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 60598.
    16/07/06 23:28:45 INFO SparkEnv: Registering MapOutputTracker
    16/07/06 23:28:45 INFO SparkEnv: Registering BlockManagerMaster
    16/07/06 23:28:45 INFO DiskBlockManager: Created local directory at /private/var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/blockmgr-5174cef3-29d9-4d2a-a84e-279a0e3d2f83
    16/07/06 23:28:45 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
    16/07/06 23:28:45 INFO SparkEnv: Registering OutputCommitCoordinator
    16/07/06 23:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
    16/07/06 23:28:45 INFO Utils: Successfully started service 'SparkUI' on port 4041.
    16/07/06 23:28:45 INFO SparkUI: Started SparkUI at http://10.63.21.172:4041
    16/07/06 23:28:45 INFO Executor: Starting executor ID driver on host localhost
    16/07/06 23:28:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60599.
    16/07/06 23:28:45 INFO NettyBlockTransferService: Server created on 60599
    16/07/06 23:28:45 INFO BlockManagerMaster: Trying to register BlockManager
    16/07/06 23:28:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:60599 with 511.1 MB RAM, BlockManagerId(driver, localhost, 60599)
    16/07/06 23:28:45 INFO BlockManagerMaster: Registered BlockManager
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
          /_/
    
    Using Python version 2.7.10 (default, Dec  1 2015 20:00:13)
    SparkContext available as sc, HiveContext available as sqlContext.
    16/07/06 23:28:46 INFO HiveContext: Initializing execution hive, version 1.2.1
    16/07/06 23:28:46 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:46 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:46 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:46 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:46 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:46 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:46 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:47 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:48 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:49 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:49 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:49 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:49 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62_resources
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62/_tmp_space.db
    16/07/06 23:28:49 INFO HiveContext: default warehouse location is /user/hive/warehouse
    16/07/06 23:28:49 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    16/07/06 23:28:49 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:49 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:50 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:50 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:51 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:52 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:52 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:52 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:52 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:52 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:52 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:53 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:53 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/77eb618d-61cc-470e-abb4-18d356833efb_resources
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb/_tmp_space.db
    

    error log:

    Traceback (most recent call last):
      File "/Users/julian_qian/PycharmProjects/tensorflow/tfs.py", line 11, in <module>
        x = tfs.block(df, "x")
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 315, in block
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 333, in _auto_placeholder
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 30, in _java_api
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o32.loadClass.
    : java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    
    opened by jq 13
  • Updated to tensorflow 1.6 and spark 2.3.

    Updated to tensorflow 1.6 and spark 2.3.

    Current version is not compatible with graphs generated by tf1.6 and it's preventing us from releasing dl-pipelines with tf1.6 support.

    • updated protobuf files and regenerated their java sources.
    • few minor changes related to Tensor taking a type parameter in tf1.6.
    opened by tomasatdatabricks 8
  • tensorframes is not working with variables.

    tensorframes is not working with variables.

    data = [Row(x=float(x)) for x in range(5)]
    df = sqlContext.createDataFrame(data)
    with tf.Graph().as_default() as g:
        # The placeholder that corresponds to column 'x'
        x = tf.placeholder(tf.double, shape=[None], name="x")
        # The output that adds 3 to x
        b = tf.Variable(float(3), name='a', dtype=tf.double)
        z = tf.add(x, b, name='z')
        #with or without `sess.run(tf.global_variables_initializer())`  following will fail
        
        df2 = tfs.map_blocks(z, df)
    
    df2.show()
    
    opened by yupbank 7
  • Does not work with Python3

    Does not work with Python3

    I just started using this with Python3, these are my commands run and the output messages.

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    Python 3.4.3 (default, Mar 26 2015, 22:03:40) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-1.5.2/assembly/target/scala-2.10/spark-assembly-1.5.2-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml databricks#tensorframes added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found databricks#tensorframes;0.2.3-s_2.10 in spark-packages found org.apache.commons#commons-lang3;3.4 in central :: resolution report :: resolve 98ms :: artifacts dl 4ms :: modules in use: databricks#tensorframes;0.2.3-s_2.10 from spark-packages in [default] org.apache.commons#commons-lang3;3.4 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 2 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/3ms) Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / .__/,// //_\ version 1.5.2 //

    Using Python version 3.4.3 (default, Mar 26 2015 22:03:40) SparkContext available as sc, SQLContext available as sqlContext.

    import tensorflow as tf import tensorframes as tfs

    Traceback (most recent call last): File "", line 1, in File "/tmp/spark-349c9955-ccd8-4fcd-938a-7e719fc45653/userFiles-bb935142-224f-4238-a144-f1cece7a5aa2/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/init.py", line 36, in ImportError: No module named 'core'

    opened by ushnish 6
  • Scala example does not work

    Scala example does not work

    I'm having trouble running the provided Scala example in the spark shell.

    My local environment is:

    • Spark 2.1.0
    • Scala version 2.11.8
    • Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121

    I ran the spark-shell with: spark-shell --packages databricks:tensorframes:0.2.5-rc2-s_2.11

    I get the following stacktrace which shuts down my spark process:

    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007fff90451b52, pid=64869, tid=0x0000000000001c03
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 compressed oops)
    # Problematic frame:
    # C  [libsystem_c.dylib+0x1b52]  strlen+0x12
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /Users/ndrizard/projects/temps/hs_err_pid64869.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    

    Thanks for your help!

    opened by nicodri 5
  • Py4JError(

    Py4JError("Answer from Java side is empty") while testing

    I have been experimenting with TensorFrames from quite some days. I have spark-1.6.1 and openjdk7 installed on my ubuntu 14.04 64bit machine. I am using IPython notebook for testing.

    import tensorframes as tfs command is working perfectly fine, but when i do tfs.print_schema(df), where df is a dataframe. The below error pops recursively till max. depth is reached.

    ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File "/home/prakhar/utilities/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command raise Py4JError("Answer from Java side is empty") Py4JError: Answer from Java side is empty

    opened by prakhar21 4
  • [ML-7986] Update tensorflow to 1.14.0

    [ML-7986] Update tensorflow to 1.14.0

    • Update tensorflow version to 1.14.0 in environment.yml, project/Dependencies.scala, and python/requirements.txt
    • Auto update *.proto with the script. All of this type update comes from tensorflow.
    opened by lu-wang-dl 3
  • Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    • Drop support for Spark 2.1 and 2.2 and hence scala 2.10 and java 7
    • Update TF to 1.10 release
    • Remove nix files, which are not used
    • Update README

    We will support Spark 2.4 once RC is released.

    opened by mengxr 3
  • Usage of tf.contrib.distributions.percentile fails

    Usage of tf.contrib.distributions.percentile fails

    Consider the following dummy example using tf.contrib.distributions.percentile:

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    import tensorflow as tf
    import tensorframes as tfs
    from pyspark import SQLContext
    from pyspark.sql import Row
    from pyspark.sql.functions import *
    
    conf = SparkConf().setAppName("repro")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    
    data = [Row(x=[1.111, 0.516, 12.759]), Row(x=[2.222, 1.516, 13.759]), Row(x=[3.333, 2.516, 14.759]), Row(x=[4.444, 3.516, 15.759])]
    df = tfs.analyze(sqlContext.createDataFrame(data))
    
    with tf.Graph().as_default() as g:
    	x = tfs.block(df, "x")
    	q = tf.constant(90, 'float64', name='Percentile')
    	qntl = tf.contrib.distributions.percentile(x, q, axis=1)
    	result = tfs.map_blocks(x, df)
    	
    

    This fails with

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2752, in _as_graph_element_locked
        return op.outputs[out_n]
    IndexError: list index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 5, in <module>
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 312, in map_blocks
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 152, in _map
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 83, in _add_shapes
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2880, in get_tensor_by_name
        return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2708, in as_graph_element
        return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2757, in _as_graph_element_locked
        % (repr(name), repr(op_name), len(op.outputs)))
    KeyError: "The name 'percentile/assert_integer/statically_determined_was_integer:0' refers to a Tensor which does not exist. The operation, 'percentile/assert_integer/statically_determined_was_integer', exists but only has 0 outputs."
    
    opened by martinstuder 3
  • Readme Example throwing Py4J error

    Readme Example throwing Py4J error

    I am using Spark 2.0.2, Python 2.7.12, iPython 5.1.0 on macOS 10.12.1.

    I am launching pyspark like this

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    From the demo, this block

    with tf.Graph().as_default() as g:
        x = tfs.block(df, "x")
        z = tf.add(x, 3, name='z')
        df2 = tfs.map_blocks(z, df)
    

    crashes with the following traceback:

    ---------------------------------------------------------------------------
    Py4JJavaError                             Traceback (most recent call last)
    <ipython-input-3-e7ae284146c3> in <module>()
          4     # The TensorFlow placeholder that corresponds to column 'x'.
          5     # The shape of the placeholder is automatically inferred from the DataFrame.
    ----> 6     x = tfs.block(df, "x")
          7     # The output that adds 3 to x
          8     z = tf.add(x, 3, name='z')
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in block(df, col_name, tf_name)
        313     :return: a TensorFlow placeholder.
        314     """
    --> 315     return _auto_placeholder(df, col_name, tf_name, block = True)
        316
        317 def row(df, col_name, tf_name = None):
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _auto_placeholder(df, col_name, tf_name, block)
        331
        332 def _auto_placeholder(df, col_name, tf_name, block):
    --> 333     info = _java_api().extra_schema_info(df._jdf)
        334     col_shape = [x.shape() for x in info if x.fieldName() == col_name]
        335     if len(col_shape) == 0:
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _java_api()
         28     # You cannot simply call the creation of the the class on the _jvm due to classloader issues
         29     # with Py4J.
    ---> 30     return _jvm.Thread.currentThread().getContextClassLoader().loadClass(javaClassName) \
         31         .newInstance()
         32
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in __call__(self, *args)
       1131         answer = self.gateway_client.send_command(command)
       1132         return_value = get_return_value(
    -> 1133             answer, self.gateway_client, self.target_id, self.name)
       1134
       1135         for temp_arg in temp_args:
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
         61     def deco(*a, **kw):
         62         try:
    ---> 63             return f(*a, **kw)
         64         except py4j.protocol.Py4JJavaError as e:
         65             s = e.java_exception.toString()
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
        317                 raise Py4JJavaError(
        318                     "An error occurred while calling {0}{1}{2}.\n".
    --> 319                     format(target_id, ".", name), value)
        320             else:
        321                 raise Py4JError(
    
    Py4JJavaError: An error occurred while calling o47.loadClass.
    : java.lang.NoClassDefFoundError: org/apache/spark/Logging
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:280)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:214)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	... 22 more
    
    opened by damienstanton 3
  • Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    The subject says it all.

    WARNING: doit works (since it disabled tests in assembly), but I could not get sbt test working. It fails with the following error which is more about TensorFlow that I know nothing about:

    ➜  tensorframes git:(spark-200-and-other-upgrades) sbt
    [info] Loading global plugins from /Users/jacek/.sbt/0.13/plugins
    [info] Loading project definition from /Users/jacek/dev/oss/tensorframes/project
    [info] Set current project to tensorframes (in build file:/Users/jacek/dev/oss/tensorframes/)
    > testOnly org.tensorframes.dsl.BasicOpsSuite
    16/08/04 23:52:22 DEBUG Paths$: Request for x -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for y -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for z -> 0
    
    import tensorflow as tf
    
    x = tf.constant(1, name='x')
    y = tf.constant(2, name='y')
    z = tf.add(x, y, name='z')
    
    g = tf.get_default_graph().as_graph_def()
    for n in g.node:
        print ">>>>>", str(n.name), "<<<<<<"
        print n
    
    [info] BasicOpsSuite:
    [info] - Add *** FAILED ***
    [info]   1 did not equal 0 (1,===========
    [info]   
    [info]   import tensorflow as tf
    [info]   
    [info]   x = tf.constant(1, name='x')
    [info]   y = tf.constant(2, name='y')
    [info]   z = tf.add(x, y, name='z')
    [info]         
    [info]   g = tf.get_default_graph().as_graph_def()
    [info]   for n in g.node:
    [info]       print ">>>>>", str(n.name), "<<<<<<"
    [info]       print n
    [info]          
    [info]   ===========) (ExtractNodes.scala:40)
    [info] Run completed in 1 second, 772 milliseconds.
    [info] Total number of tests run: 1
    [info] Suites: completed 1, aborted 0
    [info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
    [info] *** 1 TEST FAILED ***
    [error] Failed tests:
    [error]         org.tensorframes.dsl.BasicOpsSuite
    [error] (test:testOnly) sbt.TestsFailedException: Tests unsuccessful
    [error] Total time: 2 s, completed Aug 4, 2016 11:52:22 PM
    

    I'm proposing the PR hoping the issue is a minor one that could easily be fixed with enough guidance.

    opened by jaceklaskowski 3
  • Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bumps tensorflow from 1.15.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Support with deep learning model plugging

    Support with deep learning model plugging

    Can you guys help to plug this https://github.com/hongzimao/decima-sim deep learning model into tensorforms? Is it possible to do, any help will be highly appreciated.

    opened by jahidhasanlinix 0
  • Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Hi, I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.

    I also referred to Databricks documentation (https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/train-model/tensorflow) and did install gpu enabled tensorflow mentioned as:

    %pip install https://databricks-prod-cloudfront.cloud.databricks.com/artifacts/tensorflow/runtime-7.x/tensorflow-1.15.3-cp37-cp37m-linux_x86_64.whl

    But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?

    I would really appreciate the help.

    opened by samvygupta 0
  • Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Hi I want to use DeepImageFeaturizer combined with spark ML Logistic regression in Spark (2.4.5) / scala 2.11.12 but it's not working. I'm trying to resolve it for many days.

    I have this issue : java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;

    It seems a library is missing but i think I've already referenced all the needed ones :

    delta-core_2.11-0.6.0.jar
    libtensorflow-1.15.0.jar
    libtensorflow_jni-1.15.0.jar
    libtensorflow_jni_gpu-1.15.0.jar
    proto-1.15.0.jar
    scala-logging-api_2.11-2.1.2.jar
    scala-logging-slf4j_2.11-2.1.2.jar
    scala-logging_2.11-3.9.2.jar
    spark-deep-learning-1.5.0-spark2.4-s_2.11.jar
    spark-sql-kafka-0-10_2.11-2.4.5.jar
    spark-tensorflow-connector_2.11-1.6.0.jar
    tensorflow-1.15.0.jar
    tensorflow-hadoop-1.15.0.jar
    tensorframes-0.8.2-s_2.11.jar
    

    Full trace :

    20/05/15 21:17:28 DEBUG impl.TensorFlowOps$: Outputs: Set(InceptionV3_sparkdl_output__)
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
    	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
    Caused by: java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;
    	at org.tensorframes.impl.TensorFlowOps$.graphSerial(TensorFlowOps.scala:69)
    	at org.tensorframes.impl.TensorFlowOps$.analyzeGraphTF(TensorFlowOps.scala:114)
    	at org.tensorframes.impl.DebugRowOps.mapRows(DebugRowOps.scala:408)
    	at com.databricks.sparkdl.DeepImageFeaturizer.transform(DeepImageFeaturizer.scala:135)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:161)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    	at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
    	at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
    	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
    

    Can someone of the team can tell me what is going wrong ? thanks for your support

    opened by eleite77 0
  • Could not initialize class org.tensorframes.impl.SupportedOperations

    Could not initialize class org.tensorframes.impl.SupportedOperations

    Py4JJavaError: An error occurred while calling o162.analyze. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 10.244.31.75, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

    Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.tensorframes.ExtraOperations$.deepAnalyzeDataFrame(ExperimentalOperations.scala:113) at org.tensorframes.ExperimentalOperations$class.analyze(ExperimentalOperations.scala:41) at org.tensorframes.impl.DebugRowOps.analyze(DebugRowOps.scala:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

    opened by lee2015new 2
Releases(v0.6.0)
  • v0.6.0(Nov 16, 2018)

  • v0.5.0(Aug 21, 2018)

  • v0.4.0(Jun 18, 2018)

  • v0.2.9(Sep 13, 2017)

    This is the final release for 0.2.9.

    Notable changes since 0.2.8:

    • Upgrades tensorflow dependency from version 1.1.0 to 1.3.0
    • map_blocks, map_row APIs now accept Pandas DataFrames as input
    • Adds support for tensorflow variables. Note that these variables cannot be shared between the worker nodes.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Apr 25, 2017)

    This is the final release for 0.2.8.

    Notable changes since 0.2.5:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for multiple hardware platforms (CPU, GPU) and operating systems (linux, macos). Windows should also work but it has not been tested.
    • support for Spark 2.1.x and Spark 2.2.x
    • some usability and performance fixes, which should give a better experience for users
    • more flexible input names for mapRows.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8-rc0(Apr 24, 2017)

    This is the first release candidate for 0.2.8.

    Notable changes:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for Spark 2.1.x
    • the same release should support both CPU and GPU clusters
    • some usability and performance fixes, which should give a better experience for users
    Source code(tar.gz)
    Source code(zip)
Owner
Databricks
Helping data teams solve the world’s toughest problems using data and AI
Databricks
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

null 2.5k Dec 28, 2022
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

null 4.1k Jan 9, 2023
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Jan 7, 2023
Uber Open Source 1.6k Dec 31, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Dec 29, 2022
Spark development environment for k8s

Local Spark Dev Env with Docker Development environment for k8s. Using the spark-operator image to ensure it will be the same environment. Start conta

Otacilio Filho 18 Jan 4, 2022
Code base of KU AIRS: SPARK Autonomous Vehicle Team

KU AIRS: SPARK Autonomous Vehicle Project Check this link for the blog post describing this project and the video of SPARK in simulation and on parkou

Mehmet Enes Erciyes 1 Nov 23, 2021
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

The Apache Software Foundation 121 Dec 28, 2022
TensorFlow implementation of an arbitrary order Factorization Machine

This is a TensorFlow implementation of an arbitrary order (>=2) Factorization Machine based on paper Factorization Machines with libFM. It supports: d

Mikhail Trofimov 785 Dec 21, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

null 1.3k Dec 26, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

null 538 Jan 1, 2023
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

Cray Labs 139 Jan 1, 2023
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 4, 2023
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

null 2.5k Dec 28, 2022
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 4, 2023
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Nguyễn Quang Huy 5 Sep 28, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 4, 2023
A pure Python implementation of Apache Spark's RDD and DStream interfaces.

pysparkling Pysparkling provides a faster, more responsive way to develop programs for PySpark. It enables code intended for Spark applications to exe

Sven Kreiss 254 Dec 6, 2022