😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Hugging Face

Last update: Dec 24, 2022

Related tags

Overview

------ Update September 2018 ------

It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such that we can make improvements and design better models in the future.

You can help us achieve this by answering this 4-question Google Form. Thanks for your support!

😇 TorchMoji

Read our blog post about the implementation process here.

TorchMoji is a pyTorch implementation of the DeepMoji model developped by Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan and Sune Lehmann.

This model trained on 1.2 billion tweets with emojis to understand how language is used to express emotions. Through transfer learning the model can obtain state-of-the-art performance on many emotion-related text modeling tasks.

Try the online demo of DeepMoji http://deepmoji.mit.edu! See the paper, blog post or FAQ for more details.

Overview

torchmoji/ contains all the underlying code needed to convert a dataset to the vocabulary and use the model.
examples/ contains short code snippets showing how to convert a dataset to the vocabulary, load up the model and run it on that dataset.
scripts/ contains code for processing and analysing datasets to reproduce results in the paper.
model/ contains the pretrained model and vocabulary.
data/ contains raw and processed datasets that we include in this repository for testing.
tests/ contains unit tests for the codebase.

To start out with, have a look inside the examples/ directory. See score_texts_emojis.py for how to use DeepMoji to extract emoji predictions, encode_texts.py for how to convert text into 2304-dimensional emotional feature vectors or finetune_youtube_last.py for how to use the model for transfer learning on a new dataset.

Please consider citing the paper of DeepMoji if you use the model or code (see below for citation).

Installation

We assume that you're using Python 2.7-3.5 with pip installed.

First you need to install pyTorch (version 0.2+), currently by:

conda install pytorch -c pytorch

At the present stage the model can't make efficient use of CUDA. See details in the Hugging Face blog post.

When pyTorch is installed, run the following in the root directory to install the remaining dependencies:

pip install -e .

This will install the following dependencies:

Then, run the download script to downloads the pretrained torchMoji weights (~85MB) from here and put them in the model/ directory:

python scripts/download_weights.py

Testing

To run the tests, install nose. After installing, navigate to the tests/ directory and run:

cd tests
nosetests -v

By default, this will also run finetuning tests. These tests train the model for one epoch and then check the resulting accuracy, which may take several minutes to finish. If you'd prefer to exclude those, run the following instead:

cd tests
nosetests -v -a '!slow'

Disclaimer

This code has been tested to work with Python 2.7 and 3.5 on Ubuntu 16.04 and macOS Sierra machines. It has not been optimized for efficiency, but should be fast enough for most purposes. We do not give any guarantees that there are no bugs - use the code on your own responsibility!

Contributions

We welcome pull requests if you feel like something could be improved. You can also greatly help us by telling us how you felt when writing your most recent tweets. Just click here to contribute.

License

This code and the pretrained model is licensed under the MIT license.

Benchmark datasets

The benchmark datasets are uploaded to this repository for convenience purposes only. They were not released by us and we do not claim any rights on them. Use the datasets at your responsibility and make sure you fulfill the licenses that they were released with. If you use any of the benchmark datasets please consider citing the original authors.

Citation

@inproceedings{felbo2017,
  title={Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm},
  author={Felbo, Bjarke and Mislove, Alan and S{\o}gaard, Anders and Rahwan, Iyad and Lehmann, Sune},
  booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2017}
}

Comments

Finetuning does not work for more than 2 classes.

I am gettingRuntimeError: Expected object of type Variable[torch.LongTensor] but found type Variable[torch.FloatTensor] for argument #1 'target'. May be we need to change CrossEntropyLoss with something else.

opened by pushpankar 5

AssertionError while running text_emojize.py

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-6c7dc2606552> in <module>()
      3 
      4 for i in flatten_list:
----> 5   deepmojify(i, top_n = 5)

1 frames
/content/torchMoji/torchmoji/sentence_tokenizer.py in tokenize_sentences(self, sentences, reset_stats, max_sentences)
    117         # may filter the sentences etc.
    118         if not self.uses_custom_wordgen and not self.ignore_sentences_with_only_custom:
--> 119             assert len(sentences) == next_insert
    120         else:
    121             # adjust based on actual tokens received

AssertionError:

Hi,

The above error keeps coming when I run text_emojize.py file.

I have given a list of around 4700+ sentences for the model to convert it into 5 emojis.

I made changes to this block of code >> st = SentenceTokenizer(vocabulary, 100)

What am I doing wrong? Is it because I gave too many sentences?

opened by vidyap-xgboost 4

How to replace Emoji ID's with actual emojis or their unicodes? Help

Hi. When we run score_texts_emojis.py, we get an output like the below picture saved in a .CSV file.

I want to convert the ID's of Emojis(Emoji_1, Emoji_2, Emoji_3, Emoji_4, Emoji_5) given into either actual emojis or their UNICODES.

Can someone help me with that?

TIA.

opened by vidyap-xgboost 2
ValueError: too many values to unpack (expected 2)

I'm trying to get 2304 fixed length vector embeddings for a set of tweets for my dataset.

I'm using Google Colaboratory(Python 3) and I get this error with this command

!python3 examples/text_emojize.py --text f"This is the shit!"

as well as using this

encoding = model(tokenized)

opened by agoel00 2
Attention weights not initialized

The attention weights are not initialized properly; right now, they take on random values up to 10^38, which makes training a model from scratch virtually impossible.

In contrast, the original Keras implementation uses a uniform random initialization scheme with values from -0.05 to 0.05. Obviously this doesn't affect pretrained models, but it would still be nice to have this fixed.

opened by k15z 2
Add an example: Output emoji visualization from a single text input
A fun experiment of mine. It is basically the same as score_texts_emojis.py, but there are 2 differences:

I mapped emoji's id from emoji_overview.png and output the visualization with emoji package, use use_aliases=True.

I receive a single input and use user's input rather than placeholder example text

Demo:

python examples/text_emojize.py --text "I love mom's cooking\!" # => I love mom's cooking! 😋 😍 💓 💛 ❤
opened by hiepph 2
python3.6 no cuda
I tried running examples/encode_texts.py, it yielded the error at the end of the issue.

Story: I installed torch from the original torch downloads, and it appears it expects CUDA even though I assume it will just run with CPU (especially given the notificiation about GPU in the README).

Strangely, looking here: http://pytorch.org/

Whenever I select "pip", "3.6" and "none" (gpu), it still contains cu75 in the url which might indicate it needs CUDA?

Trace

pascal@archbook:~/gits/torchMoji$ python examples/encode_texts.py Tokenizing using dictionary from /home/pascal/gits/torchMoji/model/vocabulary.json Loading model from /home/pascal/gits/torchMoji/model/pytorch_model.bin. Loading weights for embed.weight Loading weights for lstm_0.weight_ih_l0 Loading weights for lstm_0.weight_hh_l0 Loading weights for lstm_0.bias_ih_l0 Loading weights for lstm_0.bias_hh_l0 Loading weights for lstm_0.weight_ih_l0_reverse Loading weights for lstm_0.weight_hh_l0_reverse Loading weights for lstm_0.bias_ih_l0_reverse Loading weights for lstm_0.bias_hh_l0_reverse Loading weights for lstm_1.weight_ih_l0 Loading weights for lstm_1.weight_hh_l0 Loading weights for lstm_1.bias_ih_l0 Loading weights for lstm_1.bias_hh_l0 Loading weights for lstm_1.weight_ih_l0_reverse Loading weights for lstm_1.weight_hh_l0_reverse Loading weights for lstm_1.bias_ih_l0_reverse Loading weights for lstm_1.bias_hh_l0_reverse Loading weights for attention_layer.attention_vector Ignoring weights for output_layer.0.weight Ignoring weights for output_layer.0.bias TorchMoji ( (embed): Embedding(50000, 256) (embed_dropout): Dropout2d (p=0) (lstm_0): LSTMHardSigmoid(256, 512, batch_first=True, bidirectional=True) (lstm_1): LSTMHardSigmoid(1024, 512, batch_first=True, bidirectional=True) (attention_layer): Attention(2304, return attention=False) ) Encoding texts.. Traceback (most recent call last): File "examples/encode_texts.py", line 34, in encoding = model(tokenized) File "/home/pascal/.pyenv/versions/3.6.2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call result = self.forward(*input, **kwargs) File "/home/pascal/gits/torchMoji/torchmoji/model_def.py", line 233, in forward x, att_weights = self.attention_layer(input_seqs, input_lengths) File "/home/pascal/.pyenv/versions/3.6.2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call result = self.forward(*input, **kwargs) File "/home/pascal/gits/torchMoji/torchmoji/attlayer.py", line 56, in forward mask = Variable((idxes < input_lengths.unsqueeze(1)).float()) File "/home/pascal/.pyenv/versions/3.6.2/lib/python3.6/site-packages/torch/tensor.py", line 351, in lt return self.lt(other) TypeError: lt received an invalid combination of arguments - got (torch.LongTensor), but expected one of:

(int value) didn't match because some of the arguments have invalid types: (torch.LongTensor)

(torch.cuda.LongTensor other) didn't match because some of the arguments have invalid types: (torch.LongTensor)
opened by kootenpv 2
Bump numpy from 1.13.1 to 1.21.0
Bumps numpy from 1.13.1 to 1.21.0.

Release notes

Sourced from numpy's releases.

v1.21.0

NumPy 1.21.0 Release Notes

The NumPy 1.21.0 release highlights are

continued SIMD work covering more functions and platforms,

initial work on the new dtype infrastructure and casting,

universal2 wheels for Python 3.8 and Python 3.9 on Mac,

improved documentation,

improved annotations,

new PCG64DXSM bitgenerator for random numbers.

In addition there are the usual large number of bug fixes and other improvements.

The Python versions supported for this release are 3.7-3.9. Official support for Python 3.10 will be added when it is released.

:warning: Warning: there are unresolved problems compiling NumPy 1.21.0 with gcc-11.1 .

Optimization level -O3 results in many wrong warnings when running the tests.

On some hardware NumPy will hang in an infinite loop.

New functions

Add PCG64DXSM BitGenerator

Uses of the PCG64 BitGenerator in a massively-parallel context have been shown to have statistical weaknesses that were not apparent at the first release in numpy 1.17. Most users will never observe this weakness and are safe to continue to use PCG64. We have introduced a new PCG64DXSM BitGenerator that will eventually become the new default BitGenerator implementation used by default_rng in future releases. PCG64DXSM solves the statistical weakness while preserving the performance and the features of PCG64.

See upgrading-pcg64 for more details.

(gh-18906)

Expired deprecations

The shape argument numpy.unravel_index cannot be passed as dims keyword argument anymore. (Was deprecated in NumPy 1.16.)

... (truncated)

Commits

b235f9e Merge pull request #19283 from charris/prepare-1.21.0-release

34aebc2 MAINT: Update 1.21.0-notes.rst

493b64b MAINT: Update 1.21.0-changelog.rst

07d7e72 MAINT: Remove accidentally created directory.

032fca5 Merge pull request #19280 from charris/backport-19277

7d25b81 BUG: Fix refcount leak in ResultType

fa5754e BUG: Add missing DECREF in new path

61127bb Merge pull request #19268 from charris/backport-19264

143d45f Merge pull request #19269 from charris/backport-19228

d80e473 BUG: Removed typing for == and != in dtypes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
pytorch_model.bin module missing

I am trying to replicate this paper as per my project proposal. Could you please assist with this?

While running

python examples/text_emojize.py --text "I love mom's cooking!"

I got C:\...\torchMoji/model/pytorch_model.bin' No such file or directory: issue.

opened by akchaudhary57 1
Compatibility with PyTorch 0.4.1

On newest PyTorch PackedSequence takes in *args so the current constructor calls don't work anymore (although it shouldn't really be constructed from user land). This patch fixes that.

Also see the relevant https://github.com/pytorch/pytorch/pull/9864 that adds support for this on master. However, that patch isn't making into the new release we (the PyTorch team) are preparing so this change will be needed for the upcoming released PyTorch.

opened by ssnl 1

Cannot run score_texts_emojis.py

I get a few errors when I run score_text_emojis.py I'm using pytorch-cpu=0.4.0=py35_cpu_1 and master of torchMoji (b3dd91958ba2a752e0fdee2fd1839ae2bbea549a)

It seems the errors are caused by changes to PyTorch.

e.g.

$ python score_texts_emojis.py

Tokenizing using dictionary from /home/dev/Downloads/torchMoji/model/vocabulary.json
Loading model from /home/dev/Downloads/torchMoji/model/pytorch_model.bin.
/home/dev/Downloads/torchMoji/torchmoji/model_def.py:159: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
  nn.init.uniform(self.embed.weight.data, a=-0.5, b=0.5)
/home/dev/Downloads/torchMoji/torchmoji/model_def.py:161: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(t)
/home/dev/Downloads/torchMoji/torchmoji/model_def.py:163: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_.
  nn.init.orthogonal(t)
/home/dev/Downloads/torchMoji/torchmoji/model_def.py:165: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
  nn.init.constant(t, 0)
/home/dev/Downloads/torchMoji/torchmoji/model_def.py:167: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(self.output_layer[0].weight.data)
TorchMoji(
  (embed): Embedding(50000, 256)
  (embed_dropout): Dropout2d(p=0)
  (lstm_0): LSTMHardSigmoid(256, 512, batch_first=True, bidirectional=True)
  (lstm_1): LSTMHardSigmoid(1024, 512, batch_first=True, bidirectional=True)
  (attention_layer): Attention(2304, return attention=False)
  (final_dropout): Dropout(p=0)
  (output_layer): Sequential(
    (0): Linear(in_features=2304, out_features=64, bias=True)
    (1): Softmax()
  )
)
Running predictions.
Traceback (most recent call last):
  File "score_texts_emojis.py", line 48, in <module>
    prob = model(tokenized)
  File "/home/dev/miniconda3/envs/hub/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dev/Downloads/torchMoji/torchmoji/model_def.py", line 218, in forward
    packed_input = PackedSequence(data=x, batch_sizes=packed_input.batch_sizes)
TypeError: __new__() got an unexpected keyword argument 'data'

I have got the example running after the following modifications:

Need to remove argument names from calls to PackedSequence.

https://github.com/huggingface/torchMoji/blob/b3dd91958ba2a752e0fdee2fd1839ae2bbea549a/torchmoji/model_def.py#L218

Change to packed_input = PackedSequence(x, packed_input.batch_sizes)

https://github.com/huggingface/torchMoji/blob/b3dd91958ba2a752e0fdee2fd1839ae2bbea549a/torchmoji/model_def.py#L226-L229

Change to:

        packed_input = PackedSequence(torch.cat((lstm_1_output.data,
                                                 lstm_0_output.data,
                                                 packed_input.data), dim=1),
                                      packed_input.batch_sizes)

Cannot reverse batch_sizes.

https://github.com/huggingface/torchMoji/blob/b3dd91958ba2a752e0fdee2fd1839ae2bbea549a/torchmoji/lstm.py#L266

Change to for batch_size in reversed(list(batch_sizes)):

opened by DavidRayner 1

Bump numpy from 1.13.1 to 1.22.0
Bumps numpy from 1.13.1 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

No module named 'torchmoji' on PySpark

Hello,

I'm trying to make a deployable version of torchmoji.. I'm still very new to Pyspark and I'm doing this project on Databricks.

My code:

import pyspark.sql.functions as F
from pyspark.sql.types import *

def deepmojify(sentence,top_n=1):
  tokenized, _, _ = st.tokenize_sentences([sentence])
  prob = model(tokenized)[0]
  emoji_ids = top_elements(prob, top_n)
  emojis = map(lambda x: EMOJIS[x], emoji_ids)

  # returning the emojis as a list named as list_emojis
  return emoji.emojize(f"{' '.join(emojis)}", use_aliases=True)

udf_deepmojify = udf(deepmojify, StringType())
test_udf_deepmojify = df.withColumn("emojis", udf_deepmojify("review_by_customer"))

display(test_udf_deepmojify)

The error I keep getting is Py4JJavaError and ModuleNotFoundError: No module named 'torchmoji'.

What do I do?

Full error:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-4055905865243119> in <module>
----> 1 final_df_test.show()

/databricks/spark/python/pyspark/sql/dataframe.py in show(self, n, truncate, vertical)
    382         """
    383         if isinstance(truncate, bool) and truncate:
--> 384             print(self._jdf.showString(n, 20, vertical))
    385         else:
    386             print(self._jdf.showString(n, int(truncate), vertical))

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling o12923.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 4 times, most recent failure: Lost task 0.3 in stage 77.0 (TID 201, 10.139.64.5, executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 464, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
  File "/databricks/spark/python/pyspark/worker.py", line 316, in read_udfs
    arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf)
  File "/databricks/spark/python/pyspark/worker.py", line 170, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
  File "/databricks/spark/python/pyspark/worker.py", line 73, in read_command
    command = serializer.loads(command.value)
  File "/databricks/spark/python/pyspark/serializers.py", line 695, in loads
    return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'torchmoji'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:540)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
	at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
	at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
	at org.apache.spark.scheduler.Task.run(Task.scala:113)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2362)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2350)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2349)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2349)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1102)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1102)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2582)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2529)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2517)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:897)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2280)
	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:270)
	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:280)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:80)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:86)
	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:508)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:57)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectResult(Dataset.scala:2905)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3517)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2634)
	at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3501)
	at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3496)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1$$anonfun$apply$1.apply(SQLExecution.scala:112)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:232)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:98)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:835)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:74)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:184)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3496)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2634)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2848)
	at org.apache.spark.sql.Dataset.getRows(Dataset.scala:279)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:316)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:251)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 464, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
  File "/databricks/spark/python/pyspark/worker.py", line 316, in read_udfs
    arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf)
  File "/databricks/spark/python/pyspark/worker.py", line 170, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
  File "/databricks/spark/python/pyspark/worker.py", line 73, in read_command
    command = serializer.loads(command.value)
  File "/databricks/spark/python/pyspark/serializers.py", line 695, in loads
    return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'torchmoji'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:540)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:494)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
	at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:159)
	at org.apache.spark.sql.execution.collect.Collector$$anonfun$2.apply(Collector.scala:158)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140)
	at org.apache.spark.scheduler.Task.run(Task.scala:113)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more

opened by vidyap-xgboost 2

Providing a working enviroment for the current version

(This should be somehow a pull request, but I have no idea how to do that, I am not a github-expert user) In order to work with the provided torchMoji follow these instructions:

Use Ubuntu (version 16.04 in my case) and install Miniconda or Anaconda
Create the file torchmoji_env.yml containing:

name: torchmoji_env
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - alabaster=0.7.12=py36_0
  - astroid=2.4.2=py36_0
  - attrs=19.3.0=py_0
  - babel=2.8.0=py_0
  - backcall=0.2.0=py_0
  - blas=1.0=mkl
  - bleach=3.1.5=py_0
  - brotlipy=0.7.0=py36h7b6447c_1000
  - ca-certificates=2020.6.24=0
  - certifi=2020.6.20=py36_0
  - cffi=1.14.0=py36h2e261b9_0
  - chardet=3.0.4=py36_1003
  - cloudpickle=1.4.1=py_0
  - cryptography=2.9.2=py36h1ba5d50_0
  - cudatoolkit=10.0.130=0
  - dbus=1.13.16=hb2f20db_0
  - decorator=4.4.2=py_0
  - defusedxml=0.6.0=py_0
  - docutils=0.16=py36_1
  - entrypoints=0.3=py36_0
  - expat=2.2.9=he6710b0_2
  - fontconfig=2.13.0=h9420a91_0
  - freetype=2.10.2=h5ab3b9f_0
  - glib=2.63.1=h5a9c865_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=hb453b48_1
  - icu=58.2=he6710b0_3
  - idna=2.10=py_0
  - imagesize=1.2.0=py_0
  - importlib-metadata=1.7.0=py36_0
  - importlib_metadata=1.7.0=0
  - intel-openmp=2019.4=243
  - ipykernel=5.3.0=py36h5ca1d4c_0
  - ipython=7.16.1=py36h5ca1d4c_0
  - ipython_genutils=0.2.0=py36_0
  - isort=4.3.21=py36_0
  - jedi=0.17.1=py36_0
  - jeepney=0.4.3=py_0
  - jinja2=2.11.2=py_0
  - jpeg=9b=h024ee3a_2
  - jsonschema=3.2.0=py36_0
  - jupyter_client=6.1.3=py_0
  - jupyter_core=4.6.3=py36_0
  - keyring=21.2.1=py36_0
  - lazy-object-proxy=1.4.3=py36h7b6447c_0
  - libedit=3.1.20191231=h7b6447c_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libpng=1.6.37=hbc83047_0
  - libsodium=1.0.18=h7b6447c_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libuuid=1.0.3=h1bed415_2
  - libxcb=1.14=h7b6447c_0
  - libxml2=2.9.10=he19cac6_1
  - markupsafe=1.1.1=py36h7b6447c_0
  - mccabe=0.6.1=py36_1
  - mistune=0.8.4=py36h7b6447c_0
  - mkl=2018.0.3=1
  - nbconvert=5.6.1=py36_0
  - nbformat=5.0.7=py_0
  - ncurses=6.2=he6710b0_1
  - ninja=1.9.0=py36hfd86e86_0
  - numpydoc=1.0.0=py_0
  - openssl=1.1.1g=h7b6447c_0
  - packaging=20.4=py_0
  - pandoc=2.9.2.1=0
  - pandocfilters=1.4.2=py36_1
  - parso=0.7.0=py_0
  - pcre=8.44=he6710b0_0
  - pexpect=4.8.0=py36_0
  - pickleshare=0.7.5=py36_0
  - pip=20.1.1=py36_1
  - prompt-toolkit=3.0.5=py_0
  - psutil=5.7.0=py36h7b6447c_0
  - ptyprocess=0.6.0=py36_0
  - pycodestyle=2.6.0=py_0
  - pycparser=2.20=py_0
  - pyflakes=2.2.0=py_0
  - pygments=2.6.1=py_0
  - pylint=2.5.3=py36_0
  - pyopenssl=19.1.0=py36_0
  - pyparsing=2.4.7=py_0
  - pyqt=5.9.2=py36h05f1152_2
  - pyrsistent=0.16.0=py36h7b6447c_0
  - pysocks=1.7.1=py36_0
  - python=3.6.7=h0371630_0
  - python-dateutil=2.8.1=py_0
  - pytorch=1.0.1=py3.6_cuda10.0.130_cudnn7.4.2_2
  - pytz=2020.1=py_0
  - pyzmq=19.0.1=py36he6710b0_1
  - qt=5.9.7=h5867ecd_1
  - qtawesome=0.7.2=py_0
  - qtconsole=4.7.5=py_0
  - qtpy=1.9.0=py_0
  - readline=7.0=h7b6447c_5
  - requests=2.24.0=py_0
  - rope=0.17.0=py_0
  - secretstorage=3.1.2=py36_0
  - setuptools=47.3.1=py36_0
  - sip=4.19.8=py36hf484d3e_0
  - six=1.15.0=py_0
  - snowballstemmer=2.0.0=py_0
  - sphinx=3.1.1=py_0
  - sphinxcontrib-applehelp=1.0.2=py_0
  - sphinxcontrib-devhelp=1.0.2=py_0
  - sphinxcontrib-htmlhelp=1.0.3=py_0
  - sphinxcontrib-jsmath=1.0.1=py_0
  - sphinxcontrib-qthelp=1.0.3=py_0
  - sphinxcontrib-serializinghtml=1.1.4=py_0
  - spyder=3.3.6=py36_0
  - spyder-kernels=0.5.2=py36_0
  - sqlite=3.32.3=h62c20be_0
  - testpath=0.4.4=py_0
  - tk=8.6.10=hbc83047_0
  - toml=0.10.1=py_0
  - tornado=6.0.4=py36h7b6447c_1
  - traitlets=4.3.3=py36_0
  - typed-ast=1.4.1=py36h7b6447c_0
  - urllib3=1.25.9=py_0
  - wcwidth=0.2.5=py_0
  - webencodings=0.5.1=py36_1
  - wheel=0.34.2=py36_0
  - wrapt=1.11.2=py36h7b6447c_0
  - wurlitzer=2.0.0=py36_0
  - xz=5.2.5=h7b6447c_0
  - zeromq=4.3.2=he6710b0_2
  - zipp=3.1.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - emoji==0.4.5
    - jupyter-client==6.1.5
    - nose==1.3.7
    - numpy==1.13.1
    - pandas==0.20.3
    - scikit-learn==0.19.0
    - scipy==0.19.1
    - text-unidecode==1.0
prefix: /home/user/Miniconda3/ana/envs/torchmoji_env

Then, install and activate it as:

conda env create -f torchmoji_env.yml
conda activate torchmoji_env

Now in order to run it, you have to fix some bugs in the code (maybe un-compatibility issues), look at: https://github.com/huggingface/torchMoji/issues/20 https://github.com/huggingface/torchMoji/issues/21#issuecomment-512642707 and when you get a bug in something like .numpy() [0] replace it with .numpy()

opened by TasosR83 0

GPU not used ?

I try to re-train the model but I don't think that the GPU is ever used ...

The part of the code I use is:

model = torchmoji_transfer(nb_classes, weight_path=pretrained_path)
model, acc = finetune(model, split_texts, split_labels, nb_classes, batch_size, nb_epochs = 1000, method =  'chain-thaw')

and GPU doesn't show to increase its usage

opened by TasosR83 1

ValueError: too many values to unpack (expected 2) error in example script

Hi,

I have cloned the repository and downloaded weights with given script. Then I have tried to run examples/text_emojize.py with --text "Today is great'" --maxlen 30, but it gave ValueError with following traceback:

➜  torchMoji git:(master) python3 examples/text_emojize.py --text "Today is great!" --maxlen 30
Traceback (most recent call last):
  File "examples/text_emojize.py", line 55, in <module>
    prob = model(tokenized)[0]
  File "/Users/zafer/.pyenv/versions/3.6.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/repos/torchMoji/torchmoji/model_def.py", line 222, in forward
    lstm_0_output, _ = self.lstm_0(packed_input, hidden)
  File "/Users/zafer/.pyenv/versions/3.6.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/repos/torchMoji/torchmoji/lstm.py", line 78, in forward
    input, batch_sizes = input
ValueError: too many values to unpack (expected 2)

My torch version is 1.2.0 and Python version is 3.6.7

opened by zafercavdar 3

tweet training dataset

As mentioned in DeepMoji GitHub repo https://github.com/bfelbo/DeepMoji, the large Twitter dataset of tweets with emojis is not released.

I wonder if there is still a chance to get the original training dataset, even a permission is required. If I understand correctly, torchMoji is also trained on the same dataset, right? Could you share how you get the training dataset? In the original paper, I saw the authors wrote

The authors would like to thank Janys Analytics for generously allowing us to use their dataset ofhuman-rated tweets

Should I contact Janys Analytics in order to get the training dataset?

opened by ydshieh 0

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Related tags

Overview

------ Update September 2018 ------

😇 TorchMoji

Overview

Installation

Testing

Disclaimer

Contributions

License

Benchmark datasets

Citation

Comments

v1.21.0

NumPy 1.21.0 Release Notes

New functions

Add PCG64DXSM BitGenerator

Expired deprecations

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Hugging Face

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

State of the Art Neural Networks for Deep Learning

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

State-of-the-art data augmentation search algorithms in PyTorch

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Implementation of the state-of-the-art vision transformers with tensorflow

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio