KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Kakao Brain

Last update: Dec 28, 2022

Related tags

Deep Learning nlp transformers generative-model korean deeplearning gpt huggingface gpt3 kogpt kakaobrain

Overview

KoGPT

KoGPT (Korean Generative Pre-trained Transformer)
- https://github.com/kakaobrain/kogpt
- https://huggingface.co/kakaobrain/kogpt

Model Descriptions

KoGPT6B-ryan1.5b

Hyperparameter	Value
$n_{parameters}$	6,166,502,400
$n_{layers}$	28
$d_{model}$	4,096
$d_{ff}$	16,384
$n_{heads}$	16
$d_{head}$	256
$n_{ctx}$	2,048
$n_{vocab}$	64,512
Positional Encoding	Rotary Position Embedding (RoPE)
RoPE Dimensions	64

Hardware requirements

KoGPT6B-ryan1.5b

GPU

The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.

32GB GPU RAM in the required minimum memory size

KoGPT6B-ryan1.5b-float16

GPU

The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.

half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
16GB GPU RAM in the required minimum memory size

Usage

prompt

python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
                       [--device {cpu,cuda}] [-d]

KakaoBrain Korean(hangul) Generative Pre-Training Model

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         huggingface repo (default:kakaobrain/kogpt)
  --revision {KoGPT6B-ryan1.5b}
  --device {cpu,cuda}   (default:cuda)
  -d, --debug

python -m kogpt
prompt> 인간처럼 생각하고, 행동하는 '지능'을 통해 인류가 이제까지 풀지 못했던
temperature(0.8)> 
max_length(128)> 64
인간처럼 생각하고, 행동하는 '지능'을 통해 인류가 이제까지 풀지 못했던 문제의 해답을 찾을 수 있을 것이다. 과학기술이 고도로 발달한 21세기를 살아갈 우리 아이들에게 가장 필요한 것은 사고력 훈련이다. 사고력 훈련을 통해, 세상

prompt>  
...

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM 

tokenizer = AutoTokenizer.from_pretrained(
  'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16',  # or float32 version: revision=KoGPT6B-ryan1.5b
  bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
  'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16',  # or float32 version: revision=KoGPT6B-ryan1.5b
  pad_token_id=tokenizer.eos_token_id,
  torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()

prompt = '인간처럼 생각하고, 행동하는 \'지능\'을 통해 인류가 이제까지 풀지 못했던'
with torch.no_grad():
  tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
  gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
  generated = tokenizer.batch_decode(gen_tokens)[0]
  
print(generated)  # print: 인간처럼 생각하고, 행동하는 '지능'을 통해 인류가 이제까지 풀지 못했던 문제의 해답을 찾을 수 있을 것이다. 과학기술이 고도로 발달한 21세기를 살아갈 우리 아이들에게 가장 필요한 것은 사고력 훈련이다. 사고력 훈련을 통해, 세상

Experiments

In-context Few-Shots

Models	#params	NSMC (Acc.)	YNAT (F1)	KLUE-STS (F1)
HyperCLOVA[1]	1.3B	83.9	58.7	60.9
HyperCLOVA[1]	6.9B	83.8	67.5	59.3
HyperCLOVA[1]	13.0B	87.9	67.9	60.0
HyperCLOVA[1]	39.0B	88.0	71.4	61.6
HyperCLOVA[1]	82.0B	88.2	72.7	65.1
Ours	6.0B	87.8	78.0	64.3

Finetuning / P-Tuning

Models	#params	method	NSMC (Acc.)	KorSTS(spearman)
SKT-AI/KoGPT-2 2.0[2]	125M	`finetuning`	93.3	78.4
SKT-AI/KoGPT-2 Trinity[3]	1.2B	`finetuning`	93.2	83.4
HyperCLOVA[1]	1.3B	`p-tuning`	91.7	-
HyperCLOVA[1]	39.0B	`p-tuning`	93.0	-
Ours	135M	`finetuning`	95.1	83.0
Ours	6.0B	`finetuning`	95.7	85.3

We conducted this experiments using [4], with same hyperparameters.

Limitations

KakaoBrain KoGPT was trained on rayn dataset, a dataset known to contain profanity, lewd, political changed, and other harsh language. Therefore, KoGPT can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how KoGPT will response to particular prompts and offensive content without warning.

Primarily Korean: KoGPT is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts. KoGPT by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.

If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [email protected].

카카오브레인 KoGPT는 욕설, 음란, 정치적 내용 및 기타 거친 언어에 대한 처리를 하지 않은 rayn dataset으로 학습하였습니다. 따라서 KoGPT는 사회적으로 용인되지 않은 텍스트를 생성할 수 있습니다. 다른 언어 모델과 마찬가지로 특정 프롬프트와 공격적인 콘텐츠에 어떠한 결과를 생성할지 사전에 파악하기 어렵습니다.

KoGPT는 주로 한국어 텍스트로 학습을 하였으며 이러한 텍스트를 분류, 검색, 요약 또는 생성하는데 가장 적합합니다. 기본적으로 KoGPT는 학습 데이터에 잘 나타나지 않는 방언뿐만아니라 한국어가 아닌 경우와 같이 학습 데이터에서 발견하기 어려운 입력에서 좋지 않은 성능을 보입니다.

테스트중에 발생한 비정상적인 혹은 사회적으로 용인되지 않는 텍스트가 생성된 경우 [email protected]로 "prompt"와 "생성된 문장"을 함께 보내주시기 바랍니다.

Citation

If you apply this library or model to any project and research, please cite our code:

@misc{kakaobrain2021kogpt,
  title         = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
  author        = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
  year          = {2021},
  howpublished  = {\url{https://github.com/kakaobrain/kogpt}},
}

Contact

This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us.

[email protected]

License

The source code of KakaoBrain KoGPT are licensed under Apache 2.0 License.
The pretrained weights of KakaoBrain KoGPT are licensed under CC-BY-NC-ND 4.0 License License.

카카오브레인 KoGPT의 소스코드(source code)는 Apache 2.0 라이선스 하에 공개되어 있습니다.
카카오브레인 KoGPT의 사전학습된 가중치(pretrained weights)는 CC-BY-NC-ND 4.0 라이선스 라이선스 하에 공개되어 있습니다.
모델 및 코드, 사전학습된 가중치를 사용할 경우 라이선스 내용을 준수해 주십시오. 라이선스 전문은 Apache 2.0, LICENSE.cc-by-nc-nd-4.0 파일에서 확인하실 수 있습니다.

References

[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
[2] SKT-AI/KoGPT-2 2.0: "SKT-AI/KoGPT2: Korean GPT-2 pretrained cased (KoGPT2)." https://github.com/SKT-AI/KoGPT2 (2021).
[3] SKT-AI/KoGPT-2 Trinity: "Ko-GPT-Trinity 1.2B." https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5 (2021).
[4] KoGPT2-subtasks: "KoGPT2 v2.0 한국어 평가 모듈" https://github.com/haven-jeon/KoGPT2-subtasks (2021).

Contribution

Disclaimer

The contribution section is not an official KakaoBrain product.

AK391's Web Demo on Huggingface Spaces

see demo: https://huggingface.co/spaces/akhaliq/kogpt
- Web Demo is integrated to Huggingface Spaces with Gradio.
- Contributors: AK391

Comments

모델의 생성 결과물의 독성(Toxicity)에 관하여
안녕하세요, KoGPT 모델의 놀라운 성능을 확인하고, 두 가지 제안을 드리고 싶은 것이 있어 issue 를 열게 되었습니다.

현재 README.md 에서는 아래와 같이 영문으로 모델의 생성 결과물이 충분히 독성을 띨 수 있음을 명시하고 계신 것으로 보입니다.

KakaoBrain KoGPT was trained on rayn dataset, a dataset known to contain profanity, lewd, political changed, and other harsh language. Therefore, KoGPT can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how KoGPT will response to particular prompts and offensive content without warning.

두 가지 제안을 드리고 싶습니다.

(1) License 에 대해서는 한국어로도 병기해 주신 것처럼, Limitations 에 대해서도 한국어로 병기해 주시면 어떨지 제안드립니다. 여지껏 공개된 모델 중 가장 강력한 성능을 지닌 것으로 보이고, 그만큼 모델 활용 과정에서 마주칠 수 있는 여러 위험한 상황에 대해 한국어 화자들이 좀더 인지할 수 있도록, 한국어로도 명시해 주실 수 있을지요?

(2) 모델에 대한 if kakao 영상에서 모델 활용 과정에서 발생할 수 있는 여러 가지 윤리적 위험성에 대해 말씀해주시는 지점이 있었습니다. 몇 가지 특수한 prompt로 KoGPT를 사용해 보니 (제 주관적으로 느끼기에) 상당히 높은 수위의 offensive content를 생성해내는 것을 보면서 다소 염려가 됩니다. 전례를 알지 못하여 구체적인 제안을 드리기 좀 어렵지만, [email protected] 이외에, 공개하신 모델의 독성(Toxicity)이 특히 잘 드러나는 prompt들에 관한 정보를 사후적으로라도 수집/연구/정제를 하는 창구를 별도로 열어주시는 것이 가능하실지요?
enhancement
opened by combacsa 4
torch_dtype=torch.float16 옵션은 잘 동작하고 있습니까? 모델 전처리를 통하여 메모리 요구량을 낮출 수 있습니다.
안녕하세요

우선 모델의 공개에 대하여 진심으로 감사 드립니다. 세부 내용중에 보면 다음과 같은 내용이 있습니다.

GPU The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.

half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere 32GB GPU RAM in the required minimum memory size

하지만 모델의 파라메터 사이즈와 자료형을 생각할때 물리적인 용량은 대략 12Gb 근처로 생각됩니다. 그리고 실제 공개하여주신 모델 파일의 tensor 타입은 fp32로 저장되어 있는것으로 보입니다.

실제 gpu의 로딩 시점에 볼때 torch_dtype=torch.float16 구성과 관계없이 fp32로 로딩이 이루어지고 있고 그로인하여 많은 gpu 메모리를 할당합니다.

아래와 같은 간단한 스크립트를 통하여 모델의 메모리 요구량을 절반 수준으로 줄일 수 있습니다. 제가 언급한 이슈가 연산에 영향이 있었다면 속도도 대폭 개선될 것 입니다.

import torch from transformers import AutoTokenizer, AutoModelForCausalLM path = "/home/noah/.cache/huggingface/transformers/1386e39caf0b158682709eb063f0231e03f868a0f87846c1eb777a79f161f87d.ce4d05ebacaac5ad33896c20e5373d786588147616bced327805834cb4beaf8f" model = torch.load(path) for i in model: t = model.get(i).dtype if (t == torch.float32): print (t) model[i] = model.get(i).half() torch.save(model, './fp16.pth')
bug enhancement
opened by go-noah 3
여러 개의 작은 GPU로 대형 모델을 분할하여 실행할 수 있도록 model parallelism 적용

모델을 사용해 보려 했는데 원래 코드 실행 시 GPU가 하나밖에 활용되지 않아서 보통 메모리가 16GB 미만인 컨슈머 레벨 GPU로는 구동이 불가능함을 발견했습니다. 이러한 환경을 위해 모델을 여러 GPU에 나눠서 올릴 수 있는 옵션 --model_parallel을 추가한 수정 버전을 만들어 사용했는데, 이를 여기에 공유드립니다. 이 옵션을 주면 10GB 수준의 GPU 4개로 full-precision 모델 구동이 가능합니다. Possibly fix #19.
hacktoberfest-accepted

opened by cosine0 2
텍스트를 분류, 검색, 요약 코드에 대한 간단한 예제 코드 혹시 업로드 가능하실까요!?

안녕하세요!! 딥러닝을 열심히 배우고 있는 학생입니다..!

README에 생성 코드는 너무 상세하게 작성되어 있어서... 너무 잘 구현해 봤습니다.!

README를 쭉 일어 보니 모델이 텍스트 분류 요약 검색에 대해 특화가 되어 있다고 하는데... 예제 코드를 보고 공부를 하고 싶은데 .. 어떻게 해야 할지 감이 안 잡혀서 염치 불구하고 이슈 올려봅니다!!

멋진 모델을 공개해주신 카카오 브래인 분들에게 감사를 표하며!!
documentation example

opened by Aramir94 2

Unable to load weights from pytorch checkpoint file

안녕하세요, 먼저 이렇게 좋은 연구 결과를 오픈해주셔서 감사합니다!

README에 있는 python 코드를 아래와 같이 실행 하다가 에러가 발생하였습니다.

float16 버전은 정상 작동을 확인하였습니다. 혹시 float32 버전은 다른 방식으로 로딩을 해야할까요?

from transformers import AutoTokenizer, AutoModelForCausalLM 

vFLOAT16 = False
model_name = 'kakaobrain/kogpt'
revision = 'KoGPT6B-ryan1.5b' + ('-float16' if vFLOAT16 else '')

tokenizer = AutoTokenizer.from_pretrained(
  model_name, revision=revision,
  bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)

model = AutoModelForCausalLM.from_pretrained(
  model_name, revision=revision,
  pad_token_id=tokenizer.eos_token_id,
  torch_dtype='auto', low_cpu_mem_usage=True,
).to(device='cuda', non_blocking=True)
model.eval()

Traceback (most recent call last):
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 399, in load_state_dict
    return torch.load(checkpoint_file, map_location="cpu")
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/torch/serialization.py", line 705, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/torch/serialization.py", line 242, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 403, in load_state_dict
    if f.read().startswith("version"):
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ihl7029/works/kogpt.py", line 13, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 463, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2182, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/home/ihl7029/anaconda3/envs/test-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 415, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/home/ihl7029/.cache/huggingface/hub/models--kakaobrain--kogpt/snapshots/2ba50fcfde0792e92bec63e039e5f57bf3cd55b4/pytorch_model.bin' at '/home/ihl7029/.cache/huggingface/hub/models--kakaobrain--kogpt/snapshots/2ba50fcfde0792e92bec63e039e5f57bf3cd55b4/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

opened by Holim0711 1

Evaluation issue with downstream evaluation codes

We have been reported to have issues with our downstream evaluation due to issues such as the following link. https://github.com/haven-jeon/KoGPT2-subtasks/pull/1

We investigated the range that affects the problem, and it was confirmed that there was only a problem with the NSMC finetuning accuracy among the following evaluation tables.

| Models | #params | method | NSMC (Acc.) | KorSTS(spearman) | |:--------------------------|--------:|:-------------|------------:|-----------------:| | SKT-AI/KoGPT-2 2.0[2] | 125M | finetuning | 93.3 | 78.4 | | SKT-AI/KoGPT-2 Trinity[3] | 1.2B | finetuning | 93.2 | 83.4 | | HyperCLOVA[1] | 1.3B | p-tuning | 91.7 | - | | HyperCLOVA[1] | 39.0B | p-tuning | 93.0 | - | | Ours | 6.0B | finetuning | 95.7 | 85.3 |

We plan to share the evaluation results that solved the problem as soon as possible.
bug evaluation

opened by wbaek 1
Can I access to 135M Model instead of 6B Model?

In README.md, I saw the result of Ours 135M model. However, in hugging face, I can only see the 6B Model which is too big for me.

Is there 135M model in public?

opened by L0Z1K 1
mac m1에서 cpu 실행시 half() 오류

https://github.com/kakaobrain/pororo/issues/11 처럼

m1에서 cpu로 실행시 추론 단계에서 문장을 입력하면 RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' 오류가 뜨는 것 같습니다.
wontfix

opened by dharana77 1
KeyError: 'gptj' 문제에 대해서

안녕하세요. 예제를 돌려 보려고 했는데 아래에 에러가 발생해서요.

현재 pytorch==1.10.0, transformers==4.2.2 사용하고 있습니다. huggingface transformers 패키지 버전이 낮아서 발생하는 문제일까요?

Traceback (most recent call last): File "inference.py", line 8, in <module> bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]') File "/anaconda3/envs/pytorch/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 352, in from_pretrained config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs) File "/anaconda3/envs/pytorch/lib/python3.7/site-packages/transformers/models/auto/configuration_auto.py", line 363, in from_pretrained config_class = CONFIG_MAPPING[config_dict["model_type"]] KeyError: 'gptj'

opened by jin8 1
135M 모델은 오픈계획이 없으신가요?

안녕하세요. 커뮤니티 보고 찾아서 들어왔습니다. 우선 내부 연구중인 결과물을 많은 사람들이 활용할 수 있게 오픈해 주심에 감사드립니다.

다름이 아니라 RTX A6000 이 있어서 다행히도 32GB 최소 사양에는 만족하나, 실생활 레벨에서 쓰려고 하면 좀더 작은 파라메터의 모델이 효과적일 것 같습니다.

벤치마크 올리신걸 보니 135M 모델이 있고, 6B 모델에 근접한 성능인 것 같은데 135M 모델은 오픈하실 계획이 없으신지 하여 문의드립니다.
question

opened by rippertnt 1
새로운 소설, 새로운 소설을 써야한다. 무명작가로 지금까지 순수소설 몇 편을 써봤지만 이렇다 할 재미를 보지 못했다. 아내와 아이들이 나를 쳐다보고 있다. 몇 개의 알바를 다니고 임시직 직장을 다녀 쥐꼬리만 한 수입봉투를 떨리는 손으로 아내의 눈치를 보며 건네 주어 살림에 보태 보았지만 이렇다할 직장도 얻지 못하고 몇 년째 글쓰기에만 매달려 있다. 이번에도 실패하면 이젠 글쓰기도 영영 끝이다. 장르 소설을 써보자. 요즘은 순수소설보다 장르소설이 대세라고 한다. 더 인기가 많고 수입도 괜찮다고.
새로운 소설, 새로운 소설을 써야한다. 무명작가로 지금까지 순수소설 몇 편을 써봤지만 이렇다 할 재미를 보지 못했다. 아내와 아이들이 나를 쳐다보고 있다. 몇 개의 알바를 다니고 임시직 직장을 다녀 쥐꼬리만 한 수입봉투를 떨리는 손으로 아내의 눈치를 보며 건네 주어 살림에 보태 보았지만 이렇다할 직장도 얻지 못하고 몇 년째 글쓰기에만 매달려 있다. 이번에도 실패하면 이젠 글쓰기도 영영 끝이다. 장르 소설을 써보자. 요즘은 순수소설보다 장르소설이 대세라고 한다. 더 인기가 많고 수입도 괜찮다고.

그런데 어떤 소설을 쓸 것인가? 그게 문제다. 풍부한 에피소드를 만들어낼 수 있는 소재, 독자들에게 관심을 끌 수 있는 이야기, 그게 대체 무엇일까?

Originally posted by @whyleeee in https://github.com/kakaobrain/kogpt/issues/23#issuecomment-1288035889
opened by whyleeee 0
m1에서 model call 결과값에서 null 발생

기존에 있는 예제 코드에서 아래와 같이 일부 수정 해서 동작 했을때 null 값이 나오는 에러가 발생 합니다.

import torch from transformers import AutoTokenizer, AutoModelForCausalLM

device = torch.device("mps")

tokenizer = AutoTokenizer.from_pretrained( "kakaobrain/kogpt", revision="KoGPT6B-ryan1.5b-float16", bos_token="[BOS]", eos_token="[EOS]", unk_token="[UNK]", pad_token="[PAD]", mask_token="[MASK]", ) model = AutoModelForCausalLM.from_pretrained( "kakaobrain/kogpt", revision="KoGPT6B-ryan1.5b", pad_token_id=tokenizer.eos_token_id, torch_dtype="auto", low_cpu_mem_usage=True, ).to(device=device, non_blocking=True) _ = model.eval()

prompt = "인간처럼 생각하고, 행동하는 '지능'을 통해 인류가 이제까지 풀지 못했던" with torch.no_grad(): # tokens = tokenizer.encode(prompt, return_tensors="pt").to(device=device, non_blocking=True) inputs = tokenizer(prompt, return_tensors="pt") print(inputs) inputs = {k: v.to(device=device, non_blocking=True) for k, v in inputs.items()} # gen_tokens = model.generate(**inputs, do_sample=True, temperature=0.8, max_length=32, top_k=8) gen_tokens = model(**inputs) print(gen_tokens) generated = tokenizer.batch_decode(gen_tokens)

print(generated)

print: 인간처럼 생각하고, 행동하는 '지능'을 통해 인류가 이제까지 풀지 못했던 문제의 해답을 찾을 수 있을 것이다. 과학기술이 고도로 발달한 21세기를 살아갈 우리 아이들에게 가장 필요한 것은 사고력 훈련이다. 사고력 훈련을 통해, 세상

결과는 아래와 같습니다.

'input_ids': tensor([[ 6577, 1290, 1032, 12519, 118, 2243, 385, 378, 882, 6261, 113, 387, 1132, 5321, 402, 2092, 841, 2182, 404, 993, 551, 726]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

CausalLMOutputWithPast(loss=None, logits=tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], device='mps:0'), past_key_values=((tensor([[[[nan, nan, nan, ..., nan, nan, nan],

tokenizer까지는 잘 되는데 model에서 결과가 이상하게 나오네요

해당 사항 문의 드립니다.

opened by nuri428 0

Releases(KoGPT6B-ryan1.5b)

KoGPT6B-ryan1.5b(Nov 12, 2021)

release: KoGPT6B-ryan1.5b
Source code(tar.gz)
Source code(zip)

Owner

Kakao Brain

Kakao Brain Corp.

GitHub

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

1.1k Dec 24, 2022

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

42 Jan 7, 2023

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

210 Dec 18, 2022

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

1.4k Jan 5, 2023

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

Related tags

Overview

KoGPT

Model Descriptions

KoGPT6B-ryan1.5b

Hardware requirements

KoGPT6B-ryan1.5b

GPU

KoGPT6B-ryan1.5b-float16

GPU

Usage

prompt

python

Experiments

In-context Few-Shots

Finetuning / P-Tuning

Limitations

Citation

Contact

License

References

Contribution

Disclaimer

AK391's Web Demo on Huggingface Spaces

Comments

Releases(KoGPT6B-ryan1.5b)

KoGPT6B-ryan1.5b(Nov 12, 2021)

Owner

Kakao Brain

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

Pre-Trained Image Processing Transformer (IPT)

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Annotate datasets with a semi-trained or fully trained YOLOv5 model

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Pre-trained NFNets with 99% of the accuracy of the official paper

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Chinese clinical named entity recognition using pre-trained BERT model

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".