Ollama에 llama 3.2 한국어 모델 사용해보기. (Feat. Huggingface)

안녕하세요. 달소입니다.

이번에는 앞서 구성한 Ollma에서 llama 3.2 한국어 모델을 적용시키는 방법입니다.
huggingface 에서 받을 수 있는 GGUF를 이용해서 Modelfile을 만들고 Ollama에서 사용할 수 있게 커스텀 모델을 만드는 과정입니다.

어떻게 보면 docker hub에서 이미지를 받고 Dockerifile로 컨테이너 이미지를 만드는 과정과 동일하다고 볼수도 있습니다.

여기서 사용할 이미지는 과기대 랩실에서 만들어준 아래의 gguf를 사용할 예정입니다.

Bllossom/llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science. from HUGGINGFACE.CO

GGUF란?

Hugging Face Hub는 모든 파일 형식을 지원하지만, 모델의 빠른 로딩 및 저장에 최적화된 바이너리 형식인 GGUF 형식 에 대한 기본 제공 기능이 있어 추론 목적에 매우 효율적입니다. GGUF는 GGML 및 기타 실행자와 함께 사용하도록 설계되었습니다. GGUF는 인기 있는 C/C++ LLM 추론 프레임워크인 llama.cpp 의 개발자이기도 한 @ggerganov 가 개발했습니다 . PyTorch와 같은 프레임워크에서 처음 개발된 모델은 해당 엔진에서 사용하기 위해 GGUF 형식으로 변환할 수 있습니다.

뭐 이렇다고 하네요..

자세한 내용은 아래 링크를 참고해주세요.

GGUF We’re on a journey to advance and democratize artificial intelligence through open source and open science. from HUGGINGFACE.CO

바로 본론으로 들어가서 시작해보겠습니다.

GGUF 모델 다운로드받기

Bllossom/llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M at main We’re on a journey to advance and democratize artificial intelligence through open source and open science. from HUGGINGFACE.CO

위 링크로 가서 gguf 를 다운로드 해줍니다.

다운로드가 다되었으면 Modelfile을 만들어줘야하는데요.

예제는 아래와같습니다. 아래 내용을 Modelfile로 만들어 주세요.

FROM ./llama-3.2-Korean-Bllossom-3B-gguf-Q4_K_M.gguf

PARAMETER temperature 0.6
PARAMETER top_p 0.9

TEMPLATE """<|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

{{ if .System }}{{ .System }}
{{- end }}
{{- if .Tools }}When you receive a tool call response, use the output to format an answer to the orginal user question.

You are a helpful assistant with tool calling capabilities.
{{- end }}<|eot_id|>
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ range $.Tools }}
{{- . }}
{{ end }}
{{ .Content }}<|eot_id|>
{{- else }}

{{ .Content }}<|eot_id|>
{{- end }}{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}
{{ range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}
{{- end }}{{ if not $last }}<|eot_id|>{{ end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}"""

SYSTEM """You are a helpful AI assistant. Please answer the user's questions kindly. 당신은 유능한 AI 어시스턴트 입니다. 사용자의 질문에 대해 친절하게 답변해주세요."""

그리고 다운로드 받은 모델과 함께 아래처럼 위치해놓고 터미널에서 열기를 누릅니다.

터미널에서 아래 명령어로 모델을 생성합니다.

ollama create llama3.2-korean -f Modelfile

그리고 이 모델을 가지고 ollama를 실행합니다.

확실히 한국어가 자연스럽네요 ㅎㅎ..

근데 말을 하다보면 점점 산으로가는게,, 복잡한건 확실히 무리일것같습니다.

이제 이걸 linux에 구성하고 내부 워크플로우에 구성만하면... Release봇 대용으로 쓸 수 있지않을까 싶네요.

그건 주말이나 다음달안에는 해보는걸로..