LangGraph vs CrewAI vs AutoGen: 2026년 멀티 에이전트 AI 프레임워크 총정리

2025년엔 AI 에이전트 하나 만들어서 뿌듯했죠. 2026년엔 에이전트 여러 개를 한꺼번에 굴려야 하는 시대가 됐습니다.

혼자서 뭐든 다 하려다 터져버리는 만능 에이전트 대신, 이제는 역할별로 전문 에이전트를 나눠서 팀처럼 협업시키는 게 대세예요. 근데 문제가 하나 있어요. 프레임워크가 3개나 있거든요—LangGraph, CrewAI, AutoGen. 셋 다 철학이 완전 달라서, 잘못 고르면 나중에 갈아엎어야 할 수도 있습니다.

이 글에서 각 프레임워크가 뭐가 다른지, 언제 뭘 써야 하는지 확실하게 정리해 드릴게요. 실제 코드도 함께 보면서요.

왜 에이전트 하나로는 안 될까요?

프레임워크 얘기 전에, 먼저 왜 멀티 에이전트가 필요한지부터 짚고 넘어갈게요.

만능 에이전트의 한계

고객 문의를 처리하는 AI를 만든다고 해볼게요. 이 에이전트가 해야 할 일이 뭐냐면:

고객이 뭘 원하는지 파악하고
관련 정보를 DB에서 찾아오고
고객 계정 정보도 확인하고
적절한 답변을 만들고
필요하면 상담원한테 넘기기

이걸 하나의 에이전트가 다 하려면요:

# 이른바 "갓 에이전트" 안티패턴
class CustomerServiceAgent:
    def handle_request(self, message: str) -> str:
        intent = self.classify_intent(message)
        context = self.search_knowledge_base(intent)
        account_info = self.get_account_info()
        response = self.generate_response(context, account_info)
        
        if self.should_escalate(response):
            return self.escalate_to_human()
        
        return response

이게 왜 문제냐면요:

토큰 금방 터짐: 이것저것 다 프롬프트에 때려넣다 보면 컨텍스트 윈도우 순삭
LLM이 헷갈림: 분류하다가 응답 만들다가 정신없음
느림: 동시에 할 수 있는 것도 순차로 돌려야 함
디버깅 지옥: 뭔가 오류 나면 2000줄짜리 프롬프트 뒤지기

멀티 에이전트로 해결

역할별로 에이전트를 쪼개면 이렇게 됩니다:

┌─────────────────────────────────────────────────────────────┐
│                    오케스트레이터                             │
│               (요청 받아서 적절한 에이전트한테 넘김)            │
└─────────────────┬──────────────────────────────┬───────────┘
                  │                              │
    ┌─────────────▼─────────────┐  ┌─────────────▼─────────────┐
    │   분류 에이전트             │  │   검색 에이전트             │
    │   (의도 파악 전문)          │  │   (RAG 전문)               │
    └─────────────┬─────────────┘  └─────────────┬─────────────┘
                  │                              │
    ┌─────────────▼─────────────┐  ┌─────────────▼─────────────┐
    │   계정 에이전트             │  │   응답 에이전트             │
    │   (CRM 조회 전문)          │  │   (답변 작성 전문)          │
    └───────────────────────────┘  └───────────────────────────┘

이렇게 하면요:

각 에이전트가 자기 할 일만 하니까 프롬프트가 간결해짐
독립적인 에이전트는 동시에 돌릴 수 있음
하나가 뻗어도 전체가 죽진 않음
테스트도 에이전트별로 따로따로

자, 이제 각 프레임워크가 이걸 어떻게 구현하는지 볼까요?

LangGraph: 그래프로 흐름을 직접 짠다

LangGraph는 LangChain 팀이 만들었어요. 핵심 컨셉은 에이전트 시스템을 그래프로 표현하는 거예요. 노드가 함수고, 엣지가 흐름이에요. 모든 게 눈에 보이니까 "지금 뭐 하고 있지?" 싶을 때 바로 파악 가능.

핵심 아이디어

노드 = 함수 (에이전트든 도구든 그냥 로직이든)
엣지 = 다음에 뭐 할지
상태 = 노드끼리 주고받는 데이터

코드로 보면 바로 이해될 거예요:

from typing import Annotated, TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI


# 에이전트들이 공유할 상태 정의
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    current_intent: str
    knowledge_context: str
    account_info: dict
    should_escalate: bool


# 각 에이전트를 함수로 정의
def classify_intent(state: AgentState) -> AgentState:
    """의도 분류 에이전트"""
    llm = ChatOpenAI(model="gpt-4o")
    
    response = llm.invoke([
        {"role": "system", "content": "사용자 의도를 billing, technical, general, complaint 중 하나로 분류해줘"},
        {"role": "user", "content": state["messages"][-1].content}
    ])
    
    return {"current_intent": response.content.strip().lower()}


def retrieve_knowledge(state: AgentState) -> AgentState:
    """지식 검색 에이전트"""
    intent = state["current_intent"]
    
    # 실서비스에선 벡터DB 쿼리
    knowledge_map = {
        "billing": "결제 정책: 30일 내 환불 가능...",
        "technical": "기술 문제: 먼저 재시작 해보세요...",
        "general": "회사 소개: 저희는 SaaS 플랫폼...",
        "complaint": "불만 처리: 진지하게 검토하겠습니다..."
    }
    
    return {"knowledge_context": knowledge_map.get(intent, "")}


def lookup_account(state: AgentState) -> AgentState:
    """계정 조회 에이전트"""
    # 실서비스에선 CRM API 호출
    return {
        "account_info": {
            "tier": "premium",
            "tenure_months": 24,
            "open_tickets": 2
        }
    }


def generate_response(state: AgentState) -> AgentState:
    """응답 생성 에이전트"""
    llm = ChatOpenAI(model="gpt-4o")
    
    prompt = f"""다음 정보를 바탕으로 친절하게 답변해줘:
    
의도: {state['current_intent']}
참고 정보: {state['knowledge_context']}
고객 등급: {state['account_info']}
고객 메시지: {state['messages'][-1].content}"""
    
    response = llm.invoke([{"role": "user", "content": prompt}])
    return {"messages": [response]}


def check_escalation(state: AgentState) -> AgentState:
    """상담원 연결 필요한지 체크"""
    should_escalate = (
        state["current_intent"] == "complaint" and 
        state["account_info"].get("tier") == "premium"
    )
    return {"should_escalate": should_escalate}


# 조건 분기 함수
def route_after_check(state: AgentState) -> str:
    if state["should_escalate"]:
        return "escalate"
    return "respond"


def escalate_to_human(state: AgentState) -> AgentState:
    return {
        "messages": [
            {"role": "assistant", "content": "전문 상담사에게 연결해 드릴게요."}
        ]
    }


# 그래프 조립
def build_graph():
    workflow = StateGraph(AgentState)
    
    # 노드 추가
    workflow.add_node("classify", classify_intent)
    workflow.add_node("retrieve", retrieve_knowledge)
    workflow.add_node("lookup", lookup_account)
    workflow.add_node("check_escalation", check_escalation)
    workflow.add_node("respond", generate_response)
    workflow.add_node("escalate", escalate_to_human)
    
    # 흐름 정의
    workflow.add_edge(START, "classify")
    workflow.add_edge("classify", "retrieve")
    workflow.add_edge("retrieve", "lookup")
    workflow.add_edge("lookup", "check_escalation")
    
    # 조건 분기
    workflow.add_conditional_edges(
        "check_escalation",
        route_after_check,
        {"respond": "respond", "escalate": "escalate"}
    )
    
    workflow.add_edge("respond", END)
    workflow.add_edge("escalate", END)
    
    return workflow.compile()


# 실행
graph = build_graph()
result = graph.invoke({
    "messages": [{"role": "user", "content": "청구서가 이상해요!"}],
    "current_intent": "",
    "knowledge_context": "",
    "account_info": {},
    "should_escalate": False
})

LangGraph가 좋은 이유

1. 시각화가 됨

from IPython.display import Image, display
display(Image(graph.get_graph().draw_mermaid_png()))

그래프를 그림으로 뽑아볼 수 있어요. 복잡한 플로우도 한눈에 파악 가능.

2. 중간에 멈췄다 이어갈 수 있음

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = build_graph().compile(checkpointer=memory)

config = {"configurable": {"thread_id": "user-123"}}
result = graph.invoke({"messages": [...]}, config)

# 나중에 같은 대화 이어가기
result = graph.invoke({"messages": [new_message]}, config)

3. 사람 승인 넣기도 쉬움

from langgraph.types import interrupt

def human_approval_node(state: AgentState) -> AgentState:
    if state["requires_approval"]:
        approval = interrupt("50만원 이상 환불이라 매니저 승인 필요")
        return {"approved": approval}
    return state

LangGraph 쓰면 좋은 경우

✅ 추천:

흐름을 직접 컨트롤하고 싶을 때
로그 추적이나 감사가 중요할 때
복잡한 분기가 많을 때
세션 유지가 필요할 때
이미 LangChain 쓰고 있을 때

❌ 비추:

빠르게 프로토타입 뽑아야 할 때 (배울 게 좀 있음)
단순한 플로우인데 굳이?

CrewAI: 팀처럼 역할 나눠서 일하게

CrewAI는 발상이 다릅니다. 그래프 같은 거 없어요. 대신 팀원에게 역할 주고 일 시키는 느낌이에요. 마치 실제 팀 운영하듯이요.

핵심 아이디어

Agent = 팀원 (역할, 목표, 성격까지 설정 가능)
Task = 할 일
Crew = 팀

코드 보시면 바로 느낌 오실 거예요:

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool


# 팀원 정의
classifier_agent = Agent(
    role="고객 의도 분석가",
    goal="고객이 뭘 원하는지 정확히 파악해서 적절히 분류하기",
    backstory="""수년간 고객 상담 경험이 있어서 
    청구 문제인지, 기술 문제인지, 불만인지 
    금방 알아채는 전문가야.""",
    verbose=True,
    allow_delegation=False
)

researcher_agent = Agent(
    role="정보 검색 전문가",
    goal="고객 문제 해결에 필요한 정보 찾아오기",
    backstory="""회사 정책과 프로세스를 꿰뚫고 있어서 
    어떤 문의든 관련 정보를 정확히 찾아내지.""",
    tools=[SerperDevTool()],
    verbose=True
)

response_agent = Agent(
    role="응대 전문가",
    goal="고객이 만족할 수 있는 답변 작성하기",
    backstory="""화난 고객도 달래는 능력자. 
    전문적이면서도 따뜻한 말투로 
    고객이 존중받는다고 느끼게 해.""",
    verbose=True
)


# 할 일 정의
classification_task = Task(
    description="""고객 메시지 분석해줘:
    
    메시지: {customer_message}
    
    billing, technical, general, complaint 중 하나로 분류하고
    긴급도(low/medium/high)도 알려줘.""",
    expected_output="의도 분류랑 긴급도",
    agent=classifier_agent
)

research_task = Task(
    description="""분류 결과 보고 관련 정보 찾아와:
    {classification}
    
    해결에 필요한 정책이나 절차 정리해줘.""",
    expected_output="관련 정책 정보랑 해결방안",
    agent=researcher_agent,
    context=[classification_task]
)

response_task = Task(
    description="""조사 결과 바탕으로 답변 작성해:
    
    원본 메시지: {customer_message}
    분류: {classification}
    조사 결과: {research}
    
    친절하고 전문적인 답변 부탁해.""",
    expected_output="고객한테 보낼 답변",
    agent=response_agent,
    context=[classification_task, research_task]
)


# 팀 구성
customer_service_crew = Crew(
    agents=[classifier_agent, researcher_agent, response_agent],
    tasks=[classification_task, research_task, response_task],
    process=Process.sequential,
    verbose=True
)


# 실행
result = customer_service_crew.kickoff(
    inputs={"customer_message": "청구서가 이상해요!"}
)
print(result)

CrewAI가 좋은 이유

1. 매니저 에이전트도 됨

crew = Crew(
    agents=[classifier_agent, researcher_agent, response_agent],
    tasks=[...],
    process=Process.hierarchical,  # 계층 구조로!
    manager_llm=ChatOpenAI(model="gpt-4o"),
    verbose=True
)

매니저가 알아서 누구한테 뭘 시킬지 판단해요.

2. 메모리 기능

crew = Crew(
    agents=[...],
    tasks=[...],
    memory=True,
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"}
    }
)

과거 대화 기억해서 점점 똑똑해짐.

3. 도구가 빵빵함

from crewai_tools import (
    SerperDevTool,      # 웹 검색
    ScrapeWebsiteTool,  # 크롤링
    FileReadTool,       # 파일 읽기
    CodeInterpreterTool # 코드 실행
)

CrewAI 쓰면 좋은 경우

✅ 추천:

빠르게 프로토타입 뽑아야 할 때
역할 기반으로 생각하는 게 편할 때
기획자도 이해해야 할 때
메모리/학습 기능이 필요할 때

❌ 비추:

세밀하게 흐름 제어해야 할 때
100% 동일한 결과가 필요할 때
감사 로그가 빡세게 필요할 때

AutoGen: 채팅으로 문제 푼다

Microsoft가 만든 AutoGen은 접근법이 완전 다릅니다. 에이전트들이 대화를 하면서 문제를 풀어요. 슬랙 채널에서 여러 명이 토론하듯이요.

핵심 아이디어

에이전트들이 메시지를 주고받음
종료 조건 될 때까지 대화가 계속됨
사람도 그냥 참여자 중 하나

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
import os

config_list = [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}]
llm_config = {"config_list": config_list}


# 에이전트들 만들기
classifier = AssistantAgent(
    name="Classifier",
    system_message="""고객 의도 분류 담당이야. 
    메시지 보고 billing/technical/general/complaint 중 뭔지, 
    긴급도는 어떤지 간단히 알려줘.""",
    llm_config=llm_config
)

researcher = AssistantAgent(
    name="Researcher",
    system_message="""정보 조사 담당이야.
    의도 파악되면 관련 정책이나 해결책 찾아서 정리해줘.""",
    llm_config=llm_config
)

responder = AssistantAgent(
    name="Responder",
    system_message="""답변 작성 담당이야.
    조사 내용 바탕으로 친절하게 답변 써줘.
    다 되면 'TERMINATE'로 끝내.""",
    llm_config=llm_config
)


human_proxy = UserProxyAgent(
    name="Customer",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=0,
    code_execution_config=False
)


# 그룹 채팅 설정
group_chat = GroupChat(
    agents=[human_proxy, classifier, researcher, responder],
    messages=[],
    max_round=10,
    speaker_selection_method="round_robin"
)

manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)


# 대화 시작
human_proxy.initiate_chat(manager, message="청구서가 이상해요!")

AutoGen이 좋은 이유

1. 코드 짜고 실행도 함

coder = AssistantAgent(
    name="Coder",
    system_message="파이썬 전문가야. 코드로 문제 해결해줘.",
    llm_config=llm_config
)

executor = UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": True
    }
)

executor.initiate_chat(coder, message="복리 계산 함수 만들어줘")

코드 짜고, 실행하고, 결과 보고 수정하고... 개발 자동화에 딱!

2. 토론시키기 좋음

optimist = AssistantAgent(name="Optimist", system_message="긍정적인 면을 찾아...")
critic = AssistantAgent(name="Critic", system_message="문제점을 찾아...")
synthesizer = AssistantAgent(name="Synthesizer", system_message="의견 종합해...")

group_chat = GroupChat(agents=[optimist, critic, synthesizer], ...)

AutoGen 쓰면 좋은 경우

✅ 추천:

코드 짜고 실행해야 할 때
반복적으로 다듬어야 할 때
토론 통해 답을 찾아야 할 때
개발 자동화 도구 만들 때

❌ 비추:

매번 똑같이 돌아가야 할 때
토큰 비용이 걱정될 때 (대화가 길어짐)
흐름을 딱딱 제어해야 할 때

세 줄 요약 비교

기준	LangGraph	CrewAI	AutoGen
한마디로	그래프로 설계	팀처럼 역할 분담	채팅으로 협업
러닝커브	좀 있음	쉬움	중간
토큰 효율	좋음	보통	많이 씀
디버깅	시각화 굿	로그로	좀 어려움
프로덕션	바로 가능	성장 중	연구용 느낌

언제 뭘 쓸까?

상황	추천
고객 상담봇	LangGraph (흐름 제어 + 감사 로그)
콘텐츠 제작 파이프라인	CrewAI (역할 분담이 직관적)
코드 생성/실행	AutoGen (코드 돌리기 특화)
리서치 자동화	LangGraph (복잡한 분기 처리)
빠른 MVP	CrewAI (배우기 쉬움)

실전 팁

함정 1: 에이전트 너무 많이 만들지 마세요

3개로 충분한데 20개 만들면 관리 지옥.

# ❌ 이러지 마세요
# ✅ 2-3개로 시작
simple_crew = Crew(
    agents=[classifier, responder],
    tasks=[classification_task, response_task]
)

함정 2: 무한루프 조심

에이전트끼리 계속 서로한테 넘기면 영원히 안 끝남.

# 종료 조건 꼭 넣기
graph.invoke(state, config={"recursion_limit": 25})
group_chat = GroupChat(max_round=10, ...)

함정 3: 컨텍스트 폭발

전체 대화를 계속 넘기면 토큰 터짐. 중간에 요약하세요.

def summarize_context(state):
    summary_llm = ChatOpenAI(model="gpt-4o-mini")
    summary = summary_llm.invoke([
        {"role": "user", "content": f"100자로 요약해줘: {state['context']}"}
    ])
    return {"context": summary.content}

결론: 뭘 고를까?

LangGraph = 흐름 제어가 중요하고, 프로덕션 수준이 필요할 때

CrewAI = 빠르게 만들고 싶고, 역할 기반이 편할 때

AutoGen = 코드 실행이 필요하거나, 토론형 문제 풀이가 필요할 때

뭘 고르든 원칙은 같아요:

2-3개로 시작해서 필요할 때 늘리기
에이전트마다 역할 명확히
에러 처리 꼭 넣기
모니터링 필수

에이전트는 준비됐고, 프레임워크도 충분히 성숙했습니다. 이제 만들 차례예요!