DE ๊ธฐ๋ณธ ์šฉ์–ด ์ •๋ฆฌ

๐Ÿ“˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๋ฐ ํ”Œ๋žซํผ ์šฉ์–ด ์ •๋ฆฌ์ง‘ #

โœ… ํ‚ค์›Œ๋“œ ๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๐Ÿ“Š 1. ๋ฐ์ดํ„ฐ ์•„ํ‚คํ…์ฒ˜ & ์„ค๊ณ„ #

ํ–‰/์—ด ๊ธฐ๋ฐ˜ (Row vs Column) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • Row-based: ๋ฐ์ดํ„ฐ๋ฅผ ํ–‰ ๋‹จ์œ„๋กœ ์ €์žฅ. OLTP ์‹œ์Šคํ…œ์— ์ตœ์ ํ™”๋จ. (e.g. MySQL, PostgreSQL)
  • Column-based: ๋ฐ์ดํ„ฐ๋ฅผ ์—ด ๋‹จ์œ„๋กœ ์ €์žฅ. OLAP ์‹œ์Šคํ…œ ๋ฐ ๋ถ„์„์— ์ตœ์ ํ™”๋จ.(e.g. Parquet, ORC, ClickHouse)

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Row-based: ์ „์ฒด ํ–‰ ๋‹จ์œ„๋กœ ์ฝ๊ณ  ์“ฐ๋ฏ€๋กœ ์‹ค์‹œ๊ฐ„ ํŠธ๋žœ์žญ์…˜์— ์ ํ•ฉ
  • Column-based: Parquet, ORC, ClickHouse, Druid โ†’ ๋ถ„์„ ์ฟผ๋ฆฌ์— ์œ ๋ฆฌ
  • ์••์ถ•๋ฅ : ์ปฌ๋Ÿผ ๊ธฐ๋ฐ˜์ด ๋™์ผ ๋ฐ์ดํ„ฐ ํƒ€์ž… ์—ฐ์† ์ €์žฅ์œผ๋กœ ์••์ถ•๋ฅ  ๋†’์Œ
  • Columnar DB๋Š” ๋ฒกํ„ฐํ™”, late materialization ๊ฐ™์€ ์ฟผ๋ฆฌ ์ตœ์ ํ™” ์ „๋žต๊ณผ ๊ถํ•ฉ์ด ์ข‹์Œ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “์™œ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์—์„œ๋Š” ์ปฌ๋Ÿผ ๊ธฐ๋ฐ˜์„ ์„ ํ˜ธํ•˜๋‚˜์š”?” โ†’ ๋ถ„์„ ์„ฑ๋Šฅ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด I/O ๋น„์šฉ ์ตœ์†Œํ™” ๊ฐ€๋Šฅ

CAP #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜
๋ถ„์‚ฐ ์‹œ์Šคํ…œ์—์„œ๋Š” ์„ธ ๊ฐ€์ง€ ์†์„ฑ ์ค‘ ๋‘ ๊ฐ€์ง€๊นŒ์ง€๋งŒ ๋™์‹œ์— ๋ณด์žฅํ•  ์ˆ˜ ์žˆ์Œ:

  • Consistency (C): ๋ชจ๋“  ๋…ธ๋“œ๊ฐ€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์Œ
  • Availability (A): ๋ชจ๋“  ์š”์ฒญ์— ๋Œ€ํ•ด ์‘๋‹ต์„ ๋ฐ˜ํ™˜ํ•จ
  • Partition Tolerance (P): ๋„คํŠธ์›Œํฌ ๋‹จ์ ˆ ์ƒํ™ฉ์—์„œ๋„ ์‹œ์Šคํ…œ์ด ์ž‘๋™ ๊ฐ€๋Šฅ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Kafka: ์ƒํ™ฉ๋ณ„๋กœ CP/AP ์„ ํƒ (๋ธŒ๋กœ์ปค ์žฅ์•  ์‹œ CP, ๋„คํŠธ์›Œํฌ ๋ถ„ํ•  ์‹œ AP)
    • ๋ธŒ๋กœ์ปค ์žฅ์•  ์‹œ: CP (์ผ๊ด€์„ฑ ์šฐ์„ , ๊ฐ€์šฉ์„ฑ ์ผ์‹œ ์ €ํ•˜)
    • ํŒŒํ‹ฐ์…˜ ๋ถ„ํ•  ์‹œ: AP (๊ฐ€์šฉ์„ฑ ์šฐ์„ , ์ผ๊ด€์„ฑ eventual)
  • Cassandra: AP ์‹œ์Šคํ…œ (๋†’์€ ๊ฐ€์šฉ์„ฑ, ์ผ์‹œ์  ๋ถˆ์ผ์น˜ ํ—ˆ์šฉ)
  • Zookeeper: CP ์‹œ์Šคํ…œ (์ผ๊ด€์„ฑ ์ค‘์‹ฌ)

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Kafka๋Š” CAP์—์„œ ์–ด๋–ค ์„ ํƒ์„ ํ–ˆ๋‚˜์š”?” โ†’ Partition ์ƒํ™ฉ์—์„œ Consistency๋ฅผ ์œ ์ง€ํ•˜๊ณ  Availability๋ฅผ ์ž ์‹œ ํฌ์ƒ (CP)

OLTP vs OLAP #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • OLTP: ์‹ค์‹œ๊ฐ„ ํŠธ๋žœ์žญ์…˜ ์ฒ˜๋ฆฌ. ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ํ™˜๊ฒฝ์— ์ ํ•ฉ
  • OLAP: ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ง‘๊ณ„ ๋ฐ ๋ถ„์„์— ์ ํ•ฉ. ๋‹ค์ฐจ์› ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • OLTP: ์ •๊ทœํ™” ์Šคํ‚ค๋งˆ, ๋‚ฎ์€ ์ง€์—ฐ, ๊ฐ•ํ•œ ์ผ๊ด€์„ฑ (์˜ˆ: ์€ํ–‰ ์‹œ์Šคํ…œ)
  • OLAP: ๋น„์ •๊ทœํ™”, ๋ฐ์ดํ„ฐ ๋งˆํŠธ/์›จ์–ดํ•˜์šฐ์Šค์—์„œ ์‚ฌ์šฉ (์˜ˆ: ํŒ๋งค ๋ถ„์„)

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “OLTP ๋ฐ์ดํ„ฐ๋ฅผ OLAP๋กœ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™˜ํ•˜๋‚˜์š”?” โ†’ ETL/ELT๋กœ ์ •์ œ โ†’ ๋ชจ๋ธ๋ง โ†’ ์ ์žฌ

Normalize / Denormalize #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์ •๊ทœํ™” (Normalization): ๋ฐ์ดํ„ฐ ์ค‘๋ณต ์ œ๊ฑฐ, ๋ฌด๊ฒฐ์„ฑ ๋ณด์žฅ, ํ…Œ์ด๋ธ” ๋ถ„ํ•ด ์ค‘์‹ฌ
  • ๋น„์ •๊ทœํ™” (Denormalization): ์ฟผ๋ฆฌ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ์˜๋„์  ์ค‘๋ณต ํ—ˆ์šฉ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • OLTP ์‹œ์Šคํ…œ: ์ •๊ทœํ™” (๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ๊ณผ ์ €์žฅ ๊ณต๊ฐ„ ์ ˆ์•ฝ)
  • OLAP ์‹œ์Šคํ…œ: ๋น„์ •๊ทœํ™” (์กฐ์ธ ์ตœ์†Œํ™”๋กœ ๋น ๋ฅธ ์ฟผ๋ฆฌ)

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “๋ถ„์„ ํ™˜๊ฒฝ์—์„œ ๋น„์ •๊ทœํ™”๋ฅผ ์„ ํƒํ•˜๋Š” ์ด์œ ๋Š”?” โ†’ ๋‹ค๋Ÿ‰์˜ ์กฐ์ธ์„ ํ”ผํ•˜๊ณ  ์ฟผ๋ฆฌ ์‘๋‹ต ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•จ

ELT vs ETL #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ETL: ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœ โ†’ ๋ณ€ํ™˜ โ†’ ์ ์žฌ (์ „ํ†ต์ ์ธ ๋ฐฉ์‹)
  • ELT: ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœ โ†’ ์ ์žฌ โ†’ ๋ณ€ํ™˜ (ํด๋ผ์šฐ๋“œ ์‹œ๋Œ€ ๋“ฑ์žฅ)

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ETL: Spark, Informatica, Talend ๋“ฑ์—์„œ ์‚ฌ์šฉ
  • ELT: BigQuery, Snowflake ๋“ฑ์—์„œ Push-down ๋ฐฉ์‹์œผ๋กœ ์ฒ˜๋ฆฌ
  • ELT๋Š” ์œ ์—ฐ์„ฑ๊ณผ ์œ ์ง€๋ณด์ˆ˜์„ฑ์ด ๋†’๊ณ , ์Šคํ† ๋ฆฌ์ง€ ๊ฐ€๊ฒฉ ํ•˜๋ฝ์œผ๋กœ ๋ถ€๋‹ด ์ ์Œ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “ELT๊ฐ€ ์ตœ๊ทผ ๋” ์„ ํ˜ธ๋˜๋Š” ์ด์œ ๋Š”?” โ†’ Cloud DWH์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ + ์ €์žฅ ๋น„์šฉ ํ•˜๋ฝ + ์œ ์—ฐํ•œ ์ฟผ๋ฆฌ ๋ชจ๋ธ

๋ฐฐ์น˜ vs ์ŠคํŠธ๋ฆฌ๋ฐ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ฐฐ์น˜: ๋ฐ์ดํ„ฐ๊ฐ€ ์ผ์ • ๊ธฐ๊ฐ„ ๋™์•ˆ ๋ˆ„์ ๋œ ํ›„ ์ผ๊ด„ ์ฒ˜๋ฆฌ
  • ์ŠคํŠธ๋ฆฌ๋ฐ: ๋ฐ์ดํ„ฐ๊ฐ€ ๋„์ฐฉํ•˜์ž๋งˆ์ž ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ๋จ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ๋ฐฐ์น˜: ๋Œ€๋Ÿ‰ ์ฒ˜๋ฆฌ, ์ •ํ™•์„ฑ ์ค‘์š”, Spark, Airflow, Hive
  • ์ŠคํŠธ๋ฆฌ๋ฐ: ์‹ค์‹œ๊ฐ„ ๋ฐ˜์‘, Kafka, Flink, Spark Structured Streaming

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “์ŠคํŠธ๋ฆฌ๋ฐ์ด ํ•„์š”ํ•œ ๋Œ€ํ‘œ์ ์ธ ์‚ฌ๋ก€๋Š”?” โ†’ ์‹ค์‹œ๊ฐ„ ์‚ฌ๊ธฐ ํƒ์ง€, ์‹ค์‹œ๊ฐ„ ๋กœ๊ทธ ๋ชจ๋‹ˆํ„ฐ๋ง, IoT ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋“ฑ

๋ฉฑ๋“ฑ์„ฑ(idempotence) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๊ฐ™์€ ์—ฐ์‚ฐ์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์ ์šฉํ•ด๋„ ๊ฒฐ๊ณผ๊ฐ€ ๋™์ผํ•œ ์„ฑ์งˆ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ETL/ELT ์žฌ์‹œ๋„ ์‹œ ๋™์ผ ๊ฒฐ๊ณผ ๋ณด์žฅ ํ•„์š”
  • Upsert ๋˜๋Š” Merge ์ „๋žต, ๋ฉฑ๋“ฑ ํ‚ค(idempotent key) ํ™œ์šฉ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ “์™œ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ๋ฉฑ๋“ฑ์„ฑ์ด ์ค‘์š”ํ• ๊นŒ?” โ†’ ์žฅ์•  ๋ฐœ์ƒ ์‹œ ์ค‘๋ณต ์‹คํ–‰์— ๋Œ€ํ•œ ์•ˆ์ „์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•ด ํ•„์ˆ˜

์ƒค๋”ฉ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ํ•˜๋‚˜์˜ ํ…Œ์ด๋ธ”์„ ์—ฌ๋Ÿฌ ๋ฌผ๋ฆฌ์  ๋…ธ๋“œ๋กœ ๋ถ„์‚ฐํ•ด ์ €์žฅํ•˜์—ฌ ํ™•์žฅ์„ฑ๊ณผ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํ™•๋ณดํ•˜๋Š” ๋ฐฉ์‹

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ์ƒค๋“œ ํ‚ค: ๊ท ๋“ฑ ๋ถ„ํฌ + ์ฟผ๋ฆฌ ํšจ์œจ์„ฑ ๊ณ ๋ ค
  • ํฌ๋กœ์Šค ์ƒค๋“œ ์—ฐ์‚ฐ์€ ๋ณต์žกํ•˜๊ณ  ๋น„์šฉ ์ฆ๊ฐ€
  • MongoDB, Elasticsearch ๋“ฑ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “์ƒค๋“œ ํ‚ค๋Š” ์–ด๋–ป๊ฒŒ ๊ณ ๋ฅด๋Š”๊ฐ€?” โ†’ ๊ท ๋“ฑ ๋ถ„์‚ฐ, ๋ถˆ๋ณ€์„ฑ, ์ฟผ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์„ ๋งŒ์กฑํ•ด์•ผ ํ•จ

๋ณต์ œ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ฐ์ดํ„ฐ์˜ ๊ณ ๊ฐ€์šฉ์„ฑ(HA)์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋ณต์ˆ˜์˜ ๋…ธ๋“œ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์‚ฌํ•˜๋Š” ๊ธฐ์ˆ 

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ์ฝ๊ธฐ ํ™•์žฅ (Read Replica)
  • ์žฅ์•  ๋ณต๊ตฌ (Failover)
  • ์ด์ค‘ํ™”, Multi-region ๋ณต์ œ ๋“ฑ์œผ๋กœ ํ™œ์šฉ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “๋ณต์ œ ์ง€์—ฐ์ด ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ์–ด๋–ป๊ฒŒ ๋Œ€์‘ํ•˜๋‚˜์š”?” โ†’ ์ฝ๊ธฐ ์ผ๊ด€์„ฑ ์กฐ์ ˆ, Staleness ํ—ˆ์šฉ ์—ฌ๋ถ€ ํŒ๋‹จ

Consistency vs Latency ํŠธ๋ ˆ์ด๋“œ์˜คํ”„ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์ผ๊ด€์„ฑ ๋ณด์žฅ์„ ๊ฐ•ํ™”ํ•˜๋ฉด, ์ผ๋ฐ˜์ ์œผ๋กœ ์‘๋‹ต ์‹œ๊ฐ„(latency)์€ ๋Š˜์–ด๋‚จ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Redis: strong consistency (LAT โ†‘)
  • DynamoDB: eventual consistency (LAT โ†“)
  • DB๋งˆ๋‹ค ์กฐ์ ˆ ๊ฐ€๋Šฅํ•œ ์ฝ๊ธฐ ์ผ๊ด€์„ฑ ์˜ต์…˜ ์กด์žฌ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “๋ถ„์‚ฐ ์บ์‹œ ์‹œ์Šคํ…œ์—์„œ ์–ด๋–ค ์ผ๊ด€์„ฑ ์ˆ˜์ค€์ด ํ•„์š”ํ•œ๊ฐ€์š”?” โ†’ ์บ์‹œ ํŠน์„ฑ์ƒ latency ์šฐ์„ , ์ผ๊ด€์„ฑ ํฌ์ƒ ๊ฐ€๋Šฅ

Eventual Consistency #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด ๊ฒฐ๊ตญ ๋ชจ๋“  ๋ณต์ œ ๋…ธ๋“œ์— ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐ˜์˜๋จ์„ ๋ณด์žฅ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • DNS, SNS ์ข‹์•„์š” ์ˆ˜ ๋“ฑ ์‹ค์‹œ๊ฐ„ ์ •ํ•ฉ์„ฑ ๋ถˆํ•„์š”ํ•œ ์„œ๋น„์Šค์— ์ ํ•ฉ
  • Write Fast, Read Eventually Accurate ์ „๋žต

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “์ตœ์ข… ์ผ๊ด€์„ฑ์ด ์ ์šฉ๋œ ์‹œ์Šคํ…œ ์˜ˆ์‹œ๋Š”?” โ†’ SNS ์ข‹์•„์š” ์ˆ˜, ์‡ผํ•‘๋ชฐ ์ƒํ’ˆ ์ฐœ ์ˆ˜ ๋“ฑ

Throughput vs Latency #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • Throughput: ์ดˆ๋‹น ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅํ•œ ์ž‘์—…๋Ÿ‰
  • Latency: ๋‹จ์ผ ์š”์ฒญ ์ฒ˜๋ฆฌ์— ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Throughput ์ค‘์‹ฌ: ๋ฐฐ์น˜ ํŒŒ์ดํ”„๋ผ์ธ (Spark)
  • Latency ์ค‘์‹ฌ: ์‹ค์‹œ๊ฐ„ ์‘๋‹ต (REST API, Kafka)
  • Kafka: ํŒŒํ‹ฐ์…˜ ์ฆ๊ฐ€ โ†’ ์ฒ˜๋ฆฌ๋Ÿ‰ ์ฆ๊ฐ€, ๋‹จ latency ์ฆ๊ฐ€ ๊ฐ€๋Šฅ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Kafka์—์„œ Throughput์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์€?” โ†’ batch.size, linger.ms ์กฐ์ • + ์••์ถ• ํ™œ์„ฑํ™” + ํŒŒํ‹ฐ์…˜ ์ˆ˜ ์ฆ๊ฐ€

๐Ÿ—๏ธ 2. ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง & ์Šคํ‚ค๋งˆ #

Fact & Dimension #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • Fact Table: ์ˆ˜์น˜ ๊ธฐ๋ฐ˜ ์ด๋ฒคํŠธ ๋ฐ์ดํ„ฐ ์ €์žฅ (์ธก์ •๊ฐ’, ์ง€ํ‘œ ์ค‘์‹ฌ)
  • Dimension Table: ์†์„ฑ/๋งฅ๋ฝ ์ •๋ณด ์ €์žฅ (์ •์˜, ์„ค๋ช… ์ค‘์‹ฌ)

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Fact: ์žฅ์‹œ๊ฐ„ ์Œ“์ด๋Š” ๋Œ€์šฉ๋Ÿ‰ ํ…Œ์ด๋ธ”, ์™ธ๋ž˜ํ‚ค๋กœ dimension ์ฐธ์กฐ
  • Dimension: ๋ถ„์„ ํŽธ์˜ ์œ„ํ•œ ์†์„ฑ ๋ถ„๋ฆฌ (e.g., ์ œํ’ˆ๋ช…, ๊ณ ๊ฐ๊ตฐ ๋“ฑ)
  • Star Schema ๊ตฌ์„ฑ ์‹œ ํ•„์ˆ˜ ์š”์†Œ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Fact Table๊ณผ Dimension Table์˜ ๊ตฌ๋ถ„ ๊ธฐ์ค€์€?” โ†’ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ˆ„์ ๋˜๊ณ  ์ˆ˜์น˜ ์ค‘์‹ฌ์ด๋ฉด Fact, ์„ค๋ช… ์ •๋ณด๋Š” Dimension

SCD #

Slowly Changing Dimension

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€๊ฒฝ๋˜๋Š” Dimension ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ /๊ด€๋ฆฌํ•˜๋Š” ์ „๋žต

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • SCD Type 1: ๊ฐ’ ๋ฎ์–ด์“ฐ๊ธฐ (๊ณผ๊ฑฐ ๋ฌด์‹œ)
  • SCD Type 2: ๋ณ€๊ฒฝ ์ด๋ ฅ row๋กœ ์ €์žฅ (์‹œ์ž‘/์ข…๋ฃŒ์ผ ์ปฌ๋Ÿผ)
  • Iceberg ๋“ฑ ํ…Œ์ด๋ธ” ํฌ๋งท์—์„œ ๊ตฌํ˜„ ์šฉ์ด

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “SCD Type 2์˜ ์žฅ๋‹จ์ ์€?” โ†’ ์ด๋ ฅ ๋ณด์กด ๊ฐ€๋Šฅ, ์ฟผ๋ฆฌ ๋ณต์žก๋„ ์ฆ๊ฐ€

Schema Evolution #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ฐ์ดํ„ฐ ์ €์žฅ ๊ตฌ์กฐ(schema)๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ํ™•์žฅ๋˜๊ฑฐ๋‚˜ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉํ•˜๋Š” ๊ธฐ๋Šฅ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Iceberg, Delta Lake, Avro ๋“ฑ์—์„œ ์ง€์›
  • ํ•„๋“œ ์ถ”๊ฐ€/์‚ญ์ œ/ํƒ€์ž… ๋ณ€๊ฒฝ ์‹œ ํ•˜์œ„ ํ˜ธํ™˜์„ฑ ์œ ์ง€ ๊ณ ๋ ค ํ•„์š”
  • ์Šคํ‚ค๋งˆ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ ํ™œ์šฉ ์‹œ ์ง„ํ™” ์ •์ฑ… ๋ช…์‹œ ๊ฐ€๋Šฅ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Schema Evolution์„ ์•ˆ์ „ํ•˜๊ฒŒ ๊ด€๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์€?” โ†’ backward/forward ํ˜ธํ™˜ ์ •์ฑ… ์ •์˜ + ํ…Œ์ŠคํŠธ ์ž๋™ํ™”

Star vs Snowflake #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • Star: Fact + ๋‹จ์ผ ๊ณ„์ธต์˜ Dimension
  • Snowflake: Dimension์„ ๋‹ค์‹œ ์ •๊ทœํ™”ํ•˜์—ฌ ๋ณต์žกํ•œ ๊ตฌ์กฐ๋กœ ๋ถ„๋ฆฌ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Star: ๋‹จ์ˆœํ•œ ๊ตฌ์กฐ, ๋น ๋ฅธ ์ฟผ๋ฆฌ (OLAP ์ ํ•ฉ)
  • Snowflake: ๊ณต๊ฐ„ ํšจ์œจ, ์œ ์ง€๊ด€๋ฆฌ ํŽธ์˜ (์ •๊ทœํ™” ๊ธฐ๋ฐ˜)

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Star Schema๋ฅผ ์‹ค๋ฌด์—์„œ ์„ ํ˜ธํ•˜๋Š” ์ด์œ ๋Š”?” โ†’ ์กฐ์ธ ์ตœ์†Œํ™” + ์‚ฌ์šฉ์ž ์นœํ™”์  ๊ตฌ์กฐ

๋ฐ์ดํ„ฐ ํƒ€์ž… ์ตœ์ ํ™” #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์ •๋ฐ€๋„, ๋ฉ”๋ชจ๋ฆฌ, ์ฟผ๋ฆฌ ์„ฑ๋Šฅ์„ ๊ณ ๋ คํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ํšจ์œจ์ ์œผ๋กœ ์ •์˜ํ•˜๋Š” ์ž‘์—… ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ
  • ์ •์ˆ˜/๋ถˆ๋ฆฌ์–ธ โ†’ INT8/BOOLEAN์œผ๋กœ ์ตœ์ ํ™”
  • ๊ณ ์œ ๊ฐ’ ์ ์€ ๋ฌธ์ž์—ด โ†’ ENUM ๋˜๋Š” Dictionary Encoding
  • ๋‚ ์งœ/์‹œ๊ฐ„ โ†’ ์ •์ˆ˜ ๊ธฐ๋ฐ˜ ํƒ€์ž„์Šคํƒฌํ”„ ๋ณ€ํ™˜์œผ๋กœ ์ €์žฅ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ์ž˜๋ชป ์ •์˜ํ•˜๋ฉด ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ๋Š”?” โ†’ ์„ฑ๋Šฅ ์ €ํ•˜, ์Šคํ† ๋ฆฌ์ง€ ๋‚ญ๋น„, ์ธ๋ฑ์Šค ๋ฌด๋ ฅํ™”

์Šคํ‚ค๋งˆ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ฉ”์‹œ์ง€ ์Šคํ‚ค๋งˆ๋ฅผ ์ค‘์•™์—์„œ ๊ด€๋ฆฌํ•˜์—ฌ ์ƒ์‚ฐ์ž-์†Œ๋น„์ž ๊ฐ„ ํ˜ธํ™˜์„ฑ ๋ณด์žฅ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Kafka + Avro ํ†ตํ•ฉ ์‹œ ๊ฑฐ์˜ ํ•„์ˆ˜
  • Schema Compatibility ์ •์ฑ… ์„ค์ • (BACKWARD, FORWARD ๋“ฑ)
  • ๋ฒ„์ „ ๊ด€๋ฆฌ ๋ฐ ์ž๋™ ๊ฒ€์ฆ ๊ธฐ๋Šฅ ํฌํ•จ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Schema Registry ์—†์„ ๋•Œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ๋Š”?” โ†’ ๋ฉ”์‹œ์ง€ ํฌ๋งท ๋ถˆ์ผ์น˜๋กœ ์ธํ•œ ํŒŒ์‹ฑ ์‹คํŒจ, ์—ญ์ง๋ ฌํ™” ์—๋Ÿฌ

Data Vault #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์ด๋ ฅ ๋ณด์กด + ํ™•์žฅ์„ฑ + ๊ฐ์‚ฌ ์ถ”์ ์„ ๊ณ ๋ คํ•œ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค ๋ชจ๋ธ๋ง ๊ธฐ๋ฒ•

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Hub (Business Key), Link (๊ด€๊ณ„), Satellite (์†์„ฑ ์ด๋ ฅ) ๊ตฌ์กฐ
  • ๋ณ€ํ™” ์ถ”์ , SCD๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๊ด€๋ฆฌ ๊ฐ€๋Šฅ
  • ๋„์ž… ๋‚œ์ด๋„๋Š” ๋†’์œผ๋‚˜ ๊ฑฐ๋ฒ„๋„Œ์Šค์— ์œ ๋ฆฌ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Data Vault๊ฐ€ Star Schema๋ณด๋‹ค ์œ ๋ฆฌํ•œ ์ƒํ™ฉ์€?” โ†’ ๋นˆ๋ฒˆํ•œ ๊ตฌ์กฐ ๋ณ€๊ฒฝ, ์ด๋ ฅ ๊ด€๋ฆฌ, ๊ฐ์‚ฌ๋ฅผ ๊ณ ๋ คํ•  ๋•Œ

๐Ÿ”ง 3. ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ & ์ตœ์ ํ™” #

Backfilling #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๊ณผ๊ฑฐ์˜ ๋ˆ„๋ฝ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌํ›„์ ์œผ๋กœ ์ฑ„์›Œ ๋„ฃ๋Š” ์ž‘์—…. ์ฃผ๋กœ ETL ํŒŒ์ดํ”„๋ผ์ธ ์ค‘๋‹จ, ์‹ ๊ทœ ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ๋ฐœ ์‹œ ์ˆ˜ํ–‰๋จ

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ๋ฐ์ดํ„ฐ ์ •ํ•ฉ์„ฑ์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์ž‘์—…์ด๋‚˜, ์ˆ˜ํ–‰ ์‹œ ๊ธฐ์ค€ ์‹œ์  ๋ช…ํ™•ํžˆ ์ •์˜ํ•ด์•ผ ํ•จ
  • ๋ฉฑ๋“ฑ์„ฑ ๋ณด์žฅ ํ•„์ˆ˜ (์ค‘๋ณต ๋ฐฉ์ง€)
  • Airflow + Spark๋กœ ์‹œ๊ฐ„ ๋ฒ”์œ„ ์ง€์ •ํ•˜์—ฌ ์žฌ์ฒ˜๋ฆฌ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “๋ฐฑํ•„ ์‹œ ์ฃผ์˜์‚ฌํ•ญ์€ ๋ฌด์—‡์ธ๊ฐ€์š”?” โ†’ ๋ฐ์ดํ„ฐ ์ค‘๋ณต, ์ง€์—ฐ ๋ฐ์ดํ„ฐ, ํƒ€์ž„์กด, ์ง€ํ‘œ ์™œ๊ณก ๋“ฑ์„ ๊ณ ๋ คํ•ด์•ผ ํ•จ

ํŒŒํ‹ฐ์…”๋‹ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ํ…Œ์ด๋ธ”์„ ํŠน์ • ๊ธฐ์ค€(์˜ˆ: ๋‚ ์งœ, ์ง€์—ญ ๋“ฑ)์œผ๋กœ ๋ถ„ํ• ํ•˜์—ฌ ์ €์žฅํ•จ์œผ๋กœ์จ ์กฐํšŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋„๋ชจํ•˜๋Š” ๊ธฐ๋ฒ•

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • ์‹œ๊ฐ„ ํŒŒํ‹ฐ์…”๋‹์ด ๊ฐ€์žฅ ์ผ๋ฐ˜์  (์˜ˆ: dt ์ปฌ๋Ÿผ ๊ธฐ์ค€)
  • Hive, BigQuery, Iceberg ๋“ฑ์—์„œ ๊ธฐ๋ณธ ์ œ๊ณต
  • ํŒŒํ‹ฐ์…˜ ํ‚ค๋Š” ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ํ•„ํ„ฐ ์กฐ๊ฑด๊ณผ ์ผ์น˜์‹œ์ผœ์•ผ ํšจ๊ณผ์ 

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “ํŒŒํ‹ฐ์…”๋‹์ด ์™œ ์ค‘์š”ํ•œ๊ฐ€์š”?” โ†’ ๋ฐ์ดํ„ฐ ๋ฒ”์œ„ ์ œํ•œ โ†’ I/O ์ค„์ด๊ณ  ์ฟผ๋ฆฌ ์„ฑ๋Šฅ ํ–ฅ์ƒ

CDC #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์›๋ณธ DB์˜ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์‹ค์‹œ๊ฐ„ ๋˜๋Š” ์ค€์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐ์ง€ํ•˜์—ฌ ํ•˜๋ฅ˜ ์‹œ์Šคํ…œ์œผ๋กœ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ์‹

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Debezium, Maxwell, StreamSets, Oracle GoldenGate ๋“ฑ ๋„๊ตฌ ํ™œ์šฉ
  • ๋กœ๊ทธ ๊ธฐ๋ฐ˜(CDC Log), ํŠธ๋ฆฌ๊ฑฐ ๊ธฐ๋ฐ˜, ํƒ€์ž„์Šคํƒฌํ”„ ๋น„๊ต ๋ฐฉ์‹ ์กด์žฌ
  • Kafka๋กœ ์ „๋‹ฌ ํ›„ Spark/Flink์—์„œ ์ฒ˜๋ฆฌ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “CDC ๊ตฌํ˜„ ์‹œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์ ์€?” โ†’ ์ˆœ์„œ ๋ณด์žฅ, ์ค‘๋ณต ์ฒ˜๋ฆฌ, ์Šคํ‚ค๋งˆ ๋ณ€๊ฒฝ ๋Œ€์‘

์ธ๋ฑ์‹ฑ ์ „๋žต #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ฐ์ดํ„ฐ ์กฐํšŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ์ปฌ๋Ÿผ์— ์ธ๋ฑ์Šค๋ฅผ ์„ค์ •ํ•˜๋Š” ์ „๋žต

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • B-Tree, Bitmap, Hash ์ธ๋ฑ์Šค ์กด์žฌ
  • ์ ์ ˆํ•œ ์ธ๋ฑ์‹ฑ์€ ์กฐํšŒ ์„ฑ๋Šฅ ํ–ฅ์ƒ, ๊ณผ๋„ํ•œ ์ธ๋ฑ์Šค๋Š” ์“ฐ๊ธฐ ์„ฑ๋Šฅ ์ €ํ•˜
  • ๋ณตํ•ฉ ์ธ๋ฑ์Šค ์ˆœ์„œ ์ค‘์š” (WHERE ์กฐ๊ฑด๊ณผ ์ •๋ ฌ ์ˆœ์„œ ๋ถ„์„ ํ•„์š”)

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “์ธ๋ฑ์Šค๊ฐ€ ์˜คํžˆ๋ ค ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ค๋Š” ๊ฒฝ์šฐ๋Š”?” โ†’ ๋ถˆํ•„์š”ํ•˜๊ฑฐ๋‚˜ ์ค‘๋ณต๋œ ์ธ๋ฑ์Šค, ์ž์ฃผ ๊ฐฑ์‹ ๋˜๋Š” ์ปฌ๋Ÿผ

์••์ถ• ํฌ๋งท (Parquet, ORC) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ถ„์„์šฉ ๋ฐ์ดํ„ฐ ์ €์žฅ ์‹œ ์šฉ๋Ÿ‰ ์ตœ์ ํ™”์™€ I/O ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ปฌ๋Ÿผ ๊ธฐ๋ฐ˜ ์••์ถ• ํฌ๋งท

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Parquet: Spark, Hive, BigQuery ๋“ฑ์—์„œ ๊ธฐ๋ณธ ์‚ฌ์šฉ
  • ORC: Hive ์ตœ์ ํ™”์šฉ ํฌ๋งท
  • Columnar ๊ตฌ์กฐ๋ผ ์••์ถ•๋ฅ  ๋†’๊ณ , ์ฟผ๋ฆฌ ์†๋„ ๋น ๋ฆ„

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “CSV ๋Œ€์‹  Parquet์„ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š”?” โ†’ ์••์ถ• ํšจ์œจ + Columnar ์ฝ๊ธฐ + ์Šคํ‚ค๋งˆ ํฌํ•จ

ํŒŒํ‹ฐ์…˜ ํ”„๋ฃจ๋‹ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ํ•„์š”ํ•œ ํŒŒํ‹ฐ์…˜๋งŒ ์ฝ๋„๋ก ์ฟผ๋ฆฌ ์ˆ˜ํ–‰ ์‹œ ์ž๋™์œผ๋กœ ํŒŒํ‹ฐ์…˜์„ ๊ฑธ๋Ÿฌ๋‚ด๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • WHERE ์กฐ๊ฑด์— ํŒŒํ‹ฐ์…˜ ํ‚ค๋ฅผ ํฌํ•จํ•ด์•ผ ๋™์ž‘
  • Hive, Spark, Iceberg ๋ชจ๋‘ ์ง€์›
  • ํŒŒํ‹ฐ์…˜ ์ˆ˜๊ฐ€ ๋งŽ์„์ˆ˜๋ก ํšจ๊ณผ ๊ทน๋Œ€ํ™”

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “ํŒŒํ‹ฐ์…˜ ํ”„๋ฃจ๋‹์ด ์•ˆ๋  ๋•Œ ๋ฌธ์ œ์ ์€?” โ†’ ์ „์ฒด ์Šค์บ” ๋ฐœ์ƒ โ†’ ์„ฑ๋Šฅ ์ €ํ•˜

์กฐ์ธ ์ตœ์ ํ™” #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ์กฐ์ธ ์ˆ˜ํ–‰ ์‹œ ๋ฐœ์ƒํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ด๋™ ๋ฐ ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ „๋žต

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Broadcast Join: ์†Œ ํ…Œ์ด๋ธ” ๋ฉ”๋ชจ๋ฆฌ ํƒ‘์žฌ (Spark)
  • Sort-Merge Join: ๋Œ€์šฉ๋Ÿ‰ ์ •๋ ฌ ๊ธฐ๋ฐ˜ ์กฐ์ธ
  • ์กฐ์ธ ์ˆœ์„œ ๋ฐ ํ•„ํ„ฐ๋ง ์ˆœ์„œ๋„ ์„ฑ๋Šฅ์— ์˜ํ–ฅ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Spark์—์„œ ์กฐ์ธ ์„ฑ๋Šฅ์„ ๋†’์ด๋ ค๋ฉด?” โ†’ Broadcast Join, ํ•„ํ„ฐ๋ง ์šฐ์„ , ํŒŒํ‹ฐ์…”๋‹ ์ „๋žต ๊ณ ๋ ค

๋ฒกํ„ฐํ™” #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • CPU์˜ SIMD(๋™์‹œ ๋ช…๋ น ์ฒ˜๋ฆฌ)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ธ”๋ก ๋‹จ์œ„๋กœ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Spark SQL Catalyst Optimizer์—์„œ ๊ธฐ๋ณธ ์ ์šฉ
  • Arrow ๊ธฐ๋ฐ˜ ์—ฐ์‚ฐ, Pandas UDF ๋“ฑ์—์„œ ํ™œ์šฉ
  • Columnar ํฌ๋งท๊ณผ ๊ถํ•ฉ์ด ์ข‹์Œ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “๋ฒกํ„ฐํ™” ์—ฐ์‚ฐ์ด๋ž€ ๋ฌด์—‡์ด๊ณ , ์–ธ์ œ ์œ ๋ฆฌํ•œ๊ฐ€์š”?” โ†’ ๋Œ€๋Ÿ‰ ๋ฐ์ดํ„ฐ ์—ฐ์‚ฐ์—์„œ ์—ฐ์†๋œ ๋ฉ”๋ชจ๋ฆฌ ์ฒ˜๋ฆฌ๋กœ CPU ํšจ์œจ ๊ทน๋Œ€ํ™”

Columnar Storage #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜

  • ๋ฐ์ดํ„ฐ๋ฅผ ์—ด ๋‹จ์œ„๋กœ ์ €์žฅํ•˜๋Š” ๊ตฌ์กฐ๋กœ, ๋ถ„์„ ์ฟผ๋ฆฌ ์„ฑ๋Šฅ์„ ๋†’์ด๊ณ  ์••์ถ•๋ฅ ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ €์žฅ ๋ฐฉ์‹

๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ

  • Parquet, ORC, ClickHouse ๋“ฑ
  • Column ๋‹จ์œ„ I/O + ๊ณ ์••์ถ• + ๋ฒกํ„ฐํ™” ์ฒ˜๋ฆฌ ์ตœ์ ํ™”
  • ๋ถ„์„ ์œ„์ฃผ์˜ OLAP ํ™˜๊ฒฝ์— ์ ํ•ฉ

๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

  • “Row vs Column Storage์˜ ์ฐจ์ด๋Š”?” โ†’ Row: ํŠธ๋žœ์žญ์…˜, Column: ๋ถ„์„

๐Ÿข 4. ๋ชจ๋˜ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ & ๊ฑฐ๋ฒ„๋„Œ์Šค #

๋ฐ์ดํ„ฐ ๋งˆํŠธ / ๋ ˆ์ดํฌํ•˜์šฐ์Šค #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Data Mesh / Discovery / Hub #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Data Catalog #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๋ฐ์ดํ„ฐ ํ’ˆ์งˆ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Lineage #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Governance #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Metadata ๊ด€๋ฆฌ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Data Contract #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

โšก 5. Kafka #

KRaft vs Zookeeper #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

ํŒŒํ‹ฐ์…˜ ์ „๋žต #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์ปจ์Šˆ๋จธ ๊ทธ๋ฃน & ์˜คํ”„์…‹ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Exactly once #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Schema Registry #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Kafka Connect #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

ํŠธ๋žœ์žญ์…˜ & ๋ฉฑ๋“ฑ์„ฑ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Backpressure #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๐Ÿš€ 6. Spark #

Tungsten #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

RDD vs DF vs DS #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

ํŒŒํ‹ฐ์…”๋‹ ์ „๋žต #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์กฐ์ธ ๋ฐฉ์‹ (Broadcast, Shuffle) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

DPP (๋™์  ํŒŒํ‹ฐ์…˜ ํ”„๋ฃจ๋‹) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

### AQE

Catalyst #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์ŠคํŠธ๋ฆฌ๋ฐ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Checkpointing #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์ด๋ฒคํŠธ ์‹œ๊ฐ„ vs ์ฒ˜๋ฆฌ ์‹œ๊ฐ„ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Watermark #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์œˆ๋„์šฐ ์—ฐ์‚ฐ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Checkpointing & Savepoint #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๋ฐฑํ”„๋ ˆ์…” (Backpressure) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์ •ํ™•ํžˆ ํ•œ ๋ฒˆ ์ฒ˜๋ฆฌ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์ƒํƒœ ๊ด€๋ฆฌ (State Backend) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

CEP (Complex Event Processing) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๐Ÿ—„๏ธ 8. ํ…Œ์ด๋ธ” ํฌ๋งท (Iceberg & Delta Lake) #

ACID ํŠธ๋žœ์žญ์…˜ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์‹œ๊ฐ„ ์—ฌํ–‰ (Time Travel) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์Šคํ‚ค๋งˆ ์ง„ํ™” #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

ํŒŒํ‹ฐ์…˜ ์ง„ํ™” #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์••์ถ• (Compaction) #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Z-ordering #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๋™์‹œ์„ฑ ์ œ์–ด #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

์ฆ๋ถ„ ์ฝ๊ธฐ #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๐ŸŒŠ 9. Airflow #

์—ฌ๋Ÿฌ Executor #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

DAG ์„ค๊ณ„ ์ „๋žต #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

XCom #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Branching #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Sensor #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Hook #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

SubDAG #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๋™์  DAG #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

Backfill #

๐Ÿ“Œ ํ•ต์‹ฌ ์ •์˜ ๐Ÿ’ก ์‹ค๋ฌด ํฌ์ธํŠธ ๐ŸŽฏ ๋ฉด์ ‘ ํฌ์ธํŠธ

๐Ÿ“ˆ 10. ๋ชจ๋‹ˆํ„ฐ๋ง & ๊ด€์ฐฐ๊ฐ€๋Šฅ์„ฑ #

ํŒŒ์ดํ”„๋ผ์ธ ๋ชจ๋‹ˆํ„ฐ๋ง #

๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘ (Prometheus, Grafana) #

๋กœ๊ทธ ์ง‘๊ณ„ #

๋ถ„์‚ฐ ์ถ”์  #

SLA/SLO #

๋ฐ์ดํ„ฐ ๋“œ๋ฆฌํ”„ํŠธ #

โ˜๏ธ 11. ํด๋ผ์šฐ๋“œ & ์ธํ”„๋ผ #

K8s ๊ธฐ๋ฐ˜ ํŒŒ์ดํ”„๋ผ์ธ #

Serverless #

์ปจํ…Œ์ด๋„ˆ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ #

๋ฉ€ํ‹ฐ ํด๋ผ์šฐ๋“œ #

๋น„์šฉ ์ตœ์ ํ™” #

๐Ÿงช 12. ํ…Œ์ŠคํŒ… & ํ’ˆ์งˆ #

ํŒŒ์ดํ”„๋ผ์ธ ํ…Œ์ŠคํŠธ ์ „๋žต #

๋ฐ์ดํ„ฐ ๊ฒ€์ฆ / ํ”„๋กœํŒŒ์ผ๋ง #

ํ’ˆ์งˆ ๋ฉ”ํŠธ๋ฆญ #

A/B ํ…Œ์ŠคํŒ… ์ธํ”„๋ผ #