Simple 5 Steps Guide to get started with Apache Hudi and Glue 4.0 and query the data using Athena

Simple 5 Steps Guide to get started with Apache Hudi and Glue 4.0 and query the data using Athena

Soumil Shah

1 год назад

3,244 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@chetancc
@chetancc - 08.12.2022 21:49

Hi Soumil,
Thanks for sharing. It would be really useful. God bless you.
Thanks,
Chetan from Kandivali, Mumbai, India :)

Ответить
@lezaaarman6964
@lezaaarman6964 - 22.12.2022 22:41

I'm so close to transitioning to Hudi tables, but there's ONE missing feature that I is a blocker:

Do you know what's the best practice to replace the glue job bookmark feature ?
I'm actually building my own bookmarking capability to add to my new glue jobs using Hudi (by replicating what the original glue job bookmark does), but is it the best approach ?
My source data is always being pushed to S3, so I don't have the option of using a streaming job by connecting to a kinesis stream, I just want to use the S3 bucket only as source.

Thanks

Ответить
@AuroraNabi
@AuroraNabi - 02.01.2023 21:22

where can i find the Hudi MOR table glue job script ? Is it uploaded ?i have checked your Github but couldn't find much

Ответить
@joegenshlea6827
@joegenshlea6827 - 21.01.2023 04:17

Great video. I really enjoy your positive energy and passion for the topic.

As a NB in Pyspark and AWS it would be very nice if you could walk through the code just a touch more. I'm curious about how those parameters in the job setup get injected into the job. I'm also curious about this function:

def create_spark_session():
spark = SparkSession \
.builder \
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
.getOrCreate()
return spark

The boiler plate script has this line to instantiate a spark session:

spark = glueContext.spark_session

What is gained by your technique?

Ответить
@MunindherReddy
@MunindherReddy - 03.02.2023 03:35

Does HUDI tables will not allow $ or specials characters on table column names?

Ответить
@adarshverma5429
@adarshverma5429 - 02.03.2023 15:20

I am getting an error like, failed to upsert for commit time 202303022121655469 while writing the data. Please help me out to resolve this issue.

Ответить
@wistyroamlands7495
@wistyroamlands7495 - 24.04.2023 08:45

I'm so glad you're still making videos. :) I wish you luck in your field of choice and I hope things are going well for you. Thanks for your contributions to society.

Ответить
@tomyanth
@tomyanth - 03.05.2023 12:30

Thanks. It is very clear and I manage to repeat this in AWS.

Ответить
@MaheshWankhede-u2o
@MaheshWankhede-u2o - 15.01.2024 18:21

Great video. Can you provide the links for jar files used for above script.

Ответить
@MohamedFazanNismy
@MohamedFazanNismy - 16.05.2024 19:19

Thanks Soumil , if I open the file it shows 'page not found'

Ответить
@KartikGautam
@KartikGautam - 19.07.2024 20:02

Hi Soumil,
I am unable to access the pdf can you help me with that. Thanks

Ответить