How to scrape Understat for football data in Python with requests and BeautifulSoup

How to scrape Understat for football data in Python with requests and BeautifulSoup

McKay Johns

3 года назад

35,633 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@raaghulviswanath284
@raaghulviswanath284 - 04.03.2021 18:37

Great work man. Appreciate it.

Ответить
@wartawen
@wartawen - 08.03.2021 00:26

Thank you for providing this tutorial! If I have a list with the match id's I want to scrape (instead of 1 by 1), what are the necessary modifications to the code? I guess that an additional for loop should be written, but don't know where.

Ответить
@afiqaimanafr
@afiqaimanafr - 08.03.2021 16:08

Sorry, I would like to ask, I am a beginner, what exactly the aim of scrapping the understat of football data?

Ответить
@heisei7361
@heisei7361 - 12.03.2021 19:39

Just coming across and had to click that subscribe button. You're so informative I wish you were my prof 😂 awesome work man!

Ответить
@SuperYash1997
@SuperYash1997 - 17.03.2021 01:03

This is really helpful especially for someone starting with football analysis and getting stuck at the initial step of finding the right data. Is there a way to get pass or any event data in general from understat?

Ответить
@alexbushnell3608
@alexbushnell3608 - 01.04.2021 16:44

The web scraping demo here is fantastic, very clear and easy to apply to other aspects of the website. Top man!

Ответить
@joshcaldwell6946
@joshcaldwell6946 - 07.05.2021 10:48

This is an awesome tutorial! Thanks so much!

Ответить
@christopheraryo3040
@christopheraryo3040 - 20.05.2021 11:10

THANKYOUUU

Ответить
@sushantregmi2126
@sushantregmi2126 - 20.05.2021 16:34

Great stuff man, which club do you support? Please don't say arsenal

Ответить
@grogg2243
@grogg2243 - 26.05.2021 00:22

Hi mate. Is there a way to visualise the data at the end

Ответить
@pramitbardhan7725
@pramitbardhan7725 - 19.06.2021 15:57

But i dont think understat has any international or CL data right? Just the leagues ig

Ответить
@GuardianApe
@GuardianApe - 01.07.2021 00:44

This had to be done , thanks for sharing your knowledge.

Ответить
@willykitheka7618
@willykitheka7618 - 02.07.2021 13:37

Am a Real Madrid fan and I subscribed!😁😁😁...thanks for sharing...I will be visiting again!

Ответить
@Paperscissor183
@Paperscissor183 - 01.08.2021 18:19

Hi this is a great video, can please scrape lotto data

Ответить
@Solace_Yard
@Solace_Yard - 21.08.2021 14:50

Hello brother, thanks for the video. i want a scraping project done. Are you able to help please? we can talk privately.

Ответить
@bernoulisan9649
@bernoulisan9649 - 21.08.2021 16:23

How can i get data manually from a football match please ?

Ответить
@sriram-uu6yd
@sriram-uu6yd - 23.08.2021 14:11

Hi, thanks for the video. I scrapped the shots data from understat, but I am not sure how to convert the X and Y values into X-coordinate, Y-coordinate values to create a shot map. Can you please give an idea.

Ответить
@claudio7614
@claudio7614 - 15.09.2021 12:06

Hi, can you help to convert the thrid script in the page called "roostersData? I changed from 1 to 2 in scripts, but even changing variables it doesn't work, seems it's a bit different from the shotsData one...thanks!

Ответить
@joilsongb436
@joilsongb436 - 16.10.2021 02:34

can you do this method on the page
b e t 3 6 5 ?
I couldn't with the instruction in this video
Delete the spaces between the words

Ответить
@emilsaji1762
@emilsaji1762 - 31.10.2021 13:51

Thank you so much broooooo 😍

Ответить
@bunnybabu1162
@bunnybabu1162 - 24.11.2021 11:06

Wowwww

Ответить
@dr.vojislavhadzimilic3649
@dr.vojislavhadzimilic3649 - 10.12.2021 23:03

Thank you so much for this video

Ответить
@samscholes9727
@samscholes9727 - 13.12.2021 15:21

By converting everything to strings surely that means we cant manipulate the numbers since there arnt any numbers just strings

Ответить
@GLDTruth
@GLDTruth - 24.12.2021 22:47

I had this working a while back, but went to run another game, and I'm getting this error:
NameError Traceback (most recent call last)
<ipython-input-6-dbe8a73dafcb> in <module>()
1 res = requests.get(url)
----> 2 soup = BeautifulSoup(res.content, 'lxml')
3 scripts = soup.find_all('script')

NameError: name 'BeautifulSoup' is not defined
Nothing else changed but the match id. Thank you for your tutorials

Ответить
@Radiofreak87
@Radiofreak87 - 17.01.2022 20:26

could you explain better the coordinate system that these dataframe has? i can't understand where is located the origin (x,y)=(0,0), because these coordinates are always positive (>0). Great video btw GJ

😀

Ответить
@BlueSkyGoldSun
@BlueSkyGoldSun - 22.01.2022 23:58

Nice. Where I can learn football analytics?
And is possible to land job in football analytics?

Ответить
@davidchapman2629
@davidchapman2629 - 02.03.2022 03:33

Great video! I'm trying to do this in Java, do you know how to do the encode & decode in Java? I'm talking about this line:

encode('utf8').decode('unicode_escape')

Thank you!

Ответить
@henriquefriedrich5960
@henriquefriedrich5960 - 12.03.2022 15:36

Superb content man! Btw I have good memories of Barcelona, my team (Internacional) defeated them in 2006 with Adriano Gabiru's goal.

Ответить
@andrkevichandvetal
@andrkevichandvetal - 21.03.2022 22:50

Thank you very much, man! It is helpful for my graduation work in university

Ответить
@goergejohn6986
@goergejohn6986 - 23.03.2022 16:51

Do you know how I can scrape multiple matches/pages on that website?

Ответить
@marianolambolla1013
@marianolambolla1013 - 21.04.2022 05:22

Great Video! Congrats! You could get the entire json converted directly to dataframe by doing:

import ast
pd.read_json(json.dumps(ast.literal_eval(str(data_json['h']))))

Ответить
@qurramzaheer3882
@qurramzaheer3882 - 26.04.2022 05:44

Hey, I was wondering: if I want to scrape multiple pages, what kind of timeout should I be using between each request? Thanks for the very helpful video

Ответить
@davidbrightnyirenda761
@davidbrightnyirenda761 - 25.06.2022 21:25

Awesome video bro...help me write a program to alert me when my variable of choice (team) scores or gets a yellow card or wins a corner kick etc. I need to be able to punch in the id of the team and id of variable I want to keep an eye on, hook it up to the internet and let it scrap while Iwait for the program to alert me if id (goal, corner, yellow card, penalty, odd) is True...

U get the idea....

Ответить
@mathijshartmann2118
@mathijshartmann2118 - 06.07.2022 10:24

Can I ask what the x and y have for meaning in the match?

Ответить
@inakigoya5959
@inakigoya5959 - 01.08.2022 09:29

Hey man, excelent video!! I started a master in data science and i wanted to practice with something related with football. I will use this for my FPL team

Ответить
@zoeksnarf7
@zoeksnarf7 - 01.09.2022 14:23

Great tutorial, cheers McKay. Instant new sub!

Ответить
@rmanalista9322
@rmanalista9322 - 05.10.2022 15:30

Guys I get the following error json_data = json_data.encode('uft8').decode('unicode_escape')
LookupError: unknown encoding: uft8. Do you know why I get this error? And how can I solve it

Ответить
@andreascalleja336
@andreascalleja336 - 28.10.2022 20:05

Don't know if this has already been posted, but the nested for loops can be replaced with the following code:

for shot_event in data_home:
x.append(shot_event['X'])
y.append(shot_event['Y'])
xg.append(shot_event['xG'])
team.append(shot_event['h_team'])

And the same for the away team.
Much cleaner imo this way - No nested loops and no multiple ifs.

Ответить
@MaartenRobaeys
@MaartenRobaeys - 24.07.2023 21:49

Github file still exits?

Ответить
@chefjuan6322
@chefjuan6322 - 02.08.2023 02:11

thanks man you saved few hours of my coding

Ответить
@Qwertythemouse
@Qwertythemouse - 13.10.2023 14:40

As far as the transformation from json to pd.DataFrame is concerned that one also works :

# Combine 'h' and 'a' dictionaries into a single list
combined_data = data['h'] + data['a']

# Create a DataFrame from the combined data
df = pd.DataFrame(combined_data)

# Display the DataFrame
df

So, it does really create a full data frame from json, having that home/away parameter as a column. Then anyone could try his own cleaning wrangling or usage of understat data himself.

Ответить
@AlimpanDey
@AlimpanDey - 22.11.2023 23:54

What is that x and y? if those are the x,y coordinates then why does it range from 0-1. Then it will be a square...
please someone help me out with this..

Ответить
@braziliandre30
@braziliandre30 - 01.12.2023 23:01

Thank you for taking the time to do this! Been wanting to learn it for a while but lacked the basic skills to start and run run by run. I'd be great if there was a way to just pick a team and start scraping their data from each game for a specific time period... Maybe there's already more work on this as well. Either way I appreciate it!

Ответить
@avazbektolibjonov4035
@avazbektolibjonov4035 - 14.01.2024 09:25

great video lesson

Ответить
@richardogujawa-oldaccount1336
@richardogujawa-oldaccount1336 - 11.02.2024 03:35

Thanks McKay, learned a lot from this!

Ответить
@mobhamjee786
@mobhamjee786 - 12.02.2024 18:26

How would you plot this for the shot map

Ответить
@sravanjs6749
@sravanjs6749 - 08.05.2024 11:09

please do a video of scrap data and save to csv file for pizza,radr and other charts.
🙏

Ответить
@brandonflexer10
@brandonflexer10 - 12.09.2024 19:21

Great video! Have you found a way to iterate over the competitions to retrieve all match urls for each competition/season? Or given the structure of Understat we have to manually collect all of them?

Ответить
@Jacek..
@Jacek.. - 14.09.2024 05:34

Where can I download updated scraped data from the understat website? On github someone shared a package with csv files but last updated 3 years ago. I'm not familiar with Python and can't update the data myself.

Ответить