I’m working on a Google Sheet that is importing some pages from Fangraphs, but there is one page that I have not been able to crack.
I’d like to import the Steamer ROS page and I can get the first 50 players, but that’s it. Someone taught me to put “page=1_1000” at the end of the link and that has worked on some pages but not this one.
Here’s my formula:
=importhtml(“2021 Projections » steamerr » Batters | FanGraphs Baseball”)
Does anyone have a theory that they think could solve this issue? Is there anyone who has already solved it for themselves and doesn’t mind letting me in on the secret? Any and all advice is appreciated.
I’m fairly certain you can’t pull those programmatically with importhtml because they use js to pull the next 50. I tried to solve this a year or so back and my solution was to use a web crawler like selenium to pull the csv exports to my local machine. This is not however my expertise so someone else maybe have a more google sheets type answer.
I’ve mentioned this in the past to others regarding this, but I still think the best way to get this data is from the player pages directly. If players that are not rostered need to be included, a series of calls that filter the data—by position, team, player, etc.—could be a fine proxy for the full data set.
For reference, simply include the player IDs:
I haven’t tried it yet. Will post an update once I do.
If this doesnt work out I can dig up the selenium script i had written to pull steamer projections and email it to you. If you are familiar with python and have an environment to do so, its not too complicated. A cursory google told me that you can upload local csvs to google sheets programatically using their api, so you could add that to the selenium script and have a fangraphs page → google sheets solution