The Entrepreneur Forum | Financial Freedom | Starting a Business | Motivation | Money | Success
  • SPONSORED: GiganticWebsites.com: We Build Sites with THOUSANDS of Unique and Genuinely Useful Articles

    30% to 50% Fastlane-exclusive discounts on WordPress-powered websites with everything included: WordPress setup, design, keyword research, article creation and article publishing. Click HERE to claim.

Welcome to the only entrepreneur forum dedicated to building life-changing wealth.

Build a Fastlane business. Earn real financial freedom. Join free.

Join over 90,000 entrepreneurs who have rejected the paradigm of mediocrity and said "NO!" to underpaid jobs, ascetic frugality, and suffocating savings rituals— learn how to build a Fastlane business that pays both freedom and lifestyle affluence.

Free registration at the forum removes this block.

Coding required for data extraction?

Daytraderz

Contributor
Speedway Pass
User Power
Value/Post Ratio
100%
Feb 11, 2016
67
67
27
Virginia
Hey guys, I had an idea today. Because data is so expensive currently, it is probably a large barrier for entry in many different industries. Yet seeing as how braggadocios data companies are online with their fancy interactive charts, I am wondering if there is a way to extract that data? Because it isn't a normal chart, and is interactive, I feel like it has some sort of unique code making it possible. I am unsure as I am no programmer, but if any of you have an idea how this could be possible and what one would have to learn to do this, I would appreciate it!
 
Dislike ads? Remove them and support the forum: Subscribe to Fastlane Insiders.

Ninjakid

Platinum Contributor
Speedway Pass
User Power
Value/Post Ratio
217%
Jun 23, 2014
1,936
4,206
Buddy Guy Eh
I literally never look at charts from data companies, but I'm going to say no, because I would imagine their interactive charts are in the form of an animation which lives on their server's backend, and is only available for viewing.
 

johnwmintz

Contributor
Read Fastlane!
User Power
Value/Post Ratio
147%
Jan 22, 2017
17
25
44
Anniston, AL
I would also say no. What you are looking at is web scraping. If the data isnt listed on the page, it may not be possible. Fire up a linux VM and see if you can do it with some bash commands or maybe Python?
 

lowtek

Legendary Contributor
FASTLANE INSIDER
EPIC CONTRIBUTOR
Read Fastlane!
Summit Attendee
Speedway Pass
User Power
Value/Post Ratio
332%
Oct 3, 2015
2,161
7,178
42
Phoenix, AZ
Of course you can, in a roundabout way. It's actually pretty straight forward.

You couldn't directly download their data, but you could reverse engineer it. I believe students had to do this before the era of computers... you know...drawing things on paper by hand.

You just have to map from the known pixel space of your screen, to the parameter space of whatever they're plotting.

Simple example:

company X is plotting their monthly revenue. For simplicity, just assume they grow linearly. Time is the horizontal axis, in months. Thousands of dollars is the vertical axis. They give one data point per month

Here's the heuristic:

1) Take a screen shot of the data

2) Make a note of the axes - how much is each tick mark? Let's say one horizontal tick is a month, and every vertical tick is $5,000 USD

3) Calculate the distance between vertical ticks, in pixels. This distance is equivalent to $5,000 USD - suppose it's 1000 pixels between ticks, so each pixel is $5

4) calculate the distance between horizontal ticks, in pixels. This distance is equivalent to 1 month. Say it's also 1000 ticks, equivalent to 1 month.

5) Pixel coordinates of the origin is (0, 0) on the plot

6) Find the pixel coordinates of the center of the first dot. Let's say the center of the data point is at (1000, 1200). Then you know that at month 1 (1000 pixels) they made $5 / 1000 pixels * 1200 pixels = $6,000 USD

7) Rinse and repeat for every other data point, and then you have their whole data set.

The logic is going to work the same for every other type of 2D relationship. Parabolas, circles, hyperbolas, etc. Pie charts and histograms would be a cakewalk. If you get into 3D plots, then you have matrix operations to deal with. This isn't quite as trivial, but it's not a show stopper. If they are doing things like heat maps, it would get tricky because you would have to look at the gradient in the color scale and transform it into pixel space.

EDIT: complications arise if the plots are continuous but the data is actually discrete. In that case they've done some interpolation, and you would need more information to separate the original data points from the continuous plot. In other words, if they don't show any data points but just lines, you won't know what the original data was exactly.

I'd have to think about that case a little more, but I think you could probably make some simplifying assumptions to get a "good enough" answer.

You probably couldn't sell the data, but if you were spying on a competitor.... you could get quite a bit of information they may not want you to have.
 
Dislike ads? Remove them and support the forum: Subscribe to Fastlane Insiders.

Ninjakid

Platinum Contributor
Speedway Pass
User Power
Value/Post Ratio
217%
Jun 23, 2014
1,936
4,206
Buddy Guy Eh
Of course you can, in a roundabout way. It's actually pretty straight forward.

You couldn't directly download their data, but you could reverse engineer it. I believe students had to do this before the era of computers... you know...drawing things on paper by hand.

You just have to map from the known pixel space of your screen, to the parameter space of whatever they're plotting.

Simple example:

company X is plotting their monthly revenue. For simplicity, just assume they grow linearly. Time is the horizontal axis, in months. Thousands of dollars is the vertical axis. They give one data point per month

Here's the heuristic:

1) Take a screen shot of the data

2) Make a note of the axes - how much is each tick mark? Let's say one horizontal tick is a month, and every vertical tick is $5,000 USD

3) Calculate the distance between vertical ticks, in pixels. This distance is equivalent to $5,000 USD - suppose it's 1000 pixels between ticks, so each pixel is $5

4) calculate the distance between horizontal ticks, in pixels. This distance is equivalent to 1 month. Say it's also 1000 ticks, equivalent to 1 month.

5) Pixel coordinates of the origin is (0, 0) on the plot

6) Find the pixel coordinates of the center of the first dot. Let's say the center of the data point is at (1000, 1200). Then you know that at month 1 (1000 pixels) they made $5 / 1000 pixels * 1200 pixels = $6,000 USD

7) Rinse and repeat for every other data point, and then you have their whole data set.

The logic is going to work the same for every other type of 2D relationship. Parabolas, circles, hyperbolas, etc. Pie charts and histograms would be a cakewalk. If you get into 3D plots, then you have matrix operations to deal with. This isn't quite as trivial, but it's not a show stopper. If they are doing things like heat maps, it would get tricky because you would have to look at the gradient in the color scale and transform it into pixel space.

EDIT: complications arise if the plots are continuous but the data is actually discrete. In that case they've done some interpolation, and you would need more information to separate the original data points from the continuous plot. In other words, if they don't show any data points but just lines, you won't know what the original data was exactly.

I'd have to think about that case a little more, but I think you could probably make some simplifying assumptions to get a "good enough" answer.

You probably couldn't sell the data, but if you were spying on a competitor.... you could get quite a bit of information they may not want you to have.
So basically the AI detects the pixel changes, and matches with the appropriate type of graph to generate its own?

it's definitely interesting and possible, I wonder if it would be worth the effort though.
 

G-Man

Cantankerous Contributor
FASTLANE INSIDER
EPIC CONTRIBUTOR
Read Fastlane!
Read Unscripted!
Summit Attendee
Speedway Pass
User Power
Value/Post Ratio
543%
Jan 13, 2014
2,001
10,863
The data is so cheap from people like IRI that I don't know why you'd bother with doing this, unless it's @lowtek 's spying scenario :clench:

This feels like the "business" version of some college kid spending 3 hours looking for a torrent site instead of paying 9.99 for the DVD. It only makes sense if you think of your time as being free.
 
Dislike ads? Remove them and support the forum: Subscribe to Fastlane Insiders.

Daytraderz

Contributor
Speedway Pass
User Power
Value/Post Ratio
100%
Feb 11, 2016
67
67
27
Virginia
I literally never look at charts from data companies, but I'm going to say no, because I would imagine their interactive charts are in the form of an animation which lives on their server's backend, and is only available for viewing.
So if you find good data from a data company that's free you don't use/look at it?
 

Daytraderz

Contributor
Speedway Pass
User Power
Value/Post Ratio
100%
Feb 11, 2016
67
67
27
Virginia
The data is so cheap from people like IRI that I don't know why you'd bother with doing this, unless it's @lowtek 's spying scenario :clench:

This feels like the "business" version of some college kid spending 3 hours looking for a torrent site instead of paying 9.99 for the DVD. It only makes sense if you think of your time as being free.

Tell me where to get local RE market data and state/national wide RE data for $9.99/month. You would save me a lot of effort.
 

G-Man

Cantankerous Contributor
FASTLANE INSIDER
EPIC CONTRIBUTOR
Read Fastlane!
Read Unscripted!
Summit Attendee
Speedway Pass
User Power
Value/Post Ratio
543%
Jan 13, 2014
2,001
10,863

woodyb23

New Contributor
User Power
Value/Post Ratio
33%
Oct 7, 2015
6
2
42
Columbus, Ohio
I am a developer that prepares the data for these type of reports, and help people to create the charts, and it is not possible to extract the data unless it has been given to you. The type of chart you are probably looking at are built with Tableau more than likely and are not susceptible to any type of scraping.
 

Post New Topic

Please SEARCH before posting.
Please select the BEST category.

Post new topic

Guest post submissions offered HERE.

Latest Posts

New Topics

Fastlane Insiders

View the forum AD FREE.
Private, unindexed content
Detailed process/execution threads
Ideas needing execution, more!

Join Fastlane Insiders.

Top