The Entrepreneur Forum | Financial Freedom | Starting a Business | Motivation | Money | Success

Welcome to the only entrepreneur forum dedicated to building life-changing wealth.

Build a Fastlane business. Earn real financial freedom. Join free.

Join over 80,000 entrepreneurs who have rejected the paradigm of mediocrity and said "NO!" to underpaid jobs, ascetic frugality, and suffocating savings rituals— learn how to build a Fastlane business that pays both freedom and lifestyle affluence.

Free registration at the forum removes this block.

Any data scientists here?

maverick

Aspice, officio fungeris sine spe honoris ampliori
Read Fastlane!
Read Unscripted!
Speedway Pass
User Power
Value/Post Ratio
228%
Oct 26, 2012
605
1,380
I manage a team of data scientists and engineers at a big corporate. I'm going to be blunt here so brace yourself.

Your target audience needs to be more specific. Who do you mean by 'data scientists'? Who is your ideal customer? On your website, you also seem to use data scientists and engineers interchangeably.
  • Data scientists working on start-ups?
  • Data scientists working at corporates?
  • BI / Developers looking to integrate machine learning in their existing products?
  • BI/ Developers looking to integrate machine learning in new products?
  • Data engineers?
Clearly articulate the problem you're trying to solve.

I'm sure you know that ingesting, parsing and pre-processing data is very time consuming, especially at corporates. Same for exposing outputs to end users. How does this work with your product?

I'll give you an example.

I create a demand forecast model for an ice cream brand. Objective is to predict what sales will be over the coming 30 days. We will use historical sales data and merge it with weather data from an external party. In short, the phases we need to go through are:

Ingesting => Parsing => Pre-processing => Modelling => Evaluating

Where does your product fit in? How would I use your product? What is the added benefit of using your product instead of alternatives (e.g. open source libraries)? How does this differ from jupyter notebooks?
 

Tipoki13

Contributor
User Power
Value/Post Ratio
112%
Jul 8, 2019
33
37
Hi everyone,

Are there any data scientists here?

I'm involved with an early stage start-up called Deepez. The aim of Deepez is to serve the needs of data scientists. We are very much at the problem discovery stage and at this point we are just trying to gain some insight into the day-to-day role of a data scientist and more specifically any problems or frustrations that they encounter regularly. The aim is to create a possible solution that resolves such problems.

My co-founder is the tech guy in the company, but he really struggles with discovering problems. He has a very solution based mindset (he spent months building a ML model that solves a problem that doesn't exist).

If there is anyone here with knowledge and expertise in the field of data science, I would really appreciate any insight that could be of value to us.

Thank you!

Conor
 
Dislike ads? Remove them and support the forum: Subscribe to Fastlane Insiders.

Tipoki13

Contributor
User Power
Value/Post Ratio
112%
Jul 8, 2019
33
37
I manage a team of data scientists and engineers at a big corporate. I'm going to be blunt here so brace yourself.

Your target audience needs to be more specific. Who do you mean by 'data scientists'? Who is your ideal customer? On your website, you also seem to use data scientists and engineers interchangeably.
  • Data scientists working on start-ups?
  • Data scientists working at corporates?
  • BI / Developers looking to integrate machine learning in their existing products?
  • BI/ Developers looking to integrate machine learning in new products?
  • Data engineers?
Clearly articulate the problem you're trying to solve.

I'm sure you know that ingesting, parsing and pre-processing data is very time consuming, especially at corporates. Same for exposing outputs to end users. How does this work with your product?

I'll give you an example.

I create a demand forecast model for an ice cream brand. Objective is to predict what sales will be over the coming 30 days. We will use historical sales data and merge it with weather data from an external party. In short, the phases we need to go through are:

Ingesting => Parsing => Pre-processing => Modelling => Evaluating

Where does your product fit in? How would I use your product? What is the added benefit of using your product instead of alternatives (e.g. open source libraries)? How does this differ from jupyter notebooks?
Hi,

Thanks for getting back to me. We initially decided to target data scientists in start-ups by offering a cheaper product. However, we've decided to change the direction we're going in: before we built a product that made it extremely easy to implement in ML into any product. We discovered that this isn't really a problem for data scientists/analysts/engineers.

Like you said, one of the biggest pain points is cleaning data. We're now trying to focus on this and we're trying to understand the problem more. Is there any insight you could offer on this? Is there any particular activity that is especially painful?

I've been told by other data people (I'm just going to use this general term), that the problem of cleaning data is just too big and difficult to solve. Data comes in various formats etc. so it makes it difficult to produce a one size fits all solution. Would you go along with this? I've been told it's best to zero in on one particular industry or domain.
 
Dislike ads? Remove them and support the forum: Subscribe to Fastlane Insiders.

maverick

Aspice, officio fungeris sine spe honoris ampliori
Read Fastlane!
Read Unscripted!
Speedway Pass
User Power
Value/Post Ratio
228%
Oct 26, 2012
605
1,380
Hi,

Thanks for getting back to me. We initially decided to target data scientists in start-ups by offering a cheaper product. However, we've decided to change the direction we're going in: before we built a product that made it extremely easy to implement in ML into any product. We discovered that this isn't really a problem for data scientists/analysts/engineers.

Like you said, one of the biggest pain points is cleaning data. We're now trying to focus on this and we're trying to understand the problem more. Is there any insight you could offer on this? Is there any particular activity that is especially painful?

I've been told by other data people (I'm just going to use this general term), that the problem of cleaning data is just too big and difficult to solve. Data comes in various formats etc. so it makes it difficult to produce a one size fits all solution. Would you go along with this? I've been told it's best to zero in on one particular industry or domain.
Deploying stuff when you're in a startup => Easy
Deploying stuff when you work at a corporate => Hard

So targeting startups with a solution that makes deployments easier is not solving any needs. There are loads of tools available to do such a thing and many can be found directly in the AWS / Azure stack.

Again, you need to really zoom into a problem before you can start thinking about 'building a solution'. What problems are there around data?

  • Big corporates are spread across countries/continents. Everyone has their own data standards. Master data management is usually terrible.

  • Building time series models requires years of historical data. This is not always available.

  • No strategy around data that is put into data lakes. Usually raw SAP tables are dropped in there without adding a semantic layer to describe what the data is. To give you an example: your data scientist / business person wants to do something with 'sales order data'. Try explaining to them that the main tables he/she needs are called VBAK, VBAP and VBPA.
 

Tipoki13

Contributor
User Power
Value/Post Ratio
112%
Jul 8, 2019
33
37
Deploying stuff when you're in a startup => Easy
Deploying stuff when you work at a corporate => Hard

So targeting startups with a solution that makes deployments easier is not solving any needs. There are loads of tools available to do such a thing and many can be found directly in the AWS / Azure stack.

Again, you need to really zoom into a problem before you can start thinking about 'building a solution'. What problems are there around data?

  • Big corporates are spread across countries/continents. Everyone has their own data standards. Master data management is usually terrible.

  • Building time series models requires years of historical data. This is not always available.

  • No strategy around data that is put into data lakes. Usually raw SAP tables are dropped in there without adding a semantic layer to describe what the data is. To give you an example: your data scientist / business person wants to do something with 'sales order data'. Try explaining to them that the main tables he/she needs are called VBAK, VBAP and VBPA.
Ok so I'm trying to find out more about data cleaning specifically because it's pretty much consensus from a bunch of people who work in the field that I've talked to that this is the biggest problem.

I'm trying to find out and understand the process of data cleaning a little better:

What are the steps involved?

Are there any steps that are particularly painful/frustrating?

Are there tasks that could be automated or are there tasks that just can't be automated and are better done by a human?

Are tools being used to make the process less tedious? If so, what tools? Is there anything about these tools that you dislike or you feel could be improved upon?

Any help would be greatly appreciated.
Conor
 

GonnaBe2020

New Contributor
Read Fastlane!
User Power
Value/Post Ratio
113%
Dec 4, 2019
15
17
Ok so I'm trying to find out more about data cleaning specifically because it's pretty much consensus from a bunch of people who work in the field that I've talked to that this is the biggest problem.
What did those guys tell you about data cleaning ? "data" is a very wide term and "cleaning" certainly has a very precise meaning for them, depending on many things, one cannot come up with a generic solution ...

Suppose you try to build a data cleaning system, the users of the system would need to describe you what is "garbage" and what is actual useful "data" in the input you are receiving, and define how "clean" data should look for their specific usage...

All this sounds doable, but very data specific in every case, can you get back to those people and ask more questions, get more details about their problem ?
 
Dislike ads? Remove them and support the forum: Subscribe to Fastlane Insiders.

Post New Topic

Please SEARCH before posting.
Please select the BEST category.

Post new topic

Guest post submissions offered HERE.

Latest Posts

New Topics

Fastlane Insiders

View the forum AD FREE.
Private, unindexed content
Detailed process/execution threads
Ideas needing execution, more!

Join Fastlane Insiders.

Top