The Entrepreneur Forum | Startups | Entrepreneurship | Starting a Business | Motivation | Success

Any data scientists here?

Tipoki13

New Contributor
Jul 8, 2019
10
9
12
Hi everyone,

Are there any data scientists here?

I'm involved with an early stage start-up called Deepez. The aim of Deepez is to serve the needs of data scientists. We are very much at the problem discovery stage and at this point we are just trying to gain some insight into the day-to-day role of a data scientist and more specifically any problems or frustrations that they encounter regularly. The aim is to create a possible solution that resolves such problems.

My co-founder is the tech guy in the company, but he really struggles with discovering problems. He has a very solution based mindset (he spent months building a ML model that solves a problem that doesn't exist).

If there is anyone here with knowledge and expertise in the field of data science, I would really appreciate any insight that could be of value to us.

Thank you!

Conor
 

Don't like ads? Remove them while supporting the forum. Subscribe.

maverick

Unorthodox Nonconformist
FASTLANE INSIDER
Read Millionaire Fastlane
I've Read UNSCRIPTED
Speedway Pass
Oct 26, 2012
365
972
344
I manage a team of data scientists and engineers at a big corporate. I'm going to be blunt here so brace yourself.

Your target audience needs to be more specific. Who do you mean by 'data scientists'? Who is your ideal customer? On your website, you also seem to use data scientists and engineers interchangeably.
  • Data scientists working on start-ups?
  • Data scientists working at corporates?
  • BI / Developers looking to integrate machine learning in their existing products?
  • BI/ Developers looking to integrate machine learning in new products?
  • Data engineers?
Clearly articulate the problem you're trying to solve.

I'm sure you know that ingesting, parsing and pre-processing data is very time consuming, especially at corporates. Same for exposing outputs to end users. How does this work with your product?

I'll give you an example.

I create a demand forecast model for an ice cream brand. Objective is to predict what sales will be over the coming 30 days. We will use historical sales data and merge it with weather data from an external party. In short, the phases we need to go through are:

Ingesting => Parsing => Pre-processing => Modelling => Evaluating

Where does your product fit in? How would I use your product? What is the added benefit of using your product instead of alternatives (e.g. open source libraries)? How does this differ from jupyter notebooks?
 
OP
OP
T

Tipoki13

New Contributor
Jul 8, 2019
10
9
12
I manage a team of data scientists and engineers at a big corporate. I'm going to be blunt here so brace yourself.

Your target audience needs to be more specific. Who do you mean by 'data scientists'? Who is your ideal customer? On your website, you also seem to use data scientists and engineers interchangeably.
  • Data scientists working on start-ups?
  • Data scientists working at corporates?
  • BI / Developers looking to integrate machine learning in their existing products?
  • BI/ Developers looking to integrate machine learning in new products?
  • Data engineers?
Clearly articulate the problem you're trying to solve.

I'm sure you know that ingesting, parsing and pre-processing data is very time consuming, especially at corporates. Same for exposing outputs to end users. How does this work with your product?

I'll give you an example.

I create a demand forecast model for an ice cream brand. Objective is to predict what sales will be over the coming 30 days. We will use historical sales data and merge it with weather data from an external party. In short, the phases we need to go through are:

Ingesting => Parsing => Pre-processing => Modelling => Evaluating

Where does your product fit in? How would I use your product? What is the added benefit of using your product instead of alternatives (e.g. open source libraries)? How does this differ from jupyter notebooks?
Hi,

Thanks for getting back to me. We initially decided to target data scientists in start-ups by offering a cheaper product. However, we've decided to change the direction we're going in: before we built a product that made it extremely easy to implement in ML into any product. We discovered that this isn't really a problem for data scientists/analysts/engineers.

Like you said, one of the biggest pain points is cleaning data. We're now trying to focus on this and we're trying to understand the problem more. Is there any insight you could offer on this? Is there any particular activity that is especially painful?

I've been told by other data people (I'm just going to use this general term), that the problem of cleaning data is just too big and difficult to solve. Data comes in various formats etc. so it makes it difficult to produce a one size fits all solution. Would you go along with this? I've been told it's best to zero in on one particular industry or domain.
 

maverick

Unorthodox Nonconformist
FASTLANE INSIDER
Read Millionaire Fastlane
I've Read UNSCRIPTED
Speedway Pass
Oct 26, 2012
365
972
344
Hi,

Thanks for getting back to me. We initially decided to target data scientists in start-ups by offering a cheaper product. However, we've decided to change the direction we're going in: before we built a product that made it extremely easy to implement in ML into any product. We discovered that this isn't really a problem for data scientists/analysts/engineers.

Like you said, one of the biggest pain points is cleaning data. We're now trying to focus on this and we're trying to understand the problem more. Is there any insight you could offer on this? Is there any particular activity that is especially painful?

I've been told by other data people (I'm just going to use this general term), that the problem of cleaning data is just too big and difficult to solve. Data comes in various formats etc. so it makes it difficult to produce a one size fits all solution. Would you go along with this? I've been told it's best to zero in on one particular industry or domain.
Deploying stuff when you're in a startup => Easy
Deploying stuff when you work at a corporate => Hard

So targeting startups with a solution that makes deployments easier is not solving any needs. There are loads of tools available to do such a thing and many can be found directly in the AWS / Azure stack.

Again, you need to really zoom into a problem before you can start thinking about 'building a solution'. What problems are there around data?

  • Big corporates are spread across countries/continents. Everyone has their own data standards. Master data management is usually terrible.

  • Building time series models requires years of historical data. This is not always available.

  • No strategy around data that is put into data lakes. Usually raw SAP tables are dropped in there without adding a semantic layer to describe what the data is. To give you an example: your data scientist / business person wants to do something with 'sales order data'. Try explaining to them that the main tables he/she needs are called VBAK, VBAP and VBPA.
 
OP
OP
T

Tipoki13

New Contributor
Jul 8, 2019
10
9
12
Deploying stuff when you're in a startup => Easy
Deploying stuff when you work at a corporate => Hard

So targeting startups with a solution that makes deployments easier is not solving any needs. There are loads of tools available to do such a thing and many can be found directly in the AWS / Azure stack.

Again, you need to really zoom into a problem before you can start thinking about 'building a solution'. What problems are there around data?

  • Big corporates are spread across countries/continents. Everyone has their own data standards. Master data management is usually terrible.

  • Building time series models requires years of historical data. This is not always available.

  • No strategy around data that is put into data lakes. Usually raw SAP tables are dropped in there without adding a semantic layer to describe what the data is. To give you an example: your data scientist / business person wants to do something with 'sales order data'. Try explaining to them that the main tables he/she needs are called VBAK, VBAP and VBPA.
Ok so I'm trying to find out more about data cleaning specifically because it's pretty much consensus from a bunch of people who work in the field that I've talked to that this is the biggest problem.

I'm trying to find out and understand the process of data cleaning a little better:

What are the steps involved?

Are there any steps that are particularly painful/frustrating?

Are there tasks that could be automated or are there tasks that just can't be automated and are better done by a human?

Are tools being used to make the process less tedious? If so, what tools? Is there anything about these tools that you dislike or you feel could be improved upon?

Any help would be greatly appreciated.
Conor
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.


Fastlane Insiders

View the forum AD FREE.
Private, unindexed content
Detailed process/execution threads
Monthly conference calls with doers
Ideas needing execution, more!

Join Fastlane Insiders.

Sponsored Offers

Lex DeVille's - Advanced Freelance Udemy Courses!
-- HALLOWEEN SPECIAL STARTS TODAY! Get any of my courses at Udemy's current best price through Friday! Use code: HALLOWEEN Use any of the links...
Top Bottom