The Entrepreneur Forum | Startups | Entrepreneurship | Starting a Business | Motivation | Success

Any data scientists here?

Remove ads while supporting the Unscripted philosophy...become an INSIDER.

Tipoki13

New Contributor
Jul 8, 2019
26
17
14
Hi everyone,

Are there any data scientists here?

I'm involved with an early stage start-up called Deepez. The aim of Deepez is to serve the needs of data scientists. We are very much at the problem discovery stage and at this point we are just trying to gain some insight into the day-to-day role of a data scientist and more specifically any problems or frustrations that they encounter regularly. The aim is to create a possible solution that resolves such problems.

My co-founder is the tech guy in the company, but he really struggles with discovering problems. He has a very solution based mindset (he spent months building a ML model that solves a problem that doesn't exist).

If there is anyone here with knowledge and expertise in the field of data science, I would really appreciate any insight that could be of value to us.

Thank you!

Conor
 

Don't like ads? Remove them while supporting the forum. Subscribe.

maverick

Aspice, officio fungeris sine spe honoris ampliori
FASTLANE INSIDER
Read Millionaire Fastlane
I've Read UNSCRIPTED
Speedway Pass
Oct 26, 2012
402
1,041
369
I manage a team of data scientists and engineers at a big corporate. I'm going to be blunt here so brace yourself.

Your target audience needs to be more specific. Who do you mean by 'data scientists'? Who is your ideal customer? On your website, you also seem to use data scientists and engineers interchangeably.
  • Data scientists working on start-ups?
  • Data scientists working at corporates?
  • BI / Developers looking to integrate machine learning in their existing products?
  • BI/ Developers looking to integrate machine learning in new products?
  • Data engineers?
Clearly articulate the problem you're trying to solve.

I'm sure you know that ingesting, parsing and pre-processing data is very time consuming, especially at corporates. Same for exposing outputs to end users. How does this work with your product?

I'll give you an example.

I create a demand forecast model for an ice cream brand. Objective is to predict what sales will be over the coming 30 days. We will use historical sales data and merge it with weather data from an external party. In short, the phases we need to go through are:

Ingesting => Parsing => Pre-processing => Modelling => Evaluating

Where does your product fit in? How would I use your product? What is the added benefit of using your product instead of alternatives (e.g. open source libraries)? How does this differ from jupyter notebooks?
 
OP
OP
T

Tipoki13

New Contributor
Jul 8, 2019
26
17
14
I manage a team of data scientists and engineers at a big corporate. I'm going to be blunt here so brace yourself.

Your target audience needs to be more specific. Who do you mean by 'data scientists'? Who is your ideal customer? On your website, you also seem to use data scientists and engineers interchangeably.
  • Data scientists working on start-ups?
  • Data scientists working at corporates?
  • BI / Developers looking to integrate machine learning in their existing products?
  • BI/ Developers looking to integrate machine learning in new products?
  • Data engineers?
Clearly articulate the problem you're trying to solve.

I'm sure you know that ingesting, parsing and pre-processing data is very time consuming, especially at corporates. Same for exposing outputs to end users. How does this work with your product?

I'll give you an example.

I create a demand forecast model for an ice cream brand. Objective is to predict what sales will be over the coming 30 days. We will use historical sales data and merge it with weather data from an external party. In short, the phases we need to go through are:

Ingesting => Parsing => Pre-processing => Modelling => Evaluating

Where does your product fit in? How would I use your product? What is the added benefit of using your product instead of alternatives (e.g. open source libraries)? How does this differ from jupyter notebooks?
Hi,

Thanks for getting back to me. We initially decided to target data scientists in start-ups by offering a cheaper product. However, we've decided to change the direction we're going in: before we built a product that made it extremely easy to implement in ML into any product. We discovered that this isn't really a problem for data scientists/analysts/engineers.

Like you said, one of the biggest pain points is cleaning data. We're now trying to focus on this and we're trying to understand the problem more. Is there any insight you could offer on this? Is there any particular activity that is especially painful?

I've been told by other data people (I'm just going to use this general term), that the problem of cleaning data is just too big and difficult to solve. Data comes in various formats etc. so it makes it difficult to produce a one size fits all solution. Would you go along with this? I've been told it's best to zero in on one particular industry or domain.
 

maverick

Aspice, officio fungeris sine spe honoris ampliori
FASTLANE INSIDER
Read Millionaire Fastlane
I've Read UNSCRIPTED
Speedway Pass
Oct 26, 2012
402
1,041
369
Hi,

Thanks for getting back to me. We initially decided to target data scientists in start-ups by offering a cheaper product. However, we've decided to change the direction we're going in: before we built a product that made it extremely easy to implement in ML into any product. We discovered that this isn't really a problem for data scientists/analysts/engineers.

Like you said, one of the biggest pain points is cleaning data. We're now trying to focus on this and we're trying to understand the problem more. Is there any insight you could offer on this? Is there any particular activity that is especially painful?

I've been told by other data people (I'm just going to use this general term), that the problem of cleaning data is just too big and difficult to solve. Data comes in various formats etc. so it makes it difficult to produce a one size fits all solution. Would you go along with this? I've been told it's best to zero in on one particular industry or domain.
Deploying stuff when you're in a startup => Easy
Deploying stuff when you work at a corporate => Hard

So targeting startups with a solution that makes deployments easier is not solving any needs. There are loads of tools available to do such a thing and many can be found directly in the AWS / Azure stack.

Again, you need to really zoom into a problem before you can start thinking about 'building a solution'. What problems are there around data?

  • Big corporates are spread across countries/continents. Everyone has their own data standards. Master data management is usually terrible.

  • Building time series models requires years of historical data. This is not always available.

  • No strategy around data that is put into data lakes. Usually raw SAP tables are dropped in there without adding a semantic layer to describe what the data is. To give you an example: your data scientist / business person wants to do something with 'sales order data'. Try explaining to them that the main tables he/she needs are called VBAK, VBAP and VBPA.
 
OP
OP
T

Tipoki13

New Contributor
Jul 8, 2019
26
17
14
Deploying stuff when you're in a startup => Easy
Deploying stuff when you work at a corporate => Hard

So targeting startups with a solution that makes deployments easier is not solving any needs. There are loads of tools available to do such a thing and many can be found directly in the AWS / Azure stack.

Again, you need to really zoom into a problem before you can start thinking about 'building a solution'. What problems are there around data?

  • Big corporates are spread across countries/continents. Everyone has their own data standards. Master data management is usually terrible.

  • Building time series models requires years of historical data. This is not always available.

  • No strategy around data that is put into data lakes. Usually raw SAP tables are dropped in there without adding a semantic layer to describe what the data is. To give you an example: your data scientist / business person wants to do something with 'sales order data'. Try explaining to them that the main tables he/she needs are called VBAK, VBAP and VBPA.
Ok so I'm trying to find out more about data cleaning specifically because it's pretty much consensus from a bunch of people who work in the field that I've talked to that this is the biggest problem.

I'm trying to find out and understand the process of data cleaning a little better:

What are the steps involved?

Are there any steps that are particularly painful/frustrating?

Are there tasks that could be automated or are there tasks that just can't be automated and are better done by a human?

Are tools being used to make the process less tedious? If so, what tools? Is there anything about these tools that you dislike or you feel could be improved upon?

Any help would be greatly appreciated.
Conor
 

GonnaBe2020

New Contributor
Read Millionaire Fastlane
Dec 4, 2019
15
15
17
Ok so I'm trying to find out more about data cleaning specifically because it's pretty much consensus from a bunch of people who work in the field that I've talked to that this is the biggest problem.
What did those guys tell you about data cleaning ? "data" is a very wide term and "cleaning" certainly has a very precise meaning for them, depending on many things, one cannot come up with a generic solution ...

Suppose you try to build a data cleaning system, the users of the system would need to describe you what is "garbage" and what is actual useful "data" in the input you are receiving, and define how "clean" data should look for their specific usage...

All this sounds doable, but very data specific in every case, can you get back to those people and ask more questions, get more details about their problem ?
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Sponsored Offers

  • Sticky
MARKETPLACE Fox's Web Design Guide: Earn $100K this year (Yes, 2020!) and Go Fastlane
I have zero coding skills whatsoever. Are coding skills needed? Will the be learned in the...
  • Sticky
MARKETPLACE How To Create A 100K-1M+ Sales Funnel
@LynX You know, this is a "Marketplace" Thread. Completely appropriate to offer goods and...
  • Sticky
MARKETPLACE You Are One Call Away From Living Your Dream Life - LightHouse’s Accountability Program ⚡
Dropping a quick note in here to say... if you are on TFL, you are part of an elite group of...
  • Sticky
MARKETPLACE KAK’s “Kill Bigger” Incubation Program- With DAILY personal attention.
I joined @Kak's business incubator in the first week of May. During our daily chats we uncovered...
  • Sticky
MARKETPLACE Lex DeVille's - Advanced Freelance Udemy Courses!
This is your May reminder that you can do this. I'll keep it short. These are the May promos...
  • Sticky
MARKETPLACE Grow Your Business With a Book (An Unorthodox Marketing Strategy That Built One of the Largest...
Thanks! This is a cool idea, actually. I would like to use your experience as I heard that these...
  • Sticky
FEATURED! Introducing... WEALTH EXPO$ED, A Short Story By MJ DeMarco
would this be available in paper version? I know it's short, but most of my family is...


Visit A Forum Sponsor
sponsor

New Topics

Fastlane Insiders

View the forum AD FREE.
Private, unindexed content
Detailed process/execution threads
Monthly conference calls with doers
Ideas needing execution, more!

Join Fastlane Insiders.

Top Bottom
AdBlock Detected - Please Disable

Yes, ads can be annoying. But please...

...to support the Unscripted/Fastlane mission (and to respect the immense amount of time needed to manage this forum) please DISABLE your ad-block. Thank you.

I've Disabled AdBlock