In our previous post, we covered survivorship bias, a tendency to focus on things that have survived and overlook those that didn’t. It misleads data scientists and distorts their reasoning. Far from being an isolated problem, however, survivorship bias is part of a broader selection bias. The latter encompasses various instances when data sets are not representative of the population intended to be analyzed.
Customers often ask us how much data they need to run successful artificial intelligence (AI) and machine learning (ML) projects. This question is hard to answer in simple terms. A functioning ML model requires clean and large data sets, but their optimal size is affected by a range of factors including the complexity of the model, training method, and tolerance for errors. Fortunately, there are several ways of calculating your data needs and overcoming the lack of data.
Across industries, businesses of all sizes are embracing digital transformation (DX or DT). Using advanced technologies to improve operations and delight customers has become a source of competitive advantage. And companies spare no expense in these efforts. Worldwide spending on DX is forecasted to reach $2.3 trillion in 2023, according to the International Data Corporation (IDC), a global market intelligence provider. The COVID-19 pandemic is set to only further accelerate this trend.
The logical error of only considering the information that is seen.
Data scientists and software engineers are a critical part of any company looking to innovate and find solutions to difficult problems. Both use their respective tools to perform their job duties and benefit from communicating and working together. Let’s take a moment to learn more about the role(s) of a data scientist and software engineer(with a caveat that it may vary from company to company).
“Correlation does not mean Causation”, “Is that correlation or causation?”.
These comments get casually thrown around during discussions around data analysis, but correlation and causation are two terms that can often elude decision-makers’ proper understanding of statistics and in turn data science. If not correctly understood, this can lead to incorrect conclusions and actions.
Twitter can be a useful tool in understanding how people are feeling about the coronavirus (COVID-19) over time. We performed sentiment analysis on tweets related to COVID19 from January 4th, 2020 to April 12th, 2020 and observe any trends or frequencies for the most positive and negative tweets during this period.
Kurvv’s AutoForecast product provides customers with a variety of quantitative sales forecasting methods, so they can simply connect their data source and receive a customized, accurate forecast in seconds. Once data is uploaded, AutoForecast tests out several different time-series forecasting methods including decomposition, exponential smoothing, ARIMA and regression (see below for more details about each). Finally, an average of all forecasts is computed and output with the final results. Customers can select the forecast output(s) they feel most comfortable with and customize forecast time horizon and validation time windows.
Steve Jobs once said, “customers don’t know what they want until you show it to them.” Now, personalized recommendations exist anywhere from e-commerce (Amazon) to entertainment ( Netflix, YouTube, and Spotify). Their recommendation systems are key in driving revenue and making them successful companies. Even Amazon increased its revenue by 35% from their recommendation system and email campaigns.