Download Our Reports

Before downloading a report, please tell
us a few things about yourself.

We will never share your information with anyone.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Data Science Survey

Rexer Analytics designed and began this research in 2007.  The ten surveys from 2007 through 2023 examine the analytic behaviors, views and preferences of data analytic professionals.  We use a small set of consistent questions to enable the tracking of trends across the years.  We also incorporate new questions into each survey to explore emerging topics.  We appreciate the many people who have submitted survey questions – many have been included in the surveys.  Please send us your ideas and feedback about this research by using the form on our CONTACT page or emailing us at DataScienceSurvey@RexerAnalytics.com.  The survey summary reports can be downloaded using the links below.

2023 Survey

328 analytic professionals from 49 countries participated in the 2023 survey
Highlights:
  • NEW PARTNER:  Eric Siegel and his Machine Learning Week organization are partnering with us to design, promote, and analyze this year’s survey. Eric Siegel brings a wealth of knowledge and experience to this survey partnership. He wrote the bestselling book Predictive Analytics, is Executive Editor of The Machine Learning Times, founded the very popular conference series “Predictive Analytics World” (now rebranded as Machine Learning Week), has delivered 100+ keynote presentations, and is currently a Visiting Professor in Analytics at the UVA Darden School of Business.
  • CONFIDENTIALITY:  All survey responses are completely confidential; information provided on the survey will not be shared outside of Rexer Analytics in any way that identifies any individuals. All reports of survey findings will be done in the aggregate, and any quotes taken from open-end responses will be carefully scrubbed so no individual identifying information is revealed.
  • SURVEY RESULTS:  Reports summarizing the results of the 2007-2020 surveys are available for FREE download using the links below. Karl Rexer and Eric Siegel will present preliminary highlights of the 2023 survey results at the June 2023 Machine Learning Week conference. Join us in Vegas for the conference! The full summary report will be available for free download from the Rexer Analytics website near the end of 2023. If you want the summary report emailed to you, there is a place at the end of the survey to leave your email address.
  • INQUIRIES:  Please contact us (DataScienceSurvey@RexerAnalytics.com) if you have any questions about this research. We also want to hear your ideas for survey questions we can incorporate into future Data Science Surveys.

2020 Survey

579 analytic professionals from 71 countries participated in the 2020 survey
2020 data science survey
Highlights:
  • SUBSTANTIAL GROWTH IN PYTHON USE:  In the last 3 years an additional 15-20% of data scientists have begun using python. Python is now the most commonly used tool in Corporate and Consulting environments, and in Academica and NGO/Government environments python use is now surpassed only by R.
  • DATA SCIENTISTS ARE CAUTIOUS ABOUT AUTO-ML TOOLS:  40% feel that Auto-ML tools are very or somewhat problematic. Many respondents provided extreme examples of misuse and bad decisions resulting from Auto-ML use.
  • IMPACT OF ENHANCED DATA SECURITY & PRIVACY:  Almost half of respondents report that enhanced data security and data privacy (e.g., GDPR and HIPAA) have had a real impact on the analytics at their organization.
  • EARLY COVID IMPACT:  In May-July 2020 very few analytic professionals reported losing a job due to the pandemic. Consultants were more likely to report a reduction in projects and income. Over 20% of analytic people working in corporate settings saw an increase in their workload – twice the rate of consultants.
  • JOB SATISFACTION:  Even in the middle of a global pandemic, data science professionals reported very high job satisfaction.

2017 Survey

1,123 analytic professionals from 91 countries participated in the 2017 survey
2017 data science survey
Highlights:
  • ANALYTIC TRAINING:  The majority of respondents think formal analytics training is needed to properly model data. Many described problems they've witnessed among untrained staff.
  • WIDE USE OF R AND PYTHON:  Most data scientists use multiple tools, with R and Python being among the most commonly used. The mix of preferred tools varies among people working in different settings (corporate, consultants, academics and NGO / Government).
  • DO-IT-YOURSELF ANALYTIC TOOLS:  One third of respondents have seen difficulties when people outside of their company's data science team have used do-it-yourself analytic tools.
  • DEEP LEARNING:  There is a growing use of Deep Learning. However, it is still only used by a small proportion of analytic professionals. Respondents report particular success in using Deep Learning on image analysis tasks.
  • MOST IMPORTANT SKILLS:  Respondents report that the most important skills / knowledge for a data science professional are not programming or algorithm skills. The most important things are: 1) Data preparation and management skills, 2) Domain knowledge, and 3) General business experience or knowledge.
The full summary report includes additional material about algorithms and software usage, fields where data scientist are working, satisfaction levels, and more.

2015 Survey

1,220 analytic professionals from 72 countries participated in the 2015 survey
Highlights:
  • CORE ALGORITHM TRIAD:  Regression, Decision Trees, and Cluster analysis remain the most commonly used algorithms in the field.
  • THE ASCENDANCE OF R:  76% of respondents report using R. This is up dramatically from just 23% in 2007. More than a third of respondents (36%) identify R as their primary tool.
  • JOB SATISFACTION:  Job satisfaction in the field remains high, but has slipped since the 2013 survey. A number of factors predict Data Scientist job satisfaction levels.
  • DEPLOYMENT:  Deployment continues to be a challenge for organizations, with less than two thirds of respondents indicating that their models are deployed most or all of the time. Getting organizational buy-in is the largest barrier to deployment, with real-time scoring and other technology issues also causing significant deployment problems.
  • MOST IMPORTANT SKILLS:  Respondents report that the most important skills / knowledge for a data science professional are not programming or algorithm skills. The most important things are: 1) Data preparation and management skills, 2) Domain knowledge, and 3) General business experience or knowledge.
The full summary report includes additional material about algorithms and software usage, fields where data scientist are working, satisfaction levels, and more.

2013 Survey

1,259 analytic professionals from 75 countries participated in the 2013 survey.
Highlights:
  • FOCUS ON CRM:  In the past few years, there has been an increase among data miners in the already substantial area of customer-focused analytics. Respondents are looking for a better understanding of customers and seeking to improve the customer experience. This can be seen in their goals, analyses, big data endeavors, and in the focus of their text mining.
  • BIG DATA:  Many in the field are talking about the phenomena of Big Data. There are clearly some areas in which the volume and sources of data have grown. However, it is unclear how much Big Data has impacted the typical data miner. While data miners believe that the size of their datasets have increased over the past year, data from previous surveys indicate consistent dataset size over time.
  • THE ASCENDANCE OF R:  The proportion of data miners using R is rapidly growing, and since 2010, R has been the most-used data mining tool. While R is frequently used along with other tools, an increasing number of data miners also select R as their primary tool.
  • CHALLENGES IN THE USE OF ANALYTICS:  Data miners continue to report challenges at each level of the analytic process. Companies often are not using analytics to their fullest and have continuing issues in the areas of deployment and performance measurement.
  • ENGAGEMENT & JOB SATISFACTION:  The Data Miners in our survey are highly engaged with the analytic community: consuming and producing content, entering competitions and searching for education and growth within their jobs. All of these activities lead to high job satisfaction, which has been increasing over time.
  • ANALYTIC SOFTWARE:  Data miners are a diverse group who are looking for different things from their data mining tools. Ease-of-use and cost are two distinguishing dimensions. Software packages vary in their strengths and features. STATISTICA, KNIME, SAS JMP and IBM SPSS Modeler all receive high satisfaction ratings.
The full summary report includes additional material about algorithms and software usage, computing environments, text mining, and more.

2011 Survey + "Best Practices" Verbatim Responses

1,319 analytic professionals from over 60 countries participated in the 2011 survey.
Highlights:
  • FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past five years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.
  • TEXT MINING: A third of data miners currently report using text mining and another third plan to in the future. Text mining is most often used to analyze customer surveys and blogs/social media.
  • TOOLS:  R continued its rise this year and is now being used by close to half of all data miners (47%). R users report preferring it for being free, open source, and having a wide variety of algorithms. Many people also cited R's flexibility and the strength of the user community. In the 2011 survey we asked R users to tell us more about their use of R. Read the R user comments about why these use R (pros), the cons of using R, why they select their R interface, and how they use R in conjuction with other tools. STATISTICA is selected as the primary data mining tool by the most data miners (17%). STATISTICA, KNIME, Rapid Miner and Salford Systems received the strongest satisfaction ratings.
  • VISUALIZATION:  Data miners frequently use data visualization techniques. More than four in five use them to explain results to others. MS Office is the most often used tool for data visualization. Data visualization is less prevalent in the Asia-Pacific region.
  • ANALYTIC CAPABILITY & SUCCESS:  Only 12% of corporate respondents rate their company as having very high analytic sophistication. However, companies with better analytic capabilities are outperforming their peers. Respondents report analyzing analytic success via Return on Investment (ROI), and analyzing the predictive validity or accuracy of their models. Challenges to measuring analytic success include client or user cooperation and data availability / quality. Read the best practices data miners shared for measuring analytic success.
  • OPTIMISTIC FUTURE:  Data miners are optimistic about continued growth in data mining adoption and the positive impact data mining will have. Participants pointed out that care must be taken to protect privacy when conducting data mining. Data miners also shared many examples of the positive impact they feel data mining can have to benefit society. Health / medical advances was the area of positive impact identified by the most data miners. Read the full list of positive impact examples identified by data miners in the 2011 survey.
The full summary report includes additional material about algorithms and software usage, the fields applying analytics, text mining, computing environments, data visualization tools, job satisfaction, and more.

2010 Survey + "Best Practices" Verbatim Responses

735 analytic professionals from 60 countries participated in the 2010 survey.
Highlights:
  • FIELDS & GOALS:  Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.
  • MODELS:  About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.
  • TOOLS:  After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.
  • TECHNOLOGY:  Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.
  • CHALLENGES:  As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face. This year data miners also shared best practices for overcoming these challenges. Read about their experiences overcoming these challenges.
The full summary report includes additional material about algorithms and software usage, tool selection priorities, data quality, model deployment, future trends, and more.

2009 Survey

710 analytic professionals from 58 countries participated in the 2009 survey.
Highlights:
  • ALGORITHMS:  As in previous years, data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis.
  • ORGANIZATIONAL IMPORTANCE:  Half of data miners say their results are helping to drive strategic decisions and operational processes. 58% say they are adding to the knowledge base in the field.
  • IMPACT OF ECONOMY:  Most data miners feel that the economy will not negatively impact them.
  • CHALLENGES:  The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data. However, in 2009 fewer data miners listed data quality and data access as challenges than in the previous year.
  • TOOLS:  IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners. Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners. Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
The full summary report includes additional material about algorithms and software usage, the fields applying analytics, corporate analytic capabilities, analytic challenges, concerns, analytic success measurement, and more.

2008 Survey

348 analytic professionals from 44 countries participated in the 2008 survey.
Highlights:
  • ADDRESSING CHALLENGES:  Dirty data, data access issues, and explaining data mining to others remain the top challenges faced by data miners. Data miners are most likely to use descriptive stats, outlier detection, and face validity to identify / address dirty data.
  • TIME ALLOTMENT:  Data miners spend only 20% of their time on actual modeling. More than a third of time is spent accessing and preparing data.
  • CONCERNS:  The most prevalent concerns with how data mining is being utilized are: resistance to using data mining in contexts where it would be beneficial, insufficient training of some data miners, and lack of model refreshing.
  • TOOLS:  SPSS Clementine was identified as the primary software used by more data miners than any other software product. SPSS and SAS continue to dominate the software market. However, Statistica, R, and the Salford products saw increased usage this year. In selecting their analytic software, data miners place a high value on dependability, the ability to handle very large datasets, and quality output.
The full summary report includes additional material about algorithms and software usage, tool selection priorities, allocation of time across analytic tasks, analytic challenges, data quality, and more.

2007 Survey

314 analytic professionals from 35 countries participated in the inaugural 2007 survey.
Highlights:
  • ALGORITHMS:  Regression, decision trees and cluster analysis were the most commonly used algorithms (mean number of algorithms used: 6.8).
  • CHALLENGES:  Top challenges data miners report are dirty data, data access, and explaining data mining to others.
  • TOOLS:  SPSS, SPSS Clementine, and SAS are the three most frequently utilized tools (mean number of tools used: 4.5). There is increasing interest in the Oracle Data Mining tool, and decreasing interest in C4.5/C5.0/See5. The primary factors data miners consider when selecting an analytic tool are: 1) the dependability and stability of software, 2) the ability to handle large data sets, and 3) data manipulation capabilities.
The full summary report includes additional material about algorithms and software usage, tool selection priorities, allocation of time across analytic tasks, analytic challenges, data quality, and more.
"Rexer Analytics’ series of Data Science Surveys is a foundational contribution to this industry’s community.  If you have the opportunity to work with Karl’s firm, nab it!"

Eric Siegel, PhD
Author of “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die”
Founder - Predictive Analytics World
Executive Editor - Predictive Analytic Times
"Rexer Analytics has been instrumental in helping advance the field of data mining through applied research, software evaluation, testing, professional conference support and consulting.  Their research evaluating the trends and preferences among the data mining community is a great resource for many."

Wayne Thompson, PhD
Chief Data Scientist - SAS
"I consider Rexer Analytics’ surveys one of the best independent analyses of Data Mining.  I stress the word ‘independent’ as that is the most useful.  Karl and his team have been active in this market for many years and bring considerable experience to the topic."

John MacGregor
Vice President, Customer Innovation & Strategic Projects, Products & Innovation - SAP
Author of “Predictive Analysis with SAP: The Comprehensive Guide”
"The Data Science Survey is one of the few unbiased resources that provides a comprehensive overview of the data science community.  It is a key source of information for analytic professionals regarding vendors, tools, and state-of-the-art algorithms.  As a frequently cited publication, it covers essential topics, new trends like “big data”, as a well as general market surveillance."

Michael Zeller,
PhDCEO - Zementis
"Rexer Analytics’ Data Mining Survey is a comprehensive and accurate assessment of the industry’s attitudes, performance, and trends.  There’s no better place to get a firm grip on the direction of this rapidly growing field."

Eric A. King
President - The Modeling Agency, LLC
"The Rexer Analytics surveys are extremely useful.  They provide industry benchmarks, keep us abreast of new trends, and surprise us with new insights."

Julia Minkowski
Risk Analytics Manager - Fiserv
Co-founder - "Russian Speaking Women in Tech"
"The Rexer Analytics team is doing great work in analytics and data mining.  Their Data Miner Survey reports are always full of useful insights."

Gregory Piatetsky-Shapiro, PhD
President - KDnuggets
Co-founder - KDD conference & ACM SIGKDD
Author of “Knowledge Discovery in Databases”
"It might not be without anxiety that we await, every other year, the results of the Rexer Analytics Data Mining Survey; but that anxiety tells you exactly that this survey is an essential snapshot of this increasingly critical and competitive market!"

Director of Analytic Strategy & Decision Management
- US Fortune 100 technology firm
"The field of data science is moving very rapidly.  As the manager of an academic research support team, it’s critically important for us to know where the field is headed.  That’s why we carefully study the reports from Gartner, Forrester, and Rexer Analytics.  We have studied the Rexer Analytics reports for many years to learn what tools are on top, but more importantly, whose market share is headed up or down.  The Rexer surveys are a key component for our published ‘The Popularity of Data Analysis Software’ analyses."

Bob Muenchen
Manager, Research Computing Support - University of Tennessee
Author of “R for SAS and SPSS Users” and “R for Stata Users”
"Every year, I look forward to reading the data mining survey from Rexer Analytics."

Michael Berry
Analytics Director - TripAdvisor for Business
Author of “Data Mining Techniques"
"The Rexer Analytics Data Miner Surveys outline the ‘State of the Art’ for our emerging industry.  Their interaction with clients, practitioners, academicians, consultants, businesses, professional societies and software vendors inform these surveys and place them in a perfect position to collect un-biased ‘independent’ evaluations regarding software choices as well as the best practices and challenges practitioners face.  Their market intelligence is highly valued in our profession."

Mary Grace Crissey
Research Analyst - Analytic Focus, LLC
Council Member, CPMS - The Practice Section of INFORMS
"We believe in a fact-based approach to technology adoption.  So we rely on the Data Miner Surveys from Rexer Analytics to help our clients understand analytic technology adoption trends and issues.  Rexer’s survey reports are an invaluable resource."

Dan Vlamis
President - Vlamis Software Solutions
Author of “Data Visualization for Oracle Business Intelligence 11g”
"As an instructor of courses in a rapidly changing field – business analytics – and as concentration advisor to the undergraduate and graduate students in business analytics at Babson, it is of paramount importance for me to keep up with trends in analytics software, techniques and applications. The bi-annual Rexer Analytics Data Miner Surveys have been a great source of information on global trends in analytics, and have allowed me and my colleagues to teach Babson students skills, tools and techniques that can position them better in the marketplace."

Dessislava A. Pachamanova, PhD
Professor of Analytics and Computational Finance - Babson College
Co-designer of the Babson undergraduate and MBA concentrations in Business Analytics
Author of “Portfolio Construction and Analytics”, and “Simulation and Optimization in Finance”
"As the longest-running survey of data miners in the industry, the Rexer Analytics survey provides valuable insights and trends into the tools, methods and applications of advanced analytical techniques today."

David M. Smith
R Community Lead - Microsoft
"Over the years, Rexer Analytics' Data Miner Surveys have provided useful macro information about the dynamic and growing field now known my many more names than data mining."

Anne Milley
Director Analytic Strategy, JMP Product Marketing - SAS
"The Rexer Analytics Data Miner Survey is the best survey of the current state and direction of the data mining and predictive analytics industry.  I recommend it to my workshop and course attendees regularly as a way to understand the important trends in software, algorithms, job titles, and vertical markets, as well as issues impacting the analytics industry and which buzz words are gaining traction."

Dean Abbott
Co-Founder & Chief Data Scientist - SmarterHQ
Founder & President - Abbott Analytics
Author of “Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst”
and “IBM SPSS Modeler Cookbook”
"The Rexer Analytics Data Mining Survey provides valuable insight into trends in tools and techniques, as well as backgrounds of data mining practitioners.  Rexer's analysis of the survey data dives into hype or reality of big data, the rise of analytics software like R, as well as challenges faced by analysts and their job satisfaction.  I look forward to each survey's results and increasingly see these results highlighted in presentations from other well-respected experts in the field."

Mark HornickDirector, Oracle Advanced Analytics, Oracle Corporation
Author of “Using R to Unlock the Value of Big Data: Big Data Analytics with Oracle R Enterprise and Oracle R Connector for Hadoop”