Big+Data+event-30.jpg

What We Do

Big Data Science is utilized everywhere, from the world of athletics to make play predictions by the opposing team during games, to Pandora Radio to categorize and play new songs the listener is likely to enjoy, to sequencing the Human Genome. Data has grown at a rapid rate over the last few years, further escalated as social media and smartphone use has increased exponentially. As a consequence, there is now a major shortage of analytic scholars to fulfill this rapidly growing need for data analytic positions. According to the U.S. Bureau of Labor Statistics, the national average growth rate for jobs is 5-8% yearly. However, a variety of occupations that require big data science are growing at a rate much faster than average, such as statisticians, with a growth rate of 34% per year. With the immense growth and utilization of big data in virtually all fields, it is becoming imperative that a wide variety of students become educated about how to visualize, analyze and draw conclusions from analyses of enormous sets of data. The DataJam was formed to develop new strategies for engaging high school students in the field of big data and encouraging them to obtain further education to participate fully in a data-driven world! The DataJam now offers programming for high school students “The Annual High School DataJam”, community college students “The Biannual Community College DataJam”, and middle school students “Middle School DataJam Days”.

The DataJam is a nonprofit organization that was formed by software scientists from IBM, Oracle, Teradata, and Iqvia and faculty from the University of Pittsburgh, Carnegie Mellon University and the joint Pitt-CMU Supercomputer Center to work together to develop and run the DataJam to inspire high school youth to enter the field of big data science. The central goal was to familiarize youth with the field of big data science and make them more aware of the role of big data in their everyday lives. As many companies and businesses rely heavily on big data and are helping shape the future using big data, the underlying concept was to build a pipeline of young people interested in and trained to participate in the big data science of the future. The DataJam started running a competition for teams from local high schools in the 2013-2014 school year and it has expanded every year, and now teams from across the country participate.

Ten Years of Introducing Youth to the Power of Data Science (2014-2024)

August 14, 2024

Read about the impact The DataJam has had on individuals who have participated (students, teachers, mentors), communities (schools, school districts, colleges & universities, business partners), and future data scientists! Track the national and international expansion of The DataJam, and learn about the ongoing activities that are strengthening The DataJam Community. The DataJam is impacting Access to Digital Skills for a wide diversity of youth. Learn how we hope to extend our mission “To grow communities of learners who use data, analysis, and critical thinking skills to better understand and impact the world around us” in the next ten years!

The High School DataJam

The High School DataJam is an academic competition for high school students and afterschool programs, like the Boys & Girls Clubs, which focuses on teaching about the use of big data to answer a research question. The program is set up in such a way that students usually work in teams of 3-8 students to formulate a research question, find publicly available data sets, analyze their data, make data visualizations, and present their findings to a panel of judges. Students learn skills pertaining to the scientific method, data analysis, and how to give scientific presentations. Schools can have multiple DataJam teams if they choose to.

The Fall of each academic year is a great time for teams to start forming and thinking about participating in the High School DataJam. A good way to get started is to view one of the two introductory videos “Introduction to the DataJam” or “DataJam Mentor Overview to the DataJam” that are available at the top of the DataJam page at pghdataworks.org. It is also very helpful for teams to view the video  “A Walk Through The DataJam Website” that can be accessed both at the top of the DataJam page or the top of the Home Page. If teams or teachers have questions about the logistics of participating in the High School DataJam, just email datajam@thedatajam.org and ask for a zoom conference and we will meet with you and answer any questions you have.

New teams may want to join the online Slack workspace for the High School DataJam. This is a workspace on the web where teams have their own channel and can work on their project collaboratively when they are not all together. They can also message their High School DataJam mentor on the Slack workspace and get assistance when they need it. If your team wants to join the Slack workspace, each team member needs to have their parent fill out and sign a permission slip (found in the DataJam Guide Book that can be downloaded from the DataJam page). Email the permission slips to datajam@thedatajam.org.

During December and January teams need to submit a High School DataJam proposal that includes their research question, their hypothesis, and the datasets they plan to use to address their question. A template for the High School DataJam proposal is found in the High School DataJam Guide Book.

High School DataJam teams then work on analyzing their data until late March, at which point they turn in a poster describing the findings of their analysis and they give an oral presentation to a panel of judges. All teams have the opportunity to display posters of their project at a High School DataJam finale in late April. The week before the finale each team presents their project in an oral presentation on Zoom to a panel of High School DataJam judges. Awards are given for 1st, 2nd, and 3rd place, as well as for the Best New Team, and Best Presentation. All students receive a certificate of participation and a participation prize.

Community College DataJam

The Community College DataJam is a semester-long data science activity and competition, offered annually in both the Fall (Aug-Dec) & Spring (Jan-May) semesters to introduce, encourage and engage college students to learn about data science with a focus in any subject area they are interested in. DataJam’s goal is to help students successfully engage with the ways data is used everywhere to solve problems, and better understand social, economic and environmental aspects of our world. As with the High School DataJam, teams are able to choose their own topic to study and all teams are paired with a DataJam mentor, usually a university student, to provide individualized assistance to their team. Importantly, for college students their DataJam poster and presentation can be included in their college portfolio and provide compelling examples of a hands-on learning project they have been involved in that will be of interest to potential employers.

Middle School DataJam Days

     Middle School DataJam Days are designed to introduce middle school students to the power of analyzing data to find answers to questions. They are half day workshops, run in person, in areas of the country where DataJam mentors are trained. Students work in groups with a mentor leading a session on a topic chosen to engage middle school students, such as “UFO Sightings”, “Shark Attack Locations”. The mentors come prepared to the workshop with data sets appropriate for addressing questions related to the topic. Students bring laptop computers to the workshop. Each group works together using google sheets and docs. Mentors guide students through developing specific research questions, developing hypotheses, choosing what analyses to do, guiding students through doing analyses and making data visualizations. At the end of the workshop each group gives a short presentation about their research findings.

Participating Middle Schools, High Schools, After School Programs & Community Colleges

Teams from thirty five schools and afterschool programs have participated in the High School DataJam over the past eight years. Click a logo to learn more about a particular school or afterschool program.

The Board of Directors

The Board of Directors of The DataJam plays a crucial role in overseeing the organization and management of the annual DataJam competition. This team of eight members, including Beth Bauer, Cheryl Begandy, Judy Cameron, Catherine Cramer, Brian Macdonald, Devashish Saxena, Beth Schwanke, and Raja Sooriamurthi, is responsible for designing and continually updating the DataJam and its accompanying resources to ensure successful project implementation by participating teams. They are also involved in expanding the reach of the DataJam and the training of DataJam Mentors on a national scale, aiming to engage diverse communities across the country.

Beth Bauer

Beth Bauer is Founder and CEO of PosiROI, a consulting firm helping to maximize impactful and sustainable commercial outcomes through strategic innovations in data, analytics, and tech.

With 30+ years leading large teams in healthcare big data, Beth’s traversed a non-traditional career path: from statistician/ data scientist to business problem solver and strategist to trusted leader/ inspirer. She is wildly passionate about data and the power it brings its users to understand their landscape, assess options, and guide decisions based on a coupling of technology, probabilistic outcomes, common sense, diverse perspectives and subject expertise. Beth’s customer consultative roles at PosiROI and IQVIA provided opportunities to create meaningful business impact through data for 26 Fortune 100 companies.

Preceding her current role at PosiROI, Beth envisaged and implemented Merck’s 1st end-to-end US Commercial Data Strategy to unlock new execution pathways and create foundational organizational and data structures that enable ongoing business innovations. The US Data Strategy was so measurably successful, it was selected by Merck’s CEO to roll out to Merck Human Health globally.

Beth’s work in US big data repeatedly reveals the disparities in health and wealth. Beth believes the heart of these disparities lies in reliable access to and trust in data and analytics – starting with improved data education for all. Beth speaks frequently about creating trusted shared value, healthcare, analytics, data strategy, and how compounding, but varied, social determinants impact community viability.

Cheryl Begandy

Cheryl Begandy is the Coordinator of External Relations for the Pittsburgh Supercomputing Center (PSC) where she is responsible for Government Relations at the state and local level. She also represents PSC in educational outreach programs, with a particular emphasis on K-12 STEM education and building 21st century skills for 21st century jobs. This includes DataJam where PSC participates in holding a high school competition in Big Data analytics.

Prior to coming to PSC, Ms. Begandy worked for Alcoa in a variety of information technology management positions. For over ten years she was a division manager at the Alcoa Technical Center where she was responsible for directing scientists and engineers working in computational modeling, visualization, applied statistics, artificial intelligence, and computing and network services. She served as Board President for Pittsburgh Musical Theater, a professional theater company and conservatory for young performing artists. During her tenure as Board President at PMT, the Company purchased and completed phase 1 renovations of a building to house their offices and studios; raised $1.7 M in its first capital campaign. She also served on the Board, and was Board President for five years, for Pittsburgh Action Against Rape, a sexual assault crisis center. Ms. Begandy currently serves as Board President for Ansar of Pittsburgh, a non-profit that advocates and provides services and relief to immigrants settling in Western Pennsylvania.

Judy Cameron, Ph.D.

Judy Cameron, Ph.D., is the Executive Director of The DataJam, a member of the Board of Directors for The DataJam, and a Professor at the University of Pittsburgh. She has a long history of translating science to the public. She has served as the Director of the Pitt Science Outreach program, which provides science education programs to school children and the public, since 2009. She has made a series of short films for PBS about myths about mental illness. She was a founding member of the editorial board for the public information website, BrainFacts. Dr. Cameron teaches an undergraduate course at Pitt “Using Big Data for Community Good” that trains Pitt students how to be effective mentors for the DataJam.

Catherine Cramer

Catherine Cramer works at the intersection of data-driven science and learning, specifically as it pertains to the understanding of complexity and its application to data and network sciences, with a focus on underrepresented communities. For over 20 years she has developed tools and programs for the teaching and learning of complex network and data science, centering on creating and growing productive and innovative collaborations and partnerships between research, industry and academia. She worked with the NSF-sponsored Centers for Ocean Science Education Excellence (COSEE) and the Ocean Literacy initiative from 2004-2014, and was one of the founders of the Network Literacy and Network Science in Education movements. She remains active in both, most recently organizing the 12th annual Network Science in Education symposium as part of the 2023 International School and Conference on Network Science, and is on the Board of the Network Science Society. She is co-editor and co-author of the Springer volume Network Science in Education, published in October 2018. She is the Deputy Director of the West Big Data Innovation Hub and Director of Outreach and Engagment for Data Initiatives at the San Diego Supercomputer Center, where she is working to build out a network of DataJam teams and mentorship training throughout the state of California. She is the founder and director of the Woods Hole Institute, a non-profit focused on connecting people and ideas among disciplines through a wide range of experiences such as colloquia, seminars, retreats, workshops, performances, and installations. She is also a drummer and percussionist.

Brian Macdonald

Brian Macdonald is a Data Scientist who helps companies in the Fortune 500 to optimize business outcomes by leveraging data, machine learning and artificial intelligence. Brian has 30 years’ experience implementing analytic solutions to address a wide range of customer needs leveraging Statistics, Machine Learning, Big Data, Data Warehousing, Business Intelligence, OLAP, Hadoop, and ETL technologies.

Brian is currently a Data Scientist and Cloud Engineer and has held similar positions at IQVIA and Teradata. Brian is a co-author of the Oracle Big Data Handbook. Brian frequently speaks at industry events and is the President of The DataJam Board of Directors. The DataJam is an analytics competition that introduces high school students and afterschool programs to big data and data analytics.

Devashish Saxena

Devashish Saxena always starts with the human being at the heart of every transformational journey. He believes we are on a human journey for technology rather than the other way around. He helps to craft the vision, builds organizations and teaches them to make data-driven decisions in an agile and lean manner so that they can collaborate and deliver continuous financial impact. He has a remarkable track record of driving $2 billion digital business growth and scaling AI/ML initiatives with potential to impact EBIT by >$200 million, while consistently shepherding deep business transformation. He is an inspirational leader of global teams, who tells stories that inspire action, getting stuff done and delivering results. Devashish has a technology background with an undergraduate and a graduate degree in Computer Science from the University of Texas at Austin. He worked in tech in telecom in his early career. Following his MBA from the University of Texas at Austin, he has built his career at the intersection of digital and international. He spent time consulting for A. T. Kearney at the birth of the .com era working largely in the automotive sector helping firms understand how value creation and value capture has shifted because of the Internet. He often jokes that he has been doing the same job ever since, as technology continues to evolve exponentially faster than the ability of businesses to adopt it and create value. Devashish has lead the digital transformation journey at Texas Instruments, Premier Farnell, Rexel and PPG, bringing their business units into the digital age across the globe. Devashish serves on the Senior Advisory Council for AEA Investors, is an Adjunct Faculty member at Carnegie Mellon University and a member of the non-profit DataJam Board of Directors. He is a frequent keynote speaker at major industry events. Devashish is a global citizen who was born in India, spent much of his childhood in Hong Kong and nurtured his deep love for exploring new cultures by working in Dallas, Singapore and Paris. He is now based in Pittsburgh with his wife who is a dentist and three children – a daughter and twin boys.

Beth Schwanke

Beth Schwanke is the Executive Director of the University of Pittsburgh Institute for Cyber Law, Policy, and Security (Pitt Cyber), where she leads efforts on tech policy issues ranging from municipal algorithmic governance to cyber workforce development. She also serves on the Digital Inclusion Steering Committee of the Community Engagement Center in the Hill District, the Steering Committee of the Collaboratory Against Hate, and the Workforce Advisory Council of the School of Computing and Information’s Professional Institute. Beth is a lawyer and previously practiced with the global law firm DLA Piper and led policy outreach efforts at the Center for Global Development in Washington, DC.

Raja Sooriamurthi, Ph.D.

Raja Sooriamurthi, Ph.D., is a Teaching Professor with the Information Systems Program at Carnegie Mellon University, Pittsburgh. He has been involved with DataJam in various roles since its inception in 2014. His research and teaching interests span the fields of artificial intelligence and software development with a current focus on data-driven decision making. Along with his co-authors, he has investigated a novel approach to teaching critical thinking and problem solving termed puzzle-based learning resulting in the book Guide to Teaching Puzzle-based Learning (Springer, 2014).  In addition to his university courses, Raja has taught several conference and industry workshops in the US, Australia, the Middle-East (Qatar, The United Arab Emirates), and India.  Over the years, since a graduate student, his pedagogical efforts have been recognized with several awards for teaching excellence.

How Data Professionals Can Get Involved

At the DataJam, industry professionals play a crucial role in inspiring and guiding the next generation of data scientists. Your support directly impacts the educational experiences of high school and community college students, fostering their interest in data science and preparing them for future careers. By supporting DataJam, you engage with a diverse community of educators, students, and industry professionals, fostering collaboration and knowledge sharing. Your contribution fuels innovation in data science education, empowering students to tackle complex challenges and develop creative solutions using big data analytics. There are several ways you can support DataJam and make a meaningful impact!

Judge the Competition

Become a judge for DataJam competitions and help evaluate projects developed by high school and community college students. Your expertise and insights will contribute to recognizing and rewarding innovative solutions and exceptional analytical skills among young participants.

Host a Field Trip

Open your doors and host a field trip at your place of business. Show students firsthand how big data is applied in real-world settings. This immersive experience can spark curiosity, inspire learning, and provide valuable exposure to industry practices. We like to offer field trips for all teams winning DataJam awards after the DataJam Finale each year. For these special field trips we like to arrange for the teams to present their 10-minute final presentation to the industry professionals.

2023 Google Visit North Allegheny Team 1

2023 Google Visit North Allegheny Team 1

Donate and Pledge your Financial Support

Consider making a monetary donation to DataJam or pledging a yearly contribution. Your financial support enables us to continue organizing competitions, providing resources to participants, and expanding educational initiatives in big data science.

If you're interested in supporting DataJam through donations of money or time, please reach out to us at datajam@thedatajam.org. We welcome contributions at all levels and appreciate your commitment to empowering the next generation of data-driven innovators.

Join us in shaping a brighter future through data science education. Together, we can inspire, educate, and empower young minds to excel in the dynamic world of big data.

Partners and Sponsors

PPG and the PPG Foundation

A Pittsburgh company since 1883, PPG is a global supplier of paints, coatings, optical products, and specialty materials.  Through leadership in innovationsustainability and color, PPG helps customers in industrial, transportation, consumer products, and construction markets and aftermarkets to enhance more surfaces in more ways than does any other company. Like many companies PPG is using data science to fuel a digital transformation, providing inspiration and a realistic view of careers for the DataJam students. The PPG Foundation provided funding in 2019-20, helping the DataJam to shift to a virtual format in response to COVID-19.

NorthEast Big Data Innovation Hub

The DataJam is proud to be a collaborator of the Northeast Big Data Innovation Hub (Northeast Hub). The mission of the Northeast Hub is to build and strengthen partnerships across industry, academia, nonprofits, and government to address societal and scientific challenges, spur economic development, and accelerate innovation in the national big data ecosystem. The Northeast Hub is a community convener, collaboration hub, and catalyst for data science innovation in the Northeast Region. The Hub amplifies successes of the community, and shares credit across the community to encourage collaboration and mutual success in data science endeavors. The Northeast Hub region includes the states of Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island and Vermont.

The Math & Science Collaborative

The MSC is a program of the Allegheny Intermediate Unit that focuses on STEM education. The MSC brings innovative and effective approaches in curriculum and instruction to the region, preparing educators to support all students for work and career in the 21st century. It was formed in 1994 by a regional “congress” of stakeholders and reaches out to more than 135 public and non-public schools/districts in Allegheny, Armstrong, Beaver, Butler, Fayette, Greene, Indiana, Lawrence, Mercer, Washington, and Westmoreland counties. The MSC helped promote the DataJam to its member school districts this past year and identified joint opportunities to work together in the future.

Pitt Cyber Institute

The University of Pittsburgh Institute for Cyber Law, Policy, and Security provides a unique interdisciplinary environment for tackling cyber challenges. They bring the breadth of one of the world’s leading public research universities to bear on the critical questions of networks, data, and algorithms, with a focus on the ever-changing gaps among law, policy, and technology. Their collective of legal, policy, and technical researchers engages with policymakers and industry to create both actionable proposals to address current demands and fundamental insights to understand the future as it arrives. DataJam is teaming with Pitt Cyber to bring issues of data ethics and security into the program for the students to consider as part of their research.

The DataJam Board of Directors is supported by their home institutions, all of which have major education, research and development programs in Big Data, Data Analytics, and Data Science:

University of Pittsburgh

The University of Pittsburgh is a state-related research university.  Founded in 1787, Pitt is one of the oldest institutions of higher education in the United States. Pitt people have defeated polio, unlocked the secrets of DNA, lead the world in organ transplantation, and pioneered TV and heavier-than-air flight, among numerous other accomplishments.

Carnegie Mellon University

Carnegie Mellon University is a private, global research university, Carnegie Mellon stands among the world's most renowned educational institutions, and sets its own course with cutting-edge brain science, path-breaking performances, innovative start-ups, driverless cars, big data, big ambitions, Nobel and Turing prizes, hands-on learning, and a whole lot of robots.

Pittsburgh Supercomputing Center

Pittsburgh Supercomputing Center is a joint partnership with Carnegie Mellon University and the University of Pittsburgh. Established in 1986,

PSC advances the state of the art in high-performance computing, communications and data analytics and offers a flexible environment for solving the largest and most challenging problems in data and computational science to scientists and engineers nationwide for unclassified research.

San Diego Supercomputer Center (SDSC)

The San Diego Supercomputer Center (SDSC) at UC San Diego is a leader in high-performance and data-intensive computing and cyberinfrastructure. Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery.

SDSC provides resources, services and expertise to the local, regional, and national research community, including industry and academia. It supports hundreds of multidisciplinary programs spanning a wide variety of domains. SDSC was founded in 1985 with a $170 million grant from the NSF Supercomputer Centers program.

SDSC's history includes pioneering advances in data storage and cloud computing, from which have emerged several Centers of Excellence in the areas of large-scale data management, predictive analytics, health IT services, workflow automation and internet analysis.

PosiROI

With 30+ years working to identify, qualify and mine data, analytics and insights across the pharmaceutical and healthcare landscapes, we've been creating both small and scaled value solutions for our customers before most people knew the term Big Data. We work with our customers to understand the complexities and challenges they are facing. Then we work together to develop the business outcome-aligned data, analytic and technology strategies needed to accelerate and enable value today, while building the foundations and modern curation needed for sustainability.

We work with you to reveal win-win 360° customer value stories using your fit-for-purpose data, at scale. If you are not able to execute and quantify business impact, then all this data, research and insights, and all those data science dollars – appear as cost, instead of value.

We are collaborating with the DataJam to help with expansion in data education for all, and share the variety of exciting consultative data careers that make an impact on our world and our future.

Woods Hole Institute

The Woods Hole Institute (WHI) is a 501(c)3 non-profit that helps connect people and ideas among disciplines through a wide range of experiences such as colloquia, seminars, retreats, workshops, performances, and installations.  WHI focuses on a range of topics including:

Complexity: Addressing the daunting problems we now face as a species requires valuing complexity as a framework for addressing humanity’s wicked problems.

Convergence: Addressing 21st century problems will take the collective minds of all of us. Converging the disciplines of science and valuing and working together across the social and physical sciences as well as the arts and humanities is going to be how we create a future for all of us. 

Sustainability and Resilience: In order to create communities that are adaptive to extreme environmental events and serve all its members in ways that can function in perpetuity, we must rethink our relationship to the places we live and work. This will only happen with deep engagement, understanding what matters, knowing what it means to be part of a dynamic system, and working together to create healthy and equitable ways of living. 

Emergence: With even the best minds and tools of science and engineering, we can’t always predict what happens next. But we can be prepared to expect the unexpected. With the rapidly changing climate, stresses on food systems, and increasing destruction of wildlands, we have to anticipate the next superstorm, the next pandemic, or the next dramatic change in climate and be willing to have plans to respond to something big.

West Big Data Innovation Hub (WBDIH)

The West Big Data Innovation Hub is an inclusive community for catalyzing and scaling data science for societal needs. Our mission is to build and strengthen partnerships across academia, industry, nonprofits, and government—connecting research, education, and practice to harness the data revolution. 

With a focus on thematic ‘verticals’ such as metro/urban data science, and natural resource management, especially water, as well as cross-cutting ‘horizontals’ such as open science, workforce development, and data ethics, the West Hub enables creative cross-pollination and resource-sharing. 

Fueled by outcomes-focused partnerships, the West Hub facilitates the development of collaborative pilot projects addressing regional needs, while connecting and scaling efforts as part of a larger global network. The WBDIH connects, convenes, curates, and communicates across our network with an emphasis on enabling interoperable, scalable, and sustainable solutions.

Past DataJams