Article

Data about data

Data has become so central to everything we do that it has its own branch of research. The emergent field of data science combines knowledge from mathematics, statistics, computer science and other disciplines with the scientific method to develop new ways of working with and gleaning meaning from the data all around us.

Abstract binary code

Digital data

Digital data refers to information in a digital format, typically consisting of binary code. It can be processed, stored and transmitted electronically.

Rights: gonin, 123RF Ltd

Data mining and data scraping

Data scientists develop ways of training computers to perform techniques such as data mining, which means combing vast amounts of data for meaningful patterns and extracting the information these patterns can tell us about the data. This is useful because it’s a way of extracting information from datasets far too large for a human to sift through.

Data scientists create specialised techniques for extracting data too. Data scraping is the practice of training computers to pull down large amounts of data from another program’s end-user output. What that means is coding software to extract data from another program once it’s been displayed as information for human consumption. Normally, data transferred between machines or software programs uses automated structures unreadable to humans, but this isn’t always possible so data scrapers have to find ways of extracting the raw data from a feed of information. People and computers process information very differently, so this method of acquiring data can be very time consuming or resource intensive.

Computer tablet with recipe beside tomatoes, garlic and olive oi

Online recipe

Online chefs use extraneous text to hide their work from data scrapers.

Background photo by fabiobalbi, 123RF Ltd. 

Rights: The University of Waikato Te Whare Wānanga o Waikato

These practices can have effects we notice around us without realising it. An example of a common use for data scraping is to trawl the internet for content that can be easily repurposed. Oddly enough, one of the best such types of content is food recipes. The reason every recipe page starts with several paragraphs of personal preamble from the author isn’t vanity – it’s to confuse data-scraper programs, ensuring only real humans can access the content they came for.

So if everyone’s constantly collecting and using data, what might it tell them about you?

Owning your data

We hear data talked about a lot lately. The more data is being gathered and analysed, the more of a picture it’s able to give about many things going on in the world. Every time someone visits a website, posts to social media or pays for something electronically, somewhere a datum about them is logged in a much larger dataset. Even school performance is collected as data. Increasingly, it’s becoming important that individuals have access to information about what data about them is being gathered – and a say in who has access to it.

Millions of copyrighted books, articles, essays, and poetry provide the “food” for AI systems.

The Authors’ Guild, Open Letter to Generative AI Leaders

One example is in the field of artificial intelligence (AI) research and development. Large language models (LLMs) like ChatGPT and generative AIs like Midjourney are trained using vast amounts of data – in this case, existing examples of human writing and art. The machines are able to digest this data to form their version of knowledge about how humans write and draw, which they use to generate new text or images from a prompt. But many artists and writers argue that their propert y – in this case, writing and artwork – has been included in the training data without their consent, meaning the AIs can be used to create material that infringes on their ownership of the work they created.

Where I would put my flag up for data sovereignty is when it comes to our stories or our narratives or our tikanga.

Sonny Ngatai, te reo Māori advocate

The field of data sovereignty – who owns and accesses data – even includes questions like whether LLMs like ChatGPT should be allowed access to languages such as te reo Māori. Some experts argue that te reo is too precious a taonga to be fed into the AI datasets and altered by machines.

Personal data can be used in many good ways. When students take a test in school, the data in their answers is analysed to determine how well they scored and it can be used for feedback. But once again, the issue is who has access to that information and what they’re able to do with it.

Data in your world

The use and creation of data has become commonplace for most people – but it is useful to take a moment to ask some questions:

  • What are some ways you collect and use data?

  • Can you think of instances of data in your world?

  • What are some ways that you get information — and how does this differ from data?

  • When you think about ways that you get information or knowledge, where might the raw data come from?

  • In what instances in your everyday life do you provide data for others?

Data activities to sample with ākonga

Pendulums – collecting and using data: students can gather data, process it into information and use that information to make predictions.

Using radiocarbon carbon dioxide data: students work with pre-existing data in order to think about how information is presented and what it might mean.

Measuring the power output of elite athletes: students learn about some of the ways sportspeople use data to improve their performance.

Related content

The PLD webinar Digital tools for science learning introduces easy-to-use digital tools that can engage learners in real-time data collection.

AI and generative learning have many positives and quite a few drawbacks. The Futures thinking toolkit can be customised to explore how changes in this technology may impact our lives and the lives of future generations.

These Connected articles are useful in helping younger learners with the concept of data:

Activity ideas

There are numerous activities on the Science Learning Hub to facilitate learning about data. Use this link and then use the filters to narrow your search.

Published: 3 August 2023