Business Express is an online portal that covers the latest developments in the world of business and finance. From startups and entrepreneurship to mergers and acquisitions, Business Express provides reporting on the stories that matter most to business leaders and decision-makers.The website publishes news, press releases, opinion and advertorials on various financial organizations, products and services which are commissioned from various Companies, Organizations, PR agencies, Bloggers etc. These commissioned articles are commercial in nature. This is not to be considered as financial advice and should be considered only for information purposes. It does not reflect the views or opinion of our website and is not to be considered an endorsement or a recommendation. We cannot guarantee the accuracy or applicability of any information provided with respect to your individual or personal circumstances. Please seek Professional advice from a qualified professional before making any financial decisions. We link to various third-party websites, affiliate sales networks, and to our advertising partners websites. When you view or click on certain links available on our articles, our partners may compensate us for displaying the content to you or make a purchase or fill a form. This will not incur any additional charges to you. To make things simpler for you to identity or distinguish advertised or sponsored articles or links, you may consider all articles or links hosted on our site as a commercial article placement. We will not be responsible for any loss you may suffer as a result of any omission or inaccuracy on the website.
iStock 1073160322
iStock 1073160322

Businesses count the cost of using questionable AI datasets 


Businesses count the cost of using questionable AI datasets 

 By Adonis Celestine, Director of Automation at Applause

Generative AI tools like ChatGPT represent a significant opportunity for businesses because they can perform a variety of tasks at a faster rate than people. But they also pose significant ethical and compliance risks if the data sets they’re built on are left unchecked. Adonis Celestine, Director of Automation at digital quality and testing company Applause explores the risks and how to avoid them.

It has been a whirlwind six months since ChatGPT burst onto the scene. Its ability to mimic writing styles to produce content on request thanks to natural language processing has given many people a real taste of what AI powered services are capable of. Generative AI tools are powered by sophisticated algorithms based on huge amounts of data in the form of text, images, video and audio files scraped off the internet. But, GPT-4, the latest iteration of ChatGPT’s underlying algorithm does not reveal what data sets it is trained on, citing competitive reasons, which raises serious ethical questions and concerns on data bias and privacy.

Regulators in Italy have gone as far as to limit the processing of Italian users’ data until it has reassurances about policies surrounding ChatGPT’s use of personal data. OpenAI, the company behind ChatGPT, has insisted that it complies with GDPR and other privacy laws. Trade unions in Germany that represent the creative industries have also expressed concerns about potential copyright infringement, demanding new rules to restrict ChatGPT’s use of copyrighted material. It’s unclear who owns the copyrights of content generated by AI. Legally only humans hold copyrights. Which raises the question, who is liable for copyright infringements, the AI or its creator? 

Examining the ethics of AI  

Advances in AI and machine learning (ML) are happening so quickly that regulators are playing catch up. But this isn’t the first-time generative AI has been scrutinised. Last year a user discovered a sensitive personal photo on the data sets used to train an image-generating AI tool. The user had not given consent for the image to be used. This exposed a massive flaw in the way data is extracted. Businesses and public organisations need to be mindful of what is contained in the data sets used to train AI services, if that data complies with data privacy and copyright laws, and if it has been ethically sourced. The European AI Act, designed to address these matters with punitive fines, will come into effect in a few years. Much like GDPR, it will apply to businesses that operate in the EU and is expected to influence similar legislation around the world.

Consumers give their verdict on AI

A survey conducted in February found that a large majority of people believe AI should be regulated. Only 6% of 4000 respondents said they did not think AI should be regulated at all. 

More than half (53%) said AI should be regulated depending on its use and 35% said it should always be regulated.

The same study also looked at sentiment around the inherent biases that can affect interactions with generative AI services. Bias occurs when the underlying algorithm has been trained with poor or insufficient data. When asked about bias in generative AI tools, 86% of survey respondents expressed concern.

Up to one third said they were dissatisfied with the AI experience, and 32% agreed with the statement “I would use chatbots more if they responded more accurately to the way I phrase things.” Natural language processing failures can reflect gaps in training data, including limited data from various regional, generational and ethnic groups. As consumer and regulatory scrutiny intensifies, businesses developing AI tools need to ensure they’re collecting training data legally and ethically.

Building valid and ethical data sets

To build high-quality data sets, companies should focus on these four key points:

Don't miss out on any breaking news or insightful opinions!
Subscribe to our free newsletter and stay updated on the go!


By submitting this form, you are consenting to receive marketing emails from: Global Banking & Finance Review. You can revoke your consent to receive emails at any time by using the SafeUnsubscribe® link, found at the bottom of every email.

  • Make sure your organisation’s terms and conditions/privacy policy cover AI training use cases. If you’re planning to use customer data to train AI, then make sure they understand how the data will be used and how it will benefit them (e.g., improved product and service offerings).
  • Make sure participants have opted in. Businesses are on solid ground when participants have agreed to provide data that may explicitly be used to train AI algorithms
  • Actively work to eliminate bias. Look at the data and make sure it accurately reflects the diversity of your customer base and target audience.
  • Consider creating synthetic balanced data based on patterns and abstractions.

Similarly, while data warehouses can provide artefacts at scale, it’s important for buyers to ensure the data may be expressly used to train AI algorithms without risk or repercussions. It’s important to ask if contributors have granted permission to have their biometrics used to train body or facial recognition technology, voice applications or other AI products.

 

Diverse data sets reflect authentic experiences

While consent is key, diversity of data and experience are also essential for training AI algorithms. Businesses need to ensure their data sets include samples from people with disabilities, different ages, genders, races, and other key demographics. An example of this done well was by an international fitness company that sourced AI training data from 1,500 users with a variety of body types and fitness levels. The project produced 36,000 workout videos that were vetted to ensure relevance and quality, and finally approved by the fitness company’s Digital Quality Analyst (DQA) team. The videos had the required diversity of data including BMI, fitness abilities and varied workout clothing, and zero data bias. 

As we continue to see AI used for an ever-growing number of new and different use cases, it is essential to ensure the quality and integrity of the experiences. Companies that focus on ethically collecting and training algorithms with diverse, quality data will see the biggest successes in both releasing great AI experiences, and in doing what’s right for their customers.

 

 

 

Recent Post: