The Role of Crowdsourcing in Amplifying Generative AI | Tech Mahindra

The Role of Crowdsourcing in Amplifying Generative AI

It is a common perception that generative AI and crowdsourcing are new trends in the technology world. But that’s not really true. Crowdsourcing has been in existence for almost two decades, using the ‘power of the crowd’ from executing microtasks to complex problems as well as developing Innovative ideas. The gig economy usage by large tech providers in the field of AI as well as the services industry just increased the adoption of crowd-powered work execution in the technology world. Similar to this, generative AI was initiated in 2014 and with the emergence of ChatGPT and similar tools, this concept has surged to high levels of applying use case-based adoption in various fields since it can help to create content in various forms like text, and video, images, and IT codes.  While generative AI challenges the mundane nature of tasks and automates them crowdsourcing looks at larger business processes/problems, splits them into microtasks and uses crowd/community members to solve them.

Generative artificial intelligence (AI) is one of the breakthrough AI technologies that can be used to generate new content, not only text but also audio, code, images, simulations, and videos. Along with the recent emergence of aggressively growing models like ChatGPT and Bard which has revolutionized the way content will be created in the future.

Generative AI the has following components working on a large training dataset – a complex neural network-based algorithm which analyzes and creates patterns and relationships based on the data with a good understanding of underlying rules and a user interface to build new use cases and scenarios/training data inputs. The content created by generative AI technology will also need rating and moderation for trust and harmful content.

The Role of Crowdsourcing in Building Generative AI

Building and maintaining large training data with various domain experts is a time-consuming and the biggest task for any generative AI model. This needs the true power of the crowd in terms of skill and scale. Generative AI coupled with the availability of crowd/gig resources around the clock drives speed to market in building the training data faster.  Crowding with a large data scientist network can also be used to build newer neural network models for generative AI. Once the content is created by any of these generative AI models, there are a few more use cases like rating the content as well as moderating the same from offensive nature/bias perspectives which are due to ethical and cultural reasons. All this needs to happen with speed and accuracy and the crowd brings the best way to accumulate wisdom on a scale.

So, the key question is what type of crowd execution model is better for generative AI? How do we assure the quality of the work from gig workers? In my opinion, organizations need to look at managed crowd model (i.e., hiring gig workers on a part-time/available basis) to set up average handling time, and quality processes and build a community around each domain at their initial plan to leverage crowdsourcing. Once these are set up in the initial 4-6 weeks it’s easier to transition to an outcome-based / transaction model with an open crowd. On the quality criteria with managed or open crowd execution model, since this is the collective wisdom of multiple domain experts on the same topic/question, I strongly recommend using the peer review methodology and then adjudicator in case of conflict between two experts. The amount of peer review will be similar to two people answering the same question or it can use submitter/approver methodology depending on domain as well as crowd capacity for a specific domain.

So, this brings me to the key challenges or pitfalls to look out for when using crowdsourcing for generative AI work –

1) Domain expert crowd capacity and availability

2)Typical gig worker mentality to look for the web for their answers will need to be curtailed to retain the originality of answers

3) Speed is important but coupled with SME availability it can be a challenge

Exploring the Growing Influence of Crowdsourcing

The current adoption of crowdsourcing in the generative AI field is limited since the majority of tech companies have just started exploring large data labeling/problem-solving tasks on crowd platforms with special communities to supplement their AI models. This is due to the fact of not being able to hire/retain the last pool of domain experts due to cost and scale reasons. This along with language diversity needed by the world to build LL models are good examples of crowdsourcing.

So, Crowdsourcing will drive the speed of improving the accuracy of Generative AI models much faster compared to the traditional methods which are manpower ramp-up dependent for each technology company.

Crowdsourcing is based on trust, honesty, plagiarism free, confidentiality of the crowd community members and first-time quality certification by gig workers when they finish the work. This also means avoiding collaboration and spamming by community workers.  These types of issues need crowd platforms to be security compliant to ensure identified crowd member only works on the assigned tasks from identified IP location which needs to have security features like zero trust security and watching the behavior of crowd workers.

Crowdsourcing as a Catalyst for Advancement in Generative AI

The competitive generative AI field needs speed to build LLM models for every tech and non-tech company’s quick ramp-up for which crowdsourcing is the solution and every advance in the technology will need community/crowd members to update/upgrade their skills and stay competitive to earn the extra gig earning. The size and the skill of the community will be differentiators of each crowd vendor in the future of generative AI work.

About the Author
Dhananjay Deshmukh
Region Head, Tech Mahindra BPS, USA

Dhananjay is a seasoned professional with 33+years of industry knowledge around talent models used by large enterprises from in-sourcing (active and passive sourcing), outsourcing and crowdsourcing with security, risk, and scalability in mind to drive time to market with an optimal mix of talent sourcing strategies for Fortune 500 companies.