Enhancing big data in the social sciences with crowdsourcing: Data augmentation practices, techniques, and opportunities PLOS ONE

Crowdsourcing has gained significance as an interesting practice for yielding meaningful insights from big data. Crowdsourcing has eased the process of performing tasks that are hard to crack for computers including audio transcriptions, sentiment analysis, document summaries, document editing, entity resolution and image annotation. Enterprises that completely crowdsource data to make critical business decisions, definitely does have some loopholes.

A growing means of interacting with non-customers is through crowd-based phenomena, which are therefore examined in this study as a way to further collect big data.
Especially as recent years have seen grassroots activism ramp up, communities have used platforms like GoFundMe to support families affected by police brutality or other violent attacks.
Automated augmentation approaches, such as adding value through sentiment analysis, are also difficult to implement without advanced training and may themselves be of questionable validity.
The funder had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Moreover, the proposed framework is also of managerial and policymaking interest because it indicates how it is possible to jointly leverage big data and crowd-based phenomena to further benefit organizations. More broadly, this study seeks to demonstrate the importance of jointly considering these phenomena under the proposed framework and nurturing further interest in this direction. Recent studies have started exploring the positive impact that the use of big data can have on organizations. Corte-Real et al. (2017) have highlighted, through a survey of managers, that the availability of big data can benefit a firm’s financial performance. In their study, Tiwari et al. (2018) found that the analysis of big data in the health-care sector can reduce costs and improve quality.

Francesco Cappa is based at the Department of Engineering, Campus Bio-Medico University, Rome, Italy and Luiss Business School, Luiss University, Rome, Italy. He is an Assistant Professor of Innovation at the Campus Bio-Medico University (Rome, Italy) and Adjunct Professor at Luiss Guido Carli University (Rome, Italy). He has been visiting researcher at the New York University Tandon School of Engineering (New York, USA) and Pace University Seidenberg School of Computer Science (New York, USA).

Crowdsourcing: A Snapshot of Published Research

To define the nature of data, data mining is one of the traditional process for the exert in analytics domain. Crowdsourcing uses smart phone users as volunteers and share their annotation process for different type of contributions. Using crowdsourcing the opportunities and challenges of data analytics are reviewed, and summarize the data analytics framework. Then it is discussed several algorithms of including applications, cost control, quality control, latency control and big data mining framework which must be consider in the field of crowdsourcing. Finally, the conclusion of this project tells about the data mining limitation and give some suggestions for future research in crowdsource data analytics. Big data companies need crowdsourcing in their operations to ensure objectivity and diversity, prevent against errors more effectively and let realistic social trends play a part in data analysis.

Laboratory for Information and Decision Systems

Prioritizing people’s rights to consent, privacy and security when using the data should be at the core of any crowdsourced business model. For example, your company should make it clear that both the data collected and the identity of the contributors are kept safe and anonymous. You should find ways to ensure transparency, safety and peace of mind in each task you assign. Crowd empowerment will likely only work when contributors have the peace of mind of knowing that their data and their well-being are secure at all points throughout the process. During crowdsourcing campaigns, organizations can collect big data from (customers and) non-customers from which they can extract valuable outcomes.

What Are the Main Types of Crowdsourcing?

The way in which scientific research is conducted has also changed due to the inclusion of crowds in data collection and analysis, leading to the birth of citizen science (Franzoni and Sauermann, 2014; Wildschut, 2017). Citizen science is another form of crowd-based phenomena whose roots date back to the early 19th century when people were recruited to gather data used, for example, to catalogue birds (Land-Zandstra et al., 2015). Citizen science seeks to involve citizens, without any specific scientific background requirement, in the collection and analysis of data for research projects through technology-mediated interactions.

In our sample, data augmentation was much rarer for both psychological (23%) and non-psychological (16%) studies. Of the studies involving data augmentation, workers are most often asked to perform crowd sourcing analytics in big data tasks replicating other data, such as lab experiments (16%). Less frequently, they are asked to code data provided by the investigator (6%) or elaborate it with additional information (9%).

Among the various possible partners that can be involved in OI, such as suppliers, competitors, universities, the involvement of a large number of dispersed individuals is becoming more common thanks to advancements in IT and the digitalization of the general public. In this way private and public organizations now have the opportunity to exploit crowd wisdom, i.e. the wide variety of expertise and resources people are endowed with (Bayus, 2013; Mollick and Nanda, 2016). Therefore, individuals have become more and more successfully involved in providing ideas for innovation, data for scientific projects and funds for promising new entrepreneurial ventures, which are referred to, respectively, as crowdsourcing, citizen science and crowdfunding. We provide a recommended reporting template in the S1 Appendix with both standard items that should be included in reporting all online crowdsourcing studies and items to use in reporting specifically for big data augmentation.

This is when big data analytics proves to be highly beneficial in coagulating crowdsourcing success. Organizations can identify the real titbits in crowd sourced data that drive innovation, development decisions and market practices by making use of well-established big data principles. Through crowdfunding, it may be possible to collect funds in an easier and faster manner than traditional sources of financing. Indeed, crowdfunding involves fewer duties and complexities because it directly asks the general public, connected through web-based platforms, for financial help to support entrepreneurial ideas (Belleflamme et al., 2014; Bi et al., 2017; Mollick, 2014). Moreover, the nature of the interaction is online only, rather than face-to-face, thus limiting the exchange of information with backers (Cappa et al., 2022b). Finally, the definition of a supervisory authority and the definition of clear regulations are still in development and not yet uniform worldwide (Cicchiello et al., 2020; Vismara, 2016).

In reward-based crowdfunding, backers typically provide small amounts of money in exchange for a reward. This reward can be a prototype of the item that will be produced, or branded merchandise like a unique t-shirt or a discount on the product when it is ready; this is the most widespread form of crowdfunding (Belleflamme et al., 2014; Cappa et al., 2022b; Davis et al., 2017; Kraus et al., 2016; Zhang and Chen, 2019). In lending-based crowdfunding, on the other hand, entrepreneurs raise funds in the form of loans that they will pay back to lenders over a pre-determined timeline with a set interest rate (Moysidou and Hausberg, 2019). Finally, equity-based crowdfunding https://1investing.in/ is based on the exchange of shares in a private company for financial capital (Block et al., 2018), in a manner similar to what happens during acquisitions through traditional stock markets. From this perspective, it is argued that organizations should also collect data from non-customers in order to make better business decisions and improve performance. As the advancement of IT has made it possible to overcome social, physical and geographical barriers, companies are increasingly including crowds in their activities, and it might thus be feasible for many organizations to leverage crowd-based phenomena to also collect big data from non-customers.

Our research practices and procedures distill large volumes of data into clear, precise recommendations. Our independence as a research firm enables our experts to provide unbiased advice you can trust. Meanwhile, an adult beverage company could look for data on how grocery stores are setting up Super Bowl end caps. Instead of deploying high-priced consultants to do display audits, the company could pay customers to report on the appearance of those endcaps while they are out shopping. On top of that, increased investments in infrastructure like the Boston Consulting Group describes will likely continue to make it easier for people to access the mobile internet, regardless of where they are. In the past, Veeramachaneni’s group has developed software that automatically generates features by inferring relationships between data from the manner in which they’re organized.

Letter to the MIT community: Announcing the Climate Project at MIT

We found that researchers are beginning to do this, but they do not offer enough detail on the process for formal evaluation or replication. In light of these opportunities and challenges, the remainder of this article examines three case studies and focuses on developing clear, evidence-based best practice guidelines on when and how researchers can successfully augment data with MTurk and report on doing so. Characteristic of social science skepticism around big data are concerns that “the reliability, statistical validity and generalizability of new forms of data are not well understood. The type of big data we focus on does not come from a heavily theorized and well- planned scientific research project–they “are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis”–which, at a minimum, creates discomfort among social scientists [5, 11]. Instead, it is a byproduct of other activity, which “has led some scholars to ask whether [big] data can provide anything beyond crude description” [12]. Without additional contextual information to help “tame” it, the concern is such data will remain too “wild” for answering valuable questions of interest in the academic social sciences.

But where the top-scoring entries were the result of weeks or even months of work, the FeatureHub entries were produced in a matter of days. And while 32 collaborators on a single data science project is a lot by today’s standards, Micah Smith, an MIT graduate student in electrical engineering and computer science who helped lead the project, has much larger ambitions. MIT researchers have developed a new collaboration tool, dubbed FeatureHub, intended to make feature identification more efficient and effective. With FeatureHub, data scientists and experts on particular topics could log on to a central site and spend an hour or two reviewing a problem and proposing features. Software then tests myriad combinations of features against target data, to determine which are most useful for a given predictive task.

Crowdsourcing and Big data analytics together can help organizations exploit information for making informed business decisions that are a worthy quest. Organizations with on-premise or in cloud big data managements systems will not merely have to invest in hardware or software costs but also will have to incur various other significant startup costs. Thus, companies might not be willing hire employees for data management tasks paving way for crowdsourcing in big data ventures. Large language models (LLMs) like GPT-4 have captivated business leaders with the promise of enhanced decision-making, streamlined operations, and new innovation. Companies such as Zendesk and Slack have started using LLMs to advance customer support, improving satisfaction and reducing costs.

CrowdFlower pays 5 million data organizers to help clean up the system, crowdsourcing them through the internet in a much laid back way. Similarly, Kaggle posts problems jobs online so they can reach the right people for the most reliable data – an act that can cause great competition among scientists. This combination of big data and crowdsourcing leads to another important aspect of modern data collection – crowd science. The results of our content analysis highlight that academic use of MTurk is largely limited to experimental studies and surveys. In contrast to this typical use, we advocate that researchers expand their use of MTurk for data augmentation, which will have particular benefits for social science applications of big data that wish to address concerns about validity and value.

For example, a common application of big data augmentation through online crowdsourcing is asking workers to answer questions about a specific web link. If finding the initial links is also a goal, devoting a single task to identifying a suitable web address and asking subsequent workers to verify web address accuracy can save on excess pay while also providing cross-verification of the initial task’s success. Automated augmentation approaches, such as adding value through sentiment analysis, are also difficult to implement without advanced training and may themselves be of questionable validity. The Facebook experiment discussed above has been criticized by social scientists for the augmentation being of unknown and potentially low validity [19].

Enhancing big data in the social sciences with crowdsourcing: Data augmentation practices, techniques, and opportunities PLOS ONE

Crowdsourcing: A Snapshot of Published Research

Laboratory for Information and Decision Systems

What Are the Main Types of Crowdsourcing?

Letter to the MIT community: Announcing the Climate Project at MIT

Leave a comment

Cancel reply