Responsible Innovation: Would you let an algorithm shortlist you for the job?

Updated: Mar 23



Online personality assessment tests are increasingly being used in the recruitment process by a wide range of organizations. These personality assessment tools use AI algorithms to ‘assess candidates' motivations, values and characteristics, and evaluate their odds of success in a specific role.’[1] They promise organizations an efficient and cost-effective way to screen out candidates who are deemed to be “not a fit” before inviting them to participate in the human-led interview process. Online personality assessments are part of a new wave of AI-enabled hiring processes which have been adapted by organizations without adequate oversight. Forward-thinking regulators, such as the City of New York, are just beginning to put in place regulatory mechanisms to counter widespread algorithmic bias in recruitment.[2]

Personality Assessments in Recruitment


Personality assessments typically occur after candidates have been shortlisted based on stated experience, achievements and competencies in their CV that align with the requirements for the specific role. Candidates are asked to complete a series of questions during the assessment. In some cases, candidates can answer based on a rating scale, in other instances, the questions are forced-choice or ipsative. There is some research to suggest that an ipsative approach is less suited to personality assessment tests because they are less nuanced, less reliable and don’t allow for comparisons between candidates.[3][4]


We recently completed a widely used personality assessment as part of our research for this article. This test had approximately 100 questions and candidates were required to select a choice presented for each question asked within the allocated time. Upon completion, the candidate receives an assessment report, which may state something like:


This report outlines your strengths and development needs. It focuses on the most significant results. Your answers have been compared with a large group of persons with a similar professional background. On the basis of this comparison a benchmark profile is generated indicating which strengths you can build on in comparison to others, and which areas you need to take care of to ensure long-term professional success.

While the narrative of the report appears focused on personal improvement, the actual use of these test results is driving inferences around personality fit for a particular role. The use of these assessments is not clearly explained to candidates nor is there any explanation about how these systems actually go about assessing the required personality required for the role. Consequently, any inference that the algorithm has made, are open to challenges around accuracy, validity, reliability, robustness and resilience[5].


When we contacted the organization using this assessment tool for an explanation of the results, the recruitment representative was not able to provide one. The absence of any reasonable explanation by the organization as to how that online personality assessment was used in the decision-making process, based on its algorithm’s inference on the candidate’s personality is problematic. This is coupled with a lack of challenge provisions for the candidate. Both issues raise clear concerns over the organization's compliance with the transparency and explainability provisions in GDPR for the data subject’s rights, as well as its validity in the hiring process for the role in question.


How Personality Assessment Algorithms Work


Personality assessment tests measure a range of characteristics such as feelings, attitudes, motivations and behavioural traits. They promise to streamline the hiring process while helping employers find candidates who will perform successfully within an organization. Typically, the thought process that informs the use of these tests is a “more of the same” approach. That is to say, organizations tend to measure and predict fit for a new hire in terms of shared characteristics (traits, attitudes, behaviours) with the company's current model workforce holding similar roles. The characteristics or attributes that define success and organizational fit for a company are encoded into the testing system algorithms. At the same time, the system will also serve to screen out characteristics that are deemed undesirable.


Many of the algorithms used in commercial personality assessment tools are kept secret as part of a company's intellectual property. However, a search of the US patent database proved fruitful in finding an example of a system that could be examined in at least some level of detail based on its patent filing.


In 2007, a patent was issued to PeopleAnswers for an application paper entitled “Behavioral profiles in sourcing and recruiting as part of a hiring process”.[6] This system compares the survey results of a job candidate against the weighted survey results of model employees within the organization in order to match key behaviour traits deemed important for that role by the company. It then ranks how closely the candidate’s answers correspond to that of the model employees. For example, a company might identify “high energy” as a desirable trait for a sales position. The trait of “high energy” is measured by answers to specific survey questions. These answers are then fed back into the system, adjusted through a set of weightings and then rolled up into a final outcome.


Does personality determine job performance?


One core assumption in the use of personality assessments in hiring is the role that personality, typically measured as attitudes and behavioural traits, plays in determining job success. When PeopleAnswers was purchased in 2014 by Infor, a global enterprise software company with over 70,000 clients, the press release stated that:


According to research, 46 percent of new hires are bad hires and 89 percent of them fail because of attitudinal reasons.. Skills are not predictive of positive outcomes for a new hire.”[7] (Infor) 

These statistics are from a study conducted by Leadership IQ[8] and popularized in a book written by their CEO. The study sounds impressive.[9] It’s described as a three-year study of over 5,200 hiring managers at 312 organizations who collectively hired more than 20,000 employees during the study period. However, we could not find evidence of this particular study being peer-reviewed, published in a credible academic journal or even published independently by a third party. To be fair, we have not conducted a full literature review to determine if other credible research supports or refutes the finding that attitudinal reasons overwhelmingly account for new hire success. However, we can say with confidence that the use of personality measures as predictors of job performance remains a contested area.


How is the assessment constructed?


Yet, even if we set that issue aside and assume that personality as measured by behavioural traits and attitude are very important to a new hire’s job performance, we still need to probe further. There are other questions we need to ask questions about the use of personality assessment tests. These include the following:

  • Can behaviour be reduced to a set of traits.

  • How are organizations defining these traits? For example, what is “high energy”? Would a set of model employees define and score this term the same way?

  • To what degree do cultural factors play into how we both define and value certain behavioural traits? For example, in western countries, the Big Five[10] personality traits (agreeableness, openness, conscientiousness and neuroticism) represent what is valued and are often presented as being universal. However, they differ from the Chinese Personality Assessment[11] traits (accommodation, dependability, interpersonal relatedness and social potency). How much cultural bias is embedded in the construction of these assessments?

  • Even if we further assume that we can consistently identify, define and score these personality traits reliably in an online test, there are still further questions to ask:

  • Are these systems accurate measures of behavioural traits?

  • On the part of the company, there are layers of reductionist scoring taking place that represent a survey taken at a point in time. Is this information updated and if so, how?

  • On the part of the candidate, a person’s entire personality is being reduced to a set of 50 or 100 questions with forced choices (ipsative) answered at a specific point in time under some level of “duress”, as the candidate knows this information will be used to determine if they progress in the hiring process.

  • How are weightings calculated and assigned? Who decides those parameters?

Organizations have made two major assumptions, that personality is a good predictor of job performance and that personality traits can be reliability measured. But still, there are questions, and this raises a red flag around the use of personality assessments in hiring scenarios. Hiring decisions are high stakes and making decisions about who to hire is complex. These systems are often positioned as reducing bias and making more objective decisions. The reality is that they encode and amplify bias while providing a veneer of objectivity thereby creating barriers for certain groups of people.[12] [13] This seems to be especially true for individuals with a disability as the Center for Democracy & Technology demonstrated in a recent report.[14] Those involved in the recruitment process can absolve responsibility for making a choice by simply following the recommendations of the system.


Stakeholders and Risk


If we take a moment to provide a very high-level assessment of these systems, we see the following stakeholders as being involved:

  • Job candidate

  • Hiring organization overall. We could break this down further into specific contacts such as the HR recruiter, the departmental supervisor, other people who work in the department.

  • Software vendor of the personality assessment system

  • Society, particularly vulnerable groups who may be disproportionately disadvantaged by the system. This may be on a protected class variable such as race, age, gender, ability or sexual orientation. Intersectionality may further disadvantage certain people (ie. BlPOC women).

The risks, by group, might look like the following:

  • Candidate: Strong and possibly better candidates excluded from being considered by the Hiring Organization due to their inferred personality not matching that defined by the trained model.

  • Candidate: There’s no transparency in the process of how these personality assessments derive their outcome. Thus, individuals often blame themselves when the outcomes are not favourable. Over time this can erode a person’s confidence and possibly result in mental health issues. It’s reported by Psychology Today[15] that up to 80% of organizations with over 100 people use personality assessments. Repeated discrimination reduces the chances of employment which impact financial well-being.

  • Hiring Organization: Risk that the inferences of the candidate’s personality from these systems do not accurately, validly, reliably, robustly or resiliently reflect their actual personality, resulting in strong and/or ideal candidates not being included in the shortlist for consideration, if the results are not reviewed in detail. Inferences are not facts.

  • Hiring Organization: Risk in using these systems to apply a “more of the same” approach to hiring resulting in less diversity - a value that many organizations say is important. In the wake of the murder of George Floyd and Black Lives Matter, organizations around the world have issued DEI statements, launched training initiatives and put in place DEI leaders. If the current employee pool is not as diverse or inclusive as organizations would like, how does measuring against the traits of current model employees as the standard for new hires make sense? Organizations that do not recognize this flaw are missing out on diversifying their talent pool and capabilities to innovate responsibly in the digital ecosystem, due to the application of unconscious bias in their recruitment process.

  • Vendor: If clients using these systems face legal action or reputational risk, they will surely implicate the system vendors as part of the chain of accountability. Regulators may also impose fines for the vendor in addition to the hiring organization.

  • Societal: Unless candidates applying for any role in any organization share the same personality traits of the employees in those organizations, they are unlikely to be inferred to be suitable by algorithms trained on data representative of their employees. The majority of candidates in the current climate of high unemployment will continue to struggle to find work in organizations that have elected to deploy such algorithms in their hiring processes, further amplifying the societal impact of algorithmic discrimination. This will disadvantage certain groups more than others.

Regulations and Risk


Regulators around the world have responded with plans to introduce new regulations to safeguard the interests of candidates subject to the use of algorithms by organizations in the hiring process.


For organizations that need to comply with EU regulations, Annex 3 of the proposed EU AI Act[16] outlines High-Risk AI Systems Referred To In Article 6(2). It includes :

“4. Employment, workers management and access to self-employment:

(a) AI systems intended to be used for recruitment or selection of natural persons, notably for advertising vacancies, screening or filtering applications, evaluating candidates in the course of interviews or tests;


(b) AI intended to be used for making decisions on promotion and termination of work-related contractual relationships, for task allocation and for monitoring and evaluating performance and behaviour of persons in such relationships.”


The proposed EU AI Act has set a high bar for how High-Risk AI Systems within organizations are developed and/or deployed.


Whilst the EU AI Act is not yet law, the existing GDPR both in the EU and UK formats have provisions within it to afford a basic level of protection for data subjects whose personal data are processed by automated decision-making systems – essentially AI, algorithmic and autonomous systems.


For organizations that need to comply with UK regulations, the UK GDPR Article 22[17] addresses regulatory obligations when deploying automated decision-making and profiling:


Article 22.1 outlines, “The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly affects him or her”.


The ICO also outlines what organizations need to do[18] when they use AI, algorithmic or autonomous systems to process personal data and infer automated decisions:


You must inform individuals if you are using their data for solely automated decision-making processes with legal or similarly significant effects. This applies whether you have received the data directly from the individuals concerned or from another source.


You must also provide meaningful information about the logic involved and what the likely consequences are for individuals.


This type of processing can be invisible to individuals so in circumstances where it can have a significant impact on them you need to make sure they understand what’s involved, why you use these methods and the likely results.


If someone is unhappy with a decision you’ve made using a solely automated process, they can ask for a review. It makes sense for you to explain how they can do this at the point you provide the decision.


You need to show how and why you reached the decision, so it’s important that you understand the underlying business rules that apply to automated decision-making techniques.


You must be able to verify the results and provide a simple explanation for the rationale behind the decision.


The systems you use should be able to deliver an audit trail showing the key decision points that formed the basis for the decision. If your system considered any alternative decisions you need to understand why these were not preferred.


You should have a process in place for individuals to challenge or appeal a decision, and the grounds on which they can make an appeal. You should also ensure that any review is carried out by someone who is suitably qualified and authorized to change the decision.


The reviewer should take into consideration the original facts on which the decision was based as well as any additional evidence the individual can provide to support their challenge.”


Most organizations that have implemented their GDPR programs are likely to have their privacy policies reflect relevant elements from GDPR by their legal teams. We expect any organization that has AI, algorithmic or autonomous systems processing personal data to cite Article 22 in its privacy policy.


The question remains, how many organizations that have deployed AI-powered hiring tools, such as the Personality Assessments discussed earlier, can truly demonstrate that they have operationalized the actions listed above as suggested by the ICO?


Meanwhile, The New York City Council had introduced a law[19] that requires algorithms used in hiring to be “audited” for bias.


Mitigation Strategies


There are many dimensions to the challenges faced by organizations deploying online assessments powered by AI, algorithmic or autonomous systems. These are reflected in the risks outlined in this article. At the same time, there are ways that organizations can mitigate risks. These include:

  • Limiting the use of personality assessment testing in the context of hiring decision-making, and where they are used, stringent due diligence is exercised before they are deployed

  • Providing transparency by sharing details of how personality tests are used

  • Ensuring adequate data privacy by understanding how vendors of these systems retain and use candidate data beyond the original purpose

Crucially, these socio-technical systems, unlike traditional IT systems, are trained on predetermined data sets, are non-deterministic and deployed on people whose characteristics may or may not be represented by those datasets. Consequently, the impact from automated decisions inferred to by the algorithms may not be accurate, valid, reliable, robust or resilient, impacting not only the candidates but also the reputation of the hiring organization, its employees, its leadership, its Board and shareholders.


Any mitigation needs to be holistic and considers not just the technology and data aspects, but also the wider organizational capabilities, goals, ambitions, culture and leadership mindset.


ForHumanity has recently announced that their criteria for the New York City Council's Bias Audit law for Automated Employment Decision Tools (AEDT) is now available for immediate consultation by anyone interested in contributing to ensuring that all downside risks relating to the use of AI, algorithmic or autonomous systems in hiring, are mitigated.


Conclusion


There are two key areas of concern that vendors and organizations that adopt vendor solutions incorporating AI, algorithmic or autonomous systems processing personal data need to address:

  1. Vendors of solutions that incorporate AI, algorithmic or autonomous systems processing personal data have a responsibility to disclose the limitations and downside risks related to their solutions during the sales process and work with their clients to mitigate those risks during the implementation process since adverse outcomes can easily be linked back to their solutions;

  2. Organizations that adopt vendor solutions incorporating AI, algorithmic or autonomous systems processing personal data need to be aware of the limitations and downside risks of these solutions and take the necessary steps to monitor their performance and implement robust controls and risk mitigations so that their customers are not subjected to the consequential adverse outcomes. Organizations can delegate responsibilities but they cannot delegate accountability for the adverse impact experienced by their customers.

Organizations that are building or deploying AI-powered solutions have a moral and increasingly, a legal, responsibility to ensure these solutions are not disadvantaging, discriminating or causing harm. Choosing to pro-actively think through and assess how these systems are being used in the context of recruitment is the responsible choice for organizations and their leaders to make.


We look forward to hearing your thoughts. Feel free to contact Chris or Katrina via LinkedIn to discuss and explore how we can help you and your organization innovate responsibly.


Chris Leong is a Fellow at ForHumanity and the Director of Leong Solutions Limited, a UK based management consultancy and licensee of ForHumanity’s Independent Audit of AI Systems.


Katrina Ingram is the Founder and CEO of Ethically Aligned AI, a Canadian based social enterprise aimed at helping organizations make better choices in designing and deploying AI systems.


Sign up for our newsletter to have new blog posts and other updates delivered to you each month!

Ethically Aligned AI is a social enterprise aimed at helping organizations make better choices about designing and deploying technology. Find out more at ethicallyalignedai.com

_________


[1] How to use personality tests

[2] Why New York City is cracking down on AI in hiring

[3] Top Five Differeces Between Ipsative and Normative Personality Assessments

[4] Examination of the Test–Retest Reliability of a Forced-Choice Personality Measure

[5] For Humanity. Accuracy, validity, Reliability, Robustness and Resilience (AVR3)

[6] Behavioral profiles in sourcing and recruiting as part of a hiring process

[7] Infor announces acquisition of PeopleAnswers

[8] Leadership IQ

[9] Leadership IQ: Why new hires fail

[10] Big Five Personality Traits

[11] Development of the Chinese Personality Assessment Inventory

[12] Job hiring increasingly relies on personality tests, but that can bar people with disabilities

[13] Hiring a “will do” workforce: ADA challenges to personality tests

[14] Algorithm-driven Hiring Tools: Innovative Recruitment or Expedited Disability Discrimination?

[15] The use and misuse of personality tests for coaching and development

[16] EU AI Act document 52021PC0206

[17] ICO - what does the UK GDPR say about automated decision-making and profiling?

[18] ICO - what else do we need to consider if article 22 applies?

[19] Why New York City is cracking down on AI in hiring