Organisation Diversity check in Python with DeepFace
Machine learning is a double-edged sword
There has been a lot of controversy about how bias in machine learning portrays social inequalities and how this might affect the outcomes of minority groups (also see this article named “AI is sending people to jail”). To be blunt, I don’t agree. Looking past the headlines that try to stimulate the emotional part of our brain will show you that machine learning is a two-edged sword. This post was written to demonstrate the other side of this sword. I will describe a short experiment in which I will try to indicate how it can be used to identify, and possibly also counteract, a lack of diversity within certain organisations.
The idea came to mind during a search for a covid pocketbook. I ran into this medical book publishing firm. It’s a company that was started a couple of years ago by two medical students who have become successful publishers and medical doctors. As a way to demonstrate the medical community involvement in creating content for their books, they posted a collection of pictures with names of medical professionals engaged in medical education and that contributed to the pocketbooks. Looking at this contributors page I had the same shock reaction as the first time I attended college at a university: there are almost no people of other ethnicities. I was in the expectation that this imbalance had improved the years after I had completed my study more than half a decade ago, however, this was an erroneous assumption. It looked like there had been little change.
Nowadays, conducting automation, data science and machine learning with Python gives us the opportunity to look at data on a large scale with a relatively small amount of effort. It might also be used to assess and progress the level of diversity within organisations. I therefore stated the following question:
Can machine learning be used to evaluate and monitor diversity within an organisation?
1. Choosing the first target
The website discussed above looked like a good first test target to do a diversity check.
2. Scraping like a crazy man
I started with scraping the website links of all the subpages of the homepage. I explained this scraping algorithm in detail in this tutorial. The subpage links were saved in a JSON file.
3. Browse through web pages and download all images
We open the JSON file in which the links are stores, create a function for downloading images on webpages and use that function in a loop that goes through all the links from the JSON file. We use the rand_sleep_int function within the loop to create random time intervals between download requests. We use regex to search for jpg, jpeg, fig and png images
Images download links are reachable in different ways. In the final code, I used the tags “img”, “scrset” and “avatar”. “rand_sleep_int” creates a random time interval to avoid an overload of requests being made to the server.
4. Filter images in which the number of faces is equal to one
Use an if statement to check if the length of the list of face locations is equal to one:
5. Give an identity number to every face
We want to avoid letting our machine learning algorithm assess the same face multiple times so we give an identification number to each face and if the same face is encountered we give it the same identification number. The identification numbers with a corresponding image file name will be saved in a dictionary as a key with a value. In this way we can prevent the same person from being counted more than once:
6. The actual machine learning ethnicity check work
As Andrew Ng said in his interview with Lex Fridman about ML:
“In a software system the machine learning model is maybe five percent
or even fewer relative to the entire software system”
The guru’s expression is reflected in the second line of this code block (together with one line in step 6 it’s the only line until now that involves ML!):want to avoid letting our machine learning algorithm assess the same face multiple times so we give an identification number to each face and if the same face is encountered we give it the same identification number. The identification numbers with a corresponding image file name will be saved in a dictionary as a key with a value. In this way we can prevent the same person from being counted more than once:
After doing the ML work we save the data in a CSV file for further analysis as shown in the last line.
7. First results
Okeeeey, time for some results. We count ethnicity with pandas:
latino hispanic 3
middle eastern 2
Name: Ethnicity, dtype: int64
We find 114 faces with an identified ethnicity of which 102 are white according to the DeepFace module, which is equal to around 89%. This is an organisation in Amsterdam, a city wherein around 51% of the people have a migration background…
8. Accuracy of ethnicity
Now let’s check what happens if you add a semi-random sample of 10 pictures with black people: 7 black female models, my paranymphs/friends during my PhD (down right) and me.
7 black females and two black males used in the sanity check. One of them seemingly eating some good Surinamese food :-).
latino hispanic 4
middle eastern 2
Name: Ethnicity, dtype: int64Gender
Name: Gender, dtype: int64
We find in total 123 faces with an identified ethnicity:
- 7 new black people: 3 are black men (me and my friends), 4 are black women (models).
- 1 new Latino Hispanic woman.
- 1 new Asian woman.
- The face of one woman could not be detected.
The confusion about the ethnicity of two women is in my opinion understandable considering the mixed genetic makeup of many individuals that are classified as “black”. What DeepFace seems to be NOT confuse
d about is the difference between white and non-white in this case.
9. Assumptions for a feasible organisation diversity checker
The next step is to upscale and test the algorithm on a collection of organisations. We make a couple of potentially wrong assumptions to make this project feasible:
- A potential target industry should have a work culture wherein the presentation of profile pictures of their employees on a company website is considered a good habit.
- The organisations in the collection displays only profile pictures of its employees. There could be profile pictures of other people sprinkled on the website but we assume these do not make up a relevant part of the total number of profile pictures.
- The employees of these organisations have only one profile picture on the website. If there are more than one profile pictures of the same employee we assume this will not be relevant as this will be a) an incidental finding or b) a systemic abnormality. In the case of the (a) it will not make much of a difference. In the case of (b) it doesn’t make a difference because pictures of all people will be in the same number of multiples.
10. Identifying an industry for analysis
We should also target an industry in which we think that performance will be optimal if the level of diversity would be a reflection of the diversity within society. The first thing that came to my mind were law firms. There is a tendency to display a list of pictures of friendly smiling employees that offer their legal support. I would also expect that it would be beneficial for society if not only the defendants would be from a diverse background but also the other people in court such as their respective lawyers.
11. Performing the analysis
Let’s take a look at what happens when we process a list of law firms in Amsterdam, the same area as where the book publishing organisation is located. Let’s give them a convenient name and call them Law Firm 1 to 5 in a JSON file.
1. DeepFace machine learning performance
2. Manual check
When manually checking the result we get the following table:
A python script was created to evaluate diversity within organisations based on lists of online profile pictures. The first check showed that the algorithm identifies 9/10 manually added profile pictures of people of black ethnicity accurately as being non-white. It was however more confused about the exact origin of the persons, something quite understandable considering the mixed origin of many individuals that we identify in the vernacular as “black”.
The second check showed that the script overclassifies individuals with a white ethnicity to the Asian, Latino and middle-eastern ethnicity classes, but not to the black ethnicity class. So the algorithm tends to overestimate non-black diversity. I haven’t delved into the potential causes of these discrepancies, but I guess that this has something to do with the less difficult task of differing white from black than white from other ethnicities.
So in conclusion, the current algorithm seems to perform well in identifying the percentage of black individuals as non-white but overclassifies other ethnicities. Nonetheless, we might be able to improve the configurations, create a better training set, take names into account or sharpen the scraping function and by doing so be able to get a wider and better delineated picture of diversity within different organisations and monitor progress towards more diversity. A public website that would incorporate such an algorithm could help citizens, journals and political institutes assess the diversity within organisations and by doing so act as a stimulus for more diversity at high positions within their hierarchies thereby potentially making them more receptive to the needs of a multicultural society.
The full code can be found on my GitHub page. Feel free to send me a message or leave a comment if you have any suggestions for improvement.