GREAT LEADERSHIP SKILLS CAN BE LEARNT

I felt school at 16. It was the early ’80’s and jobs of any sort were in very short supply. Despite getting really good grades in my exams staying on at school or going to college or university…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Bias In Machine Learning

Hardt mainly focuses on how machine learning is at the cusp of taking over much of the decision making from humans. He uses this to ground the importance of his argument and why it has the potential to have such long reaching impacts. He starts his classification by saying that the algorithms are by no means inherently fair. He says that the algorithm will pick up any biases that are built into the underlying dataset. He makes a point to say even if there is no feature in the set that specifically points to a protected attribute (such as age or race) it can be redundantly encoded in the other features. He also says that the model is better at predicting things that it has more data on. This means that it is worse at predicting outcomes for minorities (differences in classification accuracy). He also speaks about the ability to use a separate classifier for minority groups but that would require acting on a protected attribute. He brings up the question of combining two linear models together, one for the minority and one for the majority person but this is at the limit of what is possible today. Finally, he mentions the meaning of 5% error where due to the low number of minority data points, the model could be way worse than 95% accurate on the minority data points (say 50% accurate) and not hugely impact the overall accuracy of the model.

Barocas & Selbst focused on very similar ideas as Hardt but go further to explain with words and examples, the topics that they are speaking about. One of the major differences is that in this paper there is a larger focus on how humans must translate the problem into target variables (what is a good employee?) that the computer can understand. They speak to how there is potential bias in the way that is translated. They go on to talk about how if the dataset is incomplete or biased then the model will also be biased, which is the same point as Hardt’s that the algorithm will pick up any bias that is in the underlying dataset. They also bring up the idea of that non-protected features can be connected to other protected features (act as proxies). They speak specifically to the bias in the selection of the features whereas Hardt focused on the point after the features were selected. Both papers speak to the idea that groups that represent small sets of data tend to have worse predictions coming out of the model. This paper brings up the idea that minorities may get overrepresented through their example that managers may devote disproportionate amounts of time to monitoring the behavior of minority groups. They bring up a new type of discrimination that is not spoken to by Hardt and that is masking, the idea that people can blame the algorithm for the bias when they intentionally injected the bias and then blamed the model.

The White House report speaks about the inadvertent bias that may be in the algorithms and how they can perpetuate existing biases. The White House paper speaks to both data that is poorly selected and that may disadvantage minorities. As well as the potential for incomplete data that may disadvantage the people the missing data was about, and the new topic of correlation being mistaken for causation. This paper also goes a bit more into the underlying algorithm talking about how these algorithms may restrict the flow of data to certain groups. Overall this report was much more focused on outcomes then the technical inputs of the other two sources. This makes sense since the source was a governmental report. Also, the White House report talks about the lofty goal of removing bias from the processes using machine learning which the other papers do not cover.

There are many examples from the media about times where there are biases in machine learning. Below is a small summary of the biases found and how it is related to the categories listed above.

In the White House Report, Georgia State University is presented as a positive example of Big Data used to make decisions about people. There is more to this story than a golden algorithm that is perfect. The improvements are real and substantial in terms of the graduation rate. I found that the graduation rate was in the area of 30% which is bad to begin with. So, a 10% improvement is great but there is still more room to go. Also, my reading found that much of the improvement in graduation rate was from correcting incorrect class registrations which I would argue is below the caliber of machine learning but could be done with a very simple program. They steer people who are not taking the correct classes needed for their majors. This seems like a band-aid on the problem of a lack of true advising. It is worthy to note that I come from Bucknell which has a very different student to faculty ratio than Georgia State, so I look at this situation from a different perspective. Also, it was not the machine learning that fixed the problem it was the advisors who met with the kids that made a mistake during registration that really fixed the problem. They in parallel more than doubled the number of advisors for students, this likely played a large role as well in the change in graduation rates.

Machine learning is not just a method of introducing bias to processes. It has also been found to help remove bias in specific cases. There is presently a movement to remove bias in the hiring process. This bias is likely unconscious in the eyes of the HR reviewer. AI would allow the process to be audited and stopped if certain standards are not met. They also say that it would allow all the candidates to be reviewed compared to the current process that results in a very small portion of the candidates being truly reviewed (1). A TechCrunch article suggests that it is important to not allow ML or AI to be assumed to always be correct as that could allow for the continual perpetuation of the bias in the algorithms. ­­­They also speak to the importance of the right learning algorithm for a given situation, the importance of using a representative dataset and finally monitoring the outcome of the model. There is a tendency to let the models run free (not checked) which is dangerous as the models can have bias that does not get better if left unseen (2). There is some suggestion that it is important to make sure that you do not use the model outside of the realm of where the data was acquired, or it could be inaccurate (Like extrapolating a linear line) (3). The McKinsey paper suggests that machine learning could have reduce bias as the models learn to only consider information that improves prediction accuracy. This could help remove data that has no bearing on the target variable from being considered (4).

Add a comment

Related posts:

Never

The ship has drowned And so is the treasure.. “Never” is published by Bossyhead in 3 Lines Story.

Open up a new level of audio mangling in Reaper.

Reaper is almost infinitely adaptable. Routing tracks can be as complex or a simple as you like, just like in a real studio. And just like in a real studio you can create feedback loops. Now you’ll…

THE WALLS MUST COME DOWN

You are trapped in a prison of your own making…. “THE WALLS MUST COME DOWN” is published by Rosemary Nonny Knight - The Money Minister in The Deliberate Millionaire.