Ok but: doesn't bias, in the negative way lay people typically use that word in relation to algorithms (racial bias, capitalist mercenary bias), happen on top of / underneath / alongside (not sure which is most accurate) the "pruning by popularity" mode of bias that you're talking about in this post? For example, when fresh new baby algorithms get "trained" at the start of their lives, before interacting themselves with any mass data, they are fed training sets that have been selected just for them by their makers, who have visions of what they can and should do in the world that? While those training sets may have emerged out of "pruning by popularity," they are chosen carefully, no?
And perhaps they are also fed models of likely use (or desired use) that are more intentionally crafted ...? Doesn't the negative form of bias slip in, there?
It really depends on the application. The current metric is that you need about 10 million correctly labeled training data to train up a deep learning AI. There's only so much care one can put into selecting such a large training set. The mass data is there from the start--and when networks recondition themselves over time (as most do) in response to user feedback, even the initial conditions start to dissipate.
Screening the training data ahead of time is a chicken and egg problem. You can try to adjust for particularly egregious biases on a case-by-case basis, but no way to correct (or even identify) the entire set of biases one wishes to eliminate.
Take the 70K MNIST database of handwritten digit training data used to train zipcode identifiers. Here there probably aren't too many nefarious biases, but you're still favoring certain ways of writing each digit and giving them massively increased normative force in uniform fashion. Depending on the data, AIs will have an easier time with crossed or uncrossed 7s and closed or open 4s. How could you eliminate a favorable bias toward certain forms of handwritten digits? You can't equalize AI error rates across different forms of digits, because again, it's a chicken-and-egg problem: how do you know which form it is if you can't guarantee the lack of bias in the AI?
Another way to put it is that what we think of as negative bias is not sufficiently well-defined apart from the overall biases being learned--and attempting to do so only relocates the problem to whoever/whatever is doing the fixes. "Ethical AI" often serves as a fig leaf for ignoring the larger problem.
Ok but: doesn't bias, in the negative way lay people typically use that word in relation to algorithms (racial bias, capitalist mercenary bias), happen on top of / underneath / alongside (not sure which is most accurate) the "pruning by popularity" mode of bias that you're talking about in this post? For example, when fresh new baby algorithms get "trained" at the start of their lives, before interacting themselves with any mass data, they are fed training sets that have been selected just for them by their makers, who have visions of what they can and should do in the world that? While those training sets may have emerged out of "pruning by popularity," they are chosen carefully, no?
And perhaps they are also fed models of likely use (or desired use) that are more intentionally crafted ...? Doesn't the negative form of bias slip in, there?
It really depends on the application. The current metric is that you need about 10 million correctly labeled training data to train up a deep learning AI. There's only so much care one can put into selecting such a large training set. The mass data is there from the start--and when networks recondition themselves over time (as most do) in response to user feedback, even the initial conditions start to dissipate.
Screening the training data ahead of time is a chicken and egg problem. You can try to adjust for particularly egregious biases on a case-by-case basis, but no way to correct (or even identify) the entire set of biases one wishes to eliminate.
Take the 70K MNIST database of handwritten digit training data used to train zipcode identifiers. Here there probably aren't too many nefarious biases, but you're still favoring certain ways of writing each digit and giving them massively increased normative force in uniform fashion. Depending on the data, AIs will have an easier time with crossed or uncrossed 7s and closed or open 4s. How could you eliminate a favorable bias toward certain forms of handwritten digits? You can't equalize AI error rates across different forms of digits, because again, it's a chicken-and-egg problem: how do you know which form it is if you can't guarantee the lack of bias in the AI?
Another way to put it is that what we think of as negative bias is not sufficiently well-defined apart from the overall biases being learned--and attempting to do so only relocates the problem to whoever/whatever is doing the fixes. "Ethical AI" often serves as a fig leaf for ignoring the larger problem.