I assume that you are doing unit-tests for your code.
One idea that I can think of, which would maybe not do exactly what you want, is to use a linear model.
The benefit of doing that, is that you can create a bunch of other variables that you can include in the analysis.
Let's say that you have a vector $\mathbf{Y}$ which includes the outcome of your tests, and another vector $\mathbf{x}$ that includes your predictions of the outcome.
Now you can simply fit the linear model
$$
y_i = a + bx_i +\epsilon
$$
and find the value of $b$, the higher the value of $b$ would indicate that your predictions are becoming better.
The thing that makes this approach nice is that now you can start to add a bunch of other variables to see if that creates a better model, and those variables can help in making better predictions. The variables could be an indicator for the day of the week, e.g. for Monday it would always be 1, and zero for all the other days. If you include that variable in the model, you would get:
$$
y_i = a + a_{\text{Monday}} + bx_i +\epsilon
$$
And if the variable $a_{\text{Monday}}$ is significant and positive, then it could mean that you are more conservative in your predictions on Mondays.
You could also create a new variable where you give a score to assess the difficulty of the task you performed. If you have version control, then you could e.g. use the number of lines of code as difficulty, i.e. the more code you write, the more likely something will break.
Other variables could be, number of coffee cups that day, indicator for upcoming deadlines, meaning there is more stress to finish stuff etc.
You can also use a time variable to see if your predictions are getting better. Also, how long you spent on the task, or how many sessions you have spent on it, whether you were doing a quick fix and it might be sloppy etc.
In the end you have a prediction model, where you can try to predict the likelihood of success. If you manage to create this, then maybe you do not even have to make your own predictions, you can just use all the variables and have a pretty good guess on whether things will work.
The thing is that you only wanted a single number. In that case you can use the simple model I presented in the beginning and just use the slope, and redo the calculations for each period, then you can look if there is a trend in that score over time.
Hope this helps.