As you all know, if you are a regular reader of this blog, I have been investing on the Isepankur p2p lending service for over a year. So far, I’m doing pretty well – Isepankur consistently ranks me into the top 10% of investors by achieved ROI. But I have to admit that my strategy was just based on common sense (or call it gut feeling), some general p2p lending knowledge and experience won over time. Of course I obeyed fundamentals like diversification.
Now Isepankur is one of the first European p2p lending marketplaces that made available the raw loan data for everyone. You can download it here.
What does the data export contain?
The data export contains over 50 parameters for each loan that Isepankur orginated since February 2011. Isepankur says new datasets will be published monthly.
How do I analyse the data?
A sophisticated person – or a statistican – will rightly recommend to use multivariate statistics to most accurately get conclusions from analysing this loan data. I don’t have the tools or the expertise to do that, so I thought I just give it a try and look how far I get in Excel. By the way – this is going to be a rather long blog post, but I think you’ll find it worthwhile.
First I defined a population of loans (universe) I wanted to look at. I selected Estonian credit grade “1000” loans (thereby excluding other credit grades and Spanish and Finnish loans) to get a somewhat homogeneous loan population. Initially I looked at loans with the parameter ‘TwoMonthsFromFirstPayment’, in order to look only at loans that are old enough to default. Later I also excluded loans that originated after Sep. 1st, 2013.
That leaves me with a population of 1325 loans to analyse.
What I want to find out
I am trying to find factors in the loan application that indicate an above average probability that a loan will go into 60+ days overdue. While Isepankur actually still recovers large parts of the principal of loans that go into 60+ days overdue (see these useful charts), it would still be great if I as an investor could reduce the percentage of my investments that become 60+ days late. There is a parameter in the download named ‘InDebt60Day’. This is what I analysed. Note that the description says ‘This loan has at one moment been overdue for 60 days’, meaning it does include loans that are now current again, or even paid off. But if we want to reduce the risks of a loan ever going into 60+ days overdue this is the parameter we want to look at.
For 126 of the 1325 loans this parameter is set to ‘1’, meaning the average risk is 9.5%. What does that absolute number tell us? Nothing much yet, it is just a reference point I’ll use to show above average and below average risk loans.
Let’s start
Okay, I downloaded the data set into Excel and excluded all loans other than the population described above. Now I use the pivot table function of Excel to look at the data.
One easy finding is that gender influences the 60+ days risk (from now on I’ll just call it risk in short).
I marked the percentage for loans to men that has ‘InDebt60Day’=’1’ orange as it is considerably above average and the percentage for loans to woman green as it is considerably below average. Continue reading