Decision Trees – Using The Available Data to Identify Lending Opportunities on Bondora – Part 2

This is part 2 of a guest post by British Bondora investor ‘ParisinGOC’.

Read part 1 first.

Data Mining the Bondora data.

The initial process.

To help understand the specific data cleansing that the Bondora Data Set needed, I first made use of the RapidMiner metadata view – a summary of all the attributes presented to the software – showing Attribute name, type, statistics (dependant on type, includes the least occurring and most occurring values, the modal value and the average value), Range (min, max, quantity of each value for polynominal and text attributes) and, most critically, “Missings” and “Role”.

“Role” is the name given by RapidMiner to the special attributes that are needed to allow certain operations. In my case, the Decision Tree module needed to know which Attribute was the “Target”, that is the attribute that is the focus of the analysis and to which the Decision Tree has to relate the other attributes in its processing.  My “Target” was the “Default” attribute – a “Binominal” (called as such by RapidMiner and meaning an attribute with just 2 values) attribute – 1 if the loan had defaulted, 0 if not.

“Missings” is easy – this is the number of times this attribute has no valid value. For example, my import of the raw Bondora input data has 150 attributes.  Only half of these attributes have no missing values.  The remainder have between 13 and 19132 rows with missing values from a data set of 20767 rows.

To know whether these “missings” would impact my analysis, I needed to get to know the data in more detail.

I knew that Bondora had started to offer loans in Finland in summer 2013 with Spain following in October of that year and Slovakia in the first half of 2014.

I therefore decided not to bother with any loan issued prior to 2013. Continue reading

Decision Trees – Using The Available Data to Identify Lending Opportunities on Bondora – Part 1

This is a guest post by British Bondora investor ‘ParisinGOC’.

Introduction

Financial institutions across the world have many ways of assessing whether a loan is worth making.  A simple search on the web reveals that many use Data Mining.  More specifically, “Decision Trees” are a particular tool within Data Mining that has been analysed and I quickly found at least 2 papers (Mining Interesting Rules in Bank Loans Data and Assessing Loan Risks: A Data Mining Case Study) amongst many pointing in this direction.

Having had some experience of Data Mining in a financial environment, I believed I could use these same techniques in my own P2P lending which, after over 12 months activity, I felt could be improved.

In this document, I explore the use of the freely available Data Mining Software “RapidMiner” and its Decision Tree capabilities when applied to the data available to investors from Bondora, a peer-to-peer (P2P) lending site.

Bondora

Bondora is a P2P lending site based in Estonia that “unites investors and borrowers from all corners of the world”, allowing investors to invest funds to satisfy advertised borrowing needs.

Fundamentally, Bondora also provides comprehensive data to investors, allowing detailed data downloads of the individual loans held by the investor, as well as data on every application made to Bondora (originally known as Isepankur) since the first application on 21st February, 2009.

It is the complete Bondora data set that I have used as the raw data for analysis as it is the best data available to find out which potential borrowers are the right match to the potential lenders.  Only if enough lenders feel that a loan application is worth investing in will the loan be fulfilled.  Self-selection is taking place in both elements of the loan fulfilment and this data is the result of that interaction.

Also shown in this data are some elements of loan performance post-drawdown.  Crucially, it shows those loans that subsequently defaulted (failed to make any payments for a period in excess of 60 days).  Although Bondora will chase the debt on behalf of the investor and have a track record of some success, there is no guarantee that the investment, or any part of it, will be returned.

Decision Trees

www.investopedia.com/terms/d/decision-tree.asp states: A schematic tree-shaped diagram used to determine a course of action or show a statistical probability.

In this case, I am using the data provided by Bondora on all its previous applications to reveal how the resulting loans that share similar characteristics have performed.

Specifically, I am using this data to show the percentage of those previous loans that have defaulted and using this to indicate how a similar, new application may perform should the application succeed in attracting enough investors.

In other words, I am using past performance data to show how future investments may perform – I feel sure I have seen this phrase somewhere before! Continue reading

How To Filter Isepankur Loans to Reduce Risks and Achieve Higher ROIs

As you all know, if you are a regular reader of this blog, I have been investing on the Isepankur p2p lending service for over a year. So far, I’m doing pretty well – Isepankur consistently ranks me into the top 10% of investors by achieved ROI. But I have to admit that my strategy was just based on common sense (or call it gut feeling), some general p2p lending knowledge and experience won over time. Of course I obeyed fundamentals like diversification.

Now Isepankur is one of the first European p2p lending marketplaces that made available the raw loan data for everyone. You can download it here.

What does the data export contain?

The data export contains over 50 parameters for each loan that Isepankur orginated since February 2011. Isepankur says new datasets will be published monthly.

How do I analyse the data?

A sophisticated person – or a statistican – will rightly recommend to use multivariate statistics to most accurately get conclusions from analysing this loan data. I don’t have the tools or the expertise to do that, so I thought I just give it a try and look how far I get in Excel. By the way – this is going to be a rather long blog post, but I think you’ll find it worthwhile.

First I defined a population of loans (universe) I wanted to look at. I selected Estonian credit grade “1000” loans (thereby excluding other credit grades and Spanish and Finnish loans) to get a somewhat homogeneous loan population. Initially I looked at loans with the parameter ‘TwoMonthsFromFirstPayment’, in order to look only at loans that are old enough to default. Later I also excluded loans that originated after Sep. 1st, 2013.
That leaves me with a population of 1325 loans to analyse.

What I want to find out

I am trying to find factors in the loan application that indicate an above average probability that a loan will go into 60+ days overdue. While Isepankur actually still recovers large parts of the principal of loans that go into 60+ days overdue (see these useful charts), it would still be great if I as an investor could reduce the percentage of my investments that become 60+ days late. There is a parameter in the download named ‘InDebt60Day’. This is what I analysed. Note that the description says ‘This loan has at one moment been overdue for 60 days’, meaning it does include loans that are now current again, or even paid off. But if we want to reduce the risks of a loan ever going into 60+ days overdue this is the parameter we want to look at.
For 126 of the 1325 loans this parameter is set to ‘1’, meaning the average risk is 9.5%. What does that absolute number tell us? Nothing much yet, it is just a reference point I’ll use to show above average and below average risk loans.

Let’s start

Okay, I downloaded the data set into Excel and excluded all loans other than the population described above. Now I use the pivot table function of Excel to look at the data.

One easy finding is that gender influences the 60+ days risk (from now on I’ll just call it risk in short).

I marked the percentage for loans to men that has ‘InDebt60Day’=’1’ orange as it is considerably above average and the percentage for loans to woman green as it is considerably below average. Continue reading

Do Your Friends Determine If You Are Creditworthy?

A new US patent filed claims that may be a good idea! The patent application “System and method for assessing credit risk in an on-line lending environment” describes a risk assessment method, where the first level links on a social network would be checked for a borrower. It aims to derive insights from looking at the age and “activity” of these first level contacts.

Does What Your Friends Say About You Determine If You Are Creditworthy?

In a step further the method suggests to “invite linked users to provide a personal endorsement of the borrowing party; sending an endorsement invitation to identified users; and receiving endorsements from the identified users, the endorsement providing a rating of the user trustworthiness based on a numerical scale; determining and aggregate endorsement score from received endorsement which is included in the assessment score

Assessment score to be used as one of several criteria

The patent filed by Canadian company Neobanx Technologies, Inc was previously already filed in Canada. Inventors Ronald N. Ingram, Dylan Littlewood and Aston Lau describe the whole process with assessment score and endorsement score being only 2 of multiple elements that are used to assess the risk.

The potential problems with using data from social networks for the purpose of risk assessment in p2p lending were already described in detail in the article: “For Debate: Can Data from Social Networks be Used to Reduce Risks in P2P Lending“.  I still think that social network data could be used to some degree as additional data for lenders – but not to the degree this patent seems to imply.

It would be most interesting to see this implemented and monitor how it works out.

What is your opinion, dear reader?

Study Shows Women P2P Lenders Not More Risk-Averse

Are women more risk-averse then men when it comes to lending money to strangers via p2p lending services? A recent study by Nataliya Barasinska, analyzed what impact gender has on the investment decisions. In the study, which was supported by a grant by the European Commission, she looked at bidding and loan data of the German p2p lending service Smava for the time span from March 2007 to March 2010.

Women are a minority among lenders, but are no more risk-averse than men

Only about 10% of the lenders at Smava are women. But they do not perceive and react to risks differently than men, when it comes to picking loans for investments. Continue reading

For Debate: Can Data from Social Networks be Used to Reduce Risks in P2P Lending?

P2P Lending is mostly anonymous and loans are unsecured. To make the risks of lending to a stranger acceptable for lenders, p2p lending services had to provide models for the lenders to judge the dimension of the risk of not getting paid back.

The initial estimation of the risk-level could not come from the platform itself as it had no track record and could not build a model that “calculated” the level of risk involved for the lender. The consistent consequence was that nearly all p2p lenders relied on established third party providers for credit history data and credit scores. Prosper for example showed Experian data on default levels to be expected depending on credit grade.

Over the time it became obvious that the actual default levels at Prosper were much higher than the expected default levels based on Experian data. We don’t actually need to argue here what led to this (be it financial development of the economy, be it that p2p lending attracted bad risks, be it a poor validation process), but the result was that since defaults were much higher than expected, lender ROIs were much lower than expected at the time of the investment.

And this is not Prosper specific. Several other p2p lending services show clear signs that default levels will (or have) surpassed the initially published percentages of defaults to be expected based on external data.

Boober failed due to default levels, on Smava levels are higher than the Schufa percentages fore-casted, same is likely for Auxmoney defaults which will be higher then Schufa and Arvato Infoscore data suggested. The one exception from the rule is Zopa UK, which successfully manages to keep defaults low, as CEO Giles Andrews rightly points out.

Continue reading