Applied Business Intelligence

Referral Assignment 2: Analysis of Czech Bank Data

Submission deadline:         As announced on Blackboard

Submission format:             Upload your answer to Blackboard as a .DOC or a .DOCX document.  However, Blackboard does not accept documents larger than 10 Mbytes.  If necessary, upload a ZIP copy.Learning Outcomes

This assignment will be assessed upon your ability to:

  1. Describe and critically evaluate the role and relevance of business intelligence and analytical investigation to the solution of business information problems
  2. Explain the concepts that underpin the subject area of business intelligence, making reference to main established concepts and developing areas
  3. Apply concepts and justify decision when modelling and designing practical examples of applications using appropriate industry standard software.                                                                     Notes
  • Pay particular attention to the requirements relating to Turnitin.                                                         Assessment Criteria

Marks are indicated next to each question.

This is the second part of the assignment and consists of an individual piece of work that contributes 60% to the final marks of this module.  It takes the form of a well-structured report of 1,000 words in length.  Marks will be deducted if it is outside the 10% allowance.

Problem outline

For this assignment you are required to analyse a data set taken from the data mining competition prior to the third international conference of Principles and Practices of knowledge discovery in data bases (PKDD).  This conference was held in Prague in 1999[1].    One of the challenges given for the competition was a set of datasets concerning financial transactions and details for customers at a Czech bank.   The full access database is available on Blackboard and the relationships are shown below:

Figure 1: Database Relationships for Czech Bank Data

You are required to analyse one table resulting from a query from this database as detailed below.   Full details of the fields in this table are given below and in the appendix.

Details of the Query and resulting data

We wish to build a model of customers for the bank in order to gain some insight into the patterns that exist in the customer groups.  Several queries have been developed to give a final one QueryR described in the appendix.  There were 4500 records for these customers.    For each customer different types of credits and withdrawals take place, these are categorised as follows:

Credits (Paying money into your account):                Cash;   Bank collect;        other

Withdrawal(Taking money out of your account):       Cash;   Bank remittance; Card

From the transactional table (Trans in Error! Reference source not found.) it is possible to calculate the number of each type of transaction or the total value of each type of transaction.   From these the average value of each type of transaction has been calculated by dividing the total value of transactions by the number of transactions.  These have prefix a (e.g. acredit).    The resulting final table was produced in the access database and is called Queryr.    It also contains other background information such as:  age, sex, if there is a second account holder (second), if the client has a loan (loan) and the frequency of the issuance of statements (frequency).

Analysis Carried Out and Questions

This data has been analysed using SAS Enterprise Miner as follows:

QUESTION

The bank wishes to see if different customers have similar financial profiles and have therefore asked that the Queryr data be clustered.   They are looking for about eight clusters.

1)         Having prepared the data, SAS code was used to perform a hierarchical cluster analysis.    The resulting dendrogram is shown in Figure 2 below.   Fully interpret this dendrogram, explaining how many clusters should be picked.  Fully justify your answer.

(8 marks)

Figure 2:  Dendrogram resulting from hierarchical Cluster Analysis

2)         A k-means cluster analysis is carried out and the Cubic Clustering Criteria Plot is shown in Figure 3 on page 5.  Fully discuss this plot, using it to explain how many clusters could be picked.

(8 marks)