# CONSIDER THE TASK OF CLASSIFYING A CUSTOMER THAT OWNS A BANK CREDIT CARD AND IS ACTIVELY USING ONLINE BANKING SERVICES. LOOKING AT THE PIVOT TABLE, WHAT IS THE PROBABILITY THAT THIS CUSTOmer

Partition the data into training (60%) and validation (40%) sets.
Table 1 gives the pivot table for the training data with Online as a column variable, CC as a row variable, and Loan as a secondary row variable. The values inside the cells should convey the count (how many records are in that cell).

Table 1 : The pivot table for the training data

1) Consider the task of classifying a customer that owns a bank credit card and is actively using online banking services. Looking at the pivot table, what is the probability that this customer will accept the loan offer? (This is the probability of loan acceptance (Loan=1) conditional on having a bank credit card (CC=1) and being an active user of online banking services (Online=1)).

……………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

2) Given the two separate pivot tables for the training data. Table 2 has Loan (rows) as a function of Online (columns) and Table 3 has Loan (rows) as a function of CC.

Table 2: Pivot tables for the training data: Loan (rows) as a function of Online (columns)

Table 3: Pivot tables for the training data: Loan (rows) as a function of CC.

Compute the following quantities (P(A|B) means “the probability of A given B”):

Answer: (6 marks, 1 for each)

i. P(CC = 1|Loan = 1) (the proportion of credit card holders among the loan acceptors)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
ii. P(Online = 1|Loan = 1)
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………
iii. P(Loan = 1) (the proportion of loan acceptors)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………

iv. P(CC = 1|Loan = 0)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………

v. P(Online = 1|Loan = 0)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
vi. P(Loan = 0)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………

3) Use the quantities computed above to compute the naive Bayes probability:

P(Loan = 1|CC = 1; Online = 1)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………

4) Compare this value with the one obtained from the crossed pivot table in question 1). Which is a more accurate estimate?

……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
5) Given the following table 4 as an output of XLMiner using KNN technique, what is a choice of k that balances between overfitting and ignoring the predictor information? Justify your answer.

Table 4: Prior class probabilities

Answer: (3 marks, 1.5 for the value and 1.5 for the justification)
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………