In [1]:
%reset -f
In [2]:
%matplotlib inline
In [3]:
import pandas as pd
In [4]:
import matplotlib.pyplot as plt
import helpers as h
import numpy as np
import scipy as sp
In [5]:
import json, os
In [6]:
joined = pd.read_csv('cleaned-terms.csv',index_col=0)
In [ ]:
 
In [7]:
pd.options.display.max_columns=400
pd.options.display.max_colwidth= 100
In [8]:
requiredPaths = ["tab", "img", "tex"]
for p in requiredPaths:
    if not os.path.exists(p):
        os.mkdir(p)
In [ ]:
 
In [ ]:
 
In [9]:
with open("descriptions-terms-cleaned.json", "r") as f:
    descriptions = json.load(f)
In [ ]:
 
In [10]:
newColumnNames = joined.columns.tolist()
In [11]:
joined[:2]
Out[11]:
DemoEducation DemoEducationTime DemoEducation[other] DemoEmployment DemoEmploymentTime DemoLearningDiff DemoLearningDiffTime catch5 catch5Time compCardCheque compCardChequeTime compLiable compLiableTime compNegligence compNegligenceTime compOnlineBanking compOnlineBankingTime compRemember compRememberTime compShare compShareTime datestamp demoAccountsCountry demoAccountsCountryTime demoAge demoAgeTime demoCardsCountries demoCardsCountriesTime demoCardsFrequency demoCardsFrequencyTime demoExperiencedFraud demoExperiencedFraudTime demoFraudDescription demoFraudDescriptionTime demoFraudFrequency demoFraudFrequencyTime demoGender demoGenderTime demoLanguage demoLanguageTime demoNumberAccounts demoNumberAccountsTime demoNumberCards demoNumberCardsTime feedback feedbackTime groupTime62 groupTime67 groupTime71 groupTime72 groupTime73 groupTime74 groupTime75 groupTime76 groupTime77 groupTime78 groupTime79 groupTime80 interviewtime lossEnforce lossEnforceTime lossMoneyBack lossMoneyBackRev lossMoneyBackRevTime lossMoneyBackTime lossReason lossReasonRev lossReasonRevTime lossReasonTime participantIDTime phishingEnforce phishingEnforceTime phishingMoneyBack phishingMoneyBackRev phishingMoneyBackRevTime phishingMoneyBackTime phishingReason phishingReasonRev phishingReasonRevTime phishingReasonTime startlanguage termsConfidence termsConfidenceTime termsUnclear termsUnclearTime toolAccess toolAccessTime toolCredentials toolCredentialsTime toolEnforce toolEnforceTime toolHeard toolHeardTime toolLegal toolLegalRev toolLegalRevTime toolLegalTime toolLegalWhy toolLegalWhyRev toolLegalWhyRevTime toolLegalWhyTime toolMostFrequently toolMostFrequentlyTime demoFraudDescriptionCoded termsUnclearCoded compRememberCoded compOnlineBankingCoded compLiableCoded compNegligenceCoded fedbackCoded phishingReasonCoded lossReasonCoded toolLegalWhyCoded toolLegalWhyRevCoded phishingReasonRevCoded lossReasonRevCoded
0 Degree / Graduate education (e.g., BSc, BA) NaN NaN Employed NaN No NaN Once per week NaN No NaN If you don't report it quickly enough NaN I forget their definition. NaN I forget. NaN I forget. Write it down in a secure place? NaN No NaN 2016-03-02 21:45:42 United States NaN 23.0 NaN United States NaN Several times a week NaN No NaN NaN NaN NaN NaN Male NaN Native or bilingual proficiency NaN 3.0 NaN 10.0 NaN NaN NaN 42.42 28.26 10.95 55.56 99.47 7.23 131.95 80.36 211.04 98.29 NaN 11.00 776.53 don't know NaN False True NaN NaN Banks are monsters. She told them within 2 business days, so theoretically she shouldn't be responsible (as long as ... NaN NaN NaN very likely NaN False False NaN NaN Debit cards and banking accounts have almost no protection. He sent the money himself, so he's s... The transfer was made with his permission. NaN NaN en-us Understood the majority NaN Everything is overcomplicated. The terms actively avoid using clear, simple language. NaN NaN NaN NaN NaN NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN R1, R2 R1 O0 L1 L0 NaN C1, C2 CODE2 NaN NaN C1 C2
1 Degree / Graduate education (e.g., BSc, BA) NaN NaN Self-employed NaN No NaN Once per week NaN Yes NaN call the bank immediately NaN Carelessness of a person in what he does, especially in the performance of an obligation. NaN before online banking make sure no one is near to prevent data theft NaN Change my PIN periodically. NaN No NaN 2016-03-02 22:05:38 united estates NaN 26.0 NaN united estates NaN Several times a week NaN No NaN NaN NaN NaN NaN Male NaN Native or bilingual proficiency NaN 2.0 NaN 2.0 NaN EXCELLENT ESTUDY NaN 114.66 65.78 12.12 185.56 136.22 32.01 192.97 695.13 283.48 147.69 NaN 31.06 1896.68 likely NaN False False NaN NaN when the money leaves your account no longer has any way to recover it is difficult to recover the money NaN NaN NaN likely NaN False False NaN NaN in this case it is very difficult to recover the money scammers are very adept at robbing people, and it is very difficult for Mr. L recover the money NaN NaN en-us Understood everything NaN no problem, I understand everything NaN NaN NaN NaN NaN NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN R0 R2 O1 L1 L1 NaN C3 CODE4 NaN NaN C2 Code3
In [12]:
joined['interviewtime'].plot(kind="hist")
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f3671c093d0>
In [13]:
groupNames = [x for x in newColumnNames if x.startswith("groupTime")]
In [14]:
#time spend reading the terms:
In [15]:
joined.groupTime75.describe()
Out[15]:
count    151.000000
mean     203.985563
std      147.190293
min       14.360000
25%      102.505000
50%      161.760000
75%      273.175000
max      775.680000
Name: groupTime75, dtype: float64
In [ ]:
 
In [16]:
joined.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 151 entries, 0 to 150
Columns: 116 entries, DemoEducation to lossReasonRevCoded
dtypes: bool(4), float64(62), object(50)
memory usage: 133.9+ KB
In [17]:
joined['catch5'].describe()
Out[17]:
count               151
unique                2
top       Once per week
freq                150
Name: catch5, dtype: object
In [18]:
failedUnicorn = joined[joined['catch5'] != "Once per week"]
len(failedUnicorn)
Out[18]:
1
In [19]:
failedUnicorn[["interviewtime"] + groupNames]
Out[19]:
interviewtime groupTime62 groupTime67 groupTime71 groupTime72 groupTime73 groupTime74 groupTime75 groupTime76 groupTime77 groupTime78 groupTime79 groupTime80
127 1503.35 131.52 31.06 15.95 60.51 77.81 91.73 304.2 181.59 135.81 69.92 397.84 5.41
In [20]:
failedUnicorn.dropna(axis=1,how="all")
Out[20]:
DemoEducation DemoEmployment DemoLearningDiff catch5 compCardCheque compLiable compNegligence compOnlineBanking compRemember compShare datestamp demoAccountsCountry demoAge demoCardsCountries demoCardsFrequency demoExperiencedFraud demoFraudDescription demoFraudFrequency demoGender demoLanguage demoNumberAccounts demoNumberCards groupTime62 groupTime67 groupTime71 groupTime72 groupTime73 groupTime74 groupTime75 groupTime76 groupTime77 groupTime78 groupTime79 groupTime80 interviewtime lossEnforce lossMoneyBack lossMoneyBackRev lossReason lossReasonRev phishingEnforce phishingMoneyBack phishingMoneyBackRev phishingReason phishingReasonRev startlanguage termsConfidence termsUnclear toolAccess toolCredentials toolEnforce toolHeard toolLegal toolLegalRev toolLegalWhy toolLegalWhyRev toolMostFrequently demoFraudDescriptionCoded termsUnclearCoded compRememberCoded compOnlineBankingCoded compLiableCoded compNegligenceCoded phishingReasonCoded lossReasonCoded phishingReasonRevCoded lossReasonRevCoded
127 Postgraduate education (e.g., MSc, MA, MBA, PhD) Student No Once per month No wenn ich fahrlässig gehandelt habe z.B. Die Pin zusammen mit der Karte aufzubewahren. Dazu habe ich nichts gelesen Dazu habe ich nichts gelesen No 2016-03-04 11:02:36 Deutschland, USA 31.0 Deutschland, USA Several times a week Yes Es wurden Umsätze angefragt. Die Bank teilte mir mit, dass meine Bankdaten in fremden Händen seien. 2.0 Female Native or bilingual proficiency 3.0 5.0 131.52 31.06 15.95 60.51 77.81 91.73 304.2 181.59 135.81 69.92 397.84 5.41 1503.35 unlikely True True Sie hat die Sperrung zeitnah und frühstmöglich vorgenommen. Sie haftet bis zu 150 Euro likely False False Er hat die Überweisung selber getätigt. Es war eine autorisierte Transaktion, denn er hat sie selber getätigt. de Understood the majority besonders 14: Ich kann mir nicht vorstellen, was mit Zusatzanwendungen gemeint ist. keine Ahnung Kontonummer und Tan don't know Yes Yes No Es wirkt seriös. Ich nehme an, dass Soforüberweisung nicht nach meinem PIN fragen darf. Ehrlich gesagt, verstehe ... sofortüberweisung F0, B3, R2 R3 R8 O0 L9 L6 C14 C3 C1 C3
In [ ]:
 
In [21]:
#still need coding:
In [22]:
joined[joined.termsUnclearCoded.isnull()]
Out[22]:
DemoEducation DemoEducationTime DemoEducation[other] DemoEmployment DemoEmploymentTime DemoLearningDiff DemoLearningDiffTime catch5 catch5Time compCardCheque compCardChequeTime compLiable compLiableTime compNegligence compNegligenceTime compOnlineBanking compOnlineBankingTime compRemember compRememberTime compShare compShareTime datestamp demoAccountsCountry demoAccountsCountryTime demoAge demoAgeTime demoCardsCountries demoCardsCountriesTime demoCardsFrequency demoCardsFrequencyTime demoExperiencedFraud demoExperiencedFraudTime demoFraudDescription demoFraudDescriptionTime demoFraudFrequency demoFraudFrequencyTime demoGender demoGenderTime demoLanguage demoLanguageTime demoNumberAccounts demoNumberAccountsTime demoNumberCards demoNumberCardsTime feedback feedbackTime groupTime62 groupTime67 groupTime71 groupTime72 groupTime73 groupTime74 groupTime75 groupTime76 groupTime77 groupTime78 groupTime79 groupTime80 interviewtime lossEnforce lossEnforceTime lossMoneyBack lossMoneyBackRev lossMoneyBackRevTime lossMoneyBackTime lossReason lossReasonRev lossReasonRevTime lossReasonTime participantIDTime phishingEnforce phishingEnforceTime phishingMoneyBack phishingMoneyBackRev phishingMoneyBackRevTime phishingMoneyBackTime phishingReason phishingReasonRev phishingReasonRevTime phishingReasonTime startlanguage termsConfidence termsConfidenceTime termsUnclear termsUnclearTime toolAccess toolAccessTime toolCredentials toolCredentialsTime toolEnforce toolEnforceTime toolHeard toolHeardTime toolLegal toolLegalRev toolLegalRevTime toolLegalTime toolLegalWhy toolLegalWhyRev toolLegalWhyRevTime toolLegalWhyTime toolMostFrequently toolMostFrequentlyTime demoFraudDescriptionCoded termsUnclearCoded compRememberCoded compOnlineBankingCoded compLiableCoded compNegligenceCoded fedbackCoded phishingReasonCoded lossReasonCoded toolLegalWhyCoded toolLegalWhyRevCoded phishingReasonRevCoded lossReasonRevCoded
In [23]:
joined[joined.phishingReasonCoded.isnull()]
Out[23]:
DemoEducation DemoEducationTime DemoEducation[other] DemoEmployment DemoEmploymentTime DemoLearningDiff DemoLearningDiffTime catch5 catch5Time compCardCheque compCardChequeTime compLiable compLiableTime compNegligence compNegligenceTime compOnlineBanking compOnlineBankingTime compRemember compRememberTime compShare compShareTime datestamp demoAccountsCountry demoAccountsCountryTime demoAge demoAgeTime demoCardsCountries demoCardsCountriesTime demoCardsFrequency demoCardsFrequencyTime demoExperiencedFraud demoExperiencedFraudTime demoFraudDescription demoFraudDescriptionTime demoFraudFrequency demoFraudFrequencyTime demoGender demoGenderTime demoLanguage demoLanguageTime demoNumberAccounts demoNumberAccountsTime demoNumberCards demoNumberCardsTime feedback feedbackTime groupTime62 groupTime67 groupTime71 groupTime72 groupTime73 groupTime74 groupTime75 groupTime76 groupTime77 groupTime78 groupTime79 groupTime80 interviewtime lossEnforce lossEnforceTime lossMoneyBack lossMoneyBackRev lossMoneyBackRevTime lossMoneyBackTime lossReason lossReasonRev lossReasonRevTime lossReasonTime participantIDTime phishingEnforce phishingEnforceTime phishingMoneyBack phishingMoneyBackRev phishingMoneyBackRevTime phishingMoneyBackTime phishingReason phishingReasonRev phishingReasonRevTime phishingReasonTime startlanguage termsConfidence termsConfidenceTime termsUnclear termsUnclearTime toolAccess toolAccessTime toolCredentials toolCredentialsTime toolEnforce toolEnforceTime toolHeard toolHeardTime toolLegal toolLegalRev toolLegalRevTime toolLegalTime toolLegalWhy toolLegalWhyRev toolLegalWhyRevTime toolLegalWhyTime toolMostFrequently toolMostFrequentlyTime demoFraudDescriptionCoded termsUnclearCoded compRememberCoded compOnlineBankingCoded compLiableCoded compNegligenceCoded fedbackCoded phishingReasonCoded lossReasonCoded toolLegalWhyCoded toolLegalWhyRevCoded phishingReasonRevCoded lossReasonRevCoded
In [24]:
joined.shape
Out[24]:
(151, 116)
In [ ]:
 
In [25]:
import re
In [26]:
def descriptiveValueCounts(series, caption = "", reference = "", header=[], isFloat=True, save=True, folder="terms/tab", 
                           reorder = []):
    t = h.Table("lc", "lc")
    t.isFloat = isFloat
    if caption: 
        t.setCaption(caption)
    if reference:
        t.reference = reference
    if header:
        t.setHeader(header)
    vc = series.value_counts()
    for k, v in vc.iteritems():
        t.addRow([k,"$%d$" %v])
    if reorder: 
        t.rows = [t.rows[i] for i in reorder]
    if save:
        t.writeLatexToFile(path=folder)
    return t
    
In [27]:
def descriptiveValuePercentages(data, caption = "", reference = "", header=[], isFloat=True, save=True, 
                                folder="tab", reorder = [], addRows = []):
    
    d = list(data.value_counts().iteritems())
    languages = list(set([x[0][0] for x in d]))
    degrees = sorted(list(set([x[0][1] for x in d])) + addRows)
    if reorder:
        degrees = [degrees[i] for i in reorder]
    languageDistribution = [sum(x[1] for x in d if x[0][0]== lan) for lan in languages]
    print languages
    def getStat(lang, deg, dist):
        t = [x[1] for x in d if x[0][0] == lang and x[0][1] == deg]
        return 1.0*t[0]/dist*100 if t else 0.0
    vc = [(deg, [getStat(lang, deg, dist) for dist, lang in zip(languageDistribution, languages)]) for deg in degrees]
    t = h.Table("l" + "c"*len(languages), "l"  + "c"*len(languages))
    t.isFloat = isFloat
    if caption: 
        t.setCaption(caption)
    if reference:
        t.reference = reference
    if header:
        t.setHeader(header)
    for k, v in vc:
        t.addRow([k] + ["$%.0f%%$" %vt for vt in v])
        
    if save:
        t.writeLatexToFile(path=folder)
    return t
    
In [28]:
descriptiveValuePercentages(joined.groupby('startlanguage').DemoEducation, caption = "Educational Demogprahics", 
                            reference="demoEducation", header=["Highest Qualification", "US", "DE", "UK"],
                           reorder = [2, 0, 6, 1, 4, 7, 3])
['en-us', 'de', 'en']
Out[28]:
Educational Demogprahics
Highest QualificationUSDEUK
GCSE Level education (e.g., GCSE, O-Levels or Standards) or lower$0%$$7%$$15%$
A-Level education (e.g., A, AS, S-Levels, Highers)$12%$$24%$$11%$
Some undergraduate education (e.g., No completed degree)$18%$$10%$$19%$
Degree / Graduate education (e.g., BSc, BA)$43%$$32%$$35%$
Postgraduate education (e.g., MSc, MA, MBA, PhD)$16%$$22%$$19%$
Vocational education (e.g., NVQ, HNC, HND)$5%$$5%$$0%$
Other$4%$$0%$$2%$
In [29]:
languagesShort = ['en-us', 'de', 'en']
languagesLong = ["US", "DE", "UK"]
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [30]:
mf = h.figuresToLaTeX(columns=3,basename='demoAge',path='',
                      caption='Histogram of our participants age')
for sl, ll in zip(languagesShort, languagesLong):
    data = joined[joined.startlanguage == sl].demoAge
    a = plt.figure(figsize=(4,4), dpi=80)
    ax = data.plot(kind="hist",bins=np.arange(10,80,5))
    plt.setp(ax.patches, 'facecolor', '0.3','edgecolor', '0.15', 'alpha', 0.75)
    locs, labels = plt.xticks()
    plt.xticks(locs,[r'$%g$' %x for x in locs],size='large')
    locs, labels = plt.yticks()
    plt.yticks(locs,[r'$%g$' %x for x in locs],size='large')
    plt.xlabel('Age')
    plt.ylabel('Frequency')
    h.rstyle(ax)
    mf.addFigure(a,subcaption="Country: %s" %ll,describeText=h.describe(data.values))
    print h.describe(data.values)
mf.writeLaTeX()
/home/ibecker/work/terms-study-clean/terms-study-clean/lib/python2.7/site-packages/matplotlib/font_manager.py:1288: UserWarning: findfont: Font family [u'sans-serif'] not found. Falling back to Bitstream Vera Sans
  (prop.get_family(), self.defaultFamily[fontext]))
Size: 56, min: 18, max: 54, mean: 30.4464, variance: 89.488, skewness: 1.22813
/home/ibecker/work/terms-study-clean/terms-study-clean/lib/python2.7/site-packages/matplotlib/font_manager.py:1288: UserWarning: findfont: Font family [u'serif'] not found. Falling back to Bitstream Vera Sans
  (prop.get_family(), self.defaultFamily[fontext]))
Size: 41, min: 19, max: 49, mean: 27.0732, variance: 46.9695, skewness: 2.00962
Size: 54, min: 18, max: 69, mean: 33.7407, variance: 152.347, skewness: 0.869762
In [31]:
descriptiveValuePercentages(joined.groupby('startlanguage').DemoEmployment,
                            caption="Employment Demogpraphics of the participants", 
                       reference="demoEmployment", header=["Employment Status", "US", "DE", "UK"],
                            reorder = [0, 4, 5, 3, 2, 1])
['en-us', 'de', 'en']
Out[31]:
Employment Demogpraphics of the participants
Employment StatusUSDEUK
Employed$57%$$22%$$48%$
Student$14%$$63%$$30%$
Unemployed$12%$$2%$$4%$
Self-employed$16%$$7%$$13%$
Retired$0%$$2%$$6%$
Prefer not to say$0%$$2%$$0%$
In [32]:
descriptiveValuePercentages(joined.groupby('startlanguage').demoGender, caption="Gender of the participants", 
                            reference="demoGender", header= ["Gender", "US", "DE", "UK"])
['en-us', 'de', 'en']
Out[32]:
Gender of the participants
GenderUSDEUK
Female$27%$$24%$$52%$
Male$71%$$73%$$48%$
Other$2%$$2%$$0%$
In [33]:
descriptiveValuePercentages(joined.groupby('startlanguage').DemoLearningDiff,
                            caption="Learning difficulty or disability of the participants", 
                       reference="demoDisability", header=["Answer",  "US", "DE", "UK"])
['en-us', 'de', 'en']
Out[33]:
Learning difficulty or disability of the participants
AnswerUSDEUK
No$96%$$98%$$98%$
Prefer not to say$2%$$2%$$0%$
Yes$2%$$0%$$2%$
In [34]:
descriptiveValuePercentages(joined.groupby('startlanguage').demoLanguage,
                            caption="How proficient are you in English/German", 
                       reference="demoLanguage", header=["Proficiency",  "US", "DE", "UK"], 
                            addRows=["No proficiency", "Elementary proficiency"], reorder = [5,2,3,1,0,4])
['en-us', 'de', 'en']
Out[34]:
How proficient are you in English/German
ProficiencyUSDEUK
No proficiency$0%$$0%$$0%$
Elementary proficiency$0%$$0%$$0%$
Intermediate ability$0%$$2%$$2%$
Advanced level$2%$$0%$$4%$
'Near-native' level$5%$$0%$$2%$
Native or bilingual proficiency$93%$$98%$$93%$
In [35]:
joined.demoNumberAccounts.describe()
Out[35]:
count    151.000000
mean       2.099338
std        1.598979
min        0.000000
25%        1.000000
50%        2.000000
75%        2.000000
max       15.000000
Name: demoNumberAccounts, dtype: float64
In [36]:
mf = h.figuresToLaTeX(columns=3,basename='numberCards',path='',
                      caption='Participants number of payment cards')

for sl, ll in zip(languagesShort, languagesLong):
    data = joined[joined.startlanguage == sl].demoNumberCards
    
    a = plt.figure(figsize=(4,4), dpi=80)
    ax = data.plot(kind="hist",bins=np.arange(0,11,1))
    plt.setp(ax.patches, 'facecolor', '0.3','edgecolor', '0.15', 'alpha', 0.75)
    locs, labels = plt.xticks()
    plt.xticks(locs,[r'$%g$' %x for x in locs],size='large')
    locs, labels = plt.yticks()
    plt.yticks(locs,[r'$%g$' %x for x in locs],size='large')
    plt.xlabel('Number of Payment Cards')
    plt.ylabel('Frequency')
    h.rstyle(ax)
    mf.addFigure(a,subcaption="Country: %s" %ll,describeText=h.describe(data.values))
    print h.describe(data.values)
mf.writeLaTeX()
Size: 56, min: 1, max: 10, mean: 3.08929, variance: 4.73734, skewness: 1.17642
Size: 41, min: 0, max: 5, mean: 2, variance: 1.45, skewness: 0.95674
Size: 54, min: 1, max: 10, mean: 2.7037, variance: 4.13697, skewness: 1.79329
In [37]:
mf = h.figuresToLaTeX(columns=3,basename='numberBankAccounts',path='',
                      caption='Participant\'s number of bank accounts')

for sl, ll in zip(languagesShort, languagesLong):
    data = joined[joined.startlanguage == sl].demoNumberAccounts
    
    a = plt.figure(figsize=(4,4), dpi=80)
    ax = data.plot(kind="hist",bins=np.arange(0,16,1))
    plt.setp(ax.patches, 'facecolor', '0.3','edgecolor', '0.15', 'alpha', 0.75)
    locs, labels = plt.xticks()
    plt.xticks(locs,[r'$%g$' %x for x in locs],size='large')
    locs, labels = plt.yticks()
    plt.yticks(locs,[r'$%g$' %x for x in locs],size='large')
    plt.xlabel('Number of Bank Accounts')
    plt.ylabel('Frequency')
    h.rstyle(ax)
    mf.addFigure(a,subcaption="Country: %s" %ll,describeText=h.describe(data.values))
    print h.describe(data.values)
mf.writeLaTeX()
Size: 56, min: 1, max: 3, mean: 1.78571, variance: 0.535065, skewness: 0.350127
Size: 41, min: 0, max: 6, mean: 1.97561, variance: 1.37439, skewness: 1.17851
Size: 54, min: 0, max: 15, mean: 2.51852, variance: 5.34871, skewness: 3.38021
In [38]:
descriptiveValuePercentages(joined.groupby('startlanguage').demoCardsFrequency,
                            caption="Frequency of use of any of the participants payment cards", 
                       reference="demoCardUseFrequency", header=["Frequency",  "US", "DE", "UK"],
                          reorder = [0,5, 3, 2, 6, 4, 1])
['en-us', 'de', 'en']
Out[38]:
Frequency of use of any of the participants payment cards
FrequencyUSDEUK
Every day$20%$$0%$$19%$
Several times a week$55%$$63%$$65%$
Once per week$20%$$22%$$13%$
Once per month$4%$$5%$$2%$
Several times per year$0%$$7%$$2%$
Once per year or less$2%$$0%$$0%$
Never$0%$$2%$$0%$
In [39]:
descriptiveValuePercentages(joined.groupby('startlanguage').demoExperiencedFraud,
                            caption="Have you ever experienced fradulent transactions or incidents on any of your payment cards or bank accounts?", 
                       reference="demoFraudExperienced", header=["Frequency",  "US", "DE", "UK"])
                         # reorder = [0,5, 3, 2, 6, 4, 1])
['en-us', 'de', 'en']
Out[39]:
Have you ever experienced fradulent transactions or incidents on any of your payment cards or bank accounts?
FrequencyUSDEUK
No$66%$$88%$$72%$
Yes$34%$$12%$$28%$
In [ ]:
 
In [40]:
def expandCoded(data, splitString):
    t = data.str.split(splitString, expand=True)
    t2 = pd.get_dummies(t[0])
    for k in range(t.shape[1]-1):
        t2 = t2.add(pd.get_dummies(t[k+1]),fill_value=0)
    return t2
In [41]:
tdata = joined.demoFraudDescriptionCoded.str.split(',\W?',expand=True)
In [42]:
def expandedFreq(data, splitString):
    tdata = data.str.split(splitString, expand=True)
    fullsum = tdata[0].value_counts()
    for i in range(1,tdata.shape[1]):
        fullsum = fullsum.add(tdata[i].value_counts(),fill_value=0)
    return fullsum
In [ ]:
 
In [43]:
t3 = expandCoded(joined.demoFraudDescriptionCoded, ',\W?')
In [44]:
with open('demoFraudDescriptionCoded.txt', 'r') as f:
    lines = f.readlines()
mapping = dict()
for key in t3.columns:
    potentialLines = [x.strip()[len(key)+2:] for x in lines if x.strip().startswith(key)]
    mapping[u"demoFraudDescriptionCoded_%s" %key] = u"%s" %potentialLines[0] if potentialLines else u""
descriptions.update(mapping)
In [45]:
t3.columns = ["demoFraudDescriptionCoded_%s" %i for i in t3.columns]
In [46]:
joined = pd.concat([joined, t3],axis=1)
In [ ]:
 
In [47]:
tt = joined[joined.demoExperiencedFraud == "Yes"].groupby('startlanguage')[t3.columns]
In [48]:
freq = joined.groupby('startlanguage').demoFraudFrequency.sum()
In [49]:
freq = freq[[k for k in freq.keys() if freq[k] > 0]]
In [50]:
ttt = tt.sum().T.divide(freq)
In [51]:
ttt
Out[51]:
startlanguage de en en-us
demoFraudDescriptionCoded_B0 0.285714 0.55 0.607143
demoFraudDescriptionCoded_B1 0.000000 0.00 0.035714
demoFraudDescriptionCoded_B2 0.000000 0.05 0.000000
demoFraudDescriptionCoded_B3 0.428571 0.30 0.214286
demoFraudDescriptionCoded_F0 0.428571 0.30 0.428571
demoFraudDescriptionCoded_F1 0.142857 0.15 0.178571
demoFraudDescriptionCoded_F2 0.142857 0.40 0.142857
demoFraudDescriptionCoded_F3 0.000000 0.00 0.071429
demoFraudDescriptionCoded_F4 0.000000 0.05 0.071429
demoFraudDescriptionCoded_F5 0.000000 0.05 0.000000
demoFraudDescriptionCoded_R1 0.285714 0.30 0.321429
demoFraudDescriptionCoded_R2 0.142857 0.80 0.821429
In [ ]:
 
In [52]:
def mcnemar_midp(b, c):
    """
    Compute McNemar's test using the "mid-p" variant suggested by:
    
    M.W. Fagerland, S. Lydersen, P. Laake. 2013. The McNemar test for 
    binary matched-pairs data: Mid-p and asymptotic are better than exact 
    conditional. BMC Medical Research Methodology 13: 91.
    
    `b` is the number of observations correctly labeled by the first---but 
    not the second---system; `c` is the number of observations correctly 
    labeled by the second---but not the first---system.
    """
    n = b + c
    x = min(b, c)
    dist = sp.stats.binom(n, .5)
    p = 2. * dist.cdf(x)
    midp = p - dist.pmf(x)
    return midp
In [53]:
t = h.Table("lccc")
t.setHeader(["Code", "DE", "UK", "US"])
t.fromNPArray(ttt.values*100, "$%.1f%%$", rowdesc=[descriptions[x] for x in t3.columns])
t.setCaption("Thematic analysis of description of fraud experienced by participants. The first five codes describe the " + \
            "identification of fraud, the next five codes describe the type of fraud, and the last two describe the " + \
            "follow up actions that happened.")
t.reference = "demoFraudDescription"
t.writeLatexToFile(path='tab/')
t
Out[53]:
Thematic analysis of description of fraud experienced by participants. The first five codes describe the identification of fraud, the next five codes describe the type of fraud, and the last two describe the follow up actions that happened.
CodeDEUKUS
Fraud identified at a later stage$28.6%$$55.0%$$60.7%$
Transaction before card blocked$0.0%$$0.0%$$3.6%$
Transaction after card blocked$0.0%$$5.0%$$0.0%$
Transaction blocked by bank$42.9%$$30.0%$$21.4%$
Other/No idea where fraud occured$42.9%$$30.0%$$42.9%$
Offline transaction$14.3%$$15.0%$$17.9%$
Online transaction$14.3%$$40.0%$$14.3%$
Cash withdrawal$0.0%$$0.0%$$7.1%$
Card stolen$0.0%$$5.0%$$7.1%$
Online acount hacked$0.0%$$5.0%$$0.0%$
New card$28.6%$$30.0%$$32.1%$
Full refund$14.3%$$80.0%$$82.1%$
In [54]:
## The two scenarios, does the person get their money back question.
In [55]:
af = joined['startlanguage'].value_counts()
In [56]:
af
Out[56]:
en-us    56
en       54
de       41
Name: startlanguage, dtype: int64
In [57]:
freq = joined.groupby('startlanguage')[['phishingMoneyBack','phishingMoneyBackRev', 'lossMoneyBack', 'lossMoneyBackRev']]
In [58]:
ttt = freq.sum().T.divide(af)
In [59]:
significances = []
for a,b in [(joined.phishingMoneyBack,joined.phishingMoneyBackRev), (joined.lossMoneyBack, joined.lossMoneyBackRev)]:
    ct = pd.crosstab(a,b)
    pMcnemar = mcnemar_midp(ct[True][False],ct[False][True])
    if pMcnemar < 0.05:
        significances.append("significant with $p<%s$" %("0.01" if pMcnemar <0.01 else "0.05"))
    else:
        significances.append("not significant")
In [ ]:
 
In [60]:
t = h.Table("lccc")
t.setHeader(["Question", "DE", "UK", "US"])
t.fromNPArray(ttt.values*100, "$%.1f%%$", rowdesc=["Scenario Phishing", "Scenario Phishing after T&Cs",
                                                 "Scenario Theft", "Scenario Theft after T&Cs"])
t.setCaption("Percentage of participants that say that the money should be returned in each of the scenarios. " + \
             "McNemar's test is %s for the Scenario Phishing and is %s for the Scenario Theft." %tuple(significances))
t.reference = "scenarioMoneyReturned"
t.writeLatexToFile(path='tab/')
t
Out[60]:
Percentage of participants that say that the money should be returned in each of the scenarios. McNemar's test is significant with $p<0.05$ for the Scenario Phishing and is significant with $p<0.05$ for the Scenario Theft.
QuestionDEUKUS
Scenario Phishing$31.7%$$37.0%$$35.7%$
Scenario Phishing after T&Cs$43.9%$$46.3%$$42.9%$
Scenario Theft$41.5%$$81.5%$$76.8%$
Scenario Theft after T&Cs$70.7%$$66.7%$$96.4%$
In [61]:
significances
Out[61]:
['significant with $p<0.05$', 'significant with $p<0.05$']
In [ ]:
 
In [62]:
descriptiveValuePercentages(joined.groupby('startlanguage').toolHeard,
                            caption="Have you ever used third party online banking services?", 
                       reference="toolUsed", header=["Frequency",  "US", "DE", "UK"], reorder = [1, 0])
['en-us', 'de', 'en']
Out[62]:
Have you ever used third party online banking services?
FrequencyUSDEUK
Yes$23%$$29%$$2%$
No$77%$$71%$$98%$
In [63]:
#We did ask more questions here, but I think its pointless to analyse them here as the sample size is < 20.
In [64]:
# Lets look at some comprehension questions.
In [65]:
descriptiveValuePercentages(joined.groupby('startlanguage').termsConfidence,
                            caption="How confident are you that you have understood the T&Cs?", 
                       reference="termsConfidence", header=["Level",  "US", "DE", "UK"],
                         addRows=["	Understood nothing"], reorder = [0,4,2,3,1])
['en-us', 'de', 'en']
Out[65]:
How confident are you that you have understood the T&Cs?
LevelUSDEUK
Understood nothing$0%$$0%$$0%$
Understood the minority$2%$$7%$$6%$
Understood half of it$4%$$12%$$2%$
Understood the majority$50%$$59%$$54%$
Understood everything$45%$$22%$$39%$
In [ ]:
 
In [ ]:
 
In [66]:
t3 = expandCoded(joined.termsUnclearCoded, ',\W?')

with open('termsUnclearCoded.txt', 'r') as f:
    lines = f.readlines()
mapping = dict()
for key in t3.columns:
    potentialLines = [x.strip()[len(key)+2:] for x in lines if x.strip().startswith(key)]
    mapping[u"termsUnclearCoded_%s" %key] = u"%s" %potentialLines[0] if potentialLines else u""
descriptions.update(mapping)

t3.columns = ["termsUnclearCoded_%s" %i for i in t3.columns]
In [67]:
t3.columns
Out[67]:
Index([u'termsUnclearCoded_A1', u'termsUnclearCoded_R0',
       u'termsUnclearCoded_R1', u'termsUnclearCoded_R2',
       u'termsUnclearCoded_R3', u'termsUnclearCoded_R4',
       u'termsUnclearCoded_R5'],
      dtype='object')
In [68]:
print "\n".join(["%s : %s" %(key, mapping[key]) for key in t3.columns])
termsUnclearCoded_A1 : Tipps useful
termsUnclearCoded_R0 : All Ok
termsUnclearCoded_R1 : Complicated
termsUnclearCoded_R2 : Unclear
termsUnclearCoded_R3 : Abbreviations, special terms
termsUnclearCoded_R4 : Gross negligence
termsUnclearCoded_R5 : Negligence limits unclear
In [69]:
joined = pd.concat([joined, t3],axis=1)
In [70]:
tt = joined.groupby('startlanguage')[t3.columns]

freq = joined['startlanguage'].value_counts()
freq = freq[[k for k in freq.keys() if freq[k] > 0]]

ttt = tt.sum().T.divide(freq)
In [ ]:
 
In [71]:
t = h.Table("lccc")
t.setHeader(["Code", "DE", "UK", "US"])
t.fromNPArray(ttt.values*100, "$%.1f%%$", rowdesc=[descriptions[x] for x in t3.columns])
t.setCaption("Thematic analysis of understanding issues of the T&Cs of the participants.")
t.reference = "termsUnclear"
t.writeLatexToFile(path='tab/')
t
Out[71]:
Thematic analysis of understanding issues of the T&Cs of the participants.
CodeDEUKUS
Tipps useful$0.0%$$0.0%$$1.8%$
All Ok$36.6%$$51.9%$$73.2%$
Complicated$29.3%$$13.0%$$17.9%$
Unclear$51.2%$$13.0%$$19.6%$
Abbreviations, special terms$24.4%$$25.9%$$1.8%$
Gross negligence$0.0%$$13.0%$$0.0%$
Negligence limits unclear$0.0%$$0.0%$$5.4%$
In [ ]:
 
In [72]:
descriptiveValuePercentages(joined.groupby('startlanguage').compShare,
                            caption=descriptions['compShare'], 
                       reference="compShare", header=["Frequency",  "US", "DE", "UK"], reorder = [1, 0])
['en-us', 'de', 'en']
Out[72]:
Are you allowed to share your PIN with a family member?
FrequencyUSDEUK
Yes$9%$$7%$$4%$
No$91%$$93%$$96%$
In [73]:
descriptiveValuePercentages(joined.groupby('startlanguage').compCardCheque,
                            caption=descriptions['compCardCheque'], 
                       reference="compCardCheque", header=["Frequency",  "US", "DE", "UK"], reorder = [1, 0])
['en-us', 'de', 'en']
Out[73]:
Should you keep your Debit card and Cheque book together?
FrequencyUSDEUK
Yes$11%$$20%$$2%$
No$89%$$80%$$98%$
In [74]:
toDraw = ["compRememberCoded", "compOnlineBankingCoded", "compLiableCoded", "compNegligenceCoded"]
In [75]:
t3s = []
for question in toDraw:
    t3 = expandCoded(joined[question], ',\W?')

    with open('comprehensionCodes-clean.txt', 'r') as f:
        lines = f.readlines()
    mapping = dict()
    
    i = 0 if question != "compNegligenceCoded" else 1
    for key in t3.columns:
        potentialLines = [x.strip()[len(key)+2:] for x in lines if x.strip().startswith(key)]
        mapping[u"%s_%s" %(question,key)] = u"%s" %potentialLines[i] if potentialLines else u""
    descriptions.update(mapping)

    t3.columns = ["%s_%s" %(question,i) for i in t3.columns]
    t3s.append(t3)
In [76]:
for t3 in t3s:
    joined = pd.concat([joined, t3],axis=1)
In [77]:
for t3 in t3s:
    print "\n".join(["%s : %s" %(key, descriptions[key]) for key in t3.columns])
compRememberCoded_R1 : Write down
compRememberCoded_R2 : Change periodically
compRememberCoded_R3 : Memory technique
compRememberCoded_R4 : Use exiting/memorable numbers
compRememberCoded_R5 : Choose unique
compRememberCoded_R6 : just remember it
compRememberCoded_R7 : write down encrypted
compRememberCoded_R8 : Don't know
compOnlineBankingCoded_O0 : Forget
compOnlineBankingCoded_O1 : be alone
compOnlineBankingCoded_O2 : Virus program, firewall etc
compOnlineBankingCoded_O3 : No insecure wireless/network
compOnlineBankingCoded_O4 : Check website
compOnlineBankingCoded_O5 : No shared/public computer
compOnlineBankingCoded_O6 : Password protect Computer
compLiableCoded_L0 : Don't know
compLiableCoded_L1 : Notified not quickly enough
compLiableCoded_L2 : shared details
compLiableCoded_L3 : violate T&Cs
compLiableCoded_L4 : fraudulantly
compLiableCoded_L5 : always
compLiableCoded_L6 : if you notice something suspicious
compLiableCoded_L7 : not kept details safe
compLiableCoded_L8 : been phished
compLiableCoded_L9 : gross negligence
compNegligenceCoded_L0 : Don't know
compNegligenceCoded_L1 : Carelessness
compNegligenceCoded_L2 : not being careful with details
compNegligenceCoded_L3 : Your fault
compNegligenceCoded_L4 : Ignoring warnings
compNegligenceCoded_L5 : Not informing your bank of loss
compNegligenceCoded_L6 : negligence beyond reasonable practice
compNegligenceCoded_L7 : Harmful misconduct
compNegligenceCoded_L8 : Not following the T&Cs
In [ ]:
 
In [78]:
for t3, question in zip(t3s, toDraw):
    tt = joined.groupby('startlanguage')[t3.columns]

    freq = joined['startlanguage'].value_counts()
    freq = freq[[k for k in freq.keys() if freq[k] > 0]]

    ttt = tt.sum().T.divide(freq)
    
    t = h.Table("lccc")
    t.setHeader(["Code", "DE", "UK", "US"])
    t.fromNPArray(ttt.values*100, "$%.1f%%$", rowdesc=[descriptions[x] for x in t3.columns])
    t.setCaption("Thematic analysis of the answers to the comprehension question: \"%s\"" %descriptions[question[:-5]])
    t.reference = question[:-5]
    t.writeLatexToFile(path='tab/')
    t.display()
Thematic analysis of the answers to the comprehension question: "What can you do to remember your PIN?"
CodeDEUKUS
Write down$17.1%$$11.1%$$26.8%$
Change periodically$0.0%$$0.0%$$21.4%$
Memory technique$36.6%$$14.8%$$16.1%$
Use exiting/memorable numbers$9.8%$$31.5%$$14.3%$
Choose unique$4.9%$$1.9%$$0.0%$
just remember it$26.8%$$27.8%$$25.0%$
write down encrypted$4.9%$$3.7%$$1.8%$
Don't know$7.3%$$11.1%$$5.4%$
Thematic analysis of the answers to the comprehension question: "What should you do before starting online banking?"
CodeDEUKUS
Forget$14.6%$$33.3%$$48.2%$
be alone$2.4%$$3.7%$$10.7%$
Virus program, firewall etc$39.0%$$35.2%$$5.4%$
No insecure wireless/network$2.4%$$7.4%$$5.4%$
Check website$43.9%$$25.9%$$23.2%$
No shared/public computer$0.0%$$5.6%$$10.7%$
Password protect Computer$7.3%$$0.0%$$3.6%$
Thematic analysis of the answers to the comprehension question: "When are you liable for an unauthorized transaction?"
CodeDEUKUS
Don't know$2.4%$$7.4%$$7.1%$
Notified not quickly enough$19.5%$$13.0%$$80.4%$
shared details$7.3%$$27.8%$$3.6%$
violate T&Cs$7.3%$$18.5%$$1.8%$
fraudulantly$0.0%$$16.7%$$0.0%$
always$7.3%$$5.6%$$3.6%$
if you notice something suspicious$0.0%$$0.0%$$1.8%$
not kept details safe$19.5%$$9.3%$$3.6%$
been phished$2.4%$$1.9%$$0.0%$
gross negligence$53.7%$$27.8%$$0.0%$
Thematic analysis of the answers to the comprehension question: "What is gross negligence?"
CodeDEUKUS
Don't know$4.9%$$3.7%$$12.5%$
Carelessness$4.9%$$31.5%$$46.4%$
not being careful with details$53.7%$$48.1%$$8.9%$
Your fault$2.4%$$11.1%$$5.4%$
Ignoring warnings$2.4%$$1.9%$$0.0%$
Not informing your bank of loss$7.3%$$14.8%$$14.3%$
negligence beyond reasonable practice$17.1%$$9.3%$$10.7%$
Harmful misconduct$7.3%$$0.0%$$3.6%$
Not following the T&Cs$7.3%$$5.6%$$0.0%$
In [ ]:
 
In [79]:
ReplacementPhishingReasonRevCoded = {"C6" : "C5", "C8" : "C3", "C11" : "C1", "C7" : "C1", "C10" : "C2", "C9" : "C2",
                                    "Code3" : "Code11, Code5", "Code7"  : "Code2", "Code8" : "Code12, Code11, Code6",
                                     "Code9" : "Code6, Code12", "Code10" : "Code5"}
ReplacementPhishingReasonCoded = {"C9" : "C2", "C5" : "C4", "C12" : "C2", "C13" : "C2", "C11" : "C14", "C15" : "C16",
                                  "C10" : "C4", "C17" : "C16, C14", "CODE3" : "CODE2", "CODE6" : "CODE7", "CODE8" : "CODE7",
                                 "CODE9" : "CODE7", "CODE10" :"CODE7, CODE2", "CODE11" : "CODE7", "12" : "CODE2, CODE4",
                                 "CODE14" : "CODE7", "13" : "C0", "CODE16" : "CODE15", "C7" : "C2, C3", "CODE5" : "CODE1"}
ReplacementLossReasonCoded = {"C2" : "C1", "C7" : "C4", "C8" : "C4",  "C11" : "C9", "CODE3" : "CODE2", "CODE5" : "CODE4",
                              "10" : "CODE4", "CODE11" : "CODE7", "CODe8": "CODE6", "CODE12": "CODE9"}
ReplacementLossReasonRevCoded = {"C6" : "C3", "C8" : "C1", "C9" : "C1", "Code2" : "Code3", "Code5" : "Code3"}
In [80]:
toFix = [("phishingReasonCoded", ReplacementPhishingReasonCoded), 
         ("phishingReasonRevCoded", ReplacementPhishingReasonRevCoded),
         ("lossReasonCoded", ReplacementLossReasonCoded), ("lossReasonRevCoded", ReplacementLossReasonRevCoded)]
In [ ]:
 
In [81]:
t3s = []
with open('secondary-coding-codebook.txt', 'r') as f:
    content = f.read().split("---")[1:]
for question, fulltext in zip(zip(*toFix)[0],content):
    t3 = expandCoded(joined[question], ',\W?')

    lines = fulltext.splitlines()
    mapping = dict()
    
    for key in t3.columns:
        potentialLines = [x.strip()[len(key)+2:] for x in lines if x.strip().startswith(key)]
        mapping[u"%s_%s" %(question,key)] = u"%s" %potentialLines[0] if potentialLines else u""
    #print mapping
    descriptions.update(mapping)

    t3.columns = ["%s_%s" %(question,i) for i in t3.columns]
    t3s.append(t3)
In [82]:
for t3 in t3s:
    joined = pd.concat([joined, t3],axis=1)
In [ ]:
 
In [83]:
#We can draw 8 Tables, for each the positive reasons and negative reasons why people decide to think one way or another.
In [ ]:
 
In [84]:
columns = [x for x in descriptions.keys() if x.startswith("%sReason%sCoded" %("loss", ""))]
In [85]:
columns
Out[85]:
[u'lossReasonCoded_CODE1',
 u'lossReasonCoded_C3',
 u'lossReasonCoded_C0',
 u'lossReasonCoded_C4',
 u'lossReasonCoded_CODE4',
 u'lossReasonCoded_C6',
 u'lossReasonCoded_C5',
 u'lossReasonCoded_CODE2',
 u'lossReasonCoded_CODE7',
 u'lossReasonCoded_CODE6',
 u'lossReasonCoded_CODE9',
 u'lossReasonCoded_C9',
 u'lossReasonCoded_C1',
 u'lossReasonCoded_C10',
 u'lossReasonCoded_CODE20']
In [ ]:
 
In [86]:
joined[(joined.lossMoneyBack == True) & (joined.lossReasonCoded_CODE1 > 0)]
Out[86]:
DemoEducation DemoEducationTime DemoEducation[other] DemoEmployment DemoEmploymentTime DemoLearningDiff DemoLearningDiffTime catch5 catch5Time compCardCheque compCardChequeTime compLiable compLiableTime compNegligence compNegligenceTime compOnlineBanking compOnlineBankingTime compRemember compRememberTime compShare compShareTime datestamp demoAccountsCountry demoAccountsCountryTime demoAge demoAgeTime demoCardsCountries demoCardsCountriesTime demoCardsFrequency demoCardsFrequencyTime demoExperiencedFraud demoExperiencedFraudTime demoFraudDescription demoFraudDescriptionTime demoFraudFrequency demoFraudFrequencyTime demoGender demoGenderTime demoLanguage demoLanguageTime demoNumberAccounts demoNumberAccountsTime demoNumberCards demoNumberCardsTime feedback feedbackTime groupTime62 groupTime67 groupTime71 groupTime72 groupTime73 groupTime74 groupTime75 groupTime76 groupTime77 groupTime78 groupTime79 groupTime80 interviewtime lossEnforce lossEnforceTime lossMoneyBack lossMoneyBackRev lossMoneyBackRevTime lossMoneyBackTime lossReason lossReasonRev lossReasonRevTime lossReasonTime participantIDTime phishingEnforce phishingEnforceTime phishingMoneyBack phishingMoneyBackRev phishingMoneyBackRevTime phishingMoneyBackTime phishingReason phishingReasonRev phishingReasonRevTime phishingReasonTime startlanguage termsConfidence termsConfidenceTime termsUnclear termsUnclearTime toolAccess toolAccessTime toolCredentials toolCredentialsTime toolEnforce toolEnforceTime toolHeard toolHeardTime toolLegal toolLegalRev toolLegalRevTime toolLegalTime toolLegalWhy toolLegalWhyRev toolLegalWhyRevTime toolLegalWhyTime toolMostFrequently toolMostFrequentlyTime demoFraudDescriptionCoded termsUnclearCoded compRememberCoded compOnlineBankingCoded compLiableCoded compNegligenceCoded fedbackCoded phishingReasonCoded lossReasonCoded toolLegalWhyCoded toolLegalWhyRevCoded phishingReasonRevCoded lossReasonRevCoded demoFraudDescriptionCoded_B0 demoFraudDescriptionCoded_B1 demoFraudDescriptionCoded_B2 demoFraudDescriptionCoded_B3 demoFraudDescriptionCoded_F0 demoFraudDescriptionCoded_F1 demoFraudDescriptionCoded_F2 demoFraudDescriptionCoded_F3 demoFraudDescriptionCoded_F4 demoFraudDescriptionCoded_F5 demoFraudDescriptionCoded_R1 demoFraudDescriptionCoded_R2 termsUnclearCoded_A1 termsUnclearCoded_R0 termsUnclearCoded_R1 termsUnclearCoded_R2 termsUnclearCoded_R3 termsUnclearCoded_R4 termsUnclearCoded_R5 compRememberCoded_R1 compRememberCoded_R2 compRememberCoded_R3 compRememberCoded_R4 compRememberCoded_R5 compRememberCoded_R6 compRememberCoded_R7 compRememberCoded_R8 compOnlineBankingCoded_O0 compOnlineBankingCoded_O1 compOnlineBankingCoded_O2 compOnlineBankingCoded_O3 compOnlineBankingCoded_O4 compOnlineBankingCoded_O5 compOnlineBankingCoded_O6 compLiableCoded_L0 compLiableCoded_L1 compLiableCoded_L2 compLiableCoded_L3 compLiableCoded_L4 compLiableCoded_L5 compLiableCoded_L6 compLiableCoded_L7 compLiableCoded_L8 compLiableCoded_L9 compNegligenceCoded_L0 compNegligenceCoded_L1 compNegligenceCoded_L2 compNegligenceCoded_L3 compNegligenceCoded_L4 compNegligenceCoded_L5 compNegligenceCoded_L6 compNegligenceCoded_L7 compNegligenceCoded_L8 phishingReasonCoded_C0 phishingReasonCoded_C1 phishingReasonCoded_C14 phishingReasonCoded_C16 phishingReasonCoded_C2 phishingReasonCoded_C3 phishingReasonCoded_C4 phishingReasonCoded_CODE1 phishingReasonCoded_CODE15 phishingReasonCoded_CODE2 phishingReasonCoded_CODE4 phishingReasonCoded_CODE7 phishingReasonRevCoded_C0 phishingReasonRevCoded_C1 phishingReasonRevCoded_C2 phishingReasonRevCoded_C3 phishingReasonRevCoded_C4 phishingReasonRevCoded_C5 phishingReasonRevCoded_Code1 phishingReasonRevCoded_Code11 phishingReasonRevCoded_Code12 phishingReasonRevCoded_Code2 phishingReasonRevCoded_Code4 phishingReasonRevCoded_Code5 phishingReasonRevCoded_Code6 lossReasonCoded_C0 lossReasonCoded_C1 lossReasonCoded_C10 lossReasonCoded_C3 lossReasonCoded_C4 lossReasonCoded_C5 lossReasonCoded_C6 lossReasonCoded_C9 lossReasonCoded_CODE1 lossReasonCoded_CODE2 lossReasonCoded_CODE20 lossReasonCoded_CODE4 lossReasonCoded_CODE6 lossReasonCoded_CODE7 lossReasonCoded_CODE9 lossReasonRevCoded_C1 lossReasonRevCoded_C2 lossReasonRevCoded_C3 lossReasonRevCoded_C4 lossReasonRevCoded_C5 lossReasonRevCoded_C7 lossReasonRevCoded_Code1 lossReasonRevCoded_Code3 lossReasonRevCoded_Code4
In [87]:
for scenario in ["phishing", "loss"]:
    for foragainst in [True, False]:
        for beforeAfter in ["", "Rev"]:
            columns = [x for x in descriptions.keys() if x.startswith("%sReason%sCoded" %(scenario, beforeAfter))]
            columns = sorted(columns,key=lambda x: descriptions[x])
            data = joined[joined["%sMoneyBack%s" %(scenario,beforeAfter)] == foragainst].groupby('startlanguage')[columns]
            freq = joined[joined["%sMoneyBack%s" %(scenario,beforeAfter)] == foragainst]['startlanguage'].value_counts()
            freq = freq[[k for k in freq.keys() if freq[k] > 0]]
            ttt = data.sum().T[data.sum().sum() > 0].divide(freq)
            print scenario, foragainst, beforeAfter
            print ttt
            t = h.Table("lccc")
            t.setHeader(["Code", "DE", "UK", "US"])
            t.fromNPArray(ttt.values*100, "$%.1f%%$", rowdesc=[descriptions[x] for x in ttt.T.columns])
            t.setCaption("Thematic analysis of the answers%s in support of reimbursement in scenario %s%s." %(
                    (" not" if not foragainst else ""), scenario, 
                    ", after the participants have seen the T&Cs." if beforeAfter else ""))
            t.reference = "%sReason%s%s" %(scenario, "For" if foragainst else "Against", beforeAfter)
            t.writeLatexToFile(path='tab/')
            t.display()
            
phishing True 
                                  de    en  en-us
phishingReasonCoded_CODE2   0.538462  0.50   0.55
phishingReasonCoded_C0      0.000000  0.00   0.05
phishingReasonCoded_CODE4   0.307692  0.35   0.15
phishingReasonCoded_CODE1   0.153846  0.10   0.10
phishingReasonCoded_CODE7   0.307692  0.35   0.25
phishingReasonCoded_CODE15  0.000000  0.00   0.10
Thematic analysis of the answers in support of reimbursement in scenario phishing.
CodeDEUKUS
Banks have good security that should have prevented fraud$53.8%$$50.0%$$55.0%$
Don't know/unsure$0.0%$$0.0%$$5.0%$
He was tricked into phoning the number on the back of his card$30.8%$$35.0%$$15.0%$
If the fraud can be proven$15.4%$$10.0%$$10.0%$
The bank should be be insured/reverse the transaction/be ethical$30.8%$$35.0%$$25.0%$
The scammer can be someone working in the bank$0.0%$$0.0%$$10.0%$
phishing True Rev
                                     de    en     en-us
phishingReasonRevCoded_C0      0.166667  0.00  0.041667
phishingReasonRevCoded_Code11  0.277778  0.48  0.083333
phishingReasonRevCoded_Code12  0.222222  0.28  0.041667
phishingReasonRevCoded_Code6   0.222222  0.20  0.000000
phishingReasonRevCoded_Code1   0.000000  0.04  0.000000
phishingReasonRevCoded_Code4   0.166667  0.28  0.750000
phishingReasonRevCoded_Code5   0.166667  0.08  0.083333
phishingReasonRevCoded_Code2   0.000000  0.04  0.041667
Thematic analysis of the answers in support of reimbursement in scenario phishing, after the participants have seen the T&Cs..
CodeDEUKUS
Don't know/unsure$16.7%$$0.0%$$4.2%$
He could not have been aware that there was a technical fix in place$27.8%$$48.0%$$8.3%$
He followed the security procedures as documented for telephone calls$22.2%$$28.0%$$4.2%$
He was not grossly negligent$22.2%$$20.0%$$0.0%$
If the fraud can be proven$0.0%$$4.0%$$0.0%$
It is not an authorized transaction$16.7%$$28.0%$$75.0%$
Phishing not covered by the T&C$16.7%$$8.0%$$8.3%$
The bank can retreive the money$0.0%$$4.0%$$4.2%$
phishing False 
                               de        en     en-us
phishingReasonCoded_C2   0.071429  0.117647  0.166667
phishingReasonCoded_C16  0.035714  0.088235  0.055556
phishingReasonCoded_C3   0.071429  0.058824  0.166667
phishingReasonCoded_C0   0.000000  0.000000  0.027778
phishingReasonCoded_C14  0.750000  0.088235  0.194444
phishingReasonCoded_C1   0.178571  0.647059  0.333333
phishingReasonCoded_C4   0.000000  0.176471  0.305556
Thematic analysis of the answers not in support of reimbursement in scenario phishing.
CodeDEUKUS
Banking accounts have no protection$7.1%$$11.8%$$16.7%$
Banks tend not to care about customers$3.6%$$8.8%$$5.6%$
Difficult to recover the money$7.1%$$5.9%$$16.7%$
Don't know/unsure$0.0%$$0.0%$$2.8%$
His own fault, he was scammed$75.0%$$8.8%$$19.4%$
May have acted fraudulently$17.9%$$64.7%$$33.3%$
No one can tell the difference between the fraudster$0.0%$$17.6%$$30.6%$
phishing False Rev
                                 de        en    en-us
phishingReasonRevCoded_C2  0.086957  0.034483  0.12500
phishingReasonRevCoded_C0  0.043478  0.000000  0.03125
phishingReasonRevCoded_C4  0.130435  0.103448  0.03125
phishingReasonRevCoded_C3  0.608696  0.482759  0.28125
phishingReasonRevCoded_C1  0.347826  0.379310  0.43750
phishingReasonRevCoded_C5  0.086957  0.000000  0.09375
Thematic analysis of the answers not in support of reimbursement in scenario phishing, after the participants have seen the T&Cs..
CodeDEUKUS
Difficult to recover the money$8.7%$$3.4%$$12.5%$
Don't know/unsure$4.3%$$0.0%$$3.1%$
He gave his details out on the phone to the fraudsters$13.0%$$10.3%$$3.1%$
It is gross negligence$60.9%$$48.3%$$28.1%$
Mr. L transferred the money himself$34.8%$$37.9%$$43.8%$
Phishing not covered by the T&C$8.7%$$0.0%$$9.4%$
loss True 
                           de        en     en-us
lossReasonCoded_C5   0.000000  0.045455  0.000000
lossReasonCoded_C9   0.000000  0.000000  0.093023
lossReasonCoded_C6   0.000000  0.022727  0.139535
lossReasonCoded_C4   0.352941  0.386364  0.488372
lossReasonCoded_C10  0.176471  0.068182  0.023256
lossReasonCoded_C3   0.529412  0.500000  0.418605
lossReasonCoded_C1   0.058824  0.113636  0.069767
Thematic analysis of the answers in support of reimbursement in scenario loss.
CodeDEUKUS
Banks have good security that should have prevented fraud$0.0%$$4.5%$$0.0%$
Depending on the T&C of the bank$0.0%$$0.0%$$9.3%$
Insuarance will compensate her$0.0%$$2.3%$$14.0%$
People are protected from fraud by the bank.$35.3%$$38.6%$$48.8%$
She did not authorized the transaction$17.6%$$6.8%$$2.3%$
The theft was reported swiftly$52.9%$$50.0%$$41.9%$
Yes, because the bank can proove it wasn't her, due to CCTV at ATM$5.9%$$11.4%$$7.0%$
loss True Rev
                             de        en     en-us
lossReasonRevCoded_C7  0.000000  0.000000  0.018519
lossReasonRevCoded_C4  0.000000  0.000000  0.018519
lossReasonRevCoded_C2  0.310345  0.611111  0.981481
lossReasonRevCoded_C5  0.000000  0.027778  0.000000
lossReasonRevCoded_C3  0.862069  0.638889  0.074074
lossReasonRevCoded_C1  0.000000  0.166667  0.000000
Thematic analysis of the answers in support of reimbursement in scenario loss, after the participants have seen the T&Cs..
CodeDEUKUS
However, it's hard for debit, as opposed to debit, cards$0.0%$$0.0%$$1.9%$
Insurance will reimburse her$0.0%$$0.0%$$1.9%$
She reported the card stolen within the time limits$31.0%$$61.1%$$98.1%$
She used the landline to report the incident$0.0%$$2.8%$$0.0%$
The card was stolen, the transaction was unauthorized, It's fraud$86.2%$$63.9%$$7.4%$
Yes, if it can be proved that the card was stolen$0.0%$$16.7%$$0.0%$
loss False 
                              de   en     en-us
lossReasonCoded_CODE2   0.041667  0.2  0.230769
lossReasonCoded_CODE7   0.000000  0.1  0.230769
lossReasonCoded_C0      0.041667  0.0  0.000000
lossReasonCoded_CODE6   0.250000  0.2  0.307692
lossReasonCoded_CODE9   0.000000  0.2  0.000000
lossReasonCoded_CODE4   0.041667  0.2  0.230769
lossReasonCoded_CODE1   0.291667  0.1  0.000000
lossReasonCoded_CODE20  0.583333  0.1  0.000000
Thematic analysis of the answers not in support of reimbursement in scenario loss.
CodeDEUKUS
Common perception that the customer looses$4.2%$$20.0%$$23.1%$
Debit, as opposed to credit, cards do not have fraud protection$0.0%$$10.0%$$23.1%$
Don't know/unsure$4.2%$$0.0%$$0.0%$
Her mistake$25.0%$$20.0%$$30.8%$
Her purse is not insured, thief must be caught$0.0%$$20.0%$$0.0%$
Money cannot be retrieved once it leaves someone's account$4.2%$$20.0%$$23.1%$
She may have been grossly negligent$29.2%$$10.0%$$0.0%$
She waited to long before notifying her bank$58.3%$$10.0%$$0.0%$
loss False Rev
                                de        en  en-us
lossReasonRevCoded_Code3  0.000000  0.111111    1.0
lossReasonRevCoded_Code1  0.166667  0.666667    0.0
lossReasonRevCoded_Code4  0.833333  0.388889    0.0
Thematic analysis of the answers not in support of reimbursement in scenario loss, after the participants have seen the T&Cs..
CodeDEUKUS
It is difficult to recover the money$0.0%$$11.1%$$100.0%$
PIN might have been written down in her purse$16.7%$$66.7%$$0.0%$
She was grossly negligent as she lost her card and failed to immediately cancel it$83.3%$$38.9%$$0.0%$
In [88]:
ttt
Out[88]:
de en en-us
lossReasonRevCoded_Code3 0.000000 0.111111 1.0
lossReasonRevCoded_Code1 0.166667 0.666667 0.0
lossReasonRevCoded_Code4 0.833333 0.388889 0.0
In [ ]: