| Document | are | ate | cat | cheese | delicious | dog | mice | mouse | silly | the | was |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mice are silly | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| The cat ate the mouse | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 |
| The cheese was delicious | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| The dog ate the cat | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 0 |
Lecture 26
Cornell University
INFO 2951 - Spring 2025
April 29, 2025
Bet!!! I just LOWKEY vibed w/ this 10/10 fire idea, and no cap, it’s honestly giving major slay energy — so I’m finna drop it, like fr, and let y’all totally stan. 🤙
Sentence generated by ChatGPT
Bet!!! I just LOWKEY vibed w/ this 10/10 fire idea, and no cap, it’s honestly giving major slay energy — so I’m finna drop it, like fr, and let y’all totally stan. 🤙
bet!!! i just lowkey vibed w/ this 10/10 fire idea, and no cap, it’s honestly giving major slay energy — so i’m finna drop it, like fr, and let y’all totally stan. 🤙
bet!!! i just lowkey vibed w/ this 10/10 fire idea, and no cap, it’s honestly giving major slay energy — so i’m finna drop it, like fr, and let y’all totally stan. 🤙
bet!!! i just lowkey vibed w/ this / fire idea, and no cap, it’s honestly giving major slay energy — so i’m finna drop it, like fr, and let y’all totally stan. 🤙
bet!!! i just lowkey vibed w/ this / fire idea, and no cap, it’s honestly giving major slay energy — so i’m finna drop it, like fr, and let y’all totally stan. 🤙
bet i just lowkey vibed w this fire idea and no cap its honestly giving major slay energy so im finna drop it like fr and let yall totally stan 🤙
bet i just lowkey vibed w this fire idea and no cap its honestly giving major slay energy so im finna drop it like fr and let yall totally stan 🤙
bet lowkey vibed fire idea cap honestly giving major slay energy im finna drop fr yall totally stan 🤙
bet lowkey vibed fire idea cap honestly giving major slay energy im finna drop fr yall totally stan 🤙
bet lowkei vibe fire idea cap honestli give major slai energi im finna drop fr yall total stan 🤙
Convert the text string into some sort of quantifiable measures
Image credit: DALL·E
Bet!!! I just LOWKEY vibed w/ this 10/10 fire idea, and no cap, it’s honestly giving major slay energy — so I’m finna drop it, like fr, and let y’all totally stan. 🤙
It, this — bet!!! And let vibed I’m so fr, 🤙 stan. 10/10 totally w/ it’s honestly I slay just fire cap, and finna y’all lowkey no idea, major like giving energy drop
Stan. this fire and it’s totally 🤙 like w/ bet!!! I’m major and finna idea, slay giving vibed lowkey no y’all — energy drop cap, I it, just honestly so fr, 10/10 let
Honestly bet!!! It’s finna so giving totally and lowkey major w/ no this I’m vibed energy and — y’all just 🤙 slay like let stan. drop it, fr, fire I cap, 10/10 idea,
This bet!!! Lowkey y’all cap, slay vibed it’s I idea, honestly fire and giving it, fr, I’m 10/10 finna — let stan. 🤙 like energy no just totally drop w/ so and major
Order is meaningless.
| Document | are | ate | cat | cheese | delicious | dog | mice | mouse | silly | the | was |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mice are silly | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| The cat ate the mouse | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 |
| The cheese was delicious | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| The dog ate the cat | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 0 |
Term frequency: raw count of term in a document
Inverse document frequency:
\[idf(\text{term}) = \ln{\left(\frac{n_{\text{documents}}}{n_{\text{documents containing term}}}\right)}\]
tf-idf = term frequency \(\times\) inverse document frequency
Frequency of a term adjusted for how rarely it is used
More information: Text Mining with R
| Document | are | ate | cat | cheese | delicious | dog | mice | mouse | silly | the | was |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mice are silly | 0.462 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.462 | 0.000 | 0.462 | 0.000 | 0.000 |
| The cat ate the mouse | 0.000 | 0.139 | 0.139 | 0.000 | 0.000 | 0.000 | 0.000 | 0.277 | 0.000 | 0.115 | 0.000 |
| The cheese was delicious | 0.000 | 0.000 | 0.000 | 0.347 | 0.347 | 0.000 | 0.000 | 0.000 | 0.000 | 0.072 | 0.347 |
| The dog ate the cat | 0.000 | 0.139 | 0.139 | 0.000 | 0.000 | 0.277 | 0.000 | 0.000 | 0.000 | 0.115 | 0.000 |
Word embedding: a mathematical representation of a word in a continuous vector space
Image credit: Solving the Embedding Mystery!
| word | d1 | d2 | d3 | d4 | d5 | d6 | d7 | d8 | d9 | d10 | d11 | d12 | d13 | d14 | d15 | d16 | d17 | d18 | d19 | d20 | d21 | d22 | d23 | d24 | d25 | d26 | d27 | d28 | d29 | d30 | d31 | d32 | d33 | d34 | d35 | d36 | d37 | d38 | d39 | d40 | d41 | d42 | d43 | d44 | d45 | d46 | d47 | d48 | d49 | d50 | d51 | d52 | d53 | d54 | d55 | d56 | d57 | d58 | d59 | d60 | d61 | d62 | d63 | d64 | d65 | d66 | d67 | d68 | d69 | d70 | d71 | d72 | d73 | d74 | d75 | d76 | d77 | d78 | d79 | d80 | d81 | d82 | d83 | d84 | d85 | d86 | d87 | d88 | d89 | d90 | d91 | d92 | d93 | d94 | d95 | d96 | d97 | d98 | d99 | d100 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| are | -0.51533000 | 0.831860 | 0.22457 | -0.738650 | 0.187180 | 0.260210 | -0.42564 | 0.671210 | -0.310840 | -0.612750 | 0.089526 | -0.240110 | 1.18780 | 0.676090 | -0.022885 | -0.92533 | 0.071174 | 0.388370 | -0.4292400 | 0.371440 | 0.326710 | 0.431410 | 0.874950 | 0.3400900 | -0.23189 | -0.41144 | 0.490610 | -0.32906 | -0.491090 | -0.189880 | 0.334080 | -0.212450 | -0.383860 | -0.080547 | 1.116100 | 0.236170 | 0.313330 | 0.492860 | 0.100000 | -0.151310 | -0.141760 | -0.280200 | -0.23880 | -0.35486 | 0.18282 | -0.191340 | 0.60544 | 0.074573 | -0.207310 | -0.609650 | 0.199080 | -0.570240 | -0.174270 | 1.441900 | -0.250190 | -1.86480 | 0.416710 | -0.246070 | 1.5010000 | 0.874150 | -0.671350 | 1.27620 | -0.272100 | 0.17583 | 1.22420 | 0.28242 | 0.6237500 | 0.6395100 | 0.369140 | -0.846770 | -0.322700 | -0.671520 | -0.1963500 | -0.4078900 | -0.209660 | -0.19623 | 0.041885 | 0.539670 | -1.110500 | -0.395150 | 0.66590000 | -0.233000 | -1.082000 | 0.046465 | -2.09930 | -0.284930 | 0.0800250 | -0.129630 | -0.30011 | -0.467640 | -0.818310 | -0.048509 | -0.32233 | -0.320130 | -1.1207000 | -0.056788 | -0.730040 | -1.20240 | 1.130400 | 0.347900 |
| ate | -0.08029200 | 0.659240 | 0.35281 | 0.034911 | -0.944040 | 0.306810 | 0.60626 | 0.390930 | 0.228050 | -0.710910 | 0.322700 | 0.499360 | 0.39814 | 0.611360 | -0.010969 | -0.09097 | -0.421980 | -0.080869 | -0.3649000 | 0.074443 | 0.544210 | 0.350360 | 0.010708 | -0.5578100 | -0.23541 | 0.16357 | -0.941980 | -0.15397 | -0.361850 | 0.138090 | 0.351410 | 1.066500 | 0.545790 | 0.056154 | 0.332340 | 1.009100 | 0.029193 | 0.526120 | 0.161590 | -0.344020 | -0.029192 | -0.413610 | -0.20168 | -0.16338 | -0.13938 | 0.378120 | -0.54910 | 0.109800 | 0.152180 | -0.739240 | -0.034577 | -0.202590 | 0.304410 | 0.423220 | -0.975890 | -0.25193 | -0.411190 | 0.126880 | 0.0158810 | 0.390360 | 0.365970 | 1.35690 | 0.047675 | -0.62382 | -0.32479 | -0.10494 | 0.0878120 | -0.7758900 | 0.433540 | 0.222770 | -0.200040 | 0.013524 | 0.7980100 | 0.5074600 | -0.716180 | 0.92140 | -0.960170 | -0.785590 | 0.048053 | 0.730540 | 0.25351000 | 0.257890 | -0.824790 | 0.181390 | -0.66272 | -0.886150 | 0.0548580 | -0.086880 | -0.77234 | 0.432990 | 0.714370 | -0.881040 | 0.43407 | -0.066353 | -0.9752000 | -0.907160 | 0.147380 | 0.03475 | 0.384050 | 0.175360 |
| cat | 0.23088000 | 0.282830 | 0.63180 | -0.594110 | -0.585990 | 0.632550 | 0.24402 | -0.141080 | 0.060815 | -0.789800 | -0.291020 | 0.142870 | 0.72274 | 0.204280 | 0.140700 | 0.98757 | 0.525330 | 0.097456 | 0.8822000 | 0.512210 | 0.402040 | 0.211690 | -0.013109 | -0.7161600 | 0.55387 | 1.14520 | -0.880440 | -0.50216 | -0.228140 | 0.023885 | 0.107200 | 0.083739 | 0.550150 | 0.584790 | 0.758160 | 0.457060 | -0.280010 | 0.252250 | 0.689650 | -0.609720 | 0.195780 | 0.044209 | -0.31136 | -0.68826 | -0.22721 | 0.461850 | -0.77162 | 0.102080 | 0.556360 | 0.067417 | -0.572070 | 0.237350 | 0.471700 | 0.827650 | -0.292630 | -1.34220 | -0.099277 | 0.281390 | 0.4160400 | 0.105830 | 0.622030 | 0.89496 | -0.234460 | 0.51349 | 0.99379 | 1.18460 | -0.1636400 | 0.2065300 | 0.738540 | 0.240590 | -0.964730 | 0.134810 | -0.0072484 | 0.3301600 | -0.123650 | 0.27191 | -0.409510 | 0.021909 | -0.606900 | 0.407550 | 0.19566000 | -0.418020 | 0.186360 | -0.032652 | -0.78571 | -0.138470 | 0.0440070 | -0.084423 | 0.04911 | 0.241040 | 0.452730 | -0.186820 | 0.46182 | 0.089068 | -0.1818500 | -0.015230 | -0.736800 | -0.14532 | 0.151040 | -0.714930 |
| cheese | -0.63712000 | 0.605150 | -0.19317 | 0.116060 | -0.410510 | 0.129780 | 1.74050 | 0.053119 | 0.208400 | -0.536420 | 0.061240 | -0.027045 | -0.17595 | 1.296300 | 0.416620 | 0.90429 | 0.384430 | -0.615150 | -0.4669300 | 0.618620 | -0.597650 | 0.886310 | -0.374760 | -0.9017800 | -0.16541 | 1.00080 | 0.070107 | -0.38194 | -0.620150 | -0.412870 | 0.046083 | 0.613130 | -0.560240 | -0.593780 | 0.055440 | 0.622950 | 0.193900 | -0.214870 | 0.110400 | -1.433400 | 1.016800 | -1.591000 | -0.64335 | -0.88056 | -0.13692 | -0.166660 | 0.37185 | -0.198730 | -0.105600 | -0.647160 | -0.162720 | -0.266330 | -0.604040 | 0.677650 | -1.660300 | -0.76015 | -0.592030 | 0.690610 | 0.0982840 | 0.090139 | 0.970170 | 0.63826 | 0.700190 | -0.07888 | 0.77505 | -0.59275 | 0.0099363 | 0.1458000 | 0.090962 | -0.997450 | -0.332210 | 0.605890 | 0.6329000 | 0.4926700 | 0.312280 | 0.90852 | -0.434890 | -0.319390 | 0.835890 | 0.832720 | 0.47300000 | 0.053605 | -0.429040 | 0.330060 | 0.11979 | -1.012000 | -0.3595800 | 0.190870 | 0.53706 | -0.605020 | 0.014610 | 0.136870 | -1.18810 | -0.222550 | -0.9175600 | -1.289900 | 0.186770 | -0.27083 | 1.303300 | 0.036128 |
| delicious | -0.65534000 | 0.340340 | 0.30284 | -0.148540 | 0.176830 | 0.337250 | 0.51254 | 0.047677 | 0.203640 | -0.169770 | 0.064244 | -0.030980 | 0.29266 | 0.256680 | 0.266270 | 0.55210 | -0.199290 | -0.455120 | 0.0758580 | 0.672750 | 0.074552 | 0.212680 | 0.043048 | -0.9397500 | 0.16909 | 1.26090 | -0.118490 | 0.19958 | -0.780670 | -0.968800 | -0.273490 | 0.471600 | -0.011452 | -0.742100 | 0.413170 | 0.604600 | -0.075988 | 0.218740 | 0.186800 | -1.350800 | 0.686080 | -0.138280 | -0.29852 | -0.72438 | 0.56742 | 0.317580 | -0.11389 | -0.063852 | 0.062136 | -0.102100 | 0.309080 | -0.538150 | 0.341190 | 0.019077 | -0.991060 | -1.00930 | 0.773920 | 0.453050 | 0.0667420 | -0.897930 | -0.490000 | 1.16020 | -0.293620 | -0.31742 | 0.22462 | -1.19390 | 0.2820300 | -0.5876100 | -0.109370 | -0.941000 | -0.046886 | 0.327370 | 0.2178300 | 0.5369800 | -0.200270 | 1.17190 | -0.669520 | -0.533590 | 0.405850 | 0.336610 | -0.12291000 | -0.188850 | -0.452200 | 0.605610 | -0.46547 | -0.441810 | 0.2503800 | 0.173040 | -0.51647 | -0.225460 | 0.164590 | 0.279910 | -0.42529 | -0.468750 | -1.1439000 | -0.615680 | -0.426700 | -0.68853 | 0.089564 | 0.723000 |
| dog | 0.30817000 | 0.309380 | 0.52803 | -0.925430 | -0.736710 | 0.634750 | 0.44197 | 0.102620 | -0.091420 | -0.566070 | -0.532700 | 0.201300 | 0.77040 | -0.139830 | 0.137270 | 1.11280 | 0.893010 | -0.178690 | -0.0019722 | 0.572890 | 0.594790 | 0.504280 | -0.289910 | -1.3491000 | 0.42756 | 1.27480 | -1.161300 | -0.41084 | 0.042804 | 0.548660 | 0.188970 | 0.375900 | 0.580350 | 0.669750 | 0.811560 | 0.938640 | -0.510050 | -0.070079 | 0.828190 | -0.353460 | 0.210860 | -0.244120 | -0.16554 | -0.78358 | -0.48482 | 0.389680 | -0.86356 | -0.016391 | 0.319840 | -0.492460 | -0.069363 | 0.018869 | -0.098286 | 1.312600 | -0.121160 | -1.23990 | -0.091429 | 0.352940 | 0.6464500 | 0.089642 | 0.702940 | 1.12440 | 0.386390 | 0.52084 | 0.98787 | 0.79952 | -0.3462500 | 0.1409500 | 0.801670 | 0.209870 | -0.860070 | -0.153080 | 0.0745230 | 0.4081600 | 0.019208 | 0.51587 | -0.344280 | -0.245250 | -0.779840 | 0.274250 | 0.22418000 | 0.201640 | 0.017431 | -0.014697 | -1.02350 | -0.396950 | -0.0056188 | 0.305690 | 0.31748 | 0.021404 | 0.118370 | -0.113190 | 0.42456 | 0.534050 | -0.1671700 | -0.271850 | -0.625500 | 0.12883 | 0.625290 | -0.520860 |
| mice | 0.00063935 | 0.275940 | 0.11937 | -0.587170 | -0.732070 | 0.364360 | 0.73082 | 0.194790 | -0.456630 | -0.712230 | -0.462910 | 0.354310 | 0.41265 | 0.011087 | 0.704830 | 1.15380 | -0.865050 | 0.747780 | 1.0898000 | -0.136560 | -0.215850 | -0.608840 | 0.068820 | -0.2693900 | -0.14702 | 0.23594 | -0.362450 | -0.80454 | -0.619630 | 0.478210 | 0.721450 | 0.343340 | -0.329530 | 0.190550 | 1.033400 | 0.230030 | 0.115860 | 0.874050 | -0.253240 | 0.421480 | -0.464190 | -0.243130 | -1.36830 | -0.28809 | -0.18192 | 0.294360 | 0.33680 | -0.068659 | -0.929580 | -0.135920 | -0.850740 | -0.245050 | 0.089080 | 0.628800 | 0.069943 | -0.72037 | -0.561120 | -0.256980 | -0.5670900 | -0.195380 | 0.013889 | 1.16350 | 0.238500 | -0.12460 | 0.50788 | 1.59060 | -0.3817100 | 0.3070000 | 0.738250 | 0.060485 | 0.065348 | -0.019585 | 0.4766500 | 0.2848400 | -0.783970 | 0.29604 | 0.098664 | -0.142200 | -0.128560 | 0.357240 | 0.18805000 | -0.272090 | -1.156600 | 1.092900 | -1.53750 | 0.345480 | 1.5179000 | -0.030003 | -0.95319 | 0.416920 | -0.111090 | -0.608480 | 0.58638 | 0.179360 | -0.4151700 | -0.343450 | -0.857680 | -0.81315 | 0.254300 | -1.163200 |
| mouse | -0.09320700 | 0.049685 | 0.25748 | -0.525010 | -0.180090 | 0.468880 | 0.26035 | -0.484460 | -0.020865 | -1.021200 | -0.642040 | 0.062146 | 0.17611 | -0.521840 | 0.589680 | 1.54660 | -0.418890 | 0.750560 | 1.2493000 | -0.252390 | -0.275400 | 0.094360 | 0.658510 | -0.5618800 | 0.89223 | 0.82503 | -0.589030 | -0.70064 | -0.229580 | 0.036496 | 0.385330 | 0.822370 | 0.028273 | 0.533260 | 1.044000 | 0.413500 | -0.626240 | -0.199070 | 0.626840 | -0.193680 | 0.071461 | -0.056608 | -0.62716 | -0.21990 | -0.70554 | 0.756930 | -0.33047 | 0.248220 | -0.334600 | 0.413430 | -0.508890 | 0.171170 | 0.193200 | 0.417950 | -0.204310 | -1.48530 | -0.821540 | 0.069956 | 0.0020854 | 0.310960 | 0.452840 | 1.14810 | 0.089534 | 0.17282 | 0.56481 | 1.00160 | -0.3856100 | 0.2381400 | 0.659000 | 0.207000 | -0.136880 | 0.049653 | 0.0198350 | -0.6654400 | -0.365960 | 0.39073 | -0.183770 | 0.218370 | 0.042889 | 0.791930 | -0.09979700 | -0.206130 | -0.446030 | 0.172250 | -1.25740 | 1.084900 | 0.9162000 | -0.176950 | 0.56489 | -0.017692 | -0.045254 | 0.458630 | 0.47844 | -0.160780 | 0.0030882 | -0.092954 | -0.496070 | -0.58809 | 0.777270 | -0.670310 |
| silly | -0.08140800 | 0.059552 | 0.77880 | -0.646800 | -0.615850 | 0.647310 | -0.44597 | 0.308900 | -0.071626 | 0.266020 | 0.161110 | -0.040699 | -0.43499 | -0.134010 | 0.688020 | 0.53160 | -0.762000 | 0.814480 | 0.2602000 | 0.574170 | 0.828190 | 0.422930 | 0.305790 | -1.0311000 | 0.32201 | 0.68830 | -0.553720 | 0.13781 | -0.330430 | -0.024804 | -0.302030 | 0.399540 | 0.156220 | -0.948060 | -0.572130 | 0.460430 | -0.856440 | -0.653490 | 0.165680 | -0.346040 | 0.387710 | 0.912410 | -0.33025 | -0.41045 | -0.74941 | -0.215180 | 0.26530 | 0.523260 | -0.462110 | -0.477560 | 0.405750 | -0.187820 | 0.177040 | -0.039180 | -0.760020 | -1.10750 | 0.447030 | 0.884780 | 0.1169300 | 0.070433 | -0.093688 | 0.66467 | -0.649070 | 0.26288 | 0.27458 | -0.52282 | 1.0216000 | 0.0037161 | -0.361660 | -0.236730 | -0.269150 | -0.207520 | 0.0701320 | -0.0048971 | -0.583350 | 0.53387 | -0.570200 | 0.355030 | -0.083076 | 0.180800 | -0.04327600 | -0.325590 | 0.436960 | -0.069350 | -1.72520 | -0.085043 | -0.5303200 | 0.148600 | -0.13186 | 0.054436 | -0.264000 | 0.316100 | -0.24254 | -0.560520 | -0.0719670 | 0.051976 | -1.059800 | -0.11550 | -0.540620 | 0.194170 |
| the | -0.03819400 | -0.244870 | 0.72812 | -0.399610 | 0.083172 | 0.043953 | -0.39141 | 0.334400 | -0.575450 | 0.087459 | 0.287870 | -0.067310 | 0.30906 | -0.263840 | -0.132310 | -0.20757 | 0.333950 | -0.338480 | -0.3174300 | -0.483360 | 0.146400 | -0.373040 | 0.345770 | 0.0520410 | 0.44946 | -0.46971 | 0.026280 | -0.54155 | -0.155180 | -0.141070 | -0.039722 | 0.282770 | 0.143930 | 0.234640 | -0.310210 | 0.086173 | 0.203970 | 0.526240 | 0.171640 | -0.082378 | -0.717870 | -0.415310 | 0.20335 | -0.12763 | 0.41367 | 0.551870 | 0.57908 | -0.334770 | -0.365590 | -0.548570 | -0.062892 | 0.265840 | 0.302050 | 0.997750 | -0.804810 | -3.02430 | 0.012540 | -0.369420 | 2.2167000 | 0.722010 | -0.249780 | 0.92136 | 0.034514 | 0.46745 | 1.10790 | -0.19358 | -0.0745750 | 0.2335300 | -0.052062 | -0.220440 | 0.057162 | -0.158060 | -0.3079800 | -0.4162500 | 0.379720 | 0.15006 | -0.532120 | -0.205500 | -1.252600 | 0.071624 | 0.70565000 | 0.497440 | -0.420630 | 0.261480 | -1.53800 | -0.302230 | -0.0734380 | -0.283120 | 0.37104 | -0.252170 | 0.016215 | -0.017099 | -0.38984 | 0.874240 | -0.7256900 | -0.510580 | -0.520280 | -0.14590 | 0.827800 | 0.270620 |
| was | 0.13717000 | -0.542870 | 0.19419 | -0.299530 | 0.175450 | 0.084672 | 0.67752 | 0.098295 | -0.035611 | 0.213340 | 0.516630 | 0.206870 | 0.44082 | -0.336550 | 0.560250 | -0.68790 | 0.519570 | -0.212580 | -0.5270800 | -0.122490 | 0.330990 | 0.026448 | 0.590070 | 0.0065469 | 0.45405 | -0.33884 | -0.282610 | -0.24633 | 0.108470 | 0.316400 | -0.153680 | 0.735030 | 0.118580 | 0.708420 | 0.075081 | 0.297380 | -0.113950 | 0.408070 | -0.042531 | -0.213010 | -0.798490 | -0.127030 | 0.75200 | -0.41746 | 0.46615 | -0.039097 | 0.65961 | -0.323360 | 0.442000 | -0.941370 | -0.231250 | -0.306040 | 0.799120 | 1.458100 | -0.881990 | -3.00410 | -0.752430 | -0.205030 | 1.1998000 | 0.948810 | 0.306490 | 0.48411 | -0.757200 | 0.65856 | 0.70107 | -0.93141 | 0.5292800 | 0.2332300 | 0.188570 | 0.386910 | 0.011489 | -0.319370 | 0.0118580 | 0.2294400 | 0.177640 | 0.16868 | 0.140030 | 0.586470 | -1.544700 | -0.064425 | -0.00064711 | 0.136060 | -0.326950 | 0.100430 | -1.54600 | -0.547600 | 0.2102700 | -0.671950 | -0.15970 | -0.682710 | -0.220430 | -0.870880 | -0.16248 | 0.830860 | -0.2304500 | 0.198640 | -0.051892 | -0.52057 | 0.254340 | -0.237590 |
| Document | d1 | d2 | d3 | d4 | d5 | d6 | d7 | d8 | d9 | d10 | d11 | d12 | d13 | d14 | d15 | d16 | d17 | d18 | d19 | d20 | d21 | d22 | d23 | d24 | d25 | d26 | d27 | d28 | d29 | d30 | d31 | d32 | d33 | d34 | d35 | d36 | d37 | d38 | d39 | d40 | d41 | d42 | d43 | d44 | d45 | d46 | d47 | d48 | d49 | d50 | d51 | d52 | d53 | d54 | d55 | d56 | d57 | d58 | d59 | d60 | d61 | d62 | d63 | d64 | d65 | d66 | d67 | d68 | d69 | d70 | d71 | d72 | d73 | d74 | d75 | d76 | d77 | d78 | d79 | d80 | d81 | d82 | d83 | d84 | d85 | d86 | d87 | d88 | d89 | d90 | d91 | d92 | d93 | d94 | d95 | d96 | d97 | d98 | d99 | d100 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| The dog ate the cat | 0.3823700 | 0.761710 | 2.96888 | -2.283849 | -2.100396 | 1.662016 | 0.50943 | 1.021270 | -0.953455 | -1.891862 | 0.074720 | 0.708910 | 2.50940 | 0.148130 | 0.002381 | 1.59426 | 1.664260 | -0.839063 | -0.1195322 | 0.192823 | 1.833840 | 0.320250 | 0.399229 | -2.518988 | 1.64494 | 1.64415 | -2.931160 | -2.15007 | -0.857546 | 0.428495 | 0.568136 | 2.091679 | 1.964150 | 1.779974 | 1.281640 | 2.577146 | -0.352927 | 1.760771 | 2.022710 | -1.471956 | -1.058292 | -1.444141 | -0.27188 | -1.89048 | -0.02407 | 2.333390 | -1.02612 | -0.474051 | 0.297200 | -2.261423 | -0.801794 | 0.585309 | 1.281924 | 4.558970 | -2.999300 | -8.88263 | -0.576816 | 0.022370 | 5.511771 | 2.029852 | 1.191380 | 5.21898 | 0.268633 | 1.34541 | 3.87267 | 1.49202 | -0.5712280 | 0.0386500 | 1.869626 | 0.232350 | -1.910516 | -0.320866 | 0.2493246 | 0.4132800 | -0.061182 | 2.00930 | -2.778200 | -1.419931 | -3.843887 | 1.555588 | 2.084650 | 1.036390 | -1.462259 | 0.657001 | -5.54793 | -2.026030 | -0.0536298 | -0.431853 | 0.33633 | 0.191094 | 1.317900 | -1.215248 | 0.54077 | 2.305245 | -2.775600 | -2.215400 | -2.255480 | -0.27354 | 2.815980 | -0.519190 |
| The cat ate the mouse | -0.0190070 | 0.502015 | 2.69833 | -1.883429 | -1.543776 | 1.496146 | 0.32781 | 0.434190 | -0.882900 | -2.346992 | -0.034620 | 0.569756 | 1.91511 | -0.233880 | 0.454791 | 2.02806 | 0.352360 | 0.090187 | 1.1317400 | -0.632457 | 0.963650 | -0.089670 | 1.347649 | -1.731768 | 2.10961 | 1.19438 | -2.358890 | -2.43987 | -1.129930 | -0.083669 | 0.764496 | 2.538149 | 1.412073 | 1.643484 | 1.514080 | 2.052006 | -0.469117 | 1.631780 | 1.821360 | -1.312176 | -1.197691 | -1.256629 | -0.73350 | -1.32680 | -0.24479 | 2.700640 | -0.49303 | -0.209440 | -0.357240 | -1.355533 | -1.241321 | 0.737610 | 1.573410 | 3.664320 | -3.082450 | -9.12803 | -1.306927 | -0.260614 | 4.867406 | 2.251170 | 0.941280 | 5.24268 | -0.028223 | 0.99739 | 3.44961 | 1.69410 | -0.6105880 | 0.1358400 | 1.726956 | 0.229480 | -1.187326 | -0.118133 | 0.1946366 | -0.6603200 | -0.446350 | 1.88416 | -2.617690 | -0.956311 | -3.021158 | 2.073268 | 1.760673 | 0.628620 | -1.925720 | 0.843948 | -5.78183 | -0.544180 | 0.8681890 | -0.914493 | 0.58374 | 0.151998 | 1.154276 | -0.643428 | 0.59465 | 1.610415 | -2.605342 | -2.036504 | -2.126050 | -0.99046 | 2.967960 | -0.668640 |
| Mice are silly | -0.5960986 | 1.167352 | 1.12274 | -1.972620 | -1.160740 | 1.271880 | -0.14079 | 1.174900 | -0.839096 | -1.058960 | -0.212274 | 0.073501 | 1.16546 | 0.553167 | 1.369965 | 0.76007 | -1.555876 | 1.950630 | 0.9207600 | 0.809050 | 0.939050 | 0.245500 | 1.249560 | -0.960400 | -0.05690 | 0.51280 | -0.425560 | -0.99579 | -1.441150 | 0.263526 | 0.753500 | 0.530430 | -0.557170 | -0.838057 | 1.577370 | 0.926630 | -0.427250 | 0.713420 | 0.012440 | -0.075870 | -0.218240 | 0.389080 | -1.93735 | -1.05340 | -0.74851 | -0.112160 | 1.20754 | 0.529174 | -1.599000 | -1.223130 | -0.245910 | -1.003110 | 0.091850 | 2.031520 | -0.940267 | -3.69267 | 0.302620 | 0.381730 | 1.050840 | 0.749203 | -0.751149 | 3.10437 | -0.682670 | 0.31411 | 2.00666 | 1.35020 | 1.2636400 | 0.9502261 | 0.745730 | -1.023015 | -0.526502 | -0.898625 | 0.3504320 | -0.1279471 | -1.576980 | 0.63368 | -0.429651 | 0.752500 | -1.322136 | 0.142890 | 0.810674 | -0.830680 | -1.801640 | 1.070015 | -5.36200 | -0.024493 | 1.0676050 | -0.011033 | -1.38516 | 0.003716 | -1.193400 | -0.340889 | 0.02151 | -0.701290 | -1.607837 | -0.348262 | -2.647520 | -2.13105 | 0.844080 | -0.621130 |
| The cheese was delicious | -1.1934840 | 0.157750 | 1.03198 | -0.731620 | 0.024942 | 0.595655 | 2.53915 | 0.533491 | -0.199021 | -0.405391 | 0.929984 | 0.081535 | 0.86659 | 0.952590 | 1.110830 | 0.56092 | 1.038660 | -1.621330 | -1.2355820 | 0.685520 | -0.045708 | 0.752398 | 0.604128 | -1.782942 | 0.90719 | 1.45315 | -0.304713 | -0.97024 | -1.447530 | -1.206340 | -0.420809 | 2.102530 | -0.309182 | -0.392820 | 0.233481 | 1.611103 | 0.207932 | 0.938180 | 0.426309 | -3.079588 | 0.186520 | -2.271620 | 0.01348 | -2.15003 | 1.31032 | 0.663693 | 1.49665 | -0.920712 | 0.032946 | -2.239200 | -0.147782 | -0.844680 | 0.838320 | 3.152577 | -4.338160 | -7.79785 | -0.558000 | 0.569210 | 3.581526 | 0.863029 | 0.536880 | 3.20393 | -0.316116 | 0.72971 | 2.80864 | -2.91164 | 0.7466713 | 0.0249500 | 0.118100 | -1.771980 | -0.310445 | 0.455830 | 0.5546080 | 0.8428400 | 0.669370 | 2.39916 | -1.496500 | -0.472010 | -1.555560 | 1.176529 | 1.055093 | 0.498255 | -1.628820 | 1.297580 | -3.42968 | -2.303640 | 0.0276320 | -0.591160 | 0.23193 | -1.765360 | -0.025015 | -0.471199 | -2.16571 | 1.013800 | -3.017600 | -2.217520 | -0.812102 | -1.62583 | 2.475004 | 0.792158 |
[1] "transworld systems inc. \nis trying to collect a debt that is not mine, not owed and is inaccurate."
[2] "I would like to request the suppression of the following items from my credit report, which are the result of my falling victim to identity theft. This information does not relate to [ transactions that I have made/accounts that I have opened ], as the attached supporting documentation can attest. As such, it should be blocked from appearing on my credit report pursuant to section 605B of the Fair Credit Reporting Act."
[3] "Over the past 2 weeks, I have been receiving excessive amounts of telephone calls from the company listed in this complaint. The calls occur between XXXX XXXX and XXXX XXXX to my cell and at my job. The company does not have the right to harass me at work and I want this to stop. It is extremely distracting to be told 5 times a day that I have a call from this collection agency while at work."
[4] "I was sold access to an event digitally, of which I have all the screenshots to detail the transactions, transferred the money and was provided with only a fake of a ticket. I have reported this to paypal and it was for the amount of {$21.00} including a {$1.00} fee from paypal. \n\nThis occured on XX/XX/2019, by paypal user who gave two accounts : 1 ) XXXX 2 ) XXXX XXXX"
Example credit: Supervised Machine Learning for Text Analysis in R
Document-feature matrix of: 117,214 documents, 46,099 features (99.88% sparse) and 0 docvars.
features
docs account auto bank call charg chase dai date dollar
3113204 1 1 2 2 1 1 1 3 1 1
3113208 0 1 0 6 3 5 0 0 1 1
3113804 0 0 0 0 0 0 0 2 2 0
3113805 0 1 0 0 0 0 0 0 0 0
3113807 0 2 0 0 0 1 0 0 0 0
3113808 0 0 0 0 0 0 0 0 0 0
[ reached max_ndoc ... 117,208 more documents, reached max_nfeat ... 46,089 more features ]


Image credit: Supervised Machine Learning for Text Analysis in R
# A tibble: 400,000 × 101
token d1 d2 d3 d4 d5 d6 d7 d8 d9
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "the" -0.0382 -0.245 0.728 -0.400 0.0832 0.0440 -0.391 0.334 -0.575
2 "," -0.108 0.111 0.598 -0.544 0.674 0.107 0.0389 0.355 0.0635
3 "." -0.340 0.209 0.463 -0.648 -0.384 0.0380 0.171 0.160 0.466
4 "of" -0.153 -0.243 0.898 0.170 0.535 0.488 -0.588 -0.180 -1.36
5 "to" -0.190 0.0500 0.191 -0.0492 -0.0897 0.210 -0.550 0.0984 -0.201
6 "and" -0.0720 0.231 0.0237 -0.506 0.339 0.196 -0.329 0.184 -0.181
7 "in" 0.0857 -0.222 0.166 0.134 0.382 0.354 0.0129 0.225 -0.438
8 "a" -0.271 0.0440 -0.0203 -0.174 0.644 0.712 0.355 0.471 -0.296
9 "\"" -0.305 -0.236 0.176 -0.729 -0.283 -0.256 0.266 0.0253 -0.0748
10 "'s" 0.589 -0.202 0.735 -0.683 -0.197 -0.180 -0.392 0.342 -0.606
# ℹ 399,990 more rows
# ℹ 91 more variables: d10 <dbl>, d11 <dbl>, d12 <dbl>, d13 <dbl>, d14 <dbl>,
# d15 <dbl>, d16 <dbl>, d17 <dbl>, d18 <dbl>, d19 <dbl>, d20 <dbl>,
# d21 <dbl>, d22 <dbl>, d23 <dbl>, d24 <dbl>, d25 <dbl>, d26 <dbl>,
# d27 <dbl>, d28 <dbl>, d29 <dbl>, d30 <dbl>, d31 <dbl>, d32 <dbl>,
# d33 <dbl>, d34 <dbl>, d35 <dbl>, d36 <dbl>, d37 <dbl>, d38 <dbl>,
# d39 <dbl>, d40 <dbl>, d41 <dbl>, d42 <dbl>, d43 <dbl>, d44 <dbl>, …
Word embeddings learn semantics and meaning from human speech. If the text is biased, then the embeddings will also contain bias.

text <- c(
"Yeah, with a boy like that it's serious",
"There's a boy who is so wonderful",
"That girls who see him cannot find back home",
"And the gigolos run like spiders when he comes",
"'Cause he is Eros and he's Apollo",
"Girls, with a boy like that it's serious",
"Senoritas, don't follow him",
"Soon, he will eat your hearts like cereals",
"Sweet Lolitas, don't go",
"You're still young",
"But every night they fall like dominoes",
"How he does it, only heaven knows",
"All the other men turn gay wherever he goes (wow!)"
)
text [1] "Yeah, with a boy like that it's serious"
[2] "There's a boy who is so wonderful"
[3] "That girls who see him cannot find back home"
[4] "And the gigolos run like spiders when he comes"
[5] "'Cause he is Eros and he's Apollo"
[6] "Girls, with a boy like that it's serious"
[7] "Senoritas, don't follow him"
[8] "Soon, he will eat your hearts like cereals"
[9] "Sweet Lolitas, don't go"
[10] "You're still young"
[11] "But every night they fall like dominoes"
[12] "How he does it, only heaven knows"
[13] "All the other men turn gay wherever he goes (wow!)"
# A tibble: 13 × 2
line text
<int> <chr>
1 1 Yeah, with a boy like that it's serious
2 2 There's a boy who is so wonderful
3 3 That girls who see him cannot find back home
4 4 And the gigolos run like spiders when he comes
5 5 'Cause he is Eros and he's Apollo
6 6 Girls, with a boy like that it's serious
7 7 Senoritas, don't follow him
8 8 Soon, he will eat your hearts like cereals
9 9 Sweet Lolitas, don't go
10 10 You're still young
11 11 But every night they fall like dominoes
12 12 How he does it, only heaven knows
13 13 All the other men turn gay wherever he goes (wow!)
ae-24Instructions
ae-24 (repo name will be suffixed with your GitHub name).renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.