Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'd2l.torch' has no attribute 'count_corpus' #2559

Open
wangze1219 opened this issue Sep 29, 2023 · 3 comments
Open

module 'd2l.torch' has no attribute 'count_corpus' #2559

wangze1219 opened this issue Sep 29, 2023 · 3 comments
Labels

Comments

@wangze1219
Copy link

import math
import os
import random
import torch
from d2l import torch as d2l
import os
import matplotlib.pyplot as plt

os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
#@save
d2l.DATA_HUB['ptb'] = (d2l.DATA_URL + 'ptb.zip',
'319d85e578af0cdc590547f26231e4e31cdf1e42')

#@save
def read_ptb():
"""将PTB数据集加载到文本行的列表中"""
data_dir = d2l.download_extract('ptb')
# Readthetrainingset.
with open(os.path.join(data_dir, 'ptb.train.txt')) as f:
raw_text = f.read()
return [line.split() for line in raw_text.split('\n')]

sentences = read_ptb()
f'# sentences数: {len(sentences)}'
vocab = d2l.Vocab(sentences, min_freq=10)
f'vocab size: {len(vocab)}'
#@save
def subsample(sentences, vocab):
"""下采样高频词"""
# 排除未知词元''
sentences = [[token for token in line if vocab[token] != vocab.unk]
for line in sentences]
counter = d2l.count_corpus(sentences)
num_tokens = sum(counter.values())

# 如果在下采样期间保留词元,则返回True
def keep(token):
    return(random.uniform(0, 1) <
           math.sqrt(1e-4 / counter[token] * num_tokens))

return ([[token for token in line if keep(token)] for line in sentences],
        counter)

subsampled, counter = subsample(sentences, vocab)
d2l.show_list_len_pair_hist(
['origin', 'subsampled'], '# tokens per sentence',
'count', sentences, subsampled)
plt.show()

@AnirudhDagar
Copy link
Member

Please pip install d2l==0.17.6 to use older version of d2l which has these saved functions. In the latest version, we refactored the code and removed them.

@zaffnet
Copy link

zaffnet commented Oct 27, 2023

What a nightmare. So much backward incompatibility! Could you mention which version to use at the beginning of each chapter?

@AnirudhDagar
Copy link
Member

If you are using the latest version of the book then it should work with the latest d2l package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants