A while back we announced a gadget to estimate how many characters you know.
Well, that gadget simply gave you a number, like 575 characters. But what does 575 characters mean?
In an attempt to provide some context for what some level of character knowledge means, we have produced some visualizations to follow up the estimate.
Here’s an example:
The curve illustrates the relationship between the number of characters one might know and the fraction of typical text that is comprised of those characters. Because a small number of common characters make up much of every-day Chinese, the curve is quite steep. The blue point gives the location of the estimate for a student who knows 575 characters.
How does it work?
In each case, the summary is based purely on our estimate of your overall character knowledge and not the exact characters you know. If you have tried out our knowledge estimator, you will have noticed that we only ask you whether you know a few characters. Then based on your responses we estimate your overall character knowledge.
We don’t know exactly which characters you know, just about how many overall. Nonetheless, by sampling sets of characters that someone who knows 575 characters plausibly knows, we can figure out overall what fraction of text is comprised of such characters.
What are other implications of knowing 575 characters?
But why is knowing those characters useful? Well, Even with not very many characters, you can form an awful lot of words. With 575 characters one can form over 10,000 words. That’s quite a good size vocabulary. Now you just need to learn those words!
This plot (above) illustrates the relationship between the number of characters one might know and the how many words can be formed with just those characters. The idea is to give you some sense of how many words you could learn with just the characters you know. Many of these words are probably ones you already know!
But what sort of words do I likely know?
We also provide a breakdown by HSK level:
Here we show a pie graph for each HSK level. The shaded portion indicates what fraction of words at that HSK level could be formed by characters you probably know.