As an example, lets say I wanted to list the frequency of each letter of the alphabet in a string. What would be the easiest way to do it?
This is an example of what I'm thinking of... the question is how to make allTheLetters equal to said letters without something like allTheLetters = "abcdefg...xyz". In many other languages I could just do letter++ and increment my way through the alphabet, but thus far I haven't come across a way to do that in python.
def alphCount(text):
lowerText = text.lower()
for letter in allTheLetters:
print letter + ":", lowertext.count(letter)
-
Something like this?
for letter in range(ord('a'), ord('z') + 1): print chr(letter) + ":", lowertext.count(chr(letter))(I don't speak Python; please forgive my syntax errors)
paxdiablo : I think your "letter" inside the count() should be "chr(letter)"paxdiablo : Since you fixed it (and didn't have my off-by-one bug resulting in only checking up to 'y' :-), I've deleted my answer and upvoted yours.Adam Pierce : This looks fine to me, why is it getting voted down ?John Millikin : @Adam: I temporarily voted it down to remove it from the top position and elevate Matthew's answer. It's also not very Pythonic code.paxdiablo : @John: oooh, market manipulation. Does the SEC monitor these forums? :-)From Jacob -
the question is how to make allTheLetters equal to said letters without something like allTheLetters = "abcdefg...xyz"
That's actually provided by the string module, it's not like you have to manually type it yourself ;)
import string allTheLetters = string.ascii_lowercase def alphCount(text): lowerText = text.lower() for letter in allTheLetters: print letter + ":", lowertext.count(letter)Ber : This solution is slow, since it has nested iterations (lowertext.count() iterates over the string in order to find the count)paxdiablo : However, the specific question was answered. Other problems are the original posters problem.From Matthew Trevor -
Do you mean using:
import string string.ascii_lowercasethen,
counters = dict() for letter in string.ascii_lowercase: counters[letter] = lowertext.count(letter)All lowercase letters are accounted for, missing counters will have zero value.
using generators:
counters = dict( (letter,lowertext.count(letter)) for letter in string.ascii_lowercase )From gimel -
If you just want to do a frequency count of a string, try this:
s = 'hi there' f = {} for c in s: f[c] = f.get(c, 0) + 1 print fBer : This is a very goot solution as it only iterates once over the given string, and thus is O(n) as opposed to using nested iterations. event better if you use f = defaultdict(int) and the simply f[c]+=1paxdiablo : Is the get member O(1)? If it's O(n), then the whole thing is O(n^2).S.Lott : @Pax Diablo: Mappings are hashed. Dictionary gets are O(1).From Adam Pierce -
Main question is "iterate through the alphabet":
import string for c in string.lowercase: print cHow get letter frequencies with some efficiency and without counting non-letter characters:
import string sample = "Hello there, this is a test!" letter_freq = dict((c,0) for c in string.lowercase) for c in [c for c in sample.lower() if c.isalpha()]: letter_freq[c] += 1 print letter_freqFrom mhawke -
The question you've asked (how to iterate through the alphabet) is not the same question as the problem you're trying to solve (how to count the frequency of letters in a string).
You can use string.lowercase, as other posters have suggested:
import string allTheLetters = string.lowercaseTo do things the way you're "used to", treating letters as numbers, you can use the "ord" and "chr" functions. There's absolutely no reason to ever do exactly this, but maybe it comes closer to what you're actually trying to figure out:
def getAllTheLetters(begin='a', end='z'): beginNum = ord(begin) endNum = ord(end) for number in xrange(beginNum, endNum+1): yield chr(number)You can tell it does the right thing because this code prints
True:import string print ''.join(getAllTheLetters()) == string.lowercaseBut, to solve the problem you're actually trying to solve, you want to use a dictionary and collect the letters as you go:
from collections import defaultdict def letterOccurrances(string): frequencies = defaultdict(lambda: 0) for character in string: frequencies[character.lower()] += 1 return frequenciesUse like so:
occs = letterOccurrances("Hello, world!") print occs['l'] print occs['h']This will print '3' and '1' respectively.
Note that this works for unicode as well:
# -*- coding: utf-8 -*- occs = letterOccurrances(u"héĺĺó, ẃóŕĺd!") print occs[u'l'] print occs[u'ĺ']If you were to try the other approach on unicode (incrementing through every character) you'd be waiting a long time; there are millions of unicode characters.
To implement your original function (print the counts of each letter in alphabetical order) in terms of this:
def alphCount(text): for character, count in sorted(letterOccurrances(text).iteritems()): print "%s: %s" % (character, count) alphCount("hello, world!")Ber : Excellent tutorial!hop : you really should use string.ascii_lowercase instead of writing your own getAllTheLetters. also, that is a horribly unpythonic name for a function!mhawke : Your letterOccurrances() function will also count whitespace and punctuation, perhaps not intentionally.Windows programmer : Actually the number of Unicode characters is still under a million. Also a few of them are non-alphabetic, so you want to exclude those when printing out frequencies.Windows programmer : "string.ascii_lowercase" -- I hope there's a unicode_lowercase to handle Cyrillic, Greek, etc. I hope it knows how to downcase Turkish I's correctly depending on the current locale.Tony Meyer : Rather than collections.defaultdict(lambda: 0), using collections.defaultdict(int) will do the same thing, and is clearer IMO.technomalogical : Nice solution Glyph, but I don't think your final solution solves the problem exactly the same way. In the original solution it prints "a: 0" if there were no "a"s in the original string. Yours would skip over "a" in this case, correct?From Glyph
0 comments:
Post a Comment