Monday, April 29, 2019

Leetcode: Top K Frequent Words

Given a non-empty list of words, return the k most frequent elements.
Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.
Example 1:
Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.
    Note that "i" comes before "love" due to a lower alphabetical order.
Example 2:
Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,
    with the number of occurrence being 4, 3, 2 and 1 respectively.
Note:
  1. You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
  2. Input words contain only lowercase letters.
Follow up:
  1. Try to solve it in O(n log k) time and O(n) extra space.

Solution:

There are different ways to solve this problem. 

1. hashmap + priority queue
    
First use a hashmap to scan the whole string vector, use key value pair to record the frequency of each string.
Then use priority queue to insert the objects (pari<string, int>)in hashmap with an order which defined by
comparison function. 
   
Priority Queue :  A list of objects which always have largest value on top in given compare function. Has push(), emplace(), top(), pop().

Emplace() vs Push():

To fully understand what emplace_back does, one must first understand variadic templates and rvalue references.
This is a fairly advanced, and deep concept in modern C++. On a map, it would be labeled "there be dragons".
You say that you're new to C++ and trying to learn this stuff. This may not be the answer you might be looking for, but you should skip this detail for now, and come back later after you've wrapped your brain around variadic templates and rvalue references. Then it should all make sense.
But if you insist: for a container containing simple, elementary types like integers, there's little, if any difference. The difference comes when the container's type is some large, sophisticated class, with a complicated constructor, and/or copy-constructor.
The end result of either push or emplace is exactly, 100%, the same. The container gets another element appended to it. The difference is where the element comes from:
1) push takes an existing element, and appends a copy of it to the container. Simple, straightforward. push always takes exactly one argument, the element to copy to the container.

2) emplace creates another instance of the class in the container, that's already appended to the container. The arguments to emplace are forwarded as arguments to the container's class's constructor. Emplace can have either one argument, more than one argument, or no argument at all, if the class has a default constructor.

Time complexity: push() - O(logn) , emplace() -, top() - O(1), pop() - .
Time and space complexity could be different based on different container given for priority queue. 

Time complexity: O(n) + O(nlogk)
Space: O(n)

2. hashmap + Sorting

use hashmap do the same thing as 1). Then sort by frequency and by the alphabetic order. Then take the last k elements out of the sorted vector as result. 


class Solution {
    struct comp
    {
        comp() {}
        ~comp() {}
        bool operator() (const pair<string, int>& a, const pair<string, int>& b)
        {
            return a.second > b.second ||(a.second == b.second && a.first < b.first);
        }
    };
    
public:
    
    vector<string> topKFrequent(vector<string>& words, int k) {
        vector<string> res;
        
        if (words.size() == 0 || k == 0)
        {
            return res;
        }
        
        unordered_map<string, int> dict;
        
        for (string& s : words)
        {
            dict[s]++;
        }
        
        priority_queue<pair<string, int>, vector<pair<string, int>>, comp> pq;
        
        for (auto it : dict)
        {
            pq.emplace(it.first, it.second);
            if (pq.size() > k) pq.pop();
        }
        
        while(!pq.empty())
        {
            res.push_back(pq.top().first);
            pq.pop();
        }
        reverse(res.begin(), res.end());
        return res;
    }
};

No comments:

Post a Comment

Leetcode : Employee Free Time

We are given a list  schedule  of employees, which represents the working time for each employee. Each employee has a list of non-overlap...