No, the feature vector is not converted. It contains count n_i of how
often each term t_i occurs (or a TFIDF transformation of those). You
are finding the class c such that P(c) * P(t_1c)^n_1 * ... is
maximized.
In log space it's log(P(c)) + n_1*log(P(t_1c)) + ...
So your n_1 counts (or TFIDF values) are used asis and this is where
the dot product comes from.
Your bug is probably something lowerlevel and simple. I'd debug the
Spark example and print exactly its values for the log priors and
conditional probabilities, and the matrix operations, and yours too,
and see where the difference is.
On Thu, Nov 27, 2014 at 11:37 AM, jatinpreet <jatinpreet@gmail.com> wrote:
> Hi,
>
> I have been running through some troubles while converting the code to Java.
> I have done the matrix operations as directed and tried to find the maximum
> score for each category. But the predicted category is mostly different from
> the prediction done by MLlib.
>
> I am fetching iterators of the pi, theta and testData to do my calculations.
> pi and theta are in log space while my testData vector is not, could that
> be a problem because I didn't see explicit conversion in Mllib also?
>
> For example, for two categories and 5 features, I am doing the following
> operation,
>
> [1,2] + [1 2 3 4 5 ] * [1,2,3,4,5]
> [6 7 8 9 10]
> These are simple elementwise matrix multiplication and addition operators.

To unsubscribe, email: userunsubscribe@spark.apache.org
For additional commands, email: userhelp@spark.apache.org
