Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problematic results #2

Open
yue-wu opened this issue Oct 3, 2014 · 4 comments
Open

Problematic results #2

yue-wu opened this issue Oct 3, 2014 · 4 comments

Comments

@yue-wu
Copy link

yue-wu commented Oct 3, 2014

I made a toy example to test your code, but I guess it is somewhat incorrect. The following is the code that I used.

under ipython

from sklearn.decomposition import PCA
from pyIPCA import CCIPCA, Skocaj_IPCA, Hall_IPCA
import numpy as np

make toy data

data = np.random.rand( 10000, 10 ) * 100;

use sklearn pca

ncomp = 2;
pca = PCA( n_components = 2 );
pca.fit( data );
data_pca = pca.transform( data );
pyplot.scatter( data_pca[:,0], data_pca[:,1]),pyplot.title('Sklearn-PCA'), pyplot.show()
sklearn_pca

use CCIPCA

ipca = CCIPCA( n_components = 2 );
ipca.fit( data );
idata_pca = ipca.transform( data );
pyplot.scatter( idata_pca[:,0], idata_pca[:,1]),pyplot.title('CCIPCA'), pyplot.show()
ccipca

use Skocaj_IPCA

ipca = Skocaj_IPCA( n_components = 2 );
ipca.fit( data );
idata_pca = ipca.transform( data );
pyplot.scatter( idata_pca[:,0], idata_pca[:,1]),pyplot.title('Skocaj_IPCA'), pyplot.show()
skocaj_ipca
#use Hall_IPCA
ipca = Hall_IPCA( n_components = 2 );
ipca.fit( data );
idata_pca = ipca.transform( data );
pyplot.scatter( idata_pca[:,0], idata_pca[:,1]),pyplot.title('Hall_IPCA'), pyplot.show()

hall_ipca

It seems that both CCIPCA and Skocaj_pca does not work properly, because their center after transformation is too far away from the origin (0,0) and their shapes are more like a oval rather than a circle.

By the way the Skocaj_IPCA often invokes the following warning on my machine:
RuntimeWarning: invalid value encountered in divide
explained_variance_.sum())

Many thanks to your contributions in sklearn

Rex

@kevinhughes27
Copy link
Owner

Hmm you are right something does look off there - its been a while since I worked on this but I remember something about the last component being off with some of the methods. When working with real data the last dimension is usually useless and simply orthogonal to the others so I think these incremental methods might not bother with getting it correct. Maybe have a re-read of the papers and see if they mention it.

@yue-wu
Copy link
Author

yue-wu commented Oct 3, 2014

Thank you for your efforts. At least, Hull_IPCA works fine. I will use this to find PCA for 3M samples. If I found anything wrong, I shall let you know. By the way, later I noticed that pylearn2 ( still under development ) also has its own online PCA, but I guess it uses yet a different method. For your interests, here is the link to the project http://deeplearning.net/software/pylearn2/index.html.

@kevinhughes27
Copy link
Owner

cool thanks for the link!

If you find any problems feel free to send a patch my way!

@yue-wu
Copy link
Author

yue-wu commented Oct 3, 2014

Thank you again. It seems that the incremental PCA only solves the problem of possible memory shortage, but it is not a good idea that ask a cluster to use only one core to compute PCA. I am now reading http://mdp-toolkit.sourceforge.net/tutorial/parallel.html. It seems that they provide a way to perform parallel PCA estimation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants