python - Change data type of columns in Pandas

You have four main options for converting types in pandas: to_numeric() - provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetim... Read More

numpy - What's the fastest way in Python to calculate cosine similarity given sparse matrix data?

You can compute pairwise cosine similarity on the rows of a sparse matrix directly using sklearn. As of version 0.17 it also supports sparse output: from sklearn.metrics.pairwise import cosine_simila... Read More

python - Multi Index Sorting in Pandas

When sorting by a MultiIndex you need to contain the tuple describing the column inside a list*: In [11]: df.sort_values([('Group1', 'C')], ascending=False) Out[11]: Group1 Group2... Read More

python - Custom sorting in pandas dataframe

Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this: First make the month column a categorical and specify the ordering to use. In [21]: df['m'] = pd.Categorical(df[... Read More

csv - Read specific columns with pandas or other python module

An easy way to do this is using the pandas library like this. import pandas as pd fields = ['star_name', 'ra'] df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields) # See the keys prin... Read More

python - T-test in Pandas

it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as: from scipy.stats import ttest_ind cat1 = my_data[my_data['Category']=='... Read More

python - How to apply a function to two columns of Pandas dataframe

Here's an example using apply on the dataframe, which I am calling with axis = 1. Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a... Read More

python - How to set some xlim and ylim in Seaborn lmplot facetgrid

The lmplot function returns a FacetGrid instance. This object has a method called set, to which you can pass key=value pairs and they will be set on each Axes object in the grid. Secondly, you can se... Read More

Python Pandas : pivot table with aggfunc = count unique distinct

Do you mean something like this? In [39]: df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=lambda x: len(x.unique())) Out[39]: Z Z1 Z2 Z3 Y Y1 1... Read More

python - How to replace text in a column of a Pandas dataframe?

Use the vectorised str method replace: In [30]: df['range'] = df['range'].str.replace(',','-') df Out[30]: range 0 (2-30) 1 (50-290) EDIT So if we look at what you tried and why it didn't... Read More