I like Orhan's suggestion, it is less work.

Slight correction to my comment above:

"For each of the n chunks, if there is no non-zero value in the 100th column, you will get an error that looks like this..."

I meant

For each of the n chunks, if there is no value of any kind (0 or otherwise) in the 100th column, you will get an error that looks like this..."

Frank

On Thu, Jan 4, 2018 at 5:26 PM, Orhan Kislal wrote:
Hello Anthony,

I agree with Frank's suggestion, operating on chunks of the matrix should work. An alternate workaround for the 100th column issue you might encounter could be this:

Check if there exists a value for the the first (or last or any other) row, last column. If there is one, then you can use the chunk as is. If not, put 0 as the value of that particular row/column. This will ensure the matrix size is calculated correctly, will not affect the output and will not require any additional operation for the assembly of the final vector.

Please let us know if you have any questions.

Thanks,

Orhan Kislal

On Thu, Jan 4, 2018 at 12:12 PM, Frank McQuillan wrote:
Anthony,

In that case, I think you are hitting the 1GB PostgreSQL limit.

Operations on sparse matrix format requires loading into memory 2 INTEGERS for row/col plus the value (INTEGER, DOUBLE PRECISION, whatever size it is).

It means for your matrix the 2 INTEGERS alone are ~1.00E+09 bytes which is already on the limit without even considering the value yet.

So I would suggest you do the computation in blocks.  One approach to this:

* chunk your long matrix into n smaller VIEWS, say n=10 (i.e., MADlib matrix operations do work on VIEWS)
* call matrix*vector for each chunk
* reassemble the n result vectors into the final vector

You could do this in a PL/pgSQL or PL/Python function.

There is one subtlety to be aware of though because you are working with sparse matrices. For each of the n chunks, if there is no non-zero value in the 100th column, you will get an error that looks like this:

NULL,
array[1,2,3,4,5,6,7,8,9,10]
);
ERROR:  plpy.Error: Matrix error: Dimension mismatch between matrix (1 x 9) and vector (10 x 1)
CONTEXT:  Traceback (most recent call last):
PL/Python function "matrix_vec_mult", line 24, in <module>
matrix_in, in_args, vector)
PL/Python function "matrix_vec_mult", line 2031, in matrix_vec_mult
PL/Python function "matrix_vec_mult", line 77, in _assert
PL/Python function "matrix_vec_mult"

See the explanation at the top of
regarding dimensionality of sparse matrices.

One way around this is to add a (fake) row to the bottom of your VIEW with a 0 in the 100th column.  But if you do this, be sure to drop the last (fake) entry of each of the n intermediate vectors before you assemble into the final vector.

Frank

On Wed, Jan 3, 2018 at 8:15 PM, Anthony Thomas wrote:

Best,

Anthony

On Wed, Jan 3, 2018 at 3:13 PM, Frank McQuillan wrote:

Anthony,

Correct the install check error you are seeing is not related.

Cpl questions:

(1)
Are you using:

-- Multiply matrix with vector
matrix_vec_mult( matrix_in, in_args, vector)

(2)
Is matrix_in encoded in sparse format like at the top of

e.g., like this?

row_id | col_id | value
--------+--------+-------
1 |      1 |     9
1 |      5 |     6
1 |      6 |     6
2 |      1 |     8
3 |      1 |     3
3 |      2 |     9
4 |      7 |     0

Frank

On Wed, Jan 3, 2018 at 2:52 PM, Anthony Thomas wrote:
Okay - thanks Ivan, and good to know about support for Ubuntu from Greenplum!

Best,

Anthony

On Wed, Jan 3, 2018 at 2:38 PM, Ivan Novick wrote:
Hi Anthony, this does NOT look like a Ubuntu problem, and in fact there is OSS Greenplum officially on Ubuntu you can see here:
http://greenplum.org/install-greenplum-oss-on-ubuntu/

Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col combination) but there are techniques to manage data sets working within these constraints.  I will let someone else who has more experience then me working with matrices answer how is the best way to do so in a case like you have provided.

Cheers,
Ivan

On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas wrote:

I have a large tall and skinny sparse matrix which I'm trying to multiply by a dense vector. The matrix is 1.25e8 by 100 with approximately 1% nonzero values. This operations always triggers an error from Greenplum:

plpy.SPIError: invalid memory alloc request size 1073741824 (context 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957)
CONTEXT:  Traceback (most recent call last):
PL/Python function "matrix_vec_mult", line 24, in <module>
matrix_in, in_args, vector)
PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult
PL/Python function "matrix_vec_mult", line 2001, in _matrix_vec_mult_dense
PL/Python function "matrix_vec_mult"

Some Googling suggests this error is caused by a hard limit from Postgres which restricts the maximum size of an array to 1GB. If this is indeed the cause of the error I'm seeing does anyone have any suggestions about how to circumvent this issue? This comes up in other cases as well like transposing a tall and skinny matrix. MVM with smaller matrices works fine.

Here is relevant version information:

SELECT VERSION();
PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on x86_64-pc-linux-gnu, compiled by GCC gcc
(Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 2017 09:09:46

MADlib version: 1.12, git revision: unknown, cmake configuration time: Thu Dec 21 18:04:47 UTC 201
7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-generic, C compiler: gcc 4.9.3, C++ co
mpiler: g++ 4.9.3

Madlib install-check reported one error in the "convex" module related to "loss too high" which seems unrelated to the issue described above. I know Ubuntu isn't officially supported by Greenplum so I'd like to be confident this issue isn't just the result of using an unsupported OS. Please let me know if any other information would be helpful.

Thanks,

Anthony

--
Ivan Novick, Product Manager Pivotal Greenplum