The old million cube question.

Yes, i am asking the old million cube question again, but it is the only solution to my problem at least to start with.
I have read all of the posts that I can find on this forum and others dealing with my problem but still cannot find help.

I have 1Million+ data being calculated using my CUDA GPUs in a 100X100X100 3D grid.
Each data point represents a cube in one of its 24 orientations,
Each of the faces represents a data value also (ie color)

I need to be able to visually fly through the data set, rotate the universe and look at the cubes as the data changes, and PICK a cube that is out of the standard deviations on the data and examine the other data stored at that point via reference.

I can get the CUDA and the other related data crunching. My issue is finding a code sample that I can use and modify that can handle a large set of data cubes in OpenGL on Linux using C++.

I started my thesis project using PROCESSING.com and Java, but now that I am using CUDA, I need to grow up to C++.

ANY samples or code that can draw the 1M+ cubes would be greatly appreciated.
I keep finding windowz code, but cannot figure out how to migrate it to use on my Linux systems.
Thank you for any and all suggestions and samples.

Here is a sample image of my java attempt:

You have two possibilities: do it all yourself as it seems you want to do, or use a tool that will help you render this at (hopefully) interactive framerates.
For the latter, I highly suggest you to have a look at 3D engines, like Unreal Engine. This will imply to learn how the engine works and to understand how to link with your Cuda results.
For the former, I would suggest to have a look at instancing, occlusion queries, and a good space partition. I would personally go for an octree.

Since the OP states that this is for a thesis, I’m going to assume that “roll your own” code is a requirement.

Nowadays, drawing 1 million cubes is not a problem on the vertex pipeline - modern (and almost modern) GPUs have sufficient processing power and bandwidth to easily be able to handle it (I’ve personally drawn 1 million cubes using Direct3D 9 on an integrated Intel; that’s the kind of baseline we’re talking about here). What’s important is to take them in as few draw calls as possible - preferably one. If memory usage concerns you, you can implement techniques such as instancing and/or geometry shaders to get the vertex count down, but I’d suggest just drawing the full data set first to see how you get on.

Where you’re more likely to bottleneck is on the per-fragment pipeline, but that’s going to depend on how large your cubes are, whether they overlap, whether you’re doing blending, how complex your fragment shaders are, etc. Remember - even with this size of data set - there will be more fragments than vertices. You’ll need to give more information in order to get the best guidance for how to approach that.

Once you’ve got a GL context up and running, and all other things (GPU, GL_VERSION, driver quality, etc) being equal, OS is irrelevant. OpenGL is cross-platform, so you can take GL code for any OS and run it on any other; don’t be put off by the fact that you can only find code for Windows - OS is irrelevant; GPU and drivers are what matters.

Getting a GL context up and running is the OS-specific part, and for that you can either use native code or a framework/library such as GLUT or SDL. I’d suggest the latter, since your job isn’t to create a GL context; it’s to visualize a data set. Using such a framework/library will also give you portable input capability, so you can fly around your data set.

It’s also recommended to use something like GLEW for accessing higher level functionality in your project. You can of course write your own code for this too, but that’s honestly just donkey-work that everybody just does once and then forgets about.

Refer to the Getting Started pages on the OpenGL wiki for further guidance.

here’s a tutorial that shows you the basics of openGL:

read this, too:
https://www.khronos.org/opengl/wiki/Vertex_Specification_Best_Practices

put the vertices of a cube into a vertex buffer / element buffer, and reuse those vertices for each cube to draw, just vary the transformation (orientation data, like position + rotation). use 1 “VAO” (vertex array object).

this one shows you how to reduce the number of “draw calls” to 1:

use cuda (or GL compute shaders) to compute the transformations, and put the results into the “instance buffer”. also, try to figure out weather a cube is “behind the camera” (or out of the view frustum), skip those cubes.

“face culling” saves you about 50% of performance:
https://www.khronos.org/opengl/wiki/Face_Culling

“deferred shading” allows you to perform lighting for visible pixels only:

https://learnopengl.com/#!Advanced-Lighting/Deferred-Shading

“occlusion query” allows you to check if (and how many) fragments have to be rendered, you can use it for example to render only a 4x4x4 grid of bigger cubes that “wrap” all the other actual cubes. if a fragment of one of those 4x4x4 big cube passes, then render all the wrapped cubes, otherwise you can easily discard them alltogether (saves performance).
https://www.khronos.org/opengl/wiki/Query_Object#Occlusion_queries

use “timer queries” to measure the time (in nanoseconds, NOT FPS !!) needed to complete the rendering task. here’s an example:

Adding to what the others have said:
You can do all of those things very easily in java too. There is JOGL and LWJGL which both give you very low level access to OpenGL. I have personally never used JOGL but I never heard anything bad about it. I can confirm though that LWJGL3 is very powerful and very fast. It is highly optimized and running at almost equivalent speeds to C. So if you dont want to switch from Java to C++ keep in mind that you dont need to switch just because you want to work with OpenGL.

Thank you for the suggestions.
Since I have been wanting to learn C++, since my LISP and Fortran days, I will continue through the tutorials.
The main interfaces (APIs) for CUDA and deep learning are in C++, I also think that it would be best to write everything using C++.
Hardware is NOT an issue, since I have a i7 machine with 64GB and 4-1080TI cards (3 for CUDA) and 1 for OpenGL.
My problem is that I am a better code modifier and reviewer than a writer.
I have tried using the tutorial examples and growing them to my needs, but as soon as I try adding the instancing and OpenGL calls, I get errors, lost or otherwise. I have tried MANY different applications through the years, including PROCESSING.org, which is what my prototype was suilt using, but it limits out at 707070 cubes (343K cubes). You would think that with all of the people talking about wanting to draw as many cubes as they can, that there would be examples that I could easily modify that were written on Linux. Every sample that I look at, uses the Windows.h and DirectX code that I cannot seem to replace. i tried Unity and some of the other platforms, but the number of cubes, the associate rules and interfacing with the CUDA code, these we quickly eliminated. I am not and do not want to be a graphics developer, I simply want to use CUDA for the high math output and OpenGL for the simple yet fast visualization.

Thank you again for the tutorials and the paths forward for me to make sure I use, but does anybody have any example code that can get me at last part of the way to drawing the cubes???

THANK YOU for your help.

that means you want to “review and rewrite/modify” some code you actually dont understand ? why not using a renderer instead ?

Quite the opposite, it means that I can understand the code once it is written but that I know my limitations and non-creativity. I have tried writing the code before and keep running into one limitation or another. Instead of poorly writing a starter code and then asking for help in changing the many mistakes that I will make, including the wrong types of arrays or GL calls, I would like to start with something from somebody that knows what they are doing has written, and used all of the best options for optimal performance and then break it down line by line until I understand it and then change it to integrate with what I do.

A good analogy is that if you want to build a high speed race car, you can study go-carts all day long but if you want to understand performance and get the maximum out of today’s technology, you would study other winning cars and model yours after their winning designs.