Forums/Community/Developer Forum

GoodData CL Commands Explained

Zdenek Svoboda
posted this on August 06, 2010 04:50

Update: for most up to date documentation of the GoodData CL tool please refer to the GoodData developer website. The GoodData CL commands are described in the Runtime Commands documentation.

I've recently got few support requests that reported missing table errors. I've figured out that we haven't described well how the CL commands work. I'll try to quickly fix this with this post knowing that we need to extend our documentation ASAP.

The CL tool provides multiple connectors. Connectors allow you to transfer data from various different data sources. There are for example file (e.g. CSV) , JDBC, Google Analytics, and Salesforce connectors available in the CL tool.

Connectors always load data to a GoodData project.  Most of the integration scenarios load one dataset and typically have following sequence:

  1. Initialize Project by issuing one of the commends CreateProject, OpenProject, or RetrieveProject. Each of these commands sets a project context that is used by the subsequent commands. The project context remains active until the next  CreateProject, OpenProject, or RetrieveProject command. The project context can be saved using the StoreProject command and retrieved using the RetrieveProject command.
  2. Generate a Dataset Default Configuration File that describes the input data structure. The configuration file tells the CL tool what needs to happen with individual input data elements (columns). Some of them are going to turn into attributes, come of them are going to become facts, some of them are going to connect the dataset to another dataset (so called references). Typically you want to generate the config once, edit it (nobody is perfect), and then use it for the repeated data loading.
  3. Initialize Connector using a Load<ConnectorName> command. You can for example initialize the CSV connector using the LoadCsv command. Similarly you can use LoadJdbc, LoadSfdc etc. Each Load<ConnectorName> command sets a connector context that is used by the subsequent commands. The connector context remains active till the next Load<ConnectorName> command.
  4. Generate and Execute MAQL  via the GenerateMaql and ExecuteMaql commands. These commands generate and execute a script (MAQL) that initializes all project's analytical objects (attributes, facts etc.). These commands get information from both the project context and the connector context. You need to do this once per each project. Once you execute the MAQL script you can start loading the data to your project. Typically you get an error when you try to execute the same MAQL script twice agains the same project.
  5. Transfer the Data via the TransferLastSnapshot, TransferAllSnapshots, or TransferSnapshots commands. This command get the information from both project and connector contexts. These first transform (normalize) the input data and then transfer it to the project. Please note that you must run one of the versions of the Transfer command for a dataset to be able to reference it from another dataset. 

 

Here are the most common scenarios:

1. Generate the input file configuration (typically executed once per a dataset):

GenerateCsvConfig | GenerateJdbcConfig | GenerateGoogleAnalyticsConfig | GenerateSfdcConfig

2. Initialize a new GoodData project and store the project context (typically executed once per a dataset) :

CreateProject
StoreProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc 
GenerateMaql
ExecuteMaql  

3. Load data to the project (typically executed daily for a dataset):

RetrieveProject | OpenProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc 
TransferLastSnapshot | TransferAllSnapshots | TransferSnapshots

Many examples of the CL scripts are available in this document.

 

Here is an incorrect scenario that I have noticed few times recently:

1. Generate the input file configuration for a DATASET 1:

GenerateCsvConfig | GenerateJdbcConfig | GenerateGoogleAnalyticsConfig | GenerateSfdcConfig

2. Initialize a new GoodData project and store the project context for the DATASET 1 :

CreateProject
StoreProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc 
GenerateMaql
ExecuteMaql  

3. Generate the input file configuration for a DATASET 2 that references the DATASET 1 :

GenerateCsvConfig | GenerateJdbcConfig | GenerateGoogleAnalyticsConfig | GenerateSfdcConfig

4. Initialize a new GoodData project and store the project context for the DATASET 2  :

CreateProject
StoreProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc 
GenerateMaql
ExecuteMaql  

5. Load data to the project for the DATASET 2:

RetrieveProject | OpenProject
LoadCsv | LoadJdbc | LoadGoogleAnalytics | LoadSfdc 
TransferLastSnapshot | TransferAllSnapshots | TransferSnapshots

This scenario chokes because the DATASET 1 that is referenced from the DATASET 2 has never been transferred via a Transfer command. This means that the DATASET 1 transformation structures haven't been created yet. These structures are necessary when processing the DATASET 2 that references the DATASET 1.  

 

Comments

User photo
Gareth Davies

How does Good Data handle changes to any data sets that have already been uploaded?  Incremental loads are fine for growing data sets, but not changing ones.  Is it expected that the entire data set be deleted, recreated and uploaded again? 

Thanks

June 17, 2011 08:43.
User photo
Ray Light
GoodData

Hi Gareth, This is handled by the "CONNECTION POINT" of the data set. If you update data which already has an existing record, the system will update the record based on this Connection Point (primary key).

Hope this helps!

June 17, 2011 09:10.
User photo
Gareth Davies

Thanks Ray!

June 17, 2011 09:16.