If you have a large amount of data to start with, performing batch import will be much faster than sending every event over an HTTP connection.
Preparing Input File
The import tool expects its input to be a file stored either in the local filesystem or on HDFS. Each line of the file should be a JSON object string representing an event. For more information about the format of event JSON object, please refer to this page.
Shown below is an example that contains 5 events ready to be imported to the Event Server.
1 2 3 4 5 | {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"0","eventTime":"2014-11-21T01:04:14.716Z"} {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"1","eventTime":"2014-11-21T01:04:14.722Z"} {"event":"rate","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"2","properties":{"rating":1.0},"eventTime":"2014-11-21T01:04:14.729Z"} {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"7","eventTime":"2014-11-21T01:04:14.735Z"} {"event":"buy","entityType":"user","entityId":"3","targetEntityType":"item","targetEntityId":"8","eventTime":"2014-11-21T01:04:14.741Z"} |
Use SDK to Prepare Batch Input File
Some of the Apache PredictionIO SDKs also provides FileExporter client. You may use them to prepare the JSON file as described above. The FileExporter creates event in the same way as EventClient except that the events are written to a JSON file instead of being sent to EventSever. The written JSON file can then be used by batch import.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import predictionio from datetime import datetime import pytz # Create a FileExporter and specify "my_events.json" as destination file exporter = predictionio.FileExporter(file_name="my_events.json") event_properties = { "someProperty" : "value1", "anotherProperty" : "value2", } # write the events to a file event_response = exporter.create_event( event="my_event", entity_type="user", entity_id="uid", target_entity_type="item", target_entity_id="iid", properties=event_properties, event_time=datetime(2014, 12, 13, 21, 38, 45, 618000, pytz.utc)) # ... # close the FileExporter when finish writing all events exporter.close() |
1 | (coming soon) |
Import Events from Input File
Importing events from a file can be done easily using the command line interface. Assuming that pio
be in your search path, your App ID be 123
, and the input file my_events.json
be in your current working directory:
1 | $ pio import --appid 123 --input my_events.json
|
After a brief while, the tool should return to the console without any error. Congratulations! You have successfully imported your events.