Coursera Crawler – WING – Web IR / NLP Group at NUS

Abstract

Coursera Crawler is a crawler for the Coursera website to get the discussion forum data. This crawler depends on PhantomJS to simulate the login process and PycURL to get the target data via hidden APIs.

This crawler is only for discussion forum data, and you can easily extend it to get the data using PycURL if the data shown on the webpage is dynamically loaded by the APIs. The hard thing you have to do is to find the hidden APIs.

Resources

Please visit github for Coursera Crawler codes.

Members

[insert_php] echo get_avatar( $id_or_email=’anyahui.120@gmail.com’, $size=30 ); [/insert_php] An Yahui (Intern)
[insert_php] echo get_avatar( $id_or_email=’cmkumar087@gmail.com’, $size=30 ); [/insert_php] Muthu Kumar Chandrasekaran (Project Lead)
[insert_php] echo get_avatar( $id_or_email=’kanmy@comp.nus.edu.sg’, $size=30 ); [/insert_php] Min-Yen Kan (Advisor and Professor)

Meeting Minutes

29 Aug 2016
13 Sep 2016
27 Sep 2016