Thursday, May 5, 2011

What's the easiest way to grab a web page in C ?

I'm working on an old school linux variant (QNX to be exact) and need a way to grab a web page (no cookies or login, the target URL is just a text file) using nothing but sockets and arrays.

Anyone got a snippet for this?

note: I don't control the server and I've got very little to work with besides what is already on the box (adding in additional libraries is not really "easy" given the contraints -- although I do love libcurl)

From stackoverflow
  • I do have some code, but it also supports (Open)SSL so it's a bit long to post here.

    In essence:

    • parse the URL (split out URL scheme, host name, port number, scheme specific part

    • create the socket:

      s = socket(PF_INET, SOCK_STREAM, proto);

    • populate a sockaddr_in structure with the remote IP and port

    • connect the socket to the far end:

      err = connect(s, &addr, sizeof(addr));

    • make the request string:

      n = snprinf(headers, "GET /%s HTTP/1.0\r\nHost: %s\r\n\r\n", ...);

    • send the request string:

      write(s, headers, n);

    • read the data:

      while (n = read(s, buffer, bufsize) > 0) { ... }

    • close the socket:

      close(s);

    nb: pseudo-code above would collect both response headers and data. The split between the two is the first blank line.

    Rob : You appear to be missing the request, i.e. sending the GET /blah.htm.
    Alnitak : I was adding it as you posted your comment
    Andrioid : Nice detailed answer, +1 (maybe put the SSL code on pastebin?)
    eviljack : PLEASE post your snippet that also supports SSL
    Alnitak : I'll see what I can do - it's somewhat old and probably needs some more comments adding if it's going to be shown publicly :)
  • I'd look at libcurl if you want SSL support for or anything fancy.

    However if you just want to get a simple webpage from a port 80, then just open a tcp socket, send "GET /index.html HTTP/1.0\n\r\n\r" and parse the output.

    Alnitak : rE: libcurl - and add mostly likely several 100 KB and goodness knows how many library dependencies to your program's executable...

0 comments:

Post a Comment