Discussion:
Tornado - Handling request to upload large files
beyond
2014-08-12 21:43:19 UTC
Permalink
I am writing web application which will handle large files and save into
local disk

I am using stream_request_body decorator to read request stream in chunks
and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOStream. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?

Let's say web application's major role is in handling large files. In this
case does it make sense to make tornado web app async since most time spent
is in reading request data?
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
rande
2014-08-13 08:16:48 UTC
Permalink
Maybe you want to offload this task to nginx:
http://thomas.rabaix.net/blog/2014/05/handling-file-upload-is-not-always-easy
Post by beyond
I am writing web application which will handle large files and save into
local disk
I am using stream_request_body decorator to read request stream in chunks
and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOStream. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In this
case does it make sense to make tornado web app async since most time spent
is in reading request data?
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Ben Darnell
2014-08-16 15:51:29 UTC
Permalink
Linux doesn't have good support for asynchronous writes to regular files;
your best bet is to do the writes from a thread pool.
concurrent.futures.ThreadPoolExcecutor can be used directly from a Tornado
coroutine.
Post by beyond
I am writing web application which will handle large files and save into
local disk
I am using stream_request_body decorator to read request stream in chunks
and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOStream. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In this
case does it make sense to make tornado web app async since most time spent
is in reading request data?
--
You received this message because you are subscribed to the Google Groups
"Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
beyond
2014-08-19 17:32:54 UTC
Permalink
Thanks.
We are avoiding ngnix as there is some preprocessing required before
continue to download.

Will look into using thread pool thread to write to files. Wondering is
there any benefit of using thread pools thread as primary task of web
server is downloading large data (assuming there is single disk). Multiple
threads/processes trying to write to disk could even degrade performance?
Post by Ben Darnell
Linux doesn't have good support for asynchronous writes to regular files;
your best bet is to do the writes from a thread pool.
concurrent.futures.ThreadPoolExcecutor can be used directly from a Tornado
coroutine.
Post by beyond
I am writing web application which will handle large files and save into
local disk
I am using stream_request_body decorator to read request stream in
chunks and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOStream. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In
this case does it make sense to make tornado web app async since most time
spent is in reading request data?
--
You received this message because you are subscribed to the Google Groups
"Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Ben Darnell
2014-08-19 18:03:23 UTC
Permalink
Post by beyond
Thanks.
We are avoiding ngnix as there is some preprocessing required before
continue to download.
Will look into using thread pool thread to write to files. Wondering is
there any benefit of using thread pools thread as primary task of web
server is downloading large data (assuming there is single disk). Multiple
threads/processes trying to write to disk could even degrade performance?
Modern disks and filesystems are pretty clever; you might see benefits from
using more than one thread. But even if not, a thread pool with one thread
gives the easiest interface for working with threads from a coroutine.

-Ben
Post by beyond
Post by Ben Darnell
Linux doesn't have good support for asynchronous writes to regular files;
your best bet is to do the writes from a thread pool. concurrent.futures.ThreadPoolExcecutor
can be used directly from a Tornado coroutine.
Post by beyond
I am writing web application which will handle large files and save into
local disk
I am using stream_request_body decorator to read request stream in
chunks and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOStream. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In
this case does it make sense to make tornado web app async since most time
spent is in reading request data?
--
You received this message because you are subscribed to the Google
Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
beyond
2014-08-20 17:18:26 UTC
Permalink
Agreed. Will try and post here if I find anything interesting. Thanks
again.
Post by Ben Darnell
Post by beyond
Thanks.
We are avoiding ngnix as there is some preprocessing required before
continue to download.
Will look into using thread pool thread to write to files. Wondering is
there any benefit of using thread pools thread as primary task of web
server is downloading large data (assuming there is single disk). Multiple
threads/processes trying to write to disk could even degrade performance?
Modern disks and filesystems are pretty clever; you might see benefits
from using more than one thread. But even if not, a thread pool with one
thread gives the easiest interface for working with threads from a
coroutine.
-Ben
Post by beyond
Post by Ben Darnell
Linux doesn't have good support for asynchronous writes to regular
files; your best bet is to do the writes from a thread pool.
concurrent.futures.ThreadPoolExcecutor can be used directly from a
Tornado coroutine.
Post by beyond
I am writing web application which will handle large files and save
into local disk
I am using stream_request_body decorator to read request stream in
chunks and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOStream. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In
this case does it make sense to make tornado web app async since most time
spent is in reading request data?
--
You received this message because you are subscribed to the Google
Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Victor Sergienko
2014-10-15 16:43:49 UTC
Permalink
Hey,

Did anything interesting pop up?

BTW did anyone try using BaseIOStream to *read* large files, or small files?
I'm considering it because I wish to have an asynchronous fopen() call.
Post by beyond
Agreed. Will try and post here if I find anything interesting. Thanks
again.
Post by beyond
Thanks.
Post by beyond
We are avoiding ngnix as there is some preprocessing required before
continue to download.
Will look into using thread pool thread to write to files. Wondering is
there any benefit of using thread pools thread as primary task of web
server is downloading large data (assuming there is single disk). Multiple
threads/processes trying to write to disk could even degrade performance?
Modern disks and filesystems are pretty clever; you might see benefits
from using more than one thread. But even if not, a thread pool with one
thread gives the easiest interface for working with threads from a
coroutine.
Post by beyond
Post by Ben Darnell
Linux doesn't have good support for asynchronous writes to regular
files; your best bet is to do the writes from a thread pool.
concurrent.futures.ThreadPoolExcecutor can be used directly from a
Tornado coroutine.
Post by beyond
I am writing web application which will handle large files and save
into local disk
I am using stream_request_body decorator to read request stream in
chunks and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOSteam. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In
this case does it make sense to make tornado web app async since most time
spent is in reading request data?
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Ben Darnell
2014-10-15 21:49:26 UTC
Permalink
Post by Victor Sergienko
Hey,
Did anything interesting pop up?
BTW did anyone try using BaseIOStream to *read* large files, or small files?
I'm considering it because I wish to have an asynchronous fopen() call.
It won't work because linux doesn't support asynchronous reads for regular
files. They always appear to be "readable" for epoll and then block during
the read. If you're doing a lot of I/O to regular files your best bet is to
use a thread pool.

-Ben
Post by Victor Sergienko
Post by beyond
Agreed. Will try and post here if I find anything interesting. Thanks
again.
Post by beyond
Thanks.
Post by beyond
We are avoiding ngnix as there is some preprocessing required before
continue to download.
Will look into using thread pool thread to write to files. Wondering is
there any benefit of using thread pools thread as primary task of web
server is downloading large data (assuming there is single disk). Multiple
threads/processes trying to write to disk could even degrade performance?
Modern disks and filesystems are pretty clever; you might see benefits
from using more than one thread. But even if not, a thread pool with one
thread gives the easiest interface for working with threads from a
coroutine.
Post by beyond
Post by Ben Darnell
Linux doesn't have good support for asynchronous writes to regular
files; your best bet is to do the writes from a thread pool.
concurrent.futures.ThreadPoolExcecutor can be used directly from a
Tornado coroutine.
Post by beyond
I am writing web application which will handle large files and save
into local disk
I am using stream_request_body decorator to read request stream in
chunks and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOSteam. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files. In
this case does it make sense to make tornado web app async since most time
spent is in reading request data?
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Shane Spencer
2014-10-15 22:59:42 UTC
Permalink
Just a heads up to everybody that the nginx secure URL solution for some of
this is rather nice. It allows an added layer of authentication and can
have an expiration. Very simple to create them in Python as well. Secure
upload urls, Secure download urls.

As for post processing... nginx has a post-action which I use CONSTANTLY in
my own development. It simply does a GET to some other location once
something is done. This helps with ingesting downloads, as well as having
nginx fire off some information to a custom handler for analytics. I use
it for analytics and debugging.

--
http://about.me/ShaneSpencer
Post by Ben Darnell
Post by Victor Sergienko
Hey,
Did anything interesting pop up?
BTW did anyone try using BaseIOStream to *read* large files, or small files?
I'm considering it because I wish to have an asynchronous fopen() call.
It won't work because linux doesn't support asynchronous reads for regular
files. They always appear to be "readable" for epoll and then block during
the read. If you're doing a lot of I/O to regular files your best bet is to
use a thread pool.
-Ben
Post by Victor Sergienko
Post by beyond
Agreed. Will try and post here if I find anything interesting. Thanks
again.
Post by beyond
Thanks.
Post by beyond
We are avoiding ngnix as there is some preprocessing required before
continue to download.
Will look into using thread pool thread to write to files. Wondering
is there any benefit of using thread pools thread as primary task of web
server is downloading large data (assuming there is single disk). Multiple
threads/processes trying to write to disk could even degrade performance?
Modern disks and filesystems are pretty clever; you might see benefits
from using more than one thread. But even if not, a thread pool with one
thread gives the easiest interface for working with threads from a
coroutine.
Post by beyond
Post by Ben Darnell
Linux doesn't have good support for asynchronous writes to regular
files; your best bet is to do the writes from a thread pool.
concurrent.futures.ThreadPoolExcecutor can be used directly from a
Tornado coroutine.
Post by beyond
I am writing web application which will handle large files and save
into local disk
I am using stream_request_body decorator to read request stream in
chunks and keeping memory low. In order to write these chunks into file
asynchronously, I am looking into tornado.iostream.BaseIOSteam. Wondering
is that right direction? Any pointers on how to write data in request to
file asynchronously is highly appreciated?
Let's say web application's major role is in handling large files.
In this case does it make sense to make tornado web app async since most
time spent is in reading request data?
--
You received this message because you are subscribed to the Google Groups
"Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Hitesh Chaudhary
2014-10-16 03:30:23 UTC
Permalink
I just used regular os.open. I wanted to use but couldn't find reference to
how to use BaseIOStream to read/write files.
Post by beyond
BaseIOStream
--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Loading...