glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/dht: Add "afr.readdir-failover=off" option the rebalance process	shishir gowda	2012-12-17	2	-7/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By failing over readdir (default behaviour), rebalance could get duplicate files, as readdir would re-read from offset 0. Rebalance should not attempt to migrate these files again. Additionally, we need to handle these cases as failure in rebalance crawl. BUG: 859387 Change-Id: I77c5c14176bb4d9e593efd6d4739fbc8233bd0c5 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1991 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: Provide option to disable readdir failover	Pranith Kumar K	2012-12-17	5	-25/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I02e5762e7f8a5847eaf54356e5d6b5f49fe6c609 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1989 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	geo-rep / gsyncd: play nicely with peer multiplexing when setting a checkpoint	Niels de Vos	2012-12-17	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From upstream commit 01217e4e16677b13c7febc66e4e4ca3f0025739b: > The gsyncd invocation that instruments the "geo-rep config" command is > multiplexed over peers to ensure the uniformity of configuration. > In general, that works well, but checkpoint setting is a special case, > because (unlike other instances of config-set) it is logged (as recording > of checkpoint events is part of the feature). > > Problem is that the path components leading to the log file are > created only on the original node, where gsyncd was started. > Therefore the logging attempt will fail on the other nodes. > > Fix: ignore if opening the logfile on behalf of checkpoint setting > fails with ENOENT. > > Change-Id: I677f3f081bf4b9e3ba4d25d58979d86931e6beb4 > BUG: 881997 > Signed-off-by: Csaba Henk <csaba@redhat.com> > Reviewed-on: http://review.gluster.org/4248 > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Tested-by: Christos Triantafyllidis <ctrianta@redhat.com> > Reviewed-by: Christos Triantafyllidis <ctrianta@redhat.com> > Reviewed-by: Anand Avati <avati@redhat.com> Change-Id: I83b2cb7f78cf8613b78d3c8ff8e7b3828050cfc3 BUG: 881736 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1929 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	geo-replication: catch select.error on select()	Niels de Vos	2012-12-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From upstream commit 15bf92d53c72774e2fd7aba146644a2e460e543f: > tailer() in resource.py does not correctly catch exceptions from > select(). select() can raise an instance of the select.error class and > the current expression only catches ValueError (and the instance will > have reference called selecterror). > > The geo-rep log contains a call trace like this: > > E [syncdutils:190:log_raise_exception] <top>: FAIL: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 216, in twrap > > tf(aa) > > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 123, in tailer > > poe, _ ,_ = select([po.stderr for po in errstore], [], [], 1) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 276, in select > > return eintr_wrap(oselect.select, oselect.error, a) > > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 269, in eintr_wrap > > return func(*a) > > error: (9, 'Bad file descriptor') > > BUG: 880308 > Change-Id: I2babe42918950d0e9ddb3d08fa21aa3548ccf7c5 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/4233 > Reviewed-by: Peter Portante <pportant@redhat.com> > Reviewed-by: Csaba Henk <csaba@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> BUG: 880308 Change-Id: Iece1f50c0064853669d1dd4a777f77f10e2fd0dc Upstream-bug: 886808 (changed after upstream merge) Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1927 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	fuse: have setxattr on geo-rep related xattrs take effect	Niels de Vos	2012-12-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From upstream commit 6e3244a131b6d25141bef0cbc59968d3271f8ea3: > In http://review.gluster.com/3687 setxattr was made to a noop for > geo-rep special clients, with the exception of some special ones, > relevant to geo-rep. These exceptions were all in trusted namespace. > > That's no good, because with a mountbroker (unprivileged) setup, > the relevant attributes are in system namespace. So here we > just let setxattr through for any geo-rep related xattr, regardless > of namespace. > > Change-Id: I261141293b7db955a2e8b2405b4510cb10a42694 > BUG: 848447 > Signed-off-by: Csaba Henk <csaba@redhat.com> > Reviewed-on: http://review.gluster.com/3821 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Venky Shankar <vshankar@redhat.com> > Reviewed-by: Anand Avati <avati@redhat.com> BUG: 883827 Change-Id: I86a044d52ad3e679b21ff3832ee6536c5c6809fb Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1925 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/dht: fail fix-layout if any of the subvol is down	shishir gowda	2012-12-17	5	-35/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If any subvolume is down, and a layout is re-written and hash values change, entry names in the downed subvol can be reused in the other subvol which got the same hash range. when the downed subvol is brought back up, duplicate entried might appear Also separated handling of ENOSPC and ENOTCONN error. Change-Id: I1a49a689f6891a32128adcfb92dc46f39eaddec7 BUG: 860599 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1898 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: if create returns EXIST, donot set gfid/xattrs	shishir gowda	2012-12-17	1	-0/+4
\| \| \| \| \| \| \| \| \|	Change-Id: I9f2b75b10bde428d36d6516aa09c18e590d17ed9 BUG: 864801 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1896 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	protocol/client: Conditional logging in client3_1_unlink_cbk	Venkatesh Somyajulu	2012-12-17	1	-1/+4
\| \| \| \| \| \| \| \| \|	Change-Id: Ic6f4e276a5ab6906e4b3ad28e9b8c7eed52b3080 BUG: 861925 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1985 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr : Edited log message in afr_sh_entry_expunge_entry_cbk	Venkatesh Somyajulu	2012-12-17	1	-1/+2
\| \| \| \| \| \| \| \| \|	Change-Id: Ic5256650652416e3a043b9e4640748ce1fa50e83 BUG: 860246 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1986 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr Changed the message's log level from Error to Debug	Venkatesh Somyajulu	2012-12-17	1	-3/+3
\| \| \| \| \| \| \| \| \|	Change-Id: I64ca577839fc25952025651873ab60a2fcc3702c BUG: 859411 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1984 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	Cluster/afr: Fix output for gluster volume heal vn info healed	Venkatesh Somyajula	2012-12-17	7	-17/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1902 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	afr: make flush non-transactional	Brian Foster	2012-12-17	3	-136/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flush is historically a transaction to ensure all previous writes were complete. This is no longer required as write-behind has learned to make flush a barrier operation (re: conversation w/ Avati). Flush taking a full file lock causes VMs running on afr volumes to stall when a migration occurs and self-heal is in progress. Make afr_flush() a non-transactional operation. BUG: 874045 Change-Id: Ie287b79e7f300df88aca6030e2d80311772746bf Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1912 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	afr: use data trylock mode in read/write self-heal trigger paths	Brian Foster	2012-12-17	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Self-heal data lock contention between clients and glustershd instances can lead to long wait and user response times if the client ends up pending its lock on glustershd self-heal of a large file. We have reports of guest vm instances going completely unresponsive during self-heal of virtual disk images. Optimize the read/write self-heal trigger codepath (i.e., afr_open_fd_fix()) to trylock for self-heal and skip the self-heal otherwise to minimize the likelihood of a running/active guest of competing with glustershd on arrival of a brick. Note that lock contention is still possible from the client (e.g., via lookup). BUG: 874045 Change-Id: I077e2c0aaa424b80734a471284173bda8871cdc3 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1911 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	perf/io-threads: least-rate-limit least priority throttling	Brian Foster	2012-12-17	3	-2/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'least-rate-limit' io-threads translator option enables throttling of least priority operations. This is initially intended as a debug/diagnostic tool for users who might experience overloaded servers via background activity (i.e., self-heal). least-rate-limit defines the maximum number of least priority operations the io-threads translator will dequeue in one second. If the specified rate limit is met, the worker threads sleep for the minimal amount of time before the next least priority operation becomes available (or until a new request arrives). The requests/second metric is generic and relative to a variety of factors involved with a background operation (server, storage, etc.). The most recent measured rate ("cached least rate") is added to the io-threads state dump content (kill -USR1) to serve as a reference point to throttle background activity under particular conditions. [This backport drops the iot_priv_dump() bits as they do not exist in downstream.] BUG: 853680 Change-Id: If7d28439372a2ea1a64e92e4a4b13826840a5248 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1909 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	afr: support self-heal data trylock mechanism	Brian Foster	2012-12-17	4	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a block flag to support an optional blocking or non-blocking mode in the self-heal data locking mechanism. All callers are modified to use blocking mode, which is the current default behavior (no change in behavior is introduced by this commit). BUG: 874045 Change-Id: I89bd2e698bd3db898c3ad57b55cf5c38e822e136 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1910 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	Rebase commit #2	Vijay Bellur	2012-12-12	1	-0/+49
\| \| \| \| \| \| \| \|	Change-Id: Ie983d0b9862cc1401187532ed896e57bd3488e2b BUG: 871323 Reviewed-on: https://code.engineering.redhat.com/gerrit/1893 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	protocol/client: add an option to filter O_DIRECT flag in open	Amar Tumballi	2012-12-12	3	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with the option, the idea is all client-side caching will be disabled, where as on server side process, the fd will be treated as a regular fd, thus helping the performance better. "gluster volume set <VOLNAME> remote-dio enable" would set this option in client protocol volumes. Change-Id: I08f3d1f6fed6da58501b5b94e5572216593c2847 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 856156 Reviewed-on: https://code.engineering.redhat.com/gerrit/1685 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1885
*	storage/posix: Handle undefined symbol when aio is not available	Pranith Kumar K	2012-12-12	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	Change-Id: I47b93a5e72f06bda016b5b9ab820cbc8f99fab28 BUG: 871323 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/182 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1884
*	fuse: make the default background queue length as 512	Amar Tumballi	2012-12-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	should help VM hosting performance when there are more VMs are hosted from a single data store. Change-Id: I0f2df352e410e10845cfade5f27fe1b0b5b06250 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 859589 Reviewed-on: https://code.engineering.redhat.com/gerrit/1504 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1883
*	glusterd: Made volume reset recognize options in <domain>.<specifier> format	Krutika Dhananjay	2012-12-12	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	Change-Id: Id057606c2882584310119a1e7dd8674943857841 BUG: 866565 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/178 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1882
*	protocols: Suppress getxattr log when errno is ENOENT	Pranith Kumar K	2012-12-12	2	-2/+6
\| \| \| \| \| \| \| \| \| \|	Change-Id: I4c170464cb9aa013588d615c2916bf87c370e9dc BUG: 861015 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/162 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1881
*	storage/posix: Make rchecksum O_DIRECT friendly	Pranith Kumar K	2012-12-12	2	-24/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When posix-aio is enabled to perform aio fd is set with O_DIRECT whenever possible in read, writev fops. Rchecksum does not take this into account. If either offset/size/memory-buf passed to pread in rchecksum fop is not aligned, pread fails with EINVAL. Fix: Before doing pread necessary O_DIRECT manipulation is done when aio is enabled. Memory buffer passed to pread is now page-aligned. Test: 1) Create replica volume with aio enabled. 2) dd if=/dev/urandom of=a bs=1M count=1 3) kill one of the bricks in the replica pair 4) dd if=/dev/urandom of=a bs=1M count=1 5) bring back the brick. Self-heal succeeds after the change. The test above checks both rchecksum, writev fops that were changed in this patch. Change-Id: I5126e20ca1d6aeb71d4d66d14de277729fc8e89f BUG: 866459 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/156 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1880
*	cluster/afr: check transaction type for eager-lock after it is set	Pranith Kumar K	2012-12-12	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: Icb1464bc572cf0be73bdd4d5803a2326b5d22655 BUG: 865321 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/85 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1879
*	cluster/distribute: cli support for setting directory-layout-spread	shishir gowda	2012-12-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'gluster volume set <volname> subvols-per-directory <value> ' will control to how many (value) subvolume's the directories layout will be spread. Change-Id: I0aed937f6bbc66629e36b6a856432e51b180747c BUG: 865669 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/122 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1877
*	cluster/afr: Wake up post-op on non-co-operative transaction	Pranith Kumar K	2012-12-12	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The problem is observed when kernel untar is done. One file untar happens every second. The reason for this is, setattr lock is blocked on the prev fd data-transaction full-lock (because of eager-lock). Because of post-op-delay the post-op (xattrop + unlock) of the prev data-transaction happens after 1 sec. Until this the setattr is blocked resulting in performance problems in untar. Fix: Whenever an loc data, meta-data transaction comes, it should wakeup the prev-post-op on the same process' fd. Tests: The performance problem in untar went away. I put a breakpoint in client_finodelk for a 2G file dd and the inodelk is hit only 4 times. This confirms that the change does not affect post-op-delay in a -ve way. Change-Id: I32e272727f8ea03ae8768509695bbae183aff17d BUG: 853679 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/83 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1876
*	features/marker: use buf->ia_gfid in all the lookup callbacks	Raghavendra Bhat	2012-12-12	2	-10/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* In general use buf->ia_gfid for gfid instead of inode's gfid in the callbacks of the fops where new inode is created (such as create, mkdir, mknod, symlink). In the callback path inode would not be having the gfid within it, if it is not yet linked to the inode table which happens in protocol/server. Change-Id: Ie2e5ce6d25181e13d32c1ab99ee488a55fe64117 BUG: 848318 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/64 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1875
*	linux-aio: fixes while setting O_DIRECT flag	Amar Tumballi	2012-12-12	3	-40/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux AIO needs O_DIRECT to be set for effective operation. O_DIRECT in turn has constraints on when it can work (offset, size alignment) So use O_DIRECT (unless instructed by application) only when offset and size alignments match. Else, io_submit() will happen over non-O_DIRECT fd, effectively blocking till the completion of the IO. Also fix a multithreading bug where detection/setting of O_DIRECT for a request was not atomic with io_submit() of that request. Change-Id: I190017e8bc78217429aff0714dca224cbe6f251d BUG: 859406 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/4006 Tested-by: Amar Tumballi <amarts@redhat.com> Original-Author: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/61 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1874
*	Rebase commit #1	Vijay Bellur	2012-12-12	1	-0/+519
\| \| \| \| \| \| \| \|	Change-Id: Iad1acb3fb744d0a30498bddbee32e64fa0413f66 BUG: 858469 Reviewed-on: https://code.engineering.redhat.com/gerrit/1873 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	features/marker: if parent inode is NULL, then get it by inode_parent	Raghavendra Bhat	2012-12-12	2	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* If parent inode is NULL (nameless lookups which uses gfid for looking up the inode), then try to get it by inode_parent, instead of returning which results in the inode's contribution not being added to the list. * Prevent exceesive logging while adding the inode's contribution to the list if the operation fails. (Check if the inode's gfid is null which indicates that the inode is not yet linked to the inode table and hence addition of its contribution to the list can fail). BUG: 851953 Change-Id: I4539b0534894e9d9cf5036c12fbf591ecad586bb Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/35 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/138
*	cluster/dht: set conf->defrag to NULL after freeing the defrag structure	Raghavendra Bhat	2012-12-12	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also no need to free the xlator object after rebalance is over, as the process is about to be killed. Change-Id: Id13cc74edf367660eef96ce215878e4dac7b4ba1 BUG: 862981 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/53 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1872
*	logging: log ENOENT errors in DEBUG mode instead of ERROR or INFO	Raghavendra Bhat	2012-12-12	2	-2/+4
\| \| \| \| \| \| \| \| \| \|	Change-Id: I08a34e58892a8b2a2fdecc606bed8db292d36332 BUG: 851953 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/36 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/137
*	protocol/client: Remember the gfid of opened fd	Pranith Kumar K	2012-12-12	3	-112/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is needed when the fresh lookup triggers self-heal, gfid won't be present in inode yet. Similar situation happens with Rebalance as it does not perform inode_link. Added similar fix for re-opendir. Removed inode from fdctx and removed some duplication of code. Change-Id: I87679df7171bc6a25c4396af3a3fc04534a65c9c BUG: 859387 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1581 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	write-behind: fix off-by-one bug in wb_requests_overlap()	Anand Avati	2012-12-12	1	-7/+6
\| \| \| \| \| \| \| \| \| \|	and backport an upstream review comment Change-Id: If683ee051cc3bd969417d69705bd63343650b541 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1869 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	performance/write-behind: Add missing memory accounting type	Vijay Bellur	2012-12-12	1	-0/+1
\| \| \| \| \| \| \|	Change-Id: I578b41b721d1a4aca679e637082737dfcf6a3194 Reviewed-on: https://code.engineering.redhat.com/gerrit/1867 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	write-behind: implement causal ordering and other cleanup	Anand Avati	2012-12-12	1	-2321/+1212
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rules of causal ordering implemented: - If request A arrives after the acknowledgement (to the app, i.e, STACK_UNWIND) of another request B, then request B is said to have 'caused' request A. - (corollary) Two requests, which at any point of time, are unacknowledged simultaneously in the system can never 'cause' each other (wb_inode->gen is based on this) - If request A is caused by request B, AND request A's region has an overlap with request B's region, then then the fulfillment of request A is guaranteed to happen after the fulfillment of B. - FD of origin is not considered for the determination of causal ordering. - Append operation's region is considered the whole file. Other cleanup: - wb_file_t not required any more. - wb_local_t not required any more. - O_RDONLY fd's operations now go through the queue to make sure writes in the requested region get fulfilled before getting processed. - O_SYNC fd's operations now go through the queue to make sure previously acknowledged writes on the file (via other fds) are fulfilled before getting processed. - Option to not honor O_SYNC is now removed. - Option to ignore O_DIRECT is added (useful when running a VM and the drive appears with NCQ/TCQ or WCE=1 for the guest.) - Option to disable_first_nbytes is removed (as the cause of the bug which required this was diagnosed to be missing TCP_NODELAY.) - General cleanup and better conformance to coding style and convention. Change-Id: Ib44fb72da3727246b4a85174cb568c2f0231f6de BUG: 857673 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1866 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	fuse: make the 'gid-timeout' default value as 2sec instead of 0	Amar Tumballi	2012-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	done for the performance benefits. Change-Id: I4788800fb911ac571c4ff636db5d09e95b335a6e BUG: 858469 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1864 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: Option to set brick(of a volume)'s root dir's uid/gid	Krishnan Parthasarathi	2012-12-12	2	-5/+45
\| \| \| \| \| \| \| \|	Change-Id: I529d4cd949477a436a5b571b69da9f1c8b33ee8f BUG: 858469 Reviewed-on: https://code.engineering.redhat.com/gerrit/1863 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	fuse: make background queue length configurable	Amar Tumballi	2012-12-12	3	-3/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* also make 'congestion_threshold' an option * make 'congestion_threshold' as 75% of background queue length if not explicitely specified * in glusterfsd.c, moved all the fuse option dictionary setting code to separate function Change-Id: Ie1680eefaed9377720770a09222282321bd4132e Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 845214 Reviewed-on: https://code.engineering.redhat.com/gerrit/1860 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cli: Added special key "group" for bulk volume set.	Krishnan Parthasarathi	2012-12-12	1	-29/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster volume set VOLNAME group group_name - where group_name is a file under /var/lib/glusterd/groups containing one key, value pair per line as below, key1=value1 key2=value2 [...] - the command sets key1 to value1 and so on. Change-Id: Ic4c8dedb98d013b29a74e57f8ee7c1d3573137d2 BUG: 851237 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1859 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: post-op-delay support	Anand Avati	2012-12-12	6	-1/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	post-op-delay introduces an artificial delay between the OP and POST-OP-CHANGELOG phases of a write transaction to increase the probability of changelog-piggyback and eager-locking to work more efficiently. Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1858 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/afr: cleanup lk_owner and PID mess	Anand Avati	2012-12-12	3	-41/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Historically PID (frame->root->pid) was used by the locks translator to identify a locker (and make decisions about which locks contend or cooperate/merge). Since the introduction of lock_owner parameter the usage of PID (for locks) was deprecated and is now unused. This patch nukes the usage of PID in AFR The usage of lk_owner has also ended up being a mess, because of the differentiation required between ->lk() and ->inodelk(), (->lk() needs to be identified by the process (roughly) and ->inodelk() needs to be identified by the transaction) and also because of optimizations like eager locking (locks are no more identified by the transaction as they now get inherited by the next transaction). The scheme (and technique) now is: - All FOPs (the third phase of the transaction) happen with the lk_owner which is set by the topmost layer (FUSE, NFS etc.) - All entrylks are issued with lk_owner set to the frame->root address. - Inodelks which will not be subject to eager locking are issued with lk_owner set to frame->root. - Inodelks which are subject to eager locking are issued with lk_owner set to the address of fd_t (which are the only type of frames which get subject to the eager locking optimization) - At the start of the transaction, the transaction frame's lk_owner is set to the either frame->root or fd_t (and never unmodified) depending on the type of transaction. - Just before the third phase (FOP phase) the set lk_owner is "saved" away and overwritten by the lk_owner submitted by the top layer (FUSE or NFS) - Right after the third phase, the saved lk_owner is "restored" to resume the transaction into the POST-OP and eventually UNLOCK using the same lk_owner which was used during the LOCK phase. Change-Id: I6ab8e4d6b65ae4185fa85ad3fded8e9188b2f929 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1857 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd: Persisted hooks friendly user.* keys	Krishnan Parthasarathi	2012-12-12	3	-33/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Fixed validation of user.* keys in presence of multiple key, value pairs in a single volume set command Change-Id: I5b96de2d009fbc79772121308d9b4c0a552bac52 BUG: 825902 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.com/3715 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1855 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	glusterd-hooks: added support for separate namespace for 'volume set' keys	Krishnan Parthasarathi	2012-12-12	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[cherry-picked Amar's patch from master] The keys in the above mentioned namespace could be used by hook scripts to peform tasks on 'special' keys as defined by the storage admin. The choice of the key and its semantics of it are outside the scope of glusterd. It is the responsibility of the storage admin to keep the meaning of the key(s) consistent. If a user gives a command like 'gluster volume set <VOLNAME> user.for-this-key do-this" scripts would get 'user.for-this-key=do-this' as argument. Change-Id: I5509e17d99e4ddd8bf5df968dcd51ff9a80dc3ab Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 825902 Reviewed-on: http://review.gluster.com/3443 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1854 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	libglusterfs,mount/fuse: implement gidcache mechanism in fuse-bridge	Brian Foster	2012-12-12	5	-1/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change genericizes the cache mechanism implemented in commit 8efd2845 into libglusterfs/src/gidcache.[ch] and adds fuse-bridge as a client. The cache mechanism is fundamentally equivalent, with some minor changes: - Change cache key from uid_t to uint64_t. - Modify the cache add logic to locate and use an entry with a matching ID, should it already exist. This addresses a bug in the existing mechanism where an expired entry supercedes a newly added entry in lookup, causing repeated adds and flushing of a cache bucket. The fuse group cache is disabled by default. It can be enabled via the 'gid-timeout' fuse-bridge translator option and accompanying mount option (i.e., '-o gid-timeout=1' for a 1s entry timeout). BUG: 800892 Change-Id: I0b34a2263ca48dbb154790a4a44fc70b733e9114 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1853 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cli: Proper xml output for "gluster peer status"	Kaushal M	2012-12-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	Change-Id: I90952ba2ea606552cf4ad67dd296a440f90592d6 BUG: 847760 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3870 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1852 Tested-by: Vijay Bellur <vbellur@redhat.com>
*	Self-heald: Fix inode leak	Pranith Kumar K	2012-12-12	1	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RCA: There is an inode-leak because inode_link returns linked inode by taking a reference. That needs to be unreffed. Fix: Added the code to perform unrefs. In addition to that updated the loc inode with the linked-inode because that is the best practice. The code to update the input inode's gfid can be removed later, its already removed in master. Tests: Checked that opendir comes with an loc with valid inode Checked that re-opendir happens successfully. Tested index, full self-heal work fine with the fix. BUG: 826580 Change-Id: I0c68192ff98f76152ed112b393d497b8fee93355 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3518 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1851 Tested-by: Vijay Bellur <vbellur@redhat.com>
*	dht/rebalance: set the correct ownership on the dst file.	shishir gowda	2012-12-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the dst file created has root:root ownership, till migration is completed. During this phase, open fails on the dst file if uid/gid is non-root. Setting the dst_file to the correct ownership fixes the issue Change-Id: Icfec89eb10dc866cdee38dab17695fe21174ef99 BUG: 852361 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3862 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1850 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	storage/posix: implement native linux AIO support	shishir gowda	2012-12-12	6	-6/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Configurable via cli with "storage.linux-aio" settable option Backported Avati's patch http://review.gluster.org/#change,3627 BUG: 837495 Change-Id: Ia7c26f5734d34d341debd422a5c59bba31eef844 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3849 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1849 Tested-by: Vijay Bellur <vbellur@redhat.com>
*	cluster/dht: handle percent option for 'min-free-disk'	Amar Tumballi	2012-12-05	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* with the init option cleanups, setting of 'conf->disk_unit' was reset, which made it not set the '%' in the option. * bring a global check, which makes the option assume its percent, as long as value is < 100. Upstream Patch : http://review.gluster.org/3918 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 858488 Change-Id: I7916d69ba72f0647881062d910bae73884a1b1c7 Reviewed-on: https://code.engineering.redhat.com/gerrit/144 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
*	mount/fuse: readdir() should return 32-bit inodes when 'enable-ino32' is used	Niels de Vos	2012-11-19	3	-4/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From upstream commit 9cc24de746ce0e616fa09120b89aaa9a626f33cb: > The glusterfs mount option 'enable-ino32' does not change the behaviour > of readdir(). fuse_readdir_cbk() uses entry->d_ino directly, and this > was missed in commit c13823bd16b26bc471d3efb15f63b76fbfdf0309. > > By adding the function gf_fuse_fill_dirent(), the fuse_dirent structure > is filled in a similar way as the fuse_attr structure. This helper uses > the same function to squash the 64-bit inode in a 32-bit attribute. > > Change-Id: Ia20e7144613124a58691e7935cb793b6256aef79 > BUG: 850352 > URL: http://lists.nongnu.org/archive/html/gluster-devel/2012-09/msg00051.html > Tested-by: Steve Bakke <sbakke@netzyn.com> > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/3955 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Brian Foster <bfoster@redhat.com> > Reviewed-by: Anand Avati <avati@redhat.com> > Signed-off-by: Niels de Vos <ndevos@redhat.com> BUG: 876679 Change-Id: I0d6514fa6d118805b66cb942d94f40bb09045326 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/1586 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>